All posts by Lennart Poettering

systemd.conf 2016 Over Now

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-2016-over-now.html

systemd.conf 2016 is Over Now!

A few days ago systemd.conf 2016 ended, our
second conference of this kind. I personally enjoyed this conference a
lot: the talks, the atmosphere, the audience, the organization, the
location, they all were excellent!

I’d like to take the opportunity to thanks everybody involved. In
particular I’d like to thank Chris, Daniel, Sandra and Henrike
for organizing the conference, your work was stellar!

I’d also like to thank our sponsors, without which the conference
couldn’t take place like this, of course. In particular I’d like to
thank our gold sponsor, Red Hat, our organizing sponsor Kinvolk, as
well as our silver sponsors CoreOS and Facebook. I’d also like to
thank our bronze sponsors Collabora, OpenSUSE, Pantheon, Pengutronix,
our supporting sponsor Codethink and last but not least our media
sponsor Linux Magazin. Thank you all!

I’d also like to thank the Video Operation Center
(“VOC”)
for their amazing work on live-streaming
the conference and making all talks available on YouTube. It’s amazing
how efficient the VOC is, it’s simply stunning! Thank you guys!

In case you missed this year’s iteration of the conference, please
have a look at our YouTube
Channel
. You’ll
find all of this year’s talks there, as well the ones from last
year. (For example, my welcome talk is available
here). Enjoy!

We hope to see you again next year, for systemd.conf 2017 in Berlin!

systemd.conf 2016 Workshop Tickets Available

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-2016-workshop-tickets-available.html

Tickets for systemd 2016 Workshop day still available!

We still have a number of ticket for the workshop day of systemd.conf
2016
available. If you are a newcomer to
systemd, and would like to learn about various systemd facilities, or
if you already know your way around, but would like to know more: this
is the best chance to do so. The workshop day is the 28th of
September, one day before the main conference, at the betahaus in
Berlin, Germany. The schedule for the day is available
here. There
are five interesting, extensive sessions, run by the systemd hackers
themselves. Who better to learn systemd from, than the folks who wrote
it?

Note that the workshop day and the main conference days require
different tickets. (Also note: there are still a few tickets available for
the main conference!).

Buy a ticket here.

See you in Berlin!

Preliminary systemd.conf 2016 Schedule

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/preliminary-systemdconf-2016-now-available.html

A Preliminary systemd.conf 2016 Schedule is Now Available!

We have just published a first, preliminary version of the
systemd.conf 2016
schedule
. There
is a small number of white slots in the schedule still, because we’re
missing confirmation from a small number of presenters. The missing
talks will be added in as soon as they are confirmed.

The schedule consists of 5 workshops by high-profile speakers during
the workshop day, 22 exciting talks during the main conference days,
followed by one full day of hackfests.

Please sign up for the conference soon! Only a limited number of
tickets are available, hence make sure to secure yours quickly before
they run out! (Last year we sold out.) Please sign up here for the
conference!

FINAL REMINDER! systemd.conf 2016 CfP Ends on Monday!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/final-reminder-systemdconf-2016-cfp-ends-on-monday.html

Please note that the systemd.conf 2016
Call for Participation ends on Monday, on Aug. 1st! Please send
in your talk proposal by then! We’ve already got a good number of
excellent submissions, but we are very interested in yours, too!

We are looking for talks on all facets of systemd: deployment,
maintenance, administration, development. Regardless of whether you
use it in the cloud, on embedded, on IoT, on the desktop, on mobile,
in a container or on the server: we are interested in your
submissions!

In addition to proposals for talks for the main conference, we are
looking for proposals for workshop sessions held during our
Workshop Day (the first day of the conference). The workshop format
consists of a day of 2-3h training sessions, that may cover any
systemd-related topic you’d like. We are both interested in
submissions from the developer community as well as submissions from
organizations making use of systemd! Introductory workshop sessions
are particularly welcome, as the Workshop Day is intended to open up
our conference to newcomers and people who aren’t systemd gurus yet,
but would like to become more fluent.

For further details on the submissions we are looking for and the CfP
process, please consult the CfP
page
and
submit your proposal using the provided form!

ALSO: Please sign up for the conference soon! Only a
limited number of tickets are available, hence make sure to secure
yours quickly before they run out! (Last year we sold out.) Please
sign up here for the
conference!

AND OF COURSE: We are also looking for more sponsors for
systemd.conf! If you are working on systemd-related projects, or make
use of it in your company, please consider becoming a sponsor of
systemd.conf
2016
!
Without our sponsors we couldn’t organize systemd.conf 2016!

Thank you very much, and see you in Berlin!

REMINDER! systemd.conf 2016 CfP Ends in Two Weeks!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/reminder-systemdconf-2016-cfp-ends-in-two-weeks.html

Please note that the systemd.conf 2016
Call for Participation ends in less than two weeks, on Aug. 1st!
Please send in your talk proposal by then! We’ve already got a good
number of excellent submissions, but we are interested in yours even
more!

We are looking for talks on all facets of systemd: deployment,
maintenance, administration, development. Regardless of whether you
use it in the cloud, on embedded, on IoT, on the desktop, on mobile,
in a container or on the server: we are interested in your
submissions!

In addition to proposals for talks for the main conference, we are
looking for proposals for workshop sessions held during our
Workshop Day (the first day of the conference). The workshop format
consists of a day of 2-3h training sessions, that may cover any
systemd-related topic you’d like. We are both interested in
submissions from the developer community as well as submissions from
organizations making use of systemd! Introductory workshop sessions
are particularly welcome, as the Workshop Day is intended to open up
our conference to newcomers and people who aren’t systemd gurus yet,
but would like to become more fluent.

For further details on the submissions we are looking for and the CfP
process, please consult the CfP
page
and
submit your proposal using the provided form!

And keep in mind:

REMINDER: Please sign up for the conference soon! Only a
limited number of tickets are available, hence make sure to secure
yours quickly before they run out! (Last year we sold out.) Please
sign up here for the
conference!

AND OF COURSE: We are also looking for more sponsors for
systemd.conf! If you are working on systemd-related projects, or make
use of it in your company, please consider becoming a sponsor of
systemd.conf
2016
!
Without our sponsors we couldn’t organize systemd.conf 2016!

Thank you very much, and see you in Berlin!

CfP is now open

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/cfp-is-now-open.html

The systemd.conf 2016 Call for Participation is Now Open!

We’d like to invite presentation and workshop proposals for systemd.conf 2016!

The conference will consist of three parts:

  • One day of workshops, consisting of in-depth (2-3hr) training and learning-by-doing sessions (Sept. 28th)
  • Two days of regular talks (Sept. 29th-30th)
  • One day of hackfest (Oct. 1st)

We are now accepting submissions for the first three days: proposals
for workshops, training sessions and regular talks. In particular, we
are looking for sessions including, but not limited to, the following
topics:

  • Use Cases: systemd in today’s and tomorrow’s devices and applications
  • systemd and containers, in the cloud and on servers
  • systemd in distributions
  • Embedded systemd and in IoT
  • systemd on the desktop
  • Networking with systemd
  • … and everything else related to systemd

Please submit your proposals by August 1st, 2016. Notification of acceptance will be sent out 1-2 weeks later.

If submitting a workshop proposal please contact the organizers for more details.

To submit a talk, please visit our CfP submission page.

For further information on systemd.conf 2016, please visit our conference web site.

Announcing systemd.conf 2016

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/announcing-systemdconf-2016.html

Announcing systemd.conf 2016

We are happy to announce the 2016 installment of systemd.conf, the conference of the systemd project!

After our successful first conference 2015 we’d like to repeat the event in 2016 for the second time. The conference will take place on September 28th until October 1st, 2016 at betahaus in Berlin, Germany. The event is a few days before LinuxCon Europe, which also is located in Berlin this year. This year, the conference will consist of two days of presentations, a one-day hackfest and one day of hands-on training sessions.

The website is online now, please visit https://conf.systemd.io/.

Tickets at early-bird prices are available already. Purchase them at https://ti.to/systemdconf/systemdconf-2016.

The Call for Presentations will open soon, we are looking forward to your submissions! A separate announcement will be published as soon as the CfP is open.

systemd.conf 2016 is a organized jointly by the systemd community and kinvolk.io.

We are looking for sponsors! We’ve got early commitments from some of last year’s sponsors: Collabora, Pengutronix & Red Hat. Please see the web site for details about how your company may become a sponsor, too.

If you have any questions, please contact us at [email protected].

Introducing sd-event

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/introducing-sd-event.html

The Event Loop API of libsystemd

When we began working on
systemd we built
it around a hand-written ad-hoc event loop, wrapping Linux
epoll
. The more
our project grew the more we realized the limitations of using raw
epoll:

  • As we used
    timerfd
    for our timer events, each event source cost one file descriptor and
    we had many of them! File descriptors are a scarce resource on UNIX,
    as
    RLIMIT_NOFILE
    is typically set to 1024 or similar, limiting the number of
    available file descriptors per process to 1021, which isn’t
    particularly a lot.

  • Ordering of event dispatching became a nightmare. In many cases, we
    wanted to make sure that a certain kind of event would always be
    dispatched before another kind of event, if both happen at the same
    time. For example, when the last process of a service dies, we might
    be notified about that via a SIGCHLD signal, via an
    sd_notify() “STATUS=”
    message, and via a control group notification. We wanted to get
    these events in the right order, to know when it’s safe to process
    and subsequently release the runtime data systemd keeps about the
    service or process: it shouldn’t be done if there are still events
    about it pending.

  • For each program we added to the systemd project we noticed we were
    adding similar code, over and over again, to work with epoll’s
    complex interfaces. For example, finding the right file descriptor
    and callback function to dispatch an epoll event to, without running
    into invalidated pointer issues is outright difficult and requires
    non-trivial code.

  • Integrating child process watching into our event loops was much
    more complex than one could hope, and even more so if child process
    events should be ordered against each other and unrelated kinds of
    events.

Eventually, we started working on
sd-bus. At
the same time we decided to seize the opportunity, put together a
proper event loop API in C, and then not only port sd-bus on top of
it, but also the rest of systemd. The result of this is
sd-event. After
almost two years of development we declared sd-event stable in systemd
version 221, and published it as official API of libsystemd.

Why?

sd-event.h,
of course, is not the first event loop API around, and it doesn’t
implement any really novel concepts. When we started working on it we
tried to do our homework, and checked the various existing event loop
APIs, maybe looking for candidates to adopt instead of doing our own,
and to learn about the strengths and weaknesses of the various
implementations existing. Ultimately, we found no implementation that
could deliver what we needed, or where it would be easy to add the
missing bits: as usual in the systemd project, we wanted something
that allows us access to all the Linux-specific bits, instead of
limiting itself to the least common denominator of UNIX. We weren’t
looking for an abstraction API, but simply one that makes epoll usable
in system code.

With this blog story I’d like to take the opportunity to introduce you
to sd-event, and explain why it might be a good candidate to adopt as
event loop implementation in your project, too.

So, here are some features it provides:

  • I/O event sources, based on epoll’s file descriptor watching,
    including edge triggered events (EPOLLET). See
    sd_event_add_io(3).

  • Timer event sources, based on timerfd_create(), supporting the
    CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTIME clocks, as well
    as the CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM clocks that
    can resume the system from suspend. When creating timer events a
    required accuracy parameter may be specified which allows coalescing
    of timer events to minimize power consumption. For each clock only a
    single timer file descriptor is kept, and all timer events are
    multiplexed with a priority queue. See
    sd_event_add_time(3).

  • UNIX process signal events, based on
    signalfd(2),
    including full support for real-time signals, and queued
    parameters. See sd_event_add_signal(3).

  • Child process state change events, based on
    waitid(2). See
    sd_event_add_child(3).

  • Static event sources, of three types: defer, post and exit, for
    invoking calls in each event loop, after other event sources or at
    event loop termination. See
    sd_event_add_defer(3).

  • Event sources may be assigned a 64bit priority value, that controls
    the order in which event sources are dispatched if multiple are
    pending simultanously. See
    sd_event_source_set_priority(3).

  • The event loop may automatically send watchdog notification messages
    to the service manager. See sd_event_set_watchdog(3).

  • The event loop may be integrated into foreign event loops, such as
    the GLib one. The event loop API is hence composable, the same way
    the underlying epoll logic is. See
    sd_event_get_fd(3)
    for an example.

  • The API is fully OOM safe.

  • A complete set of documentation in UNIX man page format is
    available, with
    sd-event(3)
    as the entry page.

  • It’s pretty widely available, and requires no extra
    dependencies. Since systemd is built on it, most major distributions
    ship the library in their default install set.

  • After two years of development, and after being used in all of
    systemd’s components, it has received a fair share of testing already,
    even though we only recently decided to declare it stable and turned
    it into a public API.

Note that sd-event has some potential drawbacks too:

  • If portability is essential to you, sd-event is not your best
    option. sd-event is a wrapper around Linux-specific APIs, and that’s
    visible in the API. For example: our event callbacks receive
    structures defined by Linux-specific APIs such as signalfd.

  • It’s a low-level C API, and it doesn’t isolate you from the OS
    underpinnings. While I like to think that it is relatively nice and
    easy to use from C, it doesn’t compromise on exposing the low-level
    functionality. It just fills the gaps in what’s missing between
    epoll, timerfd, signalfd and related concepts, and it does not hide
    that away.

Either way, I believe that sd-event is a great choice when looking for
an event loop API, in particular if you work on system-level software
and embedded, where functionality like timer coalescing or
watchdog support matter.

Getting Started

Here’s a short example how to use sd-event in a simple daemon. In this
example, we’ll not just use sd-event.h, but also sd-daemon.h to
implement a system service.

#include <alloca.h>
#include <endian.h>
#include <errno.h>
#include <netinet/in.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <unistd.h>

#include <systemd/sd-daemon.h>
#include <systemd/sd-event.h>

static int io_handler(sd_event_source *es, int fd, uint32_t revents, void *userdata) {
        void *buffer;
        ssize_t n;
        int sz;

        /* UDP enforces a somewhat reasonable maximum datagram size of 64K, we can just allocate the buffer on the stack */
        if (ioctl(fd, FIONREAD, &sz) < 0)
                return -errno;
        buffer = alloca(sz);

        n = recv(fd, buffer, sz, 0);
        if (n < 0) {
                if (errno == EAGAIN)
                        return 0;

                return -errno;
        }

        if (n == 5 && memcmp(buffer, "EXIT\n", 5) == 0) {
                /* Request a clean exit */
                sd_event_exit(sd_event_source_get_event(es), 0);
                return 0;
        }

        fwrite(buffer, 1, n, stdout);
        fflush(stdout);
        return 0;
}

int main(int argc, char *argv[]) {
        union {
                struct sockaddr_in in;
                struct sockaddr sa;
        } sa;
        sd_event_source *event_source = NULL;
        sd_event *event = NULL;
        int fd = -1, r;
        sigset_t ss;

        r = sd_event_default(&event);
        if (r < 0)
                goto finish;

        if (sigemptyset(&ss) < 0 ||
            sigaddset(&ss, SIGTERM) < 0 ||
            sigaddset(&ss, SIGINT) < 0) {
                r = -errno;
                goto finish;
        }

        /* Block SIGTERM first, so that the event loop can handle it */
        if (sigprocmask(SIG_BLOCK, &ss, NULL) < 0) {
                r = -errno;
                goto finish;
        }

        /* Let's make use of the default handler and "floating" reference features of sd_event_add_signal() */
        r = sd_event_add_signal(event, NULL, SIGTERM, NULL, NULL);
        if (r < 0)
                goto finish;
        r = sd_event_add_signal(event, NULL, SIGINT, NULL, NULL);
        if (r < 0)
                goto finish;

        /* Enable automatic service watchdog support */
        r = sd_event_set_watchdog(event, true);
        if (r < 0)
                goto finish;

        fd = socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0);
        if (fd < 0) {
                r = -errno;
                goto finish;
        }

        sa.in = (struct sockaddr_in) {
                .sin_family = AF_INET,
                .sin_port = htobe16(7777),
        };
        if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
                r = -errno;
                goto finish;
        }

        r = sd_event_add_io(event, &event_source, fd, EPOLLIN, io_handler, NULL);
        if (r < 0)
                goto finish;

        (void) sd_notifyf(false,
                          "READY=1\n"
                          "STATUS=Daemon startup completed, processing events.");

        r = sd_event_loop(event);

finish:
        event_source = sd_event_source_unref(event_source);
        event = sd_event_unref(event);

        if (fd >= 0)
                (void) close(fd);

        if (r < 0)
                fprintf(stderr, "Failure: %s\n", strerror(-r));

        return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

The example above shows how to write a minimal UDP/IP server, that
listens on port 7777. Whenever a datagram is received it outputs its
contents to STDOUT, unless it is precisely the string EXIT\n in
which case the service exits. The service will react to SIGTERM and
SIGINT and do a clean exit then. It also notifies the service manager
about its completed startup, if it runs under a service
manager. Finally, it sends watchdog keep-alive messages to the service
manager if it asked for that, and if it runs under a service manager.

When run as systemd service this service’s STDOUT will be connected to
the logging framework of course, which means the service can act as a
minimal UDP-based remote logging service.

To compile and link this example, save it as event-example.c, then run:

$ gcc event-example.c -o event-example `pkg-config --cflags --libs libsystemd`

For a first test, simply run the resulting binary from the command
line, and test it against the following netcat command line:

$ nc -u localhost 7777

For the sake of brevity error checking is minimal, and in a real-world
application should, of course, be more comprehensive. However, it
hopefully gets the idea across how to write a daemon that reacts to
external events with sd-event.

For further details on the functions used in the example above, please
consult the manual pages:
sd-event(3),
sd_event_exit(3),
sd_event_source_get_event(3),
sd_event_default(3),
sd_event_add_signal(3),
sd_event_set_watchdog(3),
sd_event_add_io(3),
sd_notifyf(3),
sd_event_loop(3),
sd_event_source_unref(3),
sd_event_unref(3).

Conclusion

So, is this the event loop to end all other event loops? Certainly
not. I actually believe in “event loop plurality”. There are many
reasons for that, but most importantly: sd-event is supposed to be an
event loop suitable for writing a wide range of applications, but it’s
definitely not going to solve all event loop problems. For example,
while the priority logic is important for many usecase it comes with
drawbacks for others: if not used carefully high-priority event
sources can easily starve low-priority event sources. Also, in order
to implement the priority logic, sd-event needs to linearly iterate
through the event structures returned by
epoll_wait(2)
to sort the events by their priority, resulting in worst case
O(n*log(n)) complexity on each event loop wakeup (for n = number of
file descriptors). Then, to implement priorities fully, sd-event only
dispatches a single event before going back to the kernel and asking
for new events. sd-event will hence not provide the theoretically
possible best scalability to huge numbers of file descriptors. Of
course, this could be optimized, by improving epoll, and making it
support how todays’s event loops actually work (after, all, this is
the problem set all event loops that implement priorities — including
GLib’s — have to deal with), but even then: the design of sd-event is focussed on
running one event loop per thread, and it dispatches events strictly
ordered. In many other important usecases a very different design is
preferable: one where events are distributed to a set of worker threads
and are dispatched out-of-order.

Hence, don’t mistake sd-event for what it isn’t. It’s not supposed to
unify everybody on a single event loop. It’s just supposed to be a
very good implementation of an event loop suitable for a large part of
the typical usecases.

Note that our APIs, including
sd-bus, integrate nicely into
sd-event event loops, but do not require it, and may be integrated
into other event loops too, as long as they support watching for time
and I/O events.

And that’s all for now. If you are considering using sd-event for your
project and need help or have questions, please direct them to the
systemd mailing list.

systemd.conf 2015 Summary

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-2015-summary.html

systemd.conf 2015 is Over Now!

Last week our first systemd.conf conference
took place at betahaus, in Berlin, Germany. With almost 100 attendees,
a dense schedule of 23 high-quality talks stuffed into a single track
on just two days, a productive hackfest and numerous consumed
Club-Mates I believe it was quite a success!

If you couldn’t attend the conference, you may watch all talks on our
YouTube
Channel
. The
slides are available
online
,
too.

Many photos from the conference are available on the Google Events
Page
. Enjoy!

I’d specifically like to thank Daniel Mack, Chris Kühl and Nils Magnus
for running the conference, and making sure that it worked out as
smoothly as it did! Thank you very much, you did a fantastic job!

I’d also specifically like to thank the CCC Video Operation
Center
folks for the excellent video coverage of
the conference. Not only did they implement a live-stream for the
entire talks part of the conference, but also cut and uploaded videos
of all talks to our YouTube
Channel

within the same day (in fact, within a few hours after the talks
finished). That’s quite an impressive feat!

The folks from LinuxTag e.V. put a lot of time and energy in the
organization. It was great to see how well this all worked out!
Excellent work!

(BTW, LinuxTag e.V. and the CCC Video Operation Center folks are
willing to help with the organization of Free Software community
events in Germany (and Europe?). Hence, if you need an entity that can
do the financial work and other stuff for your Free Software project’s
conference, consider pinging LinuxTag, they might be willing to
help. Similar, if you are organizing such an event and are thinking
about providing video coverage, consider pinging the the CCC VOC
folks! Both of them get our best recommendations!)

I’d also like to thank our conference
sponsors
!
Specifically, we’d like to thank our Gold Sponsors Red Hat and
CoreOS for their support. We’d also like to thank our Silver
Sponsor Codethink, and our Bronze Sponsors Pengutronix,
Pantheon, Collabora, Endocode, the Linux Foundation,
Samsung and Travelping, as well as our Cooperation Partners
LinuxTag and kinvolk.io, and our Media Partner Golem.de.

Last but not least I’d really like to thank our speakers and attendees
for presenting and participating in the conference. Of course, the
conference we put together specifically for you, and we really hope
you had as much fun at it as we did!

Thank you all for attending, supporting, and organizing systemd.conf
2015
! We are looking forward to seeing you
and working with you again at systemd.conf 2016!

Thanks!

Second Round of systemd.conf 2015 Sponsors

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/second-round-of-systemdconf-2015-sponsors.html

Second Round of systemd.conf 2015 Sponsors

We are happy to announce the second round of systemd.conf
2015
sponsors! In addition to those from
the first
announcement
, we have:

Our second Gold sponsor is Red Hat!

What began as a better way to build software—openness, transparency, collaboration—soon shifted the balance of power in an entire industry. The revolution of choice continues. Today Red Hat® is the world’s leading provider of open source solutions, using a community-powered approach to provide reliable and high-performing cloud, virtualization, storage, Linux®, and middleware technologies.

A Bronze sponsor is Samsung:

From the beginning we have established a very fast pace and are currently one of the biggest and fastest growing modern-technology R&D centers in East-Central Europe.
We have started with designing subsystems for digital satellite television, however, we have quickly expanded the scope of our interest. Currently, it includes advanced systems of digital television, platform convergence, mobile systems, smart solutions, and enterprise solutions.
Also a vital role in our activity plays the quality and certification center, which controls the conformity of Samsung Electronics products with the highest standards of quality and reliability.

A Bronze sponsor is travelping:

Travelping is passionate about networks, communications and devices. We empower our customers to deploy and operate networks using our state of the art products, solutions and services.
Our products and solutions are based on our industry proven physical and virtual appliance platforms. These purpose built platforms ensure best in class performance, scalability and reliability combined with consistent end to end management capabilities.
To build this products, Travelping has developed a own embedded, cross platform Linux distribution called CAROS.io which incorporates the systemd service manager and tools.

A Bronze sponsor is Collabora:

Collabora has over 10 years of experience working with top tier OEMs & silicon manufacturers worldwide to develop products based on Open Source software. Through the use of Open Source technologies and methodologies, Collabora helps clients in multiple market segments gain faster time to market and save millions of dollars in licensing and maintenance costs. Collabora has already brought to market several products relying on systemd extensively.

A Bronze sponsor is Endocode:

Endocode AG. An employee-owned, software engineering company from Berlin. Open Source is our heart and soul.

A Bronze sponsor is the Linux Foundation:

The Linux Foundation advances the growth of Linux and offers its collaborative principles and practices to any endeavor.

We are Cooperating with LinuxTag e.V. on the organization:

LinuxTag is Europe’s leading organizer of Linux and Open Source events. Born of the community and in business for 20 years, we organize LinuxTag, an annual conference and exhibition attracting thousands of visitors. We also participate and cooperate in organizing workshops, tutorials, seminars, and other events together with and for the Open Source community. Selected events include non-profit workshops, the German Kernel Summit at FrOSCon, participation in the Open Tech Summit, and others. We take care of the organizational framework of systemd.conf 2015. LinuxTag e.V. is a non-profit organization and welcomes donations of ideas and workforce.

A Media Partner is Golem:

Golem.de is an up to date online-publication intended for professional computer users. It provides technology insights of the IT and telecommunications industry. Golem.de offers profound and up to date information on significant and trending topics. Online- and IT-Professionals, marketing managers, purchasers, and readers inspired by technology receive substantial information on product, market and branding potentials through tests, interviews und market analysis.

We’d like to thank our sponsors for their support! Without sponsors our conference would not be possible!

The Conference s SOLD OUT since a few weeks. We no longer accept registrations, nor paper submissions.

For further details about systemd.conf consult the conference website.

See the the first round of sponsor announcements!

See you in Berlin!

systemd.conf close to being sold out!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-close-to-being-sold-out.html

Only 14 tickets still available!

systemd.conf 2015 is close to being sold out, there are only 14
tickets left now. If you haven’t bought your ticket yet, now is
the time to do it, because otherwise it will be too late and all
tickets will be gone!

Why attend? At this conference you’ll get to meet everybody who is
involved with the systemd project and learn what they are working on,
and where the project will go next. You’ll hear from major users and
projects working with systemd. It’s the primary forum where you can
make yourself heard and get first hand access to everybody who’s
working on the future of the core Linux userspace!

To get an idea about the schedule, please consult our preliminary
schedule
.

In order to register for the conference, please visit the
registration
page
.

We are still looking for sponsors. If you’d like to join the ranks of
systemd.conf 2015 sponsors, please have a look at our Becoming a
Sponsor
page!

For further details about systemd.conf consult the conference
website
.

Preliminary systemd.conf 2015 Schedule

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/preliminary-systemdconf-2015-schedule.html

A Preliminary systemd.conf 2015 Schedule is Now Online!

We are happy to announce that an initial, preliminary version of the
systemd.conf 2015
schedule
is now
online! (Please ignore that some rows in the schedule link the same
session twice on that page. That’s a bug in the web site CMS we are
working on to fix.)

We got an overwhelming number of high-quality submissions during the
CfP! Because there were so many good talks we really wanted to
accept, we decided to do two full days of talks now, leaving one more
day for the hackfest and BoFs. We also shortened many of the slots, to
make room for more. All in all we now have a schedule packed with
fantastic presentations!

The areas covered range from containers, to system provisioning,
stateless systems, distributed init systems, the kdbus IPC, control
groups, systemd on the desktop, systemd in embedded devices,
configuration management and systemd, and systemd in downstream
distributions.

We’d like to thank everybody who submited a presentation proposal!

Also, don’t forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
Register here!.

We are still looking for sponsors. If you’d like to join the ranks of
systemd.conf 2015 sponsors, please have a look at our Becoming a
Sponsor
page!

For further details about systemd.conf consult the conference
website
.

systemd.conf 2015 CfP REMINDER

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-2015-cfp-reminder.html

LAST REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!

Here’s the last reminder that the systemd.conf 2015 CfP ends on August
31st 11:59:59pm Central European Time (that’s monday next week)! Make
sure to submit your proposals until then!

Please submit your proposals on our
website
!

And don’t forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
Register here!.

For further details about systemd.conf consult the conference website.

First Round of systemd.conf 2015 Sponsors

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/first-round-of-systemdconf-2015-sponsors.html

First Round of systemd.conf 2015 Sponsors

We are happy to announce the first round of systemd.conf
2015
sponsors!

Our first Gold sponsor is CoreOS!

CoreOS develops software for modern infrastructure that delivers a consistent operating environment for distributed applications. CoreOS’s commercial offering, Tectonic, is an enterprise-ready platform that combines Kubernetes and the CoreOS stack to run Linux containers. In addition CoreOS is the creator and maintainer of open source projects such as CoreOS Linux, etcd, fleet, flannel and rkt. The strategies and architectures that influence CoreOS allow companies like Google, Facebook and Twitter to run their services at scale with high resilience. Learn more about CoreOS here https://coreos.com/, Tectonic here, https://tectonic.com/ or follow CoreOS on Twitter @coreoslinux.

A Silver sponsor is Codethink:

Codethink is a software services consultancy, focusing on engineering reliable systems for long-term deployment with open source technologies.

A Bronze sponsor is Pantheon:

Pantheon is a platform for professional website development, testing, and deployment. Supporting Drupal and WordPress, Pantheon runs over 100,000 websites for the world’s top brands, universities, and media organizations on top of over a million containers.

A Bronze sponsor is Pengutronix:

Pengutronix provides consulting, training and development services for Embedded Linux to customers from the industry. The Kernel Team ports Linux to customer hardware and has more than 3100 patches in the official mainline kernel. In addition to lowlevel ports, the Pengutronix Application Team is responsible for board support packages based on PTXdist or Yocto and deals with system integration (this is where systemd plays an important role). The Graphics Team works on accelerated multimedia tasks, based on the Linux kernel, GStreamer, Qt and web technologies.

We’d like to thank our sponsors for their support! Without sponsors our conference would not be possible!

We’ll shortly announce our second round of sponsors, please stay tuned!

If you’d like to join the ranks of systemd.conf 2015 sponsors, please have a look at our Becoming a Sponsor page!

Reminder! The systemd.conf 2015 Call for Presentations ends on monday, August 31st! Please make sure to submit your proposals on the CfP page until then!

Also, don’t forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
Register here!.

For further details about systemd.conf consult the conference website.

systemd.conf 2015 Call for Presentations

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-2015-call-for-presentations.html

REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!

We’d like to remind you that the systemd.conf 2015 Call for Presentations ends
on August 31st! Please submit your presentation proposals before that data
on our website.

We are specifically interested in submissions from projects and vendors building
today’s and tomorrow’s products, services and devices with systemd. We’d like to
learn about the problems you encounter and the benefits you see! Hence, if
you work for a company using systemd, please submit a presentation!

We are also specifically interested in submissions from downstream distribution
maintainers of systemd! If you develop or maintain systemd packages in a
distribution, please submit a presentation reporting about the state, future
and the problems of systemd packaging so that we can improve downstream
collaboration!

And of course, all talks regarding systemd usage in containers, in the cloud,
on servers, on the desktop, in mobile and in embedded are highly welcome! Talks
about systemd networking and kdbus IPC are very welcome too!

Please submit your presentations until August 31st!

And don’t forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
Register here!.

Also, limited travel and entry fee sponsorship is available for community contributors. Please contact us for details!

For further details about the CfP consult the CfP page.

For further details about systemd.conf consult the conference website.

Announcing systemd.conf 2015

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/announcing-systemdconf-2015.html

Announcing systemd.conf 2015

We are happy to announce the inaugural systemd.conf 2015 conference of the systemd project.

The conference takes place November 5th-7th, 2015 in Berlin, Germany.

Only a limited number of tickets are available, hence make sure to sign up quickly.

For further details consult the conference website.

The new sd-bus API of systemd

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/the-new-sd-bus-api-of-systemd.html

With the new v221 release of
systemd

we are declaring the
sd-bus
API shipped with
systemd
stable. sd-bus is our minimal D-Bus
IPC
C library, supporting as
back-ends both classic socket-based D-Bus and
kdbus. The library has been been
part of systemd for a while, but has only been used internally, since
we wanted to have the liberty to still make API changes without
affecting external consumers of the library. However, now we are
confident to commit to a stable API for it, starting with v221.

In this blog story I hope to provide you with a quick overview on
sd-bus, a short reiteration on D-Bus and its concepts, as well as a
few simple examples how to write D-Bus clients and services with it.

What is D-Bus again?

Let’s start with a quick reminder what
D-Bus actually is: it’s a
powerful, generic IPC system for Linux and other operating systems. It
knows concepts like buses, objects, interfaces, methods, signals,
properties. It provides you with fine-grained access control, a rich
type system, discoverability, introspection, monitoring, reliable
multicasting, service activation, file descriptor passing, and
more. There are bindings for numerous programming languages that are
used on Linux.

D-Bus has been a core component of Linux systems since more than 10
years. It is certainly the most widely established high-level local
IPC system on Linux. Since systemd’s inception it has been the IPC
system it exposes its interfaces on. And even before systemd, it was
the IPC system Upstart used to expose its interfaces. It is used by
GNOME, by KDE and by a variety of system components.

D-Bus refers to both a
specification
,
and a reference
implementation
. The
reference implementation provides both a bus server component, as well
as a client library. While there are multiple other, popular
reimplementations of the client library – for both C and other
programming languages –, the only commonly used server side is the
one from the reference implementation. (However, the kdbus project is
working on providing an alternative to this server implementation as a
kernel component.)

D-Bus is mostly used as local IPC, on top of AF_UNIX sockets. However,
the protocol may be used on top of TCP/IP as well. It does not
natively support encryption, hence using D-Bus directly on TCP is
usually not a good idea. It is possible to combine D-Bus with a
transport like ssh in order to secure it. systemd uses this to make
many of its APIs accessible remotely.

A frequently asked question about D-Bus is why it exists at all,
given that AF_UNIX sockets and FIFOs already exist on UNIX and have
been used for a long time successfully. To answer this question let’s
make a comparison with popular web technology of today: what
AF_UNIX/FIFOs are to D-Bus, TCP is to HTTP/REST. While AF_UNIX
sockets/FIFOs only shovel raw bytes between processes, D-Bus defines
actual message encoding and adds concepts like method call
transactions, an object system, security mechanisms, multicasting and
more.

From our 10year+ experience with D-Bus we know today that while there
are some areas where we can improve things (and we are working on
that, both with kdbus and sd-bus), it generally appears to be a very
well designed system, that stood the test of time, aged well and is
widely established. Today, if we’d sit down and design a completely
new IPC system incorporating all the experience and knowledge we
gained with D-Bus, I am sure the result would be very close to what
D-Bus already is.

Or in short: D-Bus is great. If you hack on a Linux project and need a
local IPC, it should be your first choice. Not only because D-Bus is
well designed, but also because there aren’t many alternatives that
can cover similar functionality.

Where does sd-bus fit in?

Let’s discuss why sd-bus exists, how it compares with the other
existing C D-Bus libraries and why it might be a library to consider
for your project.

For C, there are two established, popular D-Bus libraries: libdbus, as
it is shipped in the reference implementation of D-Bus, as well as
GDBus, a component of GLib, the low-level tool library of GNOME.

Of the two libdbus is the much older one, as it was written at the
time the specification was put together. The library was written with
a focus on being portable and to be useful as back-end for higher-level
language bindings. Both of these goals required the API to be very
generic, resulting in a relatively baroque, hard-to-use API that lacks
the bits that make it easy and fun to use from C. It provides the
building blocks, but few tools to actually make it straightforward to
build a house from them. On the other hand, the library is suitable
for most use-cases (for example, it is OOM-safe making it suitable for
writing lowest level system software), and is portable to operating
systems like Windows or more exotic UNIXes.

GDBus
is a much newer implementation. It has been written after considerable
experience with using a GLib/GObject wrapper around libdbus. GDBus is
implemented from scratch, shares no code with libdbus. Its design
differs substantially from libdbus, it contains code generators to
make it specifically easy to expose GObject objects on the bus, or
talking to D-Bus objects as GObject objects. It translates D-Bus data
types to GVariant, which is GLib’s powerful data serialization
format. If you are used to GLib-style programming then you’ll feel
right at home, hacking D-Bus services and clients with it is a lot
simpler than using libdbus.

With sd-bus we now provide a third implementation, sharing no code
with either libdbus or GDBus. For us, the focus was on providing kind
of a middle ground between libdbus and GDBus: a low-level C library
that actually is fun to work with, that has enough syntactic sugar to
make it easy to write clients and services with, but on the other hand
is more low-level than GDBus/GLib/GObject/GVariant. To be able to use
it in systemd’s various system-level components it needed to be
OOM-safe and minimal. Another major point we wanted to focus on was
supporting a kdbus back-end right from the beginning, in addition to
the socket transport of the original D-Bus specification (“dbus1”). In
fact, we wanted to design the library closer to kdbus’ semantics than
to dbus1’s, wherever they are different, but still cover both
transports nicely. In contrast to libdbus or GDBus portability is not
a priority for sd-bus, instead we try to make the best of the Linux
platform and expose specific Linux concepts wherever that is
beneficial. Finally, performance was also an issue (though a secondary
one): neither libdbus nor GDBus will win any speed records. We wanted
to improve on performance (throughput and latency) — but simplicity
and correctness are more important to us. We believe the result of our
work delivers our goals quite nicely: the library is fun to use,
supports kdbus and sockets as back-end, is relatively minimal, and the
performance is substantially
better

than both libdbus and GDBus.

To decide which of the three APIs to use for you C project, here are
short guidelines:

  • If you hack on a GLib/GObject project, GDBus is definitely your
    first choice.

  • If portability to non-Linux kernels — including Windows, Mac OS and
    other UNIXes — is important to you, use either GDBus (which more or
    less means buying into GLib/GObject) or libdbus (which requires a
    lot of manual work).

  • Otherwise, sd-bus would be my recommended choice.

(I am not covering C++ specifically here, this is all about plain C
only. But do note: if you use Qt, then QtDBus is the D-Bus API of
choice, being a wrapper around libdbus.)

Introduction to D-Bus Concepts

To the uninitiated D-Bus usually appears to be a relatively opaque
technology. It uses lots of concepts that appear unnecessarily complex
and redundant on first sight. But actually, they make a lot of
sense. Let’s have a look:

  • A bus is where you look for IPC services. There are usually two
    kinds of buses: a system bus, of which there’s exactly one per
    system, and which is where you’d look for system services; and a
    user bus, of which there’s one per user, and which is where you’d
    look for user services, like the address book service or the mail
    program. (Originally, the user bus was actually a session bus — so
    that you get multiple of them if you log in many times as the same
    user –, and on most setups it still is, but we are working on
    moving things to a true user bus, of which there is only one per
    user on a system, regardless how many times that user happens to
    log in.)

  • A service is a program that offers some IPC API on a bus. A
    service is identified by a name in reverse domain name
    notation. Thus, the org.freedesktop.NetworkManager service on the
    system bus is where NetworkManager’s APIs are available and
    org.freedesktop.login1 on the system bus is where
    systemd-logind‘s APIs are exposed.

  • A client is a program that makes use of some IPC API on a bus. It
    talks to a service, monitors it and generally doesn’t provide any
    services on its own. That said, lines are blurry and many services
    are also clients to other services. Frequently the term peer is
    used as a generalization to refer to either a service or a client.

  • An object path is an identifier for an object on a specific
    service. In a way this is comparable to a C pointer, since that’s
    how you generally reference a C object, if you hack object-oriented
    programs in C. However, C pointers are just memory addresses, and
    passing memory addresses around to other processes would make
    little sense, since they of course refer to the address space of
    the service, the client couldn’t make sense of it. Thus, the D-Bus
    designers came up with the object path concept, which is just a
    string that looks like a file system path. Example:
    /org/freedesktop/login1 is the object path of the ‘manager’
    object of the org.freedesktop.login1 service (which, as we
    remember from above, is still the service systemd-logind
    exposes). Because object paths are structured like file system
    paths they can be neatly arranged in a tree, so that you end up
    with a venerable tree of objects. For example, you’ll find all user
    sessions systemd-logind manages below the
    /org/freedesktop/login1/session sub-tree, for example called
    /org/freedesktop/login1/session/_7,
    /org/freedesktop/login1/session/_55 and so on. How services
    precisely label their objects and arrange them in a tree is
    completely up to the developers of the services.

  • Each object that is identified by an object path has one or more
    interfaces. An interface is a collection of signals, methods, and
    properties (collectively called members), that belong
    together. The concept of a D-Bus interface is actually pretty
    much identical to what you know from programming languages such as
    Java, which also know an interface concept. Which interfaces an
    object implements are up the developers of the service. Interface
    names are in reverse domain name notation, much like service
    names. (Yes, that’s admittedly confusing, in particular since it’s
    pretty common for simpler services to reuse the service name string
    also as an interface name.) A couple of interfaces are standardized
    though and you’ll find them available on many of the objects
    offered by the various services. Specifically, those are
    org.freedesktop.DBus.Introspectable, org.freedesktop.DBus.Peer
    and org.freedesktop.DBus.Properties.

  • An interface can contain methods. The word “method” is more or
    less just a fancy word for “function”, and is a term used pretty
    much the same way in object-oriented languages such as Java. The
    most common interaction between D-Bus peers is that one peer
    invokes one of these methods on another peer and gets a reply. A
    D-Bus method takes a couple of parameters, and returns others. The
    parameters are transmitted in a type-safe way, and the type
    information is included in the introspection data you can query
    from each object. Usually, method names (and the other member
    types) follow a CamelCase syntax. For example, systemd-logind
    exposes an ActivateSession method on the
    org.freedesktop.login1.Manager interface that is available on the
    /org/freedesktop/login1 object of the org.freedesktop.login1
    service.

  • A signature describes a set of parameters a function (or signal,
    property, see below) takes or returns. It’s a series of characters
    that each encode one parameter by its type. The set of types
    available is pretty powerful. For example, there are simpler types
    like s for string, or u for 32bit integer, but also complex
    types such as as for an array of strings or a(sb) for an array
    of structures consisting of one string and one boolean each. See
    the D-Bus specification
    for the full explanation of the type system. The
    ActivateSession method mentioned above takes a single string as
    parameter (the parameter signature is hence s), and returns
    nothing (the return signature is hence the empty string). Of
    course, the signature can get a lot more complex, see below for
    more examples.

  • A signal is another member type that the D-Bus object system
    knows. Much like a method it has a signature. However, they serve
    different purposes. While in a method call a single client issues a
    request on a single service, and that service sends back a response
    to the client, signals are for general notification of
    peers. Services send them out when they want to tell one or more
    peers on the bus that something happened or changed. In contrast to
    method calls and their replies they are hence usually broadcast
    over a bus. While method calls/replies are used for duplex
    one-to-one communication, signals are usually used for simplex
    one-to-many communication (note however that that’s not a
    requirement, they can also be used one-to-one). Example:
    systemd-logind broadcasts a SessionNew signal from its manager
    object each time a user logs in, and a SessionRemoved signal
    every time a user logs out.

  • A property is the third member type that the D-Bus object system
    knows. It’s similar to the property concept known by languages like
    C#. Properties also have a signature, and are more or less just
    variables that an object exposes, that can be read or altered by
    clients. Example: systemd-logind exposes a property Docked of
    the signature b (a boolean). It reflects whether systemd-logind
    thinks the system is currently in a docking station of some form
    (only applies to laptops …).

So much for the various concepts D-Bus knows. Of course, all these new
concepts might be overwhelming. Let’s look at them from a different
perspective. I assume many of the readers have an understanding of
today’s web technology, specifically HTTP and REST. Let’s try to
compare the concept of a HTTP request with the concept of a D-Bus
method call:

  • A HTTP request you issue on a specific network. It could be the
    Internet, or it could be your local LAN, or a company
    VPN. Depending on which network you issue the request on, you’ll be
    able to talk to a different set of servers. This is not unlike the
    “bus” concept of D-Bus.

  • On the network you then pick a specific HTTP server to talk
    to. That’s roughly comparable to picking a service on a specific bus.

  • On the HTTP server you then ask for a specific URL. The “path” part
    of the URL (by which I mean everything after the host name of the
    server, up to the last “/”) is pretty similar to a D-Bus object path.

  • The “file” part of the URL (by which I mean everything after the
    last slash, following the path, as described above), then defines
    the actual call to make. In D-Bus this could be mapped to an
    interface and method name.

  • Finally, the parameters of a HTTP call follow the path after the
    “?”, they map to the signature of the D-Bus call.

Of course, comparing an HTTP request to a D-Bus method call is a bit
comparing apples and oranges. However, I think it’s still useful to
get a bit of a feeling of what maps to what.

From the shell

So much about the concepts and the gray theory behind them. Let’s make
this exciting, let’s actually see how this feels on a real system.

Since a while systemd has included a tool busctl that is useful to
explore and interact with the D-Bus object system. When invoked
without parameters, it will show you a list of all peers connected to
the system bus. (Use --user to see the peers of your user bus
instead):

$ busctl
NAME                                       PID PROCESS         USER             CONNECTION    UNIT                      SESSION    DESCRIPTION
:1.1                                         1 systemd         root             :1.1          -                         -          -
:1.11                                      705 NetworkManager  root             :1.11         NetworkManager.service    -          -
:1.14                                      744 gdm             root             :1.14         gdm.service               -          -
:1.4                                       708 systemd-logind  root             :1.4          systemd-logind.service    -          -
:1.7200                                  17563 busctl          lennart          :1.7200       session-1.scope           1          -
[…]
org.freedesktop.NetworkManager             705 NetworkManager  root             :1.11         NetworkManager.service    -          -
org.freedesktop.login1                     708 systemd-logind  root             :1.4          systemd-logind.service    -          -
org.freedesktop.systemd1                     1 systemd         root             :1.1          -                         -          -
org.gnome.DisplayManager                   744 gdm             root             :1.14         gdm.service               -          -
[…]

(I have shortened the output a bit, to make keep things brief).

The list begins with a list of all peers currently connected to the
bus. They are identified by peer names like “:1.11”. These are called
unique names in D-Bus nomenclature. Basically, every peer has a
unique name, and they are assigned automatically when a peer connects
to the bus. They are much like an IP address if you so will. You’ll
notice that a couple of peers are already connected, including our
little busctl tool itself as well as a number of system services. The
list then shows all actual services on the bus, identified by their
service names (as discussed above; to discern them from the unique
names these are also called well-known names). In many ways
well-known names are similar to DNS host names, i.e. they are a
friendlier way to reference a peer, but on the lower level they just
map to an IP address, or in this comparison the unique name. Much like
you can connect to a host on the Internet by either its host name or
its IP address, you can also connect to a bus peer either by its
unique or its well-known name. (Note that each peer can have as many
well-known names as it likes, much like an IP address can have
multiple host names referring to it).

OK, that’s already kinda cool. Try it for yourself, on your local
machine (all you need is a recent, systemd-based distribution).

Let’s now go the next step. Let’s see which objects the
org.freedesktop.login1 service actually offers:

$ busctl tree org.freedesktop.login1
└─/org/freedesktop/login1
  ├─/org/freedesktop/login1/seat
  │ ├─/org/freedesktop/login1/seat/seat0
  │ └─/org/freedesktop/login1/seat/self
  ├─/org/freedesktop/login1/session
  │ ├─/org/freedesktop/login1/session/_31
  │ └─/org/freedesktop/login1/session/self
  └─/org/freedesktop/login1/user
    ├─/org/freedesktop/login1/user/_1000
    └─/org/freedesktop/login1/user/self

Pretty, isn’t it? What’s actually even nicer, and which the output
does not show is that there’s full command line completion
available: as you press TAB the shell will auto-complete the service
names for you. It’s a real pleasure to explore your D-Bus objects that
way!

The output shows some objects that you might recognize from the
explanations above. Now, let’s go further. Let’s see what interfaces,
methods, signals and properties one of these objects actually exposes:

$ busctl introspect org.freedesktop.login1 /org/freedesktop/login1/session/_31
NAME                                TYPE      SIGNATURE RESULT/VALUE                             FLAGS
org.freedesktop.DBus.Introspectable interface -         -                                        -
.Introspect                         method    -         s                                        -
org.freedesktop.DBus.Peer           interface -         -                                        -
.GetMachineId                       method    -         s                                        -
.Ping                               method    -         -                                        -
org.freedesktop.DBus.Properties     interface -         -                                        -
.Get                                method    ss        v                                        -
.GetAll                             method    s         a{sv}                                    -
.Set                                method    ssv       -                                        -
.PropertiesChanged                  signal    sa{sv}as  -                                        -
org.freedesktop.login1.Session      interface -         -                                        -
.Activate                           method    -         -                                        -
.Kill                               method    si        -                                        -
.Lock                               method    -         -                                        -
.PauseDeviceComplete                method    uu        -                                        -
.ReleaseControl                     method    -         -                                        -
.ReleaseDevice                      method    uu        -                                        -
.SetIdleHint                        method    b         -                                        -
.TakeControl                        method    b         -                                        -
.TakeDevice                         method    uu        hb                                       -
.Terminate                          method    -         -                                        -
.Unlock                             method    -         -                                        -
.Active                             property  b         true                                     emits-change
.Audit                              property  u         1                                        const
.Class                              property  s         "user"                                   const
.Desktop                            property  s         ""                                       const
.Display                            property  s         ""                                       const
.Id                                 property  s         "1"                                      const
.IdleHint                           property  b         true                                     emits-change
.IdleSinceHint                      property  t         1434494624206001                         emits-change
.IdleSinceHintMonotonic             property  t         0                                        emits-change
.Leader                             property  u         762                                      const
.Name                               property  s         "lennart"                                const
.Remote                             property  b         false                                    const
.RemoteHost                         property  s         ""                                       const
.RemoteUser                         property  s         ""                                       const
.Scope                              property  s         "session-1.scope"                        const
.Seat                               property  (so)      "seat0" "/org/freedesktop/login1/seat... const
.Service                            property  s         "gdm-autologin"                          const
.State                              property  s         "active"                                 -
.TTY                                property  s         "/dev/tty1"                              const
.Timestamp                          property  t         1434494630344367                         const
.TimestampMonotonic                 property  t         34814579                                 const
.Type                               property  s         "x11"                                    const
.User                               property  (uo)      1000 "/org/freedesktop/login1/user/_1... const
.VTNr                               property  u         1                                        const
.Lock                               signal    -         -                                        -
.PauseDevice                        signal    uus       -                                        -
.ResumeDevice                       signal    uuh       -                                        -
.Unlock                             signal    -         -                                        -

As before, the busctl command supports command line completion, hence
both the service name and the object path used are easily put together
on the shell simply by pressing TAB. The output shows the methods,
properties, signals of one of the session objects that are currently
made available by systemd-logind. There’s a section for each
interface the object knows. The second column tells you what kind of
member is shown in the line. The third column shows the signature of
the member. In case of method calls that’s the input parameters, the
fourth column shows what is returned. For properties, the fourth
column encodes the current value of them.

So far, we just explored. Let’s take the next step now: let’s become
active – let’s call a method:

# busctl call org.freedesktop.login1 /org/freedesktop/login1/session/_31 org.freedesktop.login1.Session Lock

I don’t think I need to mention this anymore, but anyway: again
there’s full command line completion available. The third argument is
the interface name, the fourth the method name, both can be easily
completed by pressing TAB. In this case we picked the Lock method,
which activates the screen lock for the specific session. And yupp,
the instant I pressed enter on this line my screen lock turned on
(this only works on DEs that correctly hook into systemd-logind for
this to work. GNOME works fine, and KDE should work too).

The Lock method call we picked is very simple, as it takes no
parameters and returns none. Of course, it can get more complicated
for some calls. Here’s another example, this time using one of
systemd’s own bus calls, to start an arbitrary system unit:

# busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartUnit ss "cups.service" "replace"
o "/org/freedesktop/systemd1/job/42684"

This call takes two strings as input parameters, as we denote in the
signature string that follows the method name (as usual, command line
completion helps you getting this right). Following the signature the
next two parameters are simply the two strings to pass. The specified
signature string hence indicates what comes next. systemd’s StartUnit
method call takes the unit name to start as first parameter, and the
mode in which to start it as second. The call returned a single object
path value. It is encoded the same way as the input parameter: a
signature (just o for the object path) followed by the actual value.

Of course, some method call parameters can get a ton more complex, but
with busctl it’s relatively easy to encode them all. See the man
page
for
details.

busctl knows a number of other operations. For example, you can use
it to monitor D-Bus traffic as it happens (including generating a
.cap file for use with Wireshark!) or you can set or get specific
properties. However, this blog story was supposed to be about sd-bus,
not busctl, hence let’s cut this short here, and let me direct you
to the man page in case you want to know more about the tool.

busctl (like the rest of system) is implemented using the sd-bus
API. Thus it exposes many of the features of sd-bus itself. For
example, you can use to connect to remote or container buses. It
understands both kdbus and classic D-Bus, and more!

sd-bus

But enough! Let’s get back on topic, let’s talk about sd-bus itself.

The sd-bus set of APIs is mostly contained in the header file
sd-bus.h.

Here’s a random selection of features of the library, that make it
compare well with the other implementations available.

  • Supports both kdbus and dbus1 as back-end.

  • Has high-level support for connecting to remote buses via ssh, and
    to buses of local OS containers.

  • Powerful credential model, to implement authentication of clients
    in services. Currently 34 individual fields are supported, from the
    PID of the client to the cgroup or capability sets.

  • Support for tracking the life-cycle of peers in order to release
    local objects automatically when all peers referencing them
    disconnected.

  • The client builds an efficient decision tree to determine which
    handlers to deliver an incoming bus message to.

  • Automatically translates D-Bus errors into UNIX style errors and
    back (this is lossy though), to ensure best integration of D-Bus
    into low-level Linux programs.

  • Powerful but lightweight object model for exposing local objects on
    the bus. Automatically generates introspection as necessary.

The API is currently not fully documented, but we are working on
completing the set of manual pages. For details
see all pages starting with sd_bus_.

Invoking a Method, from C, with sd-bus

So much about the library in general. Here’s an example for connecting
to the bus and issuing a method call:

#include <stdio.h>
#include <stdlib.h>
#include <systemd/sd-bus.h>

int main(int argc, char *argv[]) {
        sd_bus_error error = SD_BUS_ERROR_NULL;
        sd_bus_message *m = NULL;
        sd_bus *bus = NULL;
        const char *path;
        int r;

        /* Connect to the system bus */
        r = sd_bus_open_system(&bus);
        if (r < 0) {
                fprintf(stderr, "Failed to connect to system bus: %s\n", strerror(-r));
                goto finish;
        }

        /* Issue the method call and store the respons message in m */
        r = sd_bus_call_method(bus,
                               "org.freedesktop.systemd1",           /* service to contact */
                               "/org/freedesktop/systemd1",          /* object path */
                               "org.freedesktop.systemd1.Manager",   /* interface name */
                               "StartUnit",                          /* method name */
                               &error,                               /* object to return error in */
                               &m,                                   /* return message on success */
                               "ss",                                 /* input signature */
                               "cups.service",                       /* first argument */
                               "replace");                           /* second argument */
        if (r < 0) {
                fprintf(stderr, "Failed to issue method call: %s\n", error.message);
                goto finish;
        }

        /* Parse the response message */
        r = sd_bus_message_read(m, "o", &path);
        if (r < 0) {
                fprintf(stderr, "Failed to parse response message: %s\n", strerror(-r));
                goto finish;
        }

        printf("Queued service job as %s.\n", path);

finish:
        sd_bus_error_free(&error);
        sd_bus_message_unref(m);
        sd_bus_unref(bus);

        return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

Save this example as bus-client.c, then build it with:

$ gcc bus-client.c -o bus-client `pkg-config --cflags --libs libsystemd`

This will generate a binary bus-client you can now run. Make sure to
run it as root though, since access to the StartUnit method is
privileged:

# ./bus-client
Queued service job as /org/freedesktop/systemd1/job/3586.

And that’s it already, our first example. It showed how we invoked a
method call on the bus. The actual function call of the method is very
close to the busctl command line we used before. I hope the code
excerpt needs little further explanation. It’s supposed to give you a
taste how to write D-Bus clients with sd-bus. For more more
information please have a look at the header file, the man page or
even the sd-bus sources.

Implementing a Service, in C, with sd-bus

Of course, just calling a single method is a rather simplistic
example. Let’s have a look on how to write a bus service. We’ll write
a small calculator service, that exposes a single object, which
implements an interface that exposes two methods: one to multiply two
64bit signed integers, and one to divide one 64bit signed integer by
another.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <systemd/sd-bus.h>

static int method_multiply(sd_bus_message *m, void *userdata, sd_bus_error *ret_error) {
        int64_t x, y;
        int r;

        /* Read the parameters */
        r = sd_bus_message_read(m, "xx", &x, &y);
        if (r < 0) {
                fprintf(stderr, "Failed to parse parameters: %s\n", strerror(-r));
                return r;
        }

        /* Reply with the response */
        return sd_bus_reply_method_return(m, "x", x * y);
}

static int method_divide(sd_bus_message *m, void *userdata, sd_bus_error *ret_error) {
        int64_t x, y;
        int r;

        /* Read the parameters */
        r = sd_bus_message_read(m, "xx", &x, &y);
        if (r < 0) {
                fprintf(stderr, "Failed to parse parameters: %s\n", strerror(-r));
                return r;
        }

        /* Return an error on division by zero */
        if (y == 0) {
                sd_bus_error_set_const(ret_error, "net.poettering.DivisionByZero", "Sorry, can't allow division by zero.");
                return -EINVAL;
        }

        return sd_bus_reply_method_return(m, "x", x / y);
}

/* The vtable of our little object, implements the net.poettering.Calculator interface */
static const sd_bus_vtable calculator_vtable[] = {
        SD_BUS_VTABLE_START(0),
        SD_BUS_METHOD("Multiply", "xx", "x", method_multiply, SD_BUS_VTABLE_UNPRIVILEGED),
        SD_BUS_METHOD("Divide",   "xx", "x", method_divide,   SD_BUS_VTABLE_UNPRIVILEGED),
        SD_BUS_VTABLE_END
};

int main(int argc, char *argv[]) {
        sd_bus_slot *slot = NULL;
        sd_bus *bus = NULL;
        int r;

        /* Connect to the user bus this time */
        r = sd_bus_open_user(&bus);
        if (r < 0) {
                fprintf(stderr, "Failed to connect to system bus: %s\n", strerror(-r));
                goto finish;
        }

        /* Install the object */
        r = sd_bus_add_object_vtable(bus,
                                     &slot,
                                     "/net/poettering/Calculator",  /* object path */
                                     "net.poettering.Calculator",   /* interface name */
                                     calculator_vtable,
                                     NULL);
        if (r < 0) {
                fprintf(stderr, "Failed to issue method call: %s\n", strerror(-r));
                goto finish;
        }

        /* Take a well-known service name so that clients can find us */
        r = sd_bus_request_name(bus, "net.poettering.Calculator", 0);
        if (r < 0) {
                fprintf(stderr, "Failed to acquire service name: %s\n", strerror(-r));
                goto finish;
        }

        for (;;) {
                /* Process requests */
                r = sd_bus_process(bus, NULL);
                if (r < 0) {
                        fprintf(stderr, "Failed to process bus: %s\n", strerror(-r));
                        goto finish;
                }
                if (r > 0) /* we processed a request, try to process another one, right-away */
                        continue;

                /* Wait for the next request to process */
                r = sd_bus_wait(bus, (uint64_t) -1);
                if (r < 0) {
                        fprintf(stderr, "Failed to wait on bus: %s\n", strerror(-r));
                        goto finish;
                }
        }

finish:
        sd_bus_slot_unref(slot);
        sd_bus_unref(bus);

        return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

Save this example as bus-service.c, then build it with:

$ gcc bus-service.c -o bus-service `pkg-config --cflags --libs libsystemd`

Now, let’s run it:

$ ./bus-service

In another terminal, let’s try to talk to it. Note that this service
is now on the user bus, not on the system bus as before. We do this
for simplicity reasons: on the system bus access to services is
tightly controlled so unprivileged clients cannot request privileged
operations. On the user bus however things are simpler: as only
processes of the user owning the bus can connect no further policy
enforcement will complicate this example. Because the service is on
the user bus, we have to pass the --user switch on the busctl
command line. Let’s start with looking at the service’s object tree.

$ busctl --user tree net.poettering.Calculator
└─/net/poettering/Calculator

As we can see, there’s only a single object on the service, which is
not surprising, given that our code above only registered one. Let’s
see the interfaces and the members this object exposes:

$ busctl --user introspect net.poettering.Calculator /net/poettering/Calculator
NAME                                TYPE      SIGNATURE RESULT/VALUE FLAGS
net.poettering.Calculator           interface -         -            -
.Divide                             method    xx        x            -
.Multiply                           method    xx        x            -
org.freedesktop.DBus.Introspectable interface -         -            -
.Introspect                         method    -         s            -
org.freedesktop.DBus.Peer           interface -         -            -
.GetMachineId                       method    -         s            -
.Ping                               method    -         -            -
org.freedesktop.DBus.Properties     interface -         -            -
.Get                                method    ss        v            -
.GetAll                             method    s         a{sv}        -
.Set                                method    ssv       -            -
.PropertiesChanged                  signal    sa{sv}as  -            -

The sd-bus library automatically added a couple of generic interfaces,
as mentioned above. But the first interface we see is actually the one
we added! It shows our two methods, and both take “xx” (two 64bit
signed integers) as input parameters, and return one “x”. Great! But
does it work?

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Multiply xx 5 7
x 35

Woohoo! We passed the two integers 5 and 7, and the service actually
multiplied them for us and returned a single integer 35! Let’s try the
other method:

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Divide xx 99 17
x 5

Oh, wow! It can even do integer division! Fantastic! But let’s trick
it into dividing by zero:

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Divide xx 43 0
Sorry, can't allow division by zero.

Nice! It detected this nicely and returned a clean error about it. If
you look in the source code example above you’ll see how precisely we
generated the error.

And that’s really all I have for today. Of course, the examples I
showed are short, and I don’t get into detail here on what precisely
each line does. However, this is supposed to be a short introduction
into D-Bus and sd-bus, and it’s already way too long for that …

I hope this blog story was useful to you. If you are interested in
using sd-bus for your own programs, I hope this gets you started. If
you have further questions, check the (incomplete) man pages, and
inquire us on IRC or the systemd mailing list. If you need more
examples, have a look at the systemd source tree, all of systemd’s
many bus services use sd-bus extensively.

Revisiting How We Put Together Linux Systems

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html

In a previous blog story I discussed
Factory Reset, Stateless Systems, Reproducible Systems & Verifiable Systems,
I now want to take the opportunity to explain a bit where we want to
take this with
systemd in the
longer run, and what we want to build out of it. This is going to be a
longer story, so better grab a cold bottle of
Club Mate before you start
reading.

Traditional Linux distributions are built around packaging systems
like RPM or dpkg, and an organization model where upstream developers
and downstream packagers are relatively clearly separated: an upstream
developer writes code, and puts it somewhere online, in a tarball. A
packager than grabs it and turns it into RPMs/DEBs. The user then
grabs these RPMs/DEBs and installs them locally on the system. For a
variety of uses this is a fantastic scheme: users have a large
selection of readily packaged software available, in mostly uniform
packaging, from a single source they can trust. In this scheme the
distribution vets all software it packages, and as long as the user
trusts the distribution all should be good. The distribution takes the
responsibility of ensuring the software is not malicious, of timely
fixing security problems and helping the user if something is wrong.

Upstream Projects

However, this scheme also has a number of problems, and doesn’t fit
many use-cases of our software particularly well. Let’s have a look at
the problems of this scheme for many upstreams:

  • Upstream software vendors are fully dependent on downstream
    distributions to package their stuff. It’s the downstream
    distribution that decides on schedules, packaging details, and how
    to handle support. Often upstream vendors want much faster release
    cycles then the downstream distributions follow.

  • Realistic testing is extremely unreliable and next to
    impossible. Since the end-user can run a variety of different
    package versions together, and expects the software he runs to just
    work on any combination, the test matrix explodes. If upstream tests
    its version on distribution X release Y, then there’s no guarantee
    that that’s the precise combination of packages that the end user
    will eventually run. In fact, it is very unlikely that the end user
    will, since most distributions probably updated a number of
    libraries the package relies on by the time the package ends up being
    made available to the user. The fact that each package can be
    individually updated by the user, and each user can combine library
    versions, plug-ins and executables relatively freely, results in a high
    risk of something going wrong.

  • Since there are so many different distributions in so many different
    versions around, if upstream tries to build and test software for
    them it needs to do so for a large number of distributions, which is
    a massive effort.

  • The distributions are actually quite different in many ways. In
    fact, they are different in a lot of the most basic
    functionality. For example, the path where to put x86-64 libraries
    is different on Fedora and Debian derived systems..

  • Developing software for a number of distributions and versions is
    hard: if you want to do it, you need to actually install them, each
    one of them, manually, and then build your software for each.

  • Since most downstream distributions have strict licensing and
    trademark requirements (and rightly so), any kind of closed source
    software (or otherwise non-free) does not fit into this scheme at
    all.

This all together makes it really hard for many upstreams to work
nicely with the current way how Linux works. Often they try to improve
the situation for them, for example by bundling libraries, to make
their test and build matrices smaller.

System Vendors

The toolbox approach of classic Linux distributions is fantastic for
people who want to put together their individual system, nicely
adjusted to exactly what they need. However, this is not really how
many of today’s Linux systems are built, installed or updated. If you
build any kind of embedded device, a server system, or even user
systems, you frequently do your work based on complete system images,
that are linearly versioned. You build these images somewhere, and
then you replicate them atomically to a larger number of systems. On
these systems, you don’t install or remove packages, you get a defined
set of files, and besides installing or updating the system there are
no ways how to change the set of tools you get.

The current Linux distributions are not particularly good at providing
for this major use-case of Linux. Their strict focus on individual
packages as well as package managers as end-user install and update
tool is incompatible with what many system vendors want.

Users

The classic Linux distribution scheme is frequently not what end users
want, either. Many users are used to app markets like Android, Windows
or iOS/Mac have. Markets are a platform that doesn’t package, build or
maintain software like distributions do, but simply allows users to
quickly find and download the software they need, with the app vendor
responsible for keeping the app updated, secured, and all that on the
vendor’s release cycle. Users tend to be impatient. They want their
software quickly, and the fine distinction between trusting a single
distribution or a myriad of app developers individually is usually not
important for them. The companies behind the marketplaces usually try
to improve this trust problem by providing sand-boxing technologies: as
a replacement for the distribution that audits, vets, builds and
packages the software and thus allows users to trust it to a certain
level, these vendors try to find technical solutions to ensure that
the software they offer for download can’t be malicious.

Existing Approaches To Fix These Problems

Now, all the issues pointed out above are not new, and there are
sometimes quite successful attempts to do something about it. Ubuntu
Apps, Docker, Software Collections, ChromeOS, CoreOS all fix part of
this problem set, usually with a strict focus on one facet of Linux
systems. For example, Ubuntu Apps focus strictly on end user (desktop)
applications, and don’t care about how we built/update/install the OS
itself, or containers. Docker OTOH focuses on containers only, and
doesn’t care about end-user apps. Software Collections tries to focus
on the development environments. ChromeOS focuses on the OS itself,
but only for end-user devices. CoreOS also focuses on the OS, but
only for server systems.

The approaches they find are usually good at specific things, and use
a variety of different technologies, on different layers. However,
none of these projects tried to fix this problems in a generic way,
for all uses, right in the core components of the OS itself.

Linux has come to tremendous successes because its kernel is so
generic: you can build supercomputers and tiny embedded devices out of
it. It’s time we come up with a basic, reusable scheme how to solve
the problem set described above, that is equally generic.

What We Want

The systemd cabal (Kay Sievers, Harald Hoyer, Daniel Mack, Tom
Gundersen, David Herrmann, and yours truly) recently met in Berlin
about all these things, and tried to come up with a scheme that is
somewhat simple, but tries to solve the issues generically, for all
use-cases, as part of the systemd project. All that in a way that is
somewhat compatible with the current scheme of distributions, to allow
a slow, gradual adoption. Also, and that’s something one cannot stress
enough: the toolbox scheme of classic Linux distributions is
actually a good one, and for many cases the right one. However, we
need to make sure we make distributions relevant again for all
use-cases, not just those of highly individualized systems.

Anyway, so let’s summarize what we are trying to do:

  • We want an efficient way that allows vendors to package their
    software (regardless if just an app, or the whole OS) directly for
    the end user, and know the precise combination of libraries and
    packages it will operate with.

  • We want to allow end users and administrators to install these
    packages on their systems, regardless which distribution they have
    installed on it.

  • We want a unified solution that ultimately can cover updates for
    full systems, OS containers, end user apps, programming ABIs, and
    more. These updates shall be double-buffered, (at least). This is an
    absolute necessity if we want to prepare the ground for operating
    systems that manage themselves, that can update safely without
    administrator involvement.

  • We want our images to be trustable (i.e. signed). In fact we want a
    fully trustable OS, with images that can be verified by a full
    trust chain from the firmware (EFI SecureBoot!), through the boot loader, through the
    kernel, and initrd. Cryptographically secure verification of the
    code we execute is relevant on the desktop (like ChromeOS does), but
    also for apps, for embedded devices and even on servers (in a post-Snowden
    world, in particular).

What We Propose

So much about the set of problems, and what we are trying to do. So,
now, let’s discuss the technical bits we came up with:

The scheme we propose is built around the variety of concepts of btrfs
and Linux file system name-spacing. btrfs at this point already has a
large number of features that fit neatly in our concept, and the
maintainers are busy working on a couple of others we want to
eventually make use of.

As first part of our proposal we make heavy use of btrfs sub-volumes and
introduce a clear naming scheme for them. We name snapshots like this:

  • usr:<vendorid>:<architecture>:<version> — This refers to a full
    vendor operating system tree. It’s basically a /usr tree (and no
    other directories), in a specific version, with everything you need to boot
    it up inside it. The <vendorid> field is replaced by some vendor
    identifier, maybe a scheme like
    org.fedoraproject.FedoraWorkstation. The <architecture> field
    specifies a CPU architecture the OS is designed for, for example
    x86-64. The <version> field specifies a specific OS version, for
    example 23.4. An example sub-volume name could hence look like this:
    usr:org.fedoraproject.FedoraWorkstation:x86_64:23.4

  • root:<name>:<vendorid>:<architecture> — This refers to an
    instance of an operating system. Its basically a root directory,
    containing primarily /etc and /var (but possibly more). Sub-volumes
    of this type do not contain a populated /usr tree though. The
    <name> field refers to some instance name (maybe the host name of
    the instance). The other fields are defined as above. An example
    sub-volume name is
    root:revolution:org.fedoraproject.FedoraWorkstation:x86_64.

  • runtime:<vendorid>:<architecture>:<version> — This refers to a
    vendor runtime. A runtime here is supposed to be a set of
    libraries and other resources that are needed to run apps (for the
    concept of apps see below), all in a /usr tree. In this regard this
    is very similar to the usr sub-volumes explained above, however,
    while a usr sub-volume is a full OS and contains everything
    necessary to boot, a runtime is really only a set of
    libraries. You cannot boot it, but you can run apps with it. An
    example sub-volume name is: runtime:org.gnome.GNOME3_20:x86_64:3.20.1

  • framework:<vendorid>:<architecture>:<version> — This is very
    similar to a vendor runtime, as described above, it contains just a
    /usr tree, but goes one step further: it additionally contains all
    development headers, compilers and build tools, that allow
    developing against a specific runtime. For each runtime there should
    be a framework. When you develop against a specific framework in a
    specific architecture, then the resulting app will be compatible
    with the runtime of the same vendor ID and architecture. Example:
    framework:org.gnome.GNOME3_20:x86_64:3.20.1

  • app:<vendorid>:<runtime>:<architecture>:<version> — This
    encapsulates an application bundle. It contains a tree that at
    runtime is mounted to /opt/<vendorid>, and contains all the
    application’s resources. The <vendorid> could be a string like
    org.libreoffice.LibreOffice, the <runtime> refers to one the
    vendor id of one specific runtime the application is built for, for
    example org.gnome.GNOME3_20:3.20.1. The <architecture> and
    <version> refer to the architecture the application is built for,
    and of course its version. Example:
    app:org.libreoffice.LibreOffice:GNOME3_20:x86_64:133

  • home:<user>:<uid>:<gid> — This sub-volume shall refer to the home
    directory of the specific user. The <user> field contains the user
    name, the <uid> and <gid> fields the numeric Unix UIDs and GIDs
    of the user. The idea here is that in the long run the list of
    sub-volumes is sufficient as a user database (but see
    below). Example: home:lennart:1000:1000.

btrfs partitions that adhere to this naming scheme should be clearly
identifiable. It is our intention to introduce a new GPT partition type
ID for this.

How To Use It

After we introduced this naming scheme let’s see what we can build of
this:

  • When booting up a system we mount the root directory from one of the
    root sub-volumes, and then mount /usr from a matching usr
    sub-volume. Matching here means it carries the same <vendor-id>
    and <architecture>. Of course, by default we should pick the
    matching usr sub-volume with the newest version by default.

  • When we boot up an OS container, we do exactly the same as the when
    we boot up a regular system: we simply combine a usr sub-volume
    with a root sub-volume.

  • When we enumerate the system’s users we simply go through the
    list of home snapshots.

  • When a user authenticates and logs in we mount his home
    directory from his snapshot.

  • When an app is run, we set up a new file system name-space, mount the
    app sub-volume to /opt/<vendorid>/, and the appropriate runtime
    sub-volume the app picked to /usr, as well as the user’s
    /home/$USER to its place.

  • When a developer wants to develop against a specific runtime he
    installs the right framework, and then temporarily transitions into
    a name space where /usris mounted from the framework sub-volume, and
    /home/$USER from his own home directory. In this name space he then
    runs his build commands. He can build in multiple name spaces at the
    same time, if he intends to builds software for multiple runtimes or
    architectures at the same time.

Instantiating a new system or OS container (which is exactly the same
in this scheme) just consists of creating a new appropriately named
root sub-volume. Completely naturally you can share one vendor OS
copy in one specific version with a multitude of container instances.

Everything is double-buffered (or actually, n-fold-buffered), because
usr, runtime, framework, app sub-volumes can exist in multiple
versions. Of course, by default the execution logic should always pick
the newest release of each sub-volume, but it is up to the user keep
multiple versions around, and possibly execute older versions, if he
desires to do so. In fact, like on ChromeOS this could even be handled
automatically: if a system fails to boot with a newer snapshot, the
boot loader can automatically revert back to an older version of the
OS.

An Example

Note that in result this allows installing not only multiple end-user
applications into the same btrfs volume, but also multiple operating
systems, multiple system instances, multiple runtimes, multiple
frameworks. Or to spell this out in an example:

Let’s say Fedora, Mageia and ArchLinux all implement this scheme,
and provide ready-made end-user images. Also, the GNOME, KDE, SDL
projects all define a runtime+framework to develop against. Finally,
both LibreOffice and Firefox provide their stuff according to this
scheme. You can now trivially install of these into the same btrfs
volume:

  • usr:org.fedoraproject.WorkStation:x86_64:24.7
  • usr:org.fedoraproject.WorkStation:x86_64:24.8
  • usr:org.fedoraproject.WorkStation:x86_64:24.9
  • usr:org.fedoraproject.WorkStation:x86_64:25beta
  • usr:org.mageia.Client:i386:39.3
  • usr:org.mageia.Client:i386:39.4
  • usr:org.mageia.Client:i386:39.6
  • usr:org.archlinux.Desktop:x86_64:302.7.8
  • usr:org.archlinux.Desktop:x86_64:302.7.9
  • usr:org.archlinux.Desktop:x86_64:302.7.10
  • root:revolution:org.fedoraproject.WorkStation:x86_64
  • root:testmachine:org.fedoraproject.WorkStation:x86_64
  • root:foo:org.mageia.Client:i386
  • root:bar:org.archlinux.Desktop:x86_64
  • runtime:org.gnome.GNOME3_20:x86_64:3.20.1
  • runtime:org.gnome.GNOME3_20:x86_64:3.20.4
  • runtime:org.gnome.GNOME3_20:x86_64:3.20.5
  • runtime:org.gnome.GNOME3_22:x86_64:3.22.0
  • runtime:org.kde.KDE5_6:x86_64:5.6.0
  • framework:org.gnome.GNOME3_22:x86_64:3.22.0
  • framework:org.kde.KDE5_6:x86_64:5.6.0
  • app:org.libreoffice.LibreOffice:GNOME3_20:x86_64:133
  • app:org.libreoffice.LibreOffice:GNOME3_22:x86_64:166
  • app:org.mozilla.Firefox:GNOME3_20:x86_64:39
  • app:org.mozilla.Firefox:GNOME3_20:x86_64:40
  • home:lennart:1000:1000
  • home:hrundivbakshi:1001:1001

In the example above, we have three vendor operating systems
installed. All of them in three versions, and one even in a beta
version. We have four system instances around. Two of them of Fedora,
maybe one of them we usually boot from, the other we run for very
specific purposes in an OS container. We also have the runtimes for
two GNOME releases in multiple versions, plus one for KDE. Then, we
have the development trees for one version of KDE and GNOME around, as
well as two apps, that make use of two releases of the GNOME
runtime. Finally, we have the home directories of two users.

Now, with the name-spacing concepts we introduced above, we can
actually relatively freely mix and match apps and OSes, or develop
against specific frameworks in specific versions on any operating
system. It doesn’t matter if you booted your ArchLinux instance, or
your Fedora one, you can execute both LibreOffice and Firefox just
fine, because at execution time they get matched up with the right
runtime, and all of them are available from all the operating systems
you installed. You get the precise runtime that the upstream vendor of
Firefox/LibreOffice did their testing with. It doesn’t matter anymore
which distribution you run, and which distribution the vendor prefers.

Also, given that the user database is actually encoded in the
sub-volume list, it doesn’t matter which system you boot, the
distribution should be able to find your local users automatically,
without any configuration in /etc/passwd.

Building Blocks

With this naming scheme plus the way how we can combine them on
execution we already came quite far, but how do we actually get these
sub-volumes onto the final machines, and how do we update them? Well,
btrfs has a feature they call “send-and-receive”. It basically allows
you to “diff” two file system versions, and generate a binary
delta. You can generate these deltas on a developer’s machine and then
push them into the user’s system, and he’ll get the exact same
sub-volume too. This is how we envision installation and updating of
operating systems, applications, runtimes, frameworks. At installation
time, we simply deserialize an initial send-and-receive delta into
our btrfs volume, and later, when a new version is released we just
add in the few bits that are new, by dropping in another
send-and-receive delta under a new sub-volume name. And we do it
exactly the same for the OS itself, for a runtime, a framework or an
app. There’s no technical distinction anymore. The underlying
operation for installing apps, runtime, frameworks, vendor OSes, as well
as the operation for updating them is done the exact same way for all.

Of course, keeping multiple full /usr trees around sounds like an
awful lot of waste, after all they will contain a lot of very similar
data, since a lot of resources are shared between distributions,
frameworks and runtimes. However, thankfully btrfs actually is able to
de-duplicate this for us. If we add in a new app snapshot, this simply
adds in the new files that changed. Moreover different runtimes and
operating systems might actually end up sharing the same tree.

Even though the example above focuses primarily on the end-user,
desktop side of things, the concept is also extremely powerful in
server scenarios. For example, it is easy to build your own usr
trees and deliver them to your hosts using this scheme. The usr
sub-volumes are supposed to be something that administrators can put
together. After deserializing them into a couple of hosts, you can
trivially instantiate them as OS containers there, simply by adding a
new root sub-volume for each instance, referencing the usr tree you
just put together. Instantiating OS containers hence becomes as easy
as creating a new btrfs sub-volume. And you can still update the images
nicely, get fully double-buffered updates and everything.

And of course, this scheme also applies great to embedded
use-cases. Regardless if you build a TV, an IVI system or a phone: you
can put together you OS versions as usr trees, and then use
btrfs-send-and-receive facilities to deliver them to the systems, and
update them there.

Many people when they hear the word “btrfs” instantly reply with “is
it ready yet?”. Thankfully, most of the functionality we really need
here is strictly read-only. With the exception of the home
sub-volumes (see below) all snapshots are strictly read-only, and are
delivered as immutable vendor trees onto the devices. They never are
changed. Even if btrfs might still be immature, for this kind of
read-only logic it should be more than good enough.

Note that this scheme also enables doing fat systems: for example,
an installer image could include a Fedora version compiled for x86-64,
one for i386, one for ARM, all in the same btrfs volume. Due to btrfs’
de-duplication they will share as much as possible, and when the image
is booted up the right sub-volume is automatically picked. Something
similar of course applies to the apps too!

This also allows us to implement something that we like to call
Operating-System-As-A-Virus. Installing a new system is little more
than:

  • Creating a new GPT partition table
  • Adding an EFI System Partition (FAT) to it
  • Adding a new btrfs volume to it
  • Deserializing a single usr sub-volume into the btrfs volume
  • Installing a boot loader into the EFI System Partition
  • Rebooting

Now, since the only real vendor data you need is the usr sub-volume,
you can trivially duplicate this onto any block device you want. Let’s
say you are a happy Fedora user, and you want to provide a friend with
his own installation of this awesome system, all on a USB stick. All
you have to do for this is do the steps above, using your installed
usr tree as source to copy. And there you go! And you don’t have to
be afraid that any of your personal data is copied too, as the usr
sub-volume is the exact version your vendor provided you with. Or with
other words: there’s no distinction anymore between installer images
and installed systems. It’s all the same. Installation becomes
replication, not more. Live-CDs and installed systems can be fully
identical.

Note that in this design apps are actually developed against a single,
very specific runtime, that contains all libraries it can link against
(including a specific glibc version!). Any library that is not
included in the runtime the developer picked must be included in the
app itself. This is similar how apps on Android declare one very
specific Android version they are developed against. This greatly
simplifies application installation, as there’s no dependency hell:
each app pulls in one runtime, and the app is actually free to pick
which one, as you can have multiple installed, though only one is used
by each app.

Also note that operating systems built this way will never see
“half-updated” systems, as it is common when a system is updated using
RPM/dpkg. When updating the system the code will either run the old or
the new version, but it will never see part of the old files and part
of the new files. This is the same for apps, runtimes, and frameworks,
too.

Where We Are Now

We are currently working on a lot of the groundwork necessary for
this. This scheme relies on the ability to monopolize the
vendor OS resources in /usr, which is the key of what I described in
Factory Reset, Stateless Systems, Reproducible Systems & Verifiable Systems
a few weeks back. Then, of course, for the full desktop app concept we
need a strong sandbox, that does more than just hiding files from the
file system view. After all with an app concept like the above the
primary interfacing between the executed desktop apps and the rest of the
system is via IPC (which is why we work on kdbus and teach it all
kinds of sand-boxing features), and the kernel itself. Harald Hoyer has
started working on generating the btrfs send-and-receive images based
on Fedora.

Getting to the full scheme will take a while. Currently we have many
of the building blocks ready, but some major items are missing. For
example, we push quite a few problems into btrfs, that other solutions
try to solve in user space. One of them is actually
signing/verification of images. The btrfs maintainers are working on
adding this to the code base, but currently nothing exists. This
functionality is essential though to come to a fully verified system
where a trust chain exists all the way from the firmware to the
apps. Also, to make the home sub-volume scheme fully workable we
actually need encrypted sub-volumes, so that the sub-volume’s
pass-phrase can be used for authenticating users in PAM. This doesn’t
exist either.

Working towards this scheme is a gradual process. Many of the steps we
require for this are useful outside of the grand scheme though, which
means we can slowly work towards the goal, and our users can already
take benefit of what we are working on as we go.

Also, and most importantly, this is not really a departure from
traditional operating systems:

Each app, each OS and each app sees a traditional Unix hierarchy with
/usr, /home, /opt, /var, /etc. It executes in an environment that is
pretty much identical to how it would be run on traditional systems.

There’s no need to fully move to a system that uses only btrfs and
follows strictly this sub-volume scheme. For example, we intend to
provide implicit support for systems that are installed on ext4 or
xfs, or that are put together with traditional packaging tools such as
RPM or dpkg: if the the user tries to install a
runtime/app/framework/os image on a system that doesn’t use btrfs so
far, it can just create a loop-back btrfs image in /var, and push the
data into that. Even us developers will run our stuff like this for a
while, after all this new scheme is not particularly useful for highly
individualized systems, and we developers usually tend to run
systems like that.

Also note that this in no way a departure from packaging systems like
RPM or DEB. Even if the new scheme we propose is used for installing
and updating a specific system, it is RPM/DEB that is used to put
together the vendor OS tree initially. Hence, even in this scheme
RPM/DEB are highly relevant, though not strictly as an end-user tool
anymore, but as a build tool.

So Let’s Summarize Again What We Propose

  • We want a unified scheme, how we can install and update OS images,
    user apps, runtimes and frameworks.

  • We want a unified scheme how you can relatively freely mix OS
    images, apps, runtimes and frameworks on the same system.

  • We want a fully trusted system, where cryptographic verification of
    all executed code can be done, all the way to the firmware, as
    standard feature of the system.

  • We want to allow app vendors to write their programs against very
    specific frameworks, under the knowledge that they will end up being
    executed with the exact same set of libraries chosen.

  • We want to allow parallel installation of multiple OSes and versions
    of them, multiple runtimes in multiple versions, as well as multiple
    frameworks in multiple versions. And of course, multiple apps in
    multiple versions.

  • We want everything double buffered (or actually n-fold buffered), to
    ensure we can reliably update/rollback versions, in particular to
    safely do automatic updates.

  • We want a system where updating a runtime, OS, framework, or OS
    container is as simple as adding in a new snapshot and restarting
    the runtime/OS/framework/OS container.

  • We want a system where we can easily instantiate a number of OS
    instances from a single vendor tree, with zero difference for doing
    this on order to be able to boot it on bare metal/VM or as a
    container.

  • We want to enable Linux to have an open scheme that people can use
    to build app markets and similar schemes, not restricted to a
    specific vendor.

Final Words

I’ll be talking about this at LinuxCon Europe in October. I originally
intended to discuss this at the Linux Plumbers Conference (which I
assumed was the right forum for this kind of major plumbing level
improvement), and at linux.conf.au, but there was no interest in my
session submissions there…

Of course this is all work in progress. These are our current ideas we
are working towards. As we progress we will likely change a number of
things. For example, the precise naming of the sub-volumes might look
very different in the end.

Of course, we are developers of the systemd project. Implementing this
scheme is not just a job for the systemd developers. This is a
reinvention how distributions work, and hence needs great support from
the distributions. We really hope we can trigger some interest by
publishing this proposal now, to get the distributions on board. This
after all is explicitly not supposed to be a solution for one specific
project and one specific vendor product, we care about making this
open, and solving it for the generic case, without cutting corners.

If you have any questions about this, you know how you can reach us
(IRC, mail, G+, …).

The future is going to be awesome!

FUDCON + GNOME.Asia Beijing 2014

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/fudcon-gnomeasia.html

Thanks to the funding from FUDCON I had the chance to attend and
keynote at the combined FUDCON Beijing 2014
and GNOME.Asia 2014 conference in
Beijing, China.

My talk was about systemd’s present and future, what we achieved
and where we are going. In my talk I tried to explain a bit where we
are coming from, and how we changed focus from being purely an init
system, to more being a set of basic building blocks to build an OS
from. Most of the talk I talked about where we still intend to take
systemd, which areas we believe should be covered by systemd, and of
course, also the always difficult question, on where to draw the line
and what clearly is outside of the focus of systemd. The slides of my
talk you find
online
. (No video recording I am aware of, sorry.)

The combined conferences were a lot of fun, and as usual, the best
discussions I had in the hallway track, discussing Linux and
systemd.

A number of pictures of the conference are now
online
. Enjoy!

After the conference I stayed for a few more days in Beijing, doing
a bit of sightseeing. What a fantastic city! The food was amazing, we
tried all kinds of fantastic stuff, from Peking duck, to Bullfrog
Sechuan style. Yummy. And one of those days I am sure I will find the
time to actually sort my photos and put them online, too.

I am really looking forward to the next FUDCON/GNOME.Asia!

Factory Reset, Stateless Systems, Reproducible Systems & Verifiable Systems

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/stateless.html

(Just a small heads-up: I don’t blog as much as I used to, I
nowadays update my Google+
page
a lot more frequently. You might want to subscribe that if
you are interested in more frequent technical updates on what we are
working on.)

In the past weeks we have been working on a couple of features for
systemd
that enable a number of new usecases I’d like to shed some light
on. Taking benefit of the /usr
merge
that a number of distributions have completed we want to
bring runtime behaviour of Linux systems to the next level. With the
/usr merge completed most static vendor-supplied OS data is
found exclusively in /usr, only a few additional bits in
/var and /etc are necessary to make a system
boot. On this we can build to enable a couple of new features:

  1. A mechanism we call Factory Reset shall flush out
    /etc and /var, but keep the vendor-supplied
    /usr, bringing the system back into a well-defined, pristine
    vendor state with no local state or configuration. This functionality
    is useful across the board from servers, to desktops, to embedded
    devices.
  2. A Stateless System goes one step further: a system like
    this never stores /etc or /var on persistent
    storage, but always comes up with pristine vendor state. On systems
    like this every reboot acts as factor reset. This functionality is
    particularly useful for simple containers or systems that boot off the
    network or read-only media, and receive all configuration they need
    during runtime from vendor packages or protocols like DHCP or are
    capable of discovering their parameters automatically from the
    available hardware or periphery.
  3. Reproducible Systems multiply a vendor image into many
    containers or systems. Only local configuration or state is stored
    per-system, while the vendor operating system is pulled in from the
    same, immutable, shared snapshot. Each system hence has its private
    /etc and /var for receiving local configuration,
    however the OS tree in /usr is pulled in via bind mounts (in
    case of containers) or technologies like NFS (in case of physical
    systems), or btrfs snapshots from a golden master image. This is
    particular interesting for containers where the goal is to run
    thousands of container images from the same OS tree. However, it also
    has a number of other usecases, for example thin client systems, which
    can boot the same NFS share a number of times. Furthermore this
    mechanism is useful to implement very simple OS installers, that
    simply unserialize a /usr snapshot into a file system,
    install a boot loader, and reboot.
  4. Verifiable Systems are closely related to stateless
    systems: if the underlying storage technology can cryptographically
    ensure that the vendor-supplied OS is trusted and in a consistent
    state, then it must be made sure that /etc or /var
    are either included in the OS image, or simply unnecessary for booting.

Concepts

A number of Linux-based operating systems have tried to implement
some of the schemes described out above in one way or
another. Particularly interesting are GNOME’s OSTree, CoreOS and Google’s Android and
ChromeOS. They generally found different solutions for the specific
problems you have when implementing schemes like this, sometimes taking
shortcuts that keep only the specific case in mind, and cannot cover
the general purpose. With systemd now being at the core of so many
distributions and deeply involved in bringing up and maintaining the
system we came to the conclusion that we should attempt to add generic
support for setups like this to systemd itself, to open this up for
the general purpose distributions to build on. We decided to focus on
three kinds of systems:

  1. The stateful system, the traditional system as we know it with
    machine-specific /etc, /usr and /var, all
    properly populated.
  2. Startup without a populated /var, but with configured
    /etc. (We will call these volatile systems.)
  3. Startup without either /etc or /var. (We will
    call these stateless systems.)

A factory reset is just a special case of the latter two modes,
where the system boots up without /var and /etc but
the next boot is a normal stateful boot like like the first described
mode. Note that a mode where /etc is flushed, but
/var is not is nothing we intend to cover (why? well, the
user ID question becomes much harder, see below, and we simply saw no
usecase for it worth the trouble).

Problems

Booting up a system without a populated /var is relatively
straight-forward. With a
few lines of tmpfiles configuration
it is possible to populate
/var with its basic structure in a way that is sufficient to
make a system boot cleanly. systemd version 214 and newer ship with
support for this. Of course, support for this scheme in systemd is
only a small part of the solution. While a lot of software
reconstructs the directory hierarchy it needs in /var
automatically, many software does not. In case like this it is
necessary to ship a couple of additional tmpfiles lines that setup up
at boot-time the necessary files or directories in /var to
make the software operate, similar to what RPM or DEB packages would
set up at installation time.

Booting up a system without a populated /etc is a more
difficult task. In /etc we have a lot of configuration bits
that are essential for the system to operate, for example and most
importantly system user and group information in /etc/passwd
and /etc/group. If the system boots up without /etc
there must be a way to replicate the minimal information necessary in
it, so that the system manages to boot up fully.

To make this even more complex, in order to support “offline”
updates of /usr that are replicated into a number of systems
possessing private /etc and /var there needs to be a
way how these directories can be upgraded transparently when
necessary, for example by recreating caches like
/etc/ld.so.cache or adding missing system users to
/etc/passwd on next reboot.

Starting with systemd 215 (yet unreleased, as I type this) we will
ship with a number of features in systemd that make /etc-less
boots functional:

  • A new tool systemd-sysusers as been added. It introduces
    a new drop-in directory /usr/lib/sysusers.d/. Minimal
    descriptions of necessary system users and groups can be placed
    there. Whenever the tool is invoked it will create these users in
    /etc/passwd and /etc/group should they be
    missing. It is only suitable for creating system users and groups, not
    for normal users. It will write to the files directly via the
    appropriate glibc APIs, which is the right thing to do for system
    users. (For normal users no such APIs exist, as the users might be
    stored centrally on LDAP or suchlike, and they are out of focus for
    our usecase.) The major benefit of this tool is that system user
    definition can happen offline: a package simply has to drop in a new
    file to register a user. This makes system user registration
    declarative instead of imperative — which is the way
    how system users are traditionally created from RPM or DEB
    installation scripts. By being declarative it is easy to replicate the
    users on next boot to a number of system instances.

    To make this new
    tool interesting for packaging scripts we make it easy to
    alternatively invoke it during package installation time, thus being a
    good alternative to invocations of useradd -r and
    groupadd -r.

    Some OS designs use a static, fixed user/group list stored in
    /usr as primary database for users/groups, which fixed
    UID/GID mappings. While this works for specific systems, this cannot
    cover the general purpose. As the UID/GID range for system
    users/groups is very small (only containing 998 users and groups on most systems), the
    best has to be made from this space and only UIDs/GIDs necessary on
    the specific system should be allocated. This means allocation has to
    be dynamic and adjust to what is necessary.

    Also note that this tool has
    one very nice feature: in addition to fully dynamic, and fully static
    UID/GID assignment for the users to create, it supports reading
    UID/GID numbers off existing files in /usr, so that vendors
    can make use of setuid/setgid binaries owned by specific users.

  • We also added a default
    user definition list
    which creates the most basic users the system
    and systemd need. Of course, very likely downstream distributions
    might need to alter this default list, add new entries and possibly
    map specific users to particular numeric UIDs.
  • A new condition ConditionNeedsUpdate= has been
    added. With this mechanism it is possible to conditionalize execution
    of services depending on whether /usr is newer than
    /etc or /var. The idea is that various services that
    need to be added into the boot process on upgrades make use of this to
    not delay boot-ups on normal boots, but run as necessary should
    /usr have been update since the last boot. This is
    implemented based on the mtime timestamp of the
    /usr: if the OS has been updated the packaging software
    should touch the directory, thus informing all instances that
    an upgrade of /etc and /var might be necessary.
  • We added a number of service files, that make use of the new
    ConditionNeedsUpdate= switch, and run a couple of services
    after each update. Among them are the aforementiond
    systemd-sysusers tool, as well as services that rebuild the
    udev hardware database, the journal catalog database and the library
    cache in /etc/ld.so.cache.
  • If systemd detects an empty /etc at early boot it will
    now use the unit
    preset
    information to enable all services by default that the
    vendor or packager declared. It will then proceed booting.
  • We added a
    new tmpfiles snippet
    that is able to reconstruct the
    most basic structure of /etc if it is missing.
  • tmpfiles also gained the ability copy entire directory trees into
    place should they be missing. This is particularly useful for copying
    certain essential files or directories into /etc without
    which the system refuses to boot. Currently the most prominent
    candidates for this are /etc/pam.d and
    /etc/dbus-1. In the long run we hope that packages can be
    fixed so that they always work correctly without configuration in
    /etc. Depending on the software this means that they should
    come with compiled-in defaults that just work should their
    configuration file be missing, or that they should fall back to static
    vendor-supplied configuration in /usr that is used whenever
    /etc doesn’t have any configuration. Both the PAM and the
    D-Bus case are probably candidates for the latter. Given that there
    are probably many cases like this we are working with a number of
    folks to introduce a new directory called /usr/share/etc
    (name is not settled yet) to major distributions, that always
    contain the full, original, vendor-supplied configuration of all
    packages. This is very useful here, so that there’s an obvious place
    to copy the original configuration from, but it is also useful
    completely independently as this provides administrators with an easy
    place to diff their own configuration in /etc
    against to see what local changes are in place.
  • We added a new --tmpfs= switch to systemd-nspawn
    to make testing of systems with unpopulated /etc and
    /var easy. For example, to run a fully state-less container, use a command line like this:

    # system-nspawn -D /srv/mycontainer --read-only --tmpfs=/var --tmpfs=/etc -b

    This command line will invoke the container tree stored in
    /srv/mycontainer in a read-only way, but with a (writable)
    tmpfs mounted to /var and /etc. With a very recent
    git snapshot of systemd invoking a Fedora rawhide system should mostly
    work OK, modulo the D-Bus and PAM problems mentioned above. A later
    version of systemd-nspawn is likely to gain a high-level
    switch --mode={stateful|volatile|stateless} that sets
    combines this into simple switches reusing the vocabulary introduced
    earlier.

What’s Next

Pulling this all together we are very close to making boots with
empty /etc and /var on general purpose Linux
operating systems a reality. Of course, while doing the groundwork in
systemd gets us some distance, there’s a lot of work left. Most
importantly: the majority of Linux packages are simply incomptible
with this scheme the way they are currently set up. They do not work
without configuration in /etc or state directories in
/var; they do not drop system user information in
/usr/lib/sysusers.d. However, we believe it’s our job to do
the groundwork, and to start somewhere.

So what does this mean for the next steps? Of course, currently
very little of this is available in any distribution (simply already
because 215 isn’t even released yet). However, this will hopefully
change quickly. As soon as that is accomplished we can start working
on making the other components of the OS work nicely in this
scheme. If you are an upstream developer, please consider making your
software work correctly if /etc and/or /var are not
populated. This means:

  • When you need a state directory in /var and it is missing,
    create it first. If you cannot do that, because you dropped priviliges
    or suchlike, please consider dropping in a tmpfiles snippet that
    creates the directory with the right permissions early at boot, should
    it be missing.
  • When you need configuration files in /etc to work
    properly, consider changing your application to work nicely when these
    files are missing, and automatically fall back to either built-in
    defaults, or to static vendor-supplied configuration files shipped in
    /usr, so that administrators can override configuration in
    /etc but if they don’t the default configuration counts.
  • When you need a system user or group, consider dropping in a file
    into /usr/lib/sysusers.d describing the users. (Currently
    documentation on this is minimal, we will provide more docs on this
    shortly.)

If you are a packager, you can also help on making this all work:

  • Ask upstream to implement what we describe above, possibly even preparing a patch for this.
  • If upstream will not make these changes, then consider dropping in
    tmpfiles snippets that copy the bare minimum of configuration files to
    make your software work from somewhere in /usr into
    /etc.
  • Consider moving from imperative useradd commands in
    packaging scripts, to declarative sysusers files. Ideally,
    this is shipped upstream too, but if that’s not possible then simply
    adding this to packages should be good enough.

Of course, before moving to declarative system user definitions you
should consult with your distribution whether their packaging policy
even allows that. Currently, most distributions will not, so we have
to work to get this changed first.

Anyway, so much about what we have been working on and where we want to take this.

Conclusion

Before we finish, let me stress again why we are doing all
this:

  1. For end-user machines like desktops, tablets or mobile phones, we
    want a generic way to implement factory reset, which the user can make
    use of when the system is broken (saves you support costs), or when he
    wants to sell it and get rid of his private data, and renew that “fresh
    car smell”.
  2. For embedded machines we want a generic way how to reset
    devices. We also want a way how every single boot can be identical to
    a factory reset, in a stateless system design.
  3. For all kinds of systems we want to centralize vendor data in
    /usr so that it can be strictly read-only, and fully
    cryptographically verified as one unit.
  4. We want to enable new kinds of OS installers that simply
    deserialize a vendor OS /usr snapshot into a new file system,
    install a boot loader and reboot, leaving all first-time configuration
    to the next boot.
  5. We want to enable new kinds of OS updaters that build on this, and
    manage a number of vendor OS /usr snapshots in verified states, and
    which can then update /etc and /var simply by
    rebooting into a newer version.
  6. We wanto to scale container setups naturally, by sharing a single
    golden master /usr tree with a large number of instances that
    simply maintain their own private /etc and /var for
    their private configuration and state, while still allowing clean
    updates of /usr.
  7. We want to make thin clients that share /usr across the
    network work by allowing stateless bootups. During all discussions on
    how /usr was to be organized this was fequently mentioned. A
    setup like this so far only worked in very specific cases, with this
    scheme we want to make this work in general case.

Of course, we have no illusions, just doing the groundwork for all
of this in systemd doesn’t make this all a real-life solution
yet. Also, it’s very unlikely that all of Fedora (or any other general
purpose distribution) will support this scheme for all its packages
soon, however, we are quite confident that the idea is convincing,
that we need to start somewhere, and that getting the most core
packages adapted to this shouldn’t be out of reach.

Oh, and of course, the concepts behind this are really not new, we
know that. However, what’s new here is that we try to make them
available in a general purpose OS core, instead of special purpose
systems.

Anyway, let’s get the ball rolling! Late’s make stateless systems a
reality!

And that’s all I have for now. I am sure this leaves a lot of
questions open. If you have any, join us on IRC on #systemd
on freenode or comment on Google+.