Tag Archives: Projects

systemd.conf 2015 Summary

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-2015-summary.html

systemd.conf 2015 is Over Now!

Last week our first systemd.conf conference
took place at betahaus, in Berlin, Germany. With almost 100 attendees,
a dense schedule of 23 high-quality talks stuffed into a single track
on just two days, a productive hackfest and numerous consumed
Club-Mates I believe it was quite a success!

If you couldn’t attend the conference, you may watch all talks on our
YouTube
Channel
. The
slides are available
online
,
too.

Many photos from the conference are available on the Google Events
Page
. Enjoy!

I’d specifically like to thank Daniel Mack, Chris Kühl and Nils Magnus
for running the conference, and making sure that it worked out as
smoothly as it did! Thank you very much, you did a fantastic job!

I’d also specifically like to thank the CCC Video Operation
Center
folks for the excellent video coverage of
the conference. Not only did they implement a live-stream for the
entire talks part of the conference, but also cut and uploaded videos
of all talks to our YouTube
Channel

within the same day (in fact, within a few hours after the talks
finished). That’s quite an impressive feat!

The folks from LinuxTag e.V. put a lot of time and energy in the
organization. It was great to see how well this all worked out!
Excellent work!

(BTW, LinuxTag e.V. and the CCC Video Operation Center folks are
willing to help with the organization of Free Software community
events in Germany (and Europe?). Hence, if you need an entity that can
do the financial work and other stuff for your Free Software project’s
conference, consider pinging LinuxTag, they might be willing to
help. Similar, if you are organizing such an event and are thinking
about providing video coverage, consider pinging the the CCC VOC
folks! Both of them get our best recommendations!)

I’d also like to thank our conference
sponsors
!
Specifically, we’d like to thank our Gold Sponsors Red Hat and
CoreOS for their support. We’d also like to thank our Silver
Sponsor Codethink, and our Bronze Sponsors Pengutronix,
Pantheon, Collabora, Endocode, the Linux Foundation,
Samsung and Travelping, as well as our Cooperation Partners
LinuxTag and kinvolk.io, and our Media Partner Golem.de.

Last but not least I’d really like to thank our speakers and attendees
for presenting and participating in the conference. Of course, the
conference we put together specifically for you, and we really hope
you had as much fun at it as we did!

Thank you all for attending, supporting, and organizing systemd.conf
2015
! We are looking forward to seeing you
and working with you again at systemd.conf 2016!

Thanks!

Second Round of systemd.conf 2015 Sponsors

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/second-round-of-systemdconf-2015-sponsors.html

Second Round of systemd.conf 2015 Sponsors

We are happy to announce the second round of systemd.conf
2015
sponsors! In addition to those from
the first
announcement
, we have:

Our second Gold sponsor is Red Hat!

What began as a better way to build software—openness, transparency, collaboration—soon shifted the balance of power in an entire industry. The revolution of choice continues. Today Red Hat® is the world’s leading provider of open source solutions, using a community-powered approach to provide reliable and high-performing cloud, virtualization, storage, Linux®, and middleware technologies.

A Bronze sponsor is Samsung:

From the beginning we have established a very fast pace and are currently one of the biggest and fastest growing modern-technology R&D centers in East-Central Europe.
We have started with designing subsystems for digital satellite television, however, we have quickly expanded the scope of our interest. Currently, it includes advanced systems of digital television, platform convergence, mobile systems, smart solutions, and enterprise solutions.
Also a vital role in our activity plays the quality and certification center, which controls the conformity of Samsung Electronics products with the highest standards of quality and reliability.

A Bronze sponsor is travelping:

Travelping is passionate about networks, communications and devices. We empower our customers to deploy and operate networks using our state of the art products, solutions and services.
Our products and solutions are based on our industry proven physical and virtual appliance platforms. These purpose built platforms ensure best in class performance, scalability and reliability combined with consistent end to end management capabilities.
To build this products, Travelping has developed a own embedded, cross platform Linux distribution called CAROS.io which incorporates the systemd service manager and tools.

A Bronze sponsor is Collabora:

Collabora has over 10 years of experience working with top tier OEMs & silicon manufacturers worldwide to develop products based on Open Source software. Through the use of Open Source technologies and methodologies, Collabora helps clients in multiple market segments gain faster time to market and save millions of dollars in licensing and maintenance costs. Collabora has already brought to market several products relying on systemd extensively.

A Bronze sponsor is Endocode:

Endocode AG. An employee-owned, software engineering company from Berlin. Open Source is our heart and soul.

A Bronze sponsor is the Linux Foundation:

The Linux Foundation advances the growth of Linux and offers its collaborative principles and practices to any endeavor.

We are Cooperating with LinuxTag e.V. on the organization:

LinuxTag is Europe’s leading organizer of Linux and Open Source events. Born of the community and in business for 20 years, we organize LinuxTag, an annual conference and exhibition attracting thousands of visitors. We also participate and cooperate in organizing workshops, tutorials, seminars, and other events together with and for the Open Source community. Selected events include non-profit workshops, the German Kernel Summit at FrOSCon, participation in the Open Tech Summit, and others. We take care of the organizational framework of systemd.conf 2015. LinuxTag e.V. is a non-profit organization and welcomes donations of ideas and workforce.

A Media Partner is Golem:

Golem.de is an up to date online-publication intended for professional computer users. It provides technology insights of the IT and telecommunications industry. Golem.de offers profound and up to date information on significant and trending topics. Online- and IT-Professionals, marketing managers, purchasers, and readers inspired by technology receive substantial information on product, market and branding potentials through tests, interviews und market analysis.

We’d like to thank our sponsors for their support! Without sponsors our conference would not be possible!

The Conference s SOLD OUT since a few weeks. We no longer accept registrations, nor paper submissions.

For further details about systemd.conf consult the conference website.

See the the first round of sponsor announcements!

See you in Berlin!

systemd.conf close to being sold out!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-close-to-being-sold-out.html

Only 14 tickets still available!

systemd.conf 2015 is close to being sold out, there are only 14
tickets left now. If you haven’t bought your ticket yet, now is
the time to do it, because otherwise it will be too late and all
tickets will be gone!

Why attend? At this conference you’ll get to meet everybody who is
involved with the systemd project and learn what they are working on,
and where the project will go next. You’ll hear from major users and
projects working with systemd. It’s the primary forum where you can
make yourself heard and get first hand access to everybody who’s
working on the future of the core Linux userspace!

To get an idea about the schedule, please consult our preliminary
schedule
.

In order to register for the conference, please visit the
registration
page
.

We are still looking for sponsors. If you’d like to join the ranks of
systemd.conf 2015 sponsors, please have a look at our Becoming a
Sponsor
page!

For further details about systemd.conf consult the conference
website
.

Preliminary systemd.conf 2015 Schedule

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/preliminary-systemdconf-2015-schedule.html

A Preliminary systemd.conf 2015 Schedule is Now Online!

We are happy to announce that an initial, preliminary version of the
systemd.conf 2015
schedule
is now
online! (Please ignore that some rows in the schedule link the same
session twice on that page. That’s a bug in the web site CMS we are
working on to fix.)

We got an overwhelming number of high-quality submissions during the
CfP! Because there were so many good talks we really wanted to
accept, we decided to do two full days of talks now, leaving one more
day for the hackfest and BoFs. We also shortened many of the slots, to
make room for more. All in all we now have a schedule packed with
fantastic presentations!

The areas covered range from containers, to system provisioning,
stateless systems, distributed init systems, the kdbus IPC, control
groups, systemd on the desktop, systemd in embedded devices,
configuration management and systemd, and systemd in downstream
distributions.

We’d like to thank everybody who submited a presentation proposal!

Also, don’t forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
Register here!.

We are still looking for sponsors. If you’d like to join the ranks of
systemd.conf 2015 sponsors, please have a look at our Becoming a
Sponsor
page!

For further details about systemd.conf consult the conference
website
.

systemd.conf 2015 CfP REMINDER

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-2015-cfp-reminder.html

LAST REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!

Here’s the last reminder that the systemd.conf 2015 CfP ends on August
31st 11:59:59pm Central European Time (that’s monday next week)! Make
sure to submit your proposals until then!

Please submit your proposals on our
website
!

And don’t forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
Register here!.

For further details about systemd.conf consult the conference website.

First Round of systemd.conf 2015 Sponsors

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/first-round-of-systemdconf-2015-sponsors.html

First Round of systemd.conf 2015 Sponsors

We are happy to announce the first round of systemd.conf
2015
sponsors!

Our first Gold sponsor is CoreOS!

CoreOS develops software for modern infrastructure that delivers a consistent operating environment for distributed applications. CoreOS’s commercial offering, Tectonic, is an enterprise-ready platform that combines Kubernetes and the CoreOS stack to run Linux containers. In addition CoreOS is the creator and maintainer of open source projects such as CoreOS Linux, etcd, fleet, flannel and rkt. The strategies and architectures that influence CoreOS allow companies like Google, Facebook and Twitter to run their services at scale with high resilience. Learn more about CoreOS here https://coreos.com/, Tectonic here, https://tectonic.com/ or follow CoreOS on Twitter @coreoslinux.

A Silver sponsor is Codethink:

Codethink is a software services consultancy, focusing on engineering reliable systems for long-term deployment with open source technologies.

A Bronze sponsor is Pantheon:

Pantheon is a platform for professional website development, testing, and deployment. Supporting Drupal and WordPress, Pantheon runs over 100,000 websites for the world’s top brands, universities, and media organizations on top of over a million containers.

A Bronze sponsor is Pengutronix:

Pengutronix provides consulting, training and development services for Embedded Linux to customers from the industry. The Kernel Team ports Linux to customer hardware and has more than 3100 patches in the official mainline kernel. In addition to lowlevel ports, the Pengutronix Application Team is responsible for board support packages based on PTXdist or Yocto and deals with system integration (this is where systemd plays an important role). The Graphics Team works on accelerated multimedia tasks, based on the Linux kernel, GStreamer, Qt and web technologies.

We’d like to thank our sponsors for their support! Without sponsors our conference would not be possible!

We’ll shortly announce our second round of sponsors, please stay tuned!

If you’d like to join the ranks of systemd.conf 2015 sponsors, please have a look at our Becoming a Sponsor page!

Reminder! The systemd.conf 2015 Call for Presentations ends on monday, August 31st! Please make sure to submit your proposals on the CfP page until then!

Also, don’t forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
Register here!.

For further details about systemd.conf consult the conference website.

systemd.conf 2015 Call for Presentations

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/systemdconf-2015-call-for-presentations.html

REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!

We’d like to remind you that the systemd.conf 2015 Call for Presentations ends
on August 31st! Please submit your presentation proposals before that data
on our website.

We are specifically interested in submissions from projects and vendors building
today’s and tomorrow’s products, services and devices with systemd. We’d like to
learn about the problems you encounter and the benefits you see! Hence, if
you work for a company using systemd, please submit a presentation!

We are also specifically interested in submissions from downstream distribution
maintainers of systemd! If you develop or maintain systemd packages in a
distribution, please submit a presentation reporting about the state, future
and the problems of systemd packaging so that we can improve downstream
collaboration!

And of course, all talks regarding systemd usage in containers, in the cloud,
on servers, on the desktop, in mobile and in embedded are highly welcome! Talks
about systemd networking and kdbus IPC are very welcome too!

Please submit your presentations until August 31st!

And don’t forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
Register here!.

Also, limited travel and entry fee sponsorship is available for community contributors. Please contact us for details!

For further details about the CfP consult the CfP page.

For further details about systemd.conf consult the conference website.

Announcing systemd.conf 2015

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/announcing-systemdconf-2015.html

Announcing systemd.conf 2015

We are happy to announce the inaugural systemd.conf 2015 conference of the systemd project.

The conference takes place November 5th-7th, 2015 in Berlin, Germany.

Only a limited number of tickets are available, hence make sure to sign up quickly.

For further details consult the conference website.

The new sd-bus API of systemd

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/the-new-sd-bus-api-of-systemd.html

With the new v221 release of
systemd

we are declaring the
sd-bus
API shipped with
systemd
stable. sd-bus is our minimal D-Bus
IPC
C library, supporting as
back-ends both classic socket-based D-Bus and
kdbus. The library has been been
part of systemd for a while, but has only been used internally, since
we wanted to have the liberty to still make API changes without
affecting external consumers of the library. However, now we are
confident to commit to a stable API for it, starting with v221.

In this blog story I hope to provide you with a quick overview on
sd-bus, a short reiteration on D-Bus and its concepts, as well as a
few simple examples how to write D-Bus clients and services with it.

What is D-Bus again?

Let’s start with a quick reminder what
D-Bus actually is: it’s a
powerful, generic IPC system for Linux and other operating systems. It
knows concepts like buses, objects, interfaces, methods, signals,
properties. It provides you with fine-grained access control, a rich
type system, discoverability, introspection, monitoring, reliable
multicasting, service activation, file descriptor passing, and
more. There are bindings for numerous programming languages that are
used on Linux.

D-Bus has been a core component of Linux systems since more than 10
years. It is certainly the most widely established high-level local
IPC system on Linux. Since systemd’s inception it has been the IPC
system it exposes its interfaces on. And even before systemd, it was
the IPC system Upstart used to expose its interfaces. It is used by
GNOME, by KDE and by a variety of system components.

D-Bus refers to both a
specification
,
and a reference
implementation
. The
reference implementation provides both a bus server component, as well
as a client library. While there are multiple other, popular
reimplementations of the client library – for both C and other
programming languages –, the only commonly used server side is the
one from the reference implementation. (However, the kdbus project is
working on providing an alternative to this server implementation as a
kernel component.)

D-Bus is mostly used as local IPC, on top of AF_UNIX sockets. However,
the protocol may be used on top of TCP/IP as well. It does not
natively support encryption, hence using D-Bus directly on TCP is
usually not a good idea. It is possible to combine D-Bus with a
transport like ssh in order to secure it. systemd uses this to make
many of its APIs accessible remotely.

A frequently asked question about D-Bus is why it exists at all,
given that AF_UNIX sockets and FIFOs already exist on UNIX and have
been used for a long time successfully. To answer this question let’s
make a comparison with popular web technology of today: what
AF_UNIX/FIFOs are to D-Bus, TCP is to HTTP/REST. While AF_UNIX
sockets/FIFOs only shovel raw bytes between processes, D-Bus defines
actual message encoding and adds concepts like method call
transactions, an object system, security mechanisms, multicasting and
more.

From our 10year+ experience with D-Bus we know today that while there
are some areas where we can improve things (and we are working on
that, both with kdbus and sd-bus), it generally appears to be a very
well designed system, that stood the test of time, aged well and is
widely established. Today, if we’d sit down and design a completely
new IPC system incorporating all the experience and knowledge we
gained with D-Bus, I am sure the result would be very close to what
D-Bus already is.

Or in short: D-Bus is great. If you hack on a Linux project and need a
local IPC, it should be your first choice. Not only because D-Bus is
well designed, but also because there aren’t many alternatives that
can cover similar functionality.

Where does sd-bus fit in?

Let’s discuss why sd-bus exists, how it compares with the other
existing C D-Bus libraries and why it might be a library to consider
for your project.

For C, there are two established, popular D-Bus libraries: libdbus, as
it is shipped in the reference implementation of D-Bus, as well as
GDBus, a component of GLib, the low-level tool library of GNOME.

Of the two libdbus is the much older one, as it was written at the
time the specification was put together. The library was written with
a focus on being portable and to be useful as back-end for higher-level
language bindings. Both of these goals required the API to be very
generic, resulting in a relatively baroque, hard-to-use API that lacks
the bits that make it easy and fun to use from C. It provides the
building blocks, but few tools to actually make it straightforward to
build a house from them. On the other hand, the library is suitable
for most use-cases (for example, it is OOM-safe making it suitable for
writing lowest level system software), and is portable to operating
systems like Windows or more exotic UNIXes.

GDBus
is a much newer implementation. It has been written after considerable
experience with using a GLib/GObject wrapper around libdbus. GDBus is
implemented from scratch, shares no code with libdbus. Its design
differs substantially from libdbus, it contains code generators to
make it specifically easy to expose GObject objects on the bus, or
talking to D-Bus objects as GObject objects. It translates D-Bus data
types to GVariant, which is GLib’s powerful data serialization
format. If you are used to GLib-style programming then you’ll feel
right at home, hacking D-Bus services and clients with it is a lot
simpler than using libdbus.

With sd-bus we now provide a third implementation, sharing no code
with either libdbus or GDBus. For us, the focus was on providing kind
of a middle ground between libdbus and GDBus: a low-level C library
that actually is fun to work with, that has enough syntactic sugar to
make it easy to write clients and services with, but on the other hand
is more low-level than GDBus/GLib/GObject/GVariant. To be able to use
it in systemd’s various system-level components it needed to be
OOM-safe and minimal. Another major point we wanted to focus on was
supporting a kdbus back-end right from the beginning, in addition to
the socket transport of the original D-Bus specification (“dbus1”). In
fact, we wanted to design the library closer to kdbus’ semantics than
to dbus1’s, wherever they are different, but still cover both
transports nicely. In contrast to libdbus or GDBus portability is not
a priority for sd-bus, instead we try to make the best of the Linux
platform and expose specific Linux concepts wherever that is
beneficial. Finally, performance was also an issue (though a secondary
one): neither libdbus nor GDBus will win any speed records. We wanted
to improve on performance (throughput and latency) — but simplicity
and correctness are more important to us. We believe the result of our
work delivers our goals quite nicely: the library is fun to use,
supports kdbus and sockets as back-end, is relatively minimal, and the
performance is substantially
better

than both libdbus and GDBus.

To decide which of the three APIs to use for you C project, here are
short guidelines:

  • If you hack on a GLib/GObject project, GDBus is definitely your
    first choice.

  • If portability to non-Linux kernels — including Windows, Mac OS and
    other UNIXes — is important to you, use either GDBus (which more or
    less means buying into GLib/GObject) or libdbus (which requires a
    lot of manual work).

  • Otherwise, sd-bus would be my recommended choice.

(I am not covering C++ specifically here, this is all about plain C
only. But do note: if you use Qt, then QtDBus is the D-Bus API of
choice, being a wrapper around libdbus.)

Introduction to D-Bus Concepts

To the uninitiated D-Bus usually appears to be a relatively opaque
technology. It uses lots of concepts that appear unnecessarily complex
and redundant on first sight. But actually, they make a lot of
sense. Let’s have a look:

  • A bus is where you look for IPC services. There are usually two
    kinds of buses: a system bus, of which there’s exactly one per
    system, and which is where you’d look for system services; and a
    user bus, of which there’s one per user, and which is where you’d
    look for user services, like the address book service or the mail
    program. (Originally, the user bus was actually a session bus — so
    that you get multiple of them if you log in many times as the same
    user –, and on most setups it still is, but we are working on
    moving things to a true user bus, of which there is only one per
    user on a system, regardless how many times that user happens to
    log in.)

  • A service is a program that offers some IPC API on a bus. A
    service is identified by a name in reverse domain name
    notation. Thus, the org.freedesktop.NetworkManager service on the
    system bus is where NetworkManager’s APIs are available and
    org.freedesktop.login1 on the system bus is where
    systemd-logind‘s APIs are exposed.

  • A client is a program that makes use of some IPC API on a bus. It
    talks to a service, monitors it and generally doesn’t provide any
    services on its own. That said, lines are blurry and many services
    are also clients to other services. Frequently the term peer is
    used as a generalization to refer to either a service or a client.

  • An object path is an identifier for an object on a specific
    service. In a way this is comparable to a C pointer, since that’s
    how you generally reference a C object, if you hack object-oriented
    programs in C. However, C pointers are just memory addresses, and
    passing memory addresses around to other processes would make
    little sense, since they of course refer to the address space of
    the service, the client couldn’t make sense of it. Thus, the D-Bus
    designers came up with the object path concept, which is just a
    string that looks like a file system path. Example:
    /org/freedesktop/login1 is the object path of the ‘manager’
    object of the org.freedesktop.login1 service (which, as we
    remember from above, is still the service systemd-logind
    exposes). Because object paths are structured like file system
    paths they can be neatly arranged in a tree, so that you end up
    with a venerable tree of objects. For example, you’ll find all user
    sessions systemd-logind manages below the
    /org/freedesktop/login1/session sub-tree, for example called
    /org/freedesktop/login1/session/_7,
    /org/freedesktop/login1/session/_55 and so on. How services
    precisely label their objects and arrange them in a tree is
    completely up to the developers of the services.

  • Each object that is identified by an object path has one or more
    interfaces. An interface is a collection of signals, methods, and
    properties (collectively called members), that belong
    together. The concept of a D-Bus interface is actually pretty
    much identical to what you know from programming languages such as
    Java, which also know an interface concept. Which interfaces an
    object implements are up the developers of the service. Interface
    names are in reverse domain name notation, much like service
    names. (Yes, that’s admittedly confusing, in particular since it’s
    pretty common for simpler services to reuse the service name string
    also as an interface name.) A couple of interfaces are standardized
    though and you’ll find them available on many of the objects
    offered by the various services. Specifically, those are
    org.freedesktop.DBus.Introspectable, org.freedesktop.DBus.Peer
    and org.freedesktop.DBus.Properties.

  • An interface can contain methods. The word “method” is more or
    less just a fancy word for “function”, and is a term used pretty
    much the same way in object-oriented languages such as Java. The
    most common interaction between D-Bus peers is that one peer
    invokes one of these methods on another peer and gets a reply. A
    D-Bus method takes a couple of parameters, and returns others. The
    parameters are transmitted in a type-safe way, and the type
    information is included in the introspection data you can query
    from each object. Usually, method names (and the other member
    types) follow a CamelCase syntax. For example, systemd-logind
    exposes an ActivateSession method on the
    org.freedesktop.login1.Manager interface that is available on the
    /org/freedesktop/login1 object of the org.freedesktop.login1
    service.

  • A signature describes a set of parameters a function (or signal,
    property, see below) takes or returns. It’s a series of characters
    that each encode one parameter by its type. The set of types
    available is pretty powerful. For example, there are simpler types
    like s for string, or u for 32bit integer, but also complex
    types such as as for an array of strings or a(sb) for an array
    of structures consisting of one string and one boolean each. See
    the D-Bus specification
    for the full explanation of the type system. The
    ActivateSession method mentioned above takes a single string as
    parameter (the parameter signature is hence s), and returns
    nothing (the return signature is hence the empty string). Of
    course, the signature can get a lot more complex, see below for
    more examples.

  • A signal is another member type that the D-Bus object system
    knows. Much like a method it has a signature. However, they serve
    different purposes. While in a method call a single client issues a
    request on a single service, and that service sends back a response
    to the client, signals are for general notification of
    peers. Services send them out when they want to tell one or more
    peers on the bus that something happened or changed. In contrast to
    method calls and their replies they are hence usually broadcast
    over a bus. While method calls/replies are used for duplex
    one-to-one communication, signals are usually used for simplex
    one-to-many communication (note however that that’s not a
    requirement, they can also be used one-to-one). Example:
    systemd-logind broadcasts a SessionNew signal from its manager
    object each time a user logs in, and a SessionRemoved signal
    every time a user logs out.

  • A property is the third member type that the D-Bus object system
    knows. It’s similar to the property concept known by languages like
    C#. Properties also have a signature, and are more or less just
    variables that an object exposes, that can be read or altered by
    clients. Example: systemd-logind exposes a property Docked of
    the signature b (a boolean). It reflects whether systemd-logind
    thinks the system is currently in a docking station of some form
    (only applies to laptops …).

So much for the various concepts D-Bus knows. Of course, all these new
concepts might be overwhelming. Let’s look at them from a different
perspective. I assume many of the readers have an understanding of
today’s web technology, specifically HTTP and REST. Let’s try to
compare the concept of a HTTP request with the concept of a D-Bus
method call:

  • A HTTP request you issue on a specific network. It could be the
    Internet, or it could be your local LAN, or a company
    VPN. Depending on which network you issue the request on, you’ll be
    able to talk to a different set of servers. This is not unlike the
    “bus” concept of D-Bus.

  • On the network you then pick a specific HTTP server to talk
    to. That’s roughly comparable to picking a service on a specific bus.

  • On the HTTP server you then ask for a specific URL. The “path” part
    of the URL (by which I mean everything after the host name of the
    server, up to the last “/”) is pretty similar to a D-Bus object path.

  • The “file” part of the URL (by which I mean everything after the
    last slash, following the path, as described above), then defines
    the actual call to make. In D-Bus this could be mapped to an
    interface and method name.

  • Finally, the parameters of a HTTP call follow the path after the
    “?”, they map to the signature of the D-Bus call.

Of course, comparing an HTTP request to a D-Bus method call is a bit
comparing apples and oranges. However, I think it’s still useful to
get a bit of a feeling of what maps to what.

From the shell

So much about the concepts and the gray theory behind them. Let’s make
this exciting, let’s actually see how this feels on a real system.

Since a while systemd has included a tool busctl that is useful to
explore and interact with the D-Bus object system. When invoked
without parameters, it will show you a list of all peers connected to
the system bus. (Use --user to see the peers of your user bus
instead):

$ busctl
NAME                                       PID PROCESS         USER             CONNECTION    UNIT                      SESSION    DESCRIPTION
:1.1                                         1 systemd         root             :1.1          -                         -          -
:1.11                                      705 NetworkManager  root             :1.11         NetworkManager.service    -          -
:1.14                                      744 gdm             root             :1.14         gdm.service               -          -
:1.4                                       708 systemd-logind  root             :1.4          systemd-logind.service    -          -
:1.7200                                  17563 busctl          lennart          :1.7200       session-1.scope           1          -
[…]
org.freedesktop.NetworkManager             705 NetworkManager  root             :1.11         NetworkManager.service    -          -
org.freedesktop.login1                     708 systemd-logind  root             :1.4          systemd-logind.service    -          -
org.freedesktop.systemd1                     1 systemd         root             :1.1          -                         -          -
org.gnome.DisplayManager                   744 gdm             root             :1.14         gdm.service               -          -
[…]

(I have shortened the output a bit, to make keep things brief).

The list begins with a list of all peers currently connected to the
bus. They are identified by peer names like “:1.11”. These are called
unique names in D-Bus nomenclature. Basically, every peer has a
unique name, and they are assigned automatically when a peer connects
to the bus. They are much like an IP address if you so will. You’ll
notice that a couple of peers are already connected, including our
little busctl tool itself as well as a number of system services. The
list then shows all actual services on the bus, identified by their
service names (as discussed above; to discern them from the unique
names these are also called well-known names). In many ways
well-known names are similar to DNS host names, i.e. they are a
friendlier way to reference a peer, but on the lower level they just
map to an IP address, or in this comparison the unique name. Much like
you can connect to a host on the Internet by either its host name or
its IP address, you can also connect to a bus peer either by its
unique or its well-known name. (Note that each peer can have as many
well-known names as it likes, much like an IP address can have
multiple host names referring to it).

OK, that’s already kinda cool. Try it for yourself, on your local
machine (all you need is a recent, systemd-based distribution).

Let’s now go the next step. Let’s see which objects the
org.freedesktop.login1 service actually offers:

$ busctl tree org.freedesktop.login1
└─/org/freedesktop/login1
  ├─/org/freedesktop/login1/seat
  │ ├─/org/freedesktop/login1/seat/seat0
  │ └─/org/freedesktop/login1/seat/self
  ├─/org/freedesktop/login1/session
  │ ├─/org/freedesktop/login1/session/_31
  │ └─/org/freedesktop/login1/session/self
  └─/org/freedesktop/login1/user
    ├─/org/freedesktop/login1/user/_1000
    └─/org/freedesktop/login1/user/self

Pretty, isn’t it? What’s actually even nicer, and which the output
does not show is that there’s full command line completion
available: as you press TAB the shell will auto-complete the service
names for you. It’s a real pleasure to explore your D-Bus objects that
way!

The output shows some objects that you might recognize from the
explanations above. Now, let’s go further. Let’s see what interfaces,
methods, signals and properties one of these objects actually exposes:

$ busctl introspect org.freedesktop.login1 /org/freedesktop/login1/session/_31
NAME                                TYPE      SIGNATURE RESULT/VALUE                             FLAGS
org.freedesktop.DBus.Introspectable interface -         -                                        -
.Introspect                         method    -         s                                        -
org.freedesktop.DBus.Peer           interface -         -                                        -
.GetMachineId                       method    -         s                                        -
.Ping                               method    -         -                                        -
org.freedesktop.DBus.Properties     interface -         -                                        -
.Get                                method    ss        v                                        -
.GetAll                             method    s         a{sv}                                    -
.Set                                method    ssv       -                                        -
.PropertiesChanged                  signal    sa{sv}as  -                                        -
org.freedesktop.login1.Session      interface -         -                                        -
.Activate                           method    -         -                                        -
.Kill                               method    si        -                                        -
.Lock                               method    -         -                                        -
.PauseDeviceComplete                method    uu        -                                        -
.ReleaseControl                     method    -         -                                        -
.ReleaseDevice                      method    uu        -                                        -
.SetIdleHint                        method    b         -                                        -
.TakeControl                        method    b         -                                        -
.TakeDevice                         method    uu        hb                                       -
.Terminate                          method    -         -                                        -
.Unlock                             method    -         -                                        -
.Active                             property  b         true                                     emits-change
.Audit                              property  u         1                                        const
.Class                              property  s         "user"                                   const
.Desktop                            property  s         ""                                       const
.Display                            property  s         ""                                       const
.Id                                 property  s         "1"                                      const
.IdleHint                           property  b         true                                     emits-change
.IdleSinceHint                      property  t         1434494624206001                         emits-change
.IdleSinceHintMonotonic             property  t         0                                        emits-change
.Leader                             property  u         762                                      const
.Name                               property  s         "lennart"                                const
.Remote                             property  b         false                                    const
.RemoteHost                         property  s         ""                                       const
.RemoteUser                         property  s         ""                                       const
.Scope                              property  s         "session-1.scope"                        const
.Seat                               property  (so)      "seat0" "/org/freedesktop/login1/seat... const
.Service                            property  s         "gdm-autologin"                          const
.State                              property  s         "active"                                 -
.TTY                                property  s         "/dev/tty1"                              const
.Timestamp                          property  t         1434494630344367                         const
.TimestampMonotonic                 property  t         34814579                                 const
.Type                               property  s         "x11"                                    const
.User                               property  (uo)      1000 "/org/freedesktop/login1/user/_1... const
.VTNr                               property  u         1                                        const
.Lock                               signal    -         -                                        -
.PauseDevice                        signal    uus       -                                        -
.ResumeDevice                       signal    uuh       -                                        -
.Unlock                             signal    -         -                                        -

As before, the busctl command supports command line completion, hence
both the service name and the object path used are easily put together
on the shell simply by pressing TAB. The output shows the methods,
properties, signals of one of the session objects that are currently
made available by systemd-logind. There’s a section for each
interface the object knows. The second column tells you what kind of
member is shown in the line. The third column shows the signature of
the member. In case of method calls that’s the input parameters, the
fourth column shows what is returned. For properties, the fourth
column encodes the current value of them.

So far, we just explored. Let’s take the next step now: let’s become
active – let’s call a method:

# busctl call org.freedesktop.login1 /org/freedesktop/login1/session/_31 org.freedesktop.login1.Session Lock

I don’t think I need to mention this anymore, but anyway: again
there’s full command line completion available. The third argument is
the interface name, the fourth the method name, both can be easily
completed by pressing TAB. In this case we picked the Lock method,
which activates the screen lock for the specific session. And yupp,
the instant I pressed enter on this line my screen lock turned on
(this only works on DEs that correctly hook into systemd-logind for
this to work. GNOME works fine, and KDE should work too).

The Lock method call we picked is very simple, as it takes no
parameters and returns none. Of course, it can get more complicated
for some calls. Here’s another example, this time using one of
systemd’s own bus calls, to start an arbitrary system unit:

# busctl call org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager StartUnit ss "cups.service" "replace"
o "/org/freedesktop/systemd1/job/42684"

This call takes two strings as input parameters, as we denote in the
signature string that follows the method name (as usual, command line
completion helps you getting this right). Following the signature the
next two parameters are simply the two strings to pass. The specified
signature string hence indicates what comes next. systemd’s StartUnit
method call takes the unit name to start as first parameter, and the
mode in which to start it as second. The call returned a single object
path value. It is encoded the same way as the input parameter: a
signature (just o for the object path) followed by the actual value.

Of course, some method call parameters can get a ton more complex, but
with busctl it’s relatively easy to encode them all. See the man
page
for
details.

busctl knows a number of other operations. For example, you can use
it to monitor D-Bus traffic as it happens (including generating a
.cap file for use with Wireshark!) or you can set or get specific
properties. However, this blog story was supposed to be about sd-bus,
not busctl, hence let’s cut this short here, and let me direct you
to the man page in case you want to know more about the tool.

busctl (like the rest of system) is implemented using the sd-bus
API. Thus it exposes many of the features of sd-bus itself. For
example, you can use to connect to remote or container buses. It
understands both kdbus and classic D-Bus, and more!

sd-bus

But enough! Let’s get back on topic, let’s talk about sd-bus itself.

The sd-bus set of APIs is mostly contained in the header file
sd-bus.h.

Here’s a random selection of features of the library, that make it
compare well with the other implementations available.

  • Supports both kdbus and dbus1 as back-end.

  • Has high-level support for connecting to remote buses via ssh, and
    to buses of local OS containers.

  • Powerful credential model, to implement authentication of clients
    in services. Currently 34 individual fields are supported, from the
    PID of the client to the cgroup or capability sets.

  • Support for tracking the life-cycle of peers in order to release
    local objects automatically when all peers referencing them
    disconnected.

  • The client builds an efficient decision tree to determine which
    handlers to deliver an incoming bus message to.

  • Automatically translates D-Bus errors into UNIX style errors and
    back (this is lossy though), to ensure best integration of D-Bus
    into low-level Linux programs.

  • Powerful but lightweight object model for exposing local objects on
    the bus. Automatically generates introspection as necessary.

The API is currently not fully documented, but we are working on
completing the set of manual pages. For details
see all pages starting with sd_bus_.

Invoking a Method, from C, with sd-bus

So much about the library in general. Here’s an example for connecting
to the bus and issuing a method call:

#include <stdio.h>
#include <stdlib.h>
#include <systemd/sd-bus.h>

int main(int argc, char *argv[]) {
        sd_bus_error error = SD_BUS_ERROR_NULL;
        sd_bus_message *m = NULL;
        sd_bus *bus = NULL;
        const char *path;
        int r;

        /* Connect to the system bus */
        r = sd_bus_open_system(&bus);
        if (r < 0) {
                fprintf(stderr, "Failed to connect to system bus: %s\n", strerror(-r));
                goto finish;
        }

        /* Issue the method call and store the respons message in m */
        r = sd_bus_call_method(bus,
                               "org.freedesktop.systemd1",           /* service to contact */
                               "/org/freedesktop/systemd1",          /* object path */
                               "org.freedesktop.systemd1.Manager",   /* interface name */
                               "StartUnit",                          /* method name */
                               &error,                               /* object to return error in */
                               &m,                                   /* return message on success */
                               "ss",                                 /* input signature */
                               "cups.service",                       /* first argument */
                               "replace");                           /* second argument */
        if (r < 0) {
                fprintf(stderr, "Failed to issue method call: %s\n", error.message);
                goto finish;
        }

        /* Parse the response message */
        r = sd_bus_message_read(m, "o", &path);
        if (r < 0) {
                fprintf(stderr, "Failed to parse response message: %s\n", strerror(-r));
                goto finish;
        }

        printf("Queued service job as %s.\n", path);

finish:
        sd_bus_error_free(&error);
        sd_bus_message_unref(m);
        sd_bus_unref(bus);

        return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

Save this example as bus-client.c, then build it with:

$ gcc bus-client.c -o bus-client `pkg-config --cflags --libs libsystemd`

This will generate a binary bus-client you can now run. Make sure to
run it as root though, since access to the StartUnit method is
privileged:

# ./bus-client
Queued service job as /org/freedesktop/systemd1/job/3586.

And that’s it already, our first example. It showed how we invoked a
method call on the bus. The actual function call of the method is very
close to the busctl command line we used before. I hope the code
excerpt needs little further explanation. It’s supposed to give you a
taste how to write D-Bus clients with sd-bus. For more more
information please have a look at the header file, the man page or
even the sd-bus sources.

Implementing a Service, in C, with sd-bus

Of course, just calling a single method is a rather simplistic
example. Let’s have a look on how to write a bus service. We’ll write
a small calculator service, that exposes a single object, which
implements an interface that exposes two methods: one to multiply two
64bit signed integers, and one to divide one 64bit signed integer by
another.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <systemd/sd-bus.h>

static int method_multiply(sd_bus_message *m, void *userdata, sd_bus_error *ret_error) {
        int64_t x, y;
        int r;

        /* Read the parameters */
        r = sd_bus_message_read(m, "xx", &x, &y);
        if (r < 0) {
                fprintf(stderr, "Failed to parse parameters: %s\n", strerror(-r));
                return r;
        }

        /* Reply with the response */
        return sd_bus_reply_method_return(m, "x", x * y);
}

static int method_divide(sd_bus_message *m, void *userdata, sd_bus_error *ret_error) {
        int64_t x, y;
        int r;

        /* Read the parameters */
        r = sd_bus_message_read(m, "xx", &x, &y);
        if (r < 0) {
                fprintf(stderr, "Failed to parse parameters: %s\n", strerror(-r));
                return r;
        }

        /* Return an error on division by zero */
        if (y == 0) {
                sd_bus_error_set_const(ret_error, "net.poettering.DivisionByZero", "Sorry, can't allow division by zero.");
                return -EINVAL;
        }

        return sd_bus_reply_method_return(m, "x", x / y);
}

/* The vtable of our little object, implements the net.poettering.Calculator interface */
static const sd_bus_vtable calculator_vtable[] = {
        SD_BUS_VTABLE_START(0),
        SD_BUS_METHOD("Multiply", "xx", "x", method_multiply, SD_BUS_VTABLE_UNPRIVILEGED),
        SD_BUS_METHOD("Divide",   "xx", "x", method_divide,   SD_BUS_VTABLE_UNPRIVILEGED),
        SD_BUS_VTABLE_END
};

int main(int argc, char *argv[]) {
        sd_bus_slot *slot = NULL;
        sd_bus *bus = NULL;
        int r;

        /* Connect to the user bus this time */
        r = sd_bus_open_user(&bus);
        if (r < 0) {
                fprintf(stderr, "Failed to connect to system bus: %s\n", strerror(-r));
                goto finish;
        }

        /* Install the object */
        r = sd_bus_add_object_vtable(bus,
                                     &slot,
                                     "/net/poettering/Calculator",  /* object path */
                                     "net.poettering.Calculator",   /* interface name */
                                     calculator_vtable,
                                     NULL);
        if (r < 0) {
                fprintf(stderr, "Failed to issue method call: %s\n", strerror(-r));
                goto finish;
        }

        /* Take a well-known service name so that clients can find us */
        r = sd_bus_request_name(bus, "net.poettering.Calculator", 0);
        if (r < 0) {
                fprintf(stderr, "Failed to acquire service name: %s\n", strerror(-r));
                goto finish;
        }

        for (;;) {
                /* Process requests */
                r = sd_bus_process(bus, NULL);
                if (r < 0) {
                        fprintf(stderr, "Failed to process bus: %s\n", strerror(-r));
                        goto finish;
                }
                if (r > 0) /* we processed a request, try to process another one, right-away */
                        continue;

                /* Wait for the next request to process */
                r = sd_bus_wait(bus, (uint64_t) -1);
                if (r < 0) {
                        fprintf(stderr, "Failed to wait on bus: %s\n", strerror(-r));
                        goto finish;
                }
        }

finish:
        sd_bus_slot_unref(slot);
        sd_bus_unref(bus);

        return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
}

Save this example as bus-service.c, then build it with:

$ gcc bus-service.c -o bus-service `pkg-config --cflags --libs libsystemd`

Now, let’s run it:

$ ./bus-service

In another terminal, let’s try to talk to it. Note that this service
is now on the user bus, not on the system bus as before. We do this
for simplicity reasons: on the system bus access to services is
tightly controlled so unprivileged clients cannot request privileged
operations. On the user bus however things are simpler: as only
processes of the user owning the bus can connect no further policy
enforcement will complicate this example. Because the service is on
the user bus, we have to pass the --user switch on the busctl
command line. Let’s start with looking at the service’s object tree.

$ busctl --user tree net.poettering.Calculator
└─/net/poettering/Calculator

As we can see, there’s only a single object on the service, which is
not surprising, given that our code above only registered one. Let’s
see the interfaces and the members this object exposes:

$ busctl --user introspect net.poettering.Calculator /net/poettering/Calculator
NAME                                TYPE      SIGNATURE RESULT/VALUE FLAGS
net.poettering.Calculator           interface -         -            -
.Divide                             method    xx        x            -
.Multiply                           method    xx        x            -
org.freedesktop.DBus.Introspectable interface -         -            -
.Introspect                         method    -         s            -
org.freedesktop.DBus.Peer           interface -         -            -
.GetMachineId                       method    -         s            -
.Ping                               method    -         -            -
org.freedesktop.DBus.Properties     interface -         -            -
.Get                                method    ss        v            -
.GetAll                             method    s         a{sv}        -
.Set                                method    ssv       -            -
.PropertiesChanged                  signal    sa{sv}as  -            -

The sd-bus library automatically added a couple of generic interfaces,
as mentioned above. But the first interface we see is actually the one
we added! It shows our two methods, and both take “xx” (two 64bit
signed integers) as input parameters, and return one “x”. Great! But
does it work?

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Multiply xx 5 7
x 35

Woohoo! We passed the two integers 5 and 7, and the service actually
multiplied them for us and returned a single integer 35! Let’s try the
other method:

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Divide xx 99 17
x 5

Oh, wow! It can even do integer division! Fantastic! But let’s trick
it into dividing by zero:

$ busctl --user call net.poettering.Calculator /net/poettering/Calculator net.poettering.Calculator Divide xx 43 0
Sorry, can't allow division by zero.

Nice! It detected this nicely and returned a clean error about it. If
you look in the source code example above you’ll see how precisely we
generated the error.

And that’s really all I have for today. Of course, the examples I
showed are short, and I don’t get into detail here on what precisely
each line does. However, this is supposed to be a short introduction
into D-Bus and sd-bus, and it’s already way too long for that …

I hope this blog story was useful to you. If you are interested in
using sd-bus for your own programs, I hope this gets you started. If
you have further questions, check the (incomplete) man pages, and
inquire us on IRC or the systemd mailing list. If you need more
examples, have a look at the systemd source tree, all of systemd’s
many bus services use sd-bus extensively.

Revisiting How We Put Together Linux Systems

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html

In a previous blog story I discussed
Factory Reset, Stateless Systems, Reproducible Systems & Verifiable Systems,
I now want to take the opportunity to explain a bit where we want to
take this with
systemd in the
longer run, and what we want to build out of it. This is going to be a
longer story, so better grab a cold bottle of
Club Mate before you start
reading.

Traditional Linux distributions are built around packaging systems
like RPM or dpkg, and an organization model where upstream developers
and downstream packagers are relatively clearly separated: an upstream
developer writes code, and puts it somewhere online, in a tarball. A
packager than grabs it and turns it into RPMs/DEBs. The user then
grabs these RPMs/DEBs and installs them locally on the system. For a
variety of uses this is a fantastic scheme: users have a large
selection of readily packaged software available, in mostly uniform
packaging, from a single source they can trust. In this scheme the
distribution vets all software it packages, and as long as the user
trusts the distribution all should be good. The distribution takes the
responsibility of ensuring the software is not malicious, of timely
fixing security problems and helping the user if something is wrong.

Upstream Projects

However, this scheme also has a number of problems, and doesn’t fit
many use-cases of our software particularly well. Let’s have a look at
the problems of this scheme for many upstreams:

  • Upstream software vendors are fully dependent on downstream
    distributions to package their stuff. It’s the downstream
    distribution that decides on schedules, packaging details, and how
    to handle support. Often upstream vendors want much faster release
    cycles then the downstream distributions follow.

  • Realistic testing is extremely unreliable and next to
    impossible. Since the end-user can run a variety of different
    package versions together, and expects the software he runs to just
    work on any combination, the test matrix explodes. If upstream tests
    its version on distribution X release Y, then there’s no guarantee
    that that’s the precise combination of packages that the end user
    will eventually run. In fact, it is very unlikely that the end user
    will, since most distributions probably updated a number of
    libraries the package relies on by the time the package ends up being
    made available to the user. The fact that each package can be
    individually updated by the user, and each user can combine library
    versions, plug-ins and executables relatively freely, results in a high
    risk of something going wrong.

  • Since there are so many different distributions in so many different
    versions around, if upstream tries to build and test software for
    them it needs to do so for a large number of distributions, which is
    a massive effort.

  • The distributions are actually quite different in many ways. In
    fact, they are different in a lot of the most basic
    functionality. For example, the path where to put x86-64 libraries
    is different on Fedora and Debian derived systems..

  • Developing software for a number of distributions and versions is
    hard: if you want to do it, you need to actually install them, each
    one of them, manually, and then build your software for each.

  • Since most downstream distributions have strict licensing and
    trademark requirements (and rightly so), any kind of closed source
    software (or otherwise non-free) does not fit into this scheme at
    all.

This all together makes it really hard for many upstreams to work
nicely with the current way how Linux works. Often they try to improve
the situation for them, for example by bundling libraries, to make
their test and build matrices smaller.

System Vendors

The toolbox approach of classic Linux distributions is fantastic for
people who want to put together their individual system, nicely
adjusted to exactly what they need. However, this is not really how
many of today’s Linux systems are built, installed or updated. If you
build any kind of embedded device, a server system, or even user
systems, you frequently do your work based on complete system images,
that are linearly versioned. You build these images somewhere, and
then you replicate them atomically to a larger number of systems. On
these systems, you don’t install or remove packages, you get a defined
set of files, and besides installing or updating the system there are
no ways how to change the set of tools you get.

The current Linux distributions are not particularly good at providing
for this major use-case of Linux. Their strict focus on individual
packages as well as package managers as end-user install and update
tool is incompatible with what many system vendors want.

Users

The classic Linux distribution scheme is frequently not what end users
want, either. Many users are used to app markets like Android, Windows
or iOS/Mac have. Markets are a platform that doesn’t package, build or
maintain software like distributions do, but simply allows users to
quickly find and download the software they need, with the app vendor
responsible for keeping the app updated, secured, and all that on the
vendor’s release cycle. Users tend to be impatient. They want their
software quickly, and the fine distinction between trusting a single
distribution or a myriad of app developers individually is usually not
important for them. The companies behind the marketplaces usually try
to improve this trust problem by providing sand-boxing technologies: as
a replacement for the distribution that audits, vets, builds and
packages the software and thus allows users to trust it to a certain
level, these vendors try to find technical solutions to ensure that
the software they offer for download can’t be malicious.

Existing Approaches To Fix These Problems

Now, all the issues pointed out above are not new, and there are
sometimes quite successful attempts to do something about it. Ubuntu
Apps, Docker, Software Collections, ChromeOS, CoreOS all fix part of
this problem set, usually with a strict focus on one facet of Linux
systems. For example, Ubuntu Apps focus strictly on end user (desktop)
applications, and don’t care about how we built/update/install the OS
itself, or containers. Docker OTOH focuses on containers only, and
doesn’t care about end-user apps. Software Collections tries to focus
on the development environments. ChromeOS focuses on the OS itself,
but only for end-user devices. CoreOS also focuses on the OS, but
only for server systems.

The approaches they find are usually good at specific things, and use
a variety of different technologies, on different layers. However,
none of these projects tried to fix this problems in a generic way,
for all uses, right in the core components of the OS itself.

Linux has come to tremendous successes because its kernel is so
generic: you can build supercomputers and tiny embedded devices out of
it. It’s time we come up with a basic, reusable scheme how to solve
the problem set described above, that is equally generic.

What We Want

The systemd cabal (Kay Sievers, Harald Hoyer, Daniel Mack, Tom
Gundersen, David Herrmann, and yours truly) recently met in Berlin
about all these things, and tried to come up with a scheme that is
somewhat simple, but tries to solve the issues generically, for all
use-cases, as part of the systemd project. All that in a way that is
somewhat compatible with the current scheme of distributions, to allow
a slow, gradual adoption. Also, and that’s something one cannot stress
enough: the toolbox scheme of classic Linux distributions is
actually a good one, and for many cases the right one. However, we
need to make sure we make distributions relevant again for all
use-cases, not just those of highly individualized systems.

Anyway, so let’s summarize what we are trying to do:

  • We want an efficient way that allows vendors to package their
    software (regardless if just an app, or the whole OS) directly for
    the end user, and know the precise combination of libraries and
    packages it will operate with.

  • We want to allow end users and administrators to install these
    packages on their systems, regardless which distribution they have
    installed on it.

  • We want a unified solution that ultimately can cover updates for
    full systems, OS containers, end user apps, programming ABIs, and
    more. These updates shall be double-buffered, (at least). This is an
    absolute necessity if we want to prepare the ground for operating
    systems that manage themselves, that can update safely without
    administrator involvement.

  • We want our images to be trustable (i.e. signed). In fact we want a
    fully trustable OS, with images that can be verified by a full
    trust chain from the firmware (EFI SecureBoot!), through the boot loader, through the
    kernel, and initrd. Cryptographically secure verification of the
    code we execute is relevant on the desktop (like ChromeOS does), but
    also for apps, for embedded devices and even on servers (in a post-Snowden
    world, in particular).

What We Propose

So much about the set of problems, and what we are trying to do. So,
now, let’s discuss the technical bits we came up with:

The scheme we propose is built around the variety of concepts of btrfs
and Linux file system name-spacing. btrfs at this point already has a
large number of features that fit neatly in our concept, and the
maintainers are busy working on a couple of others we want to
eventually make use of.

As first part of our proposal we make heavy use of btrfs sub-volumes and
introduce a clear naming scheme for them. We name snapshots like this:

  • usr:<vendorid>:<architecture>:<version> — This refers to a full
    vendor operating system tree. It’s basically a /usr tree (and no
    other directories), in a specific version, with everything you need to boot
    it up inside it. The <vendorid> field is replaced by some vendor
    identifier, maybe a scheme like
    org.fedoraproject.FedoraWorkstation. The <architecture> field
    specifies a CPU architecture the OS is designed for, for example
    x86-64. The <version> field specifies a specific OS version, for
    example 23.4. An example sub-volume name could hence look like this:
    usr:org.fedoraproject.FedoraWorkstation:x86_64:23.4

  • root:<name>:<vendorid>:<architecture> — This refers to an
    instance of an operating system. Its basically a root directory,
    containing primarily /etc and /var (but possibly more). Sub-volumes
    of this type do not contain a populated /usr tree though. The
    <name> field refers to some instance name (maybe the host name of
    the instance). The other fields are defined as above. An example
    sub-volume name is
    root:revolution:org.fedoraproject.FedoraWorkstation:x86_64.

  • runtime:<vendorid>:<architecture>:<version> — This refers to a
    vendor runtime. A runtime here is supposed to be a set of
    libraries and other resources that are needed to run apps (for the
    concept of apps see below), all in a /usr tree. In this regard this
    is very similar to the usr sub-volumes explained above, however,
    while a usr sub-volume is a full OS and contains everything
    necessary to boot, a runtime is really only a set of
    libraries. You cannot boot it, but you can run apps with it. An
    example sub-volume name is: runtime:org.gnome.GNOME3_20:x86_64:3.20.1

  • framework:<vendorid>:<architecture>:<version> — This is very
    similar to a vendor runtime, as described above, it contains just a
    /usr tree, but goes one step further: it additionally contains all
    development headers, compilers and build tools, that allow
    developing against a specific runtime. For each runtime there should
    be a framework. When you develop against a specific framework in a
    specific architecture, then the resulting app will be compatible
    with the runtime of the same vendor ID and architecture. Example:
    framework:org.gnome.GNOME3_20:x86_64:3.20.1

  • app:<vendorid>:<runtime>:<architecture>:<version> — This
    encapsulates an application bundle. It contains a tree that at
    runtime is mounted to /opt/<vendorid>, and contains all the
    application’s resources. The <vendorid> could be a string like
    org.libreoffice.LibreOffice, the <runtime> refers to one the
    vendor id of one specific runtime the application is built for, for
    example org.gnome.GNOME3_20:3.20.1. The <architecture> and
    <version> refer to the architecture the application is built for,
    and of course its version. Example:
    app:org.libreoffice.LibreOffice:GNOME3_20:x86_64:133

  • home:<user>:<uid>:<gid> — This sub-volume shall refer to the home
    directory of the specific user. The <user> field contains the user
    name, the <uid> and <gid> fields the numeric Unix UIDs and GIDs
    of the user. The idea here is that in the long run the list of
    sub-volumes is sufficient as a user database (but see
    below). Example: home:lennart:1000:1000.

btrfs partitions that adhere to this naming scheme should be clearly
identifiable. It is our intention to introduce a new GPT partition type
ID for this.

How To Use It

After we introduced this naming scheme let’s see what we can build of
this:

  • When booting up a system we mount the root directory from one of the
    root sub-volumes, and then mount /usr from a matching usr
    sub-volume. Matching here means it carries the same <vendor-id>
    and <architecture>. Of course, by default we should pick the
    matching usr sub-volume with the newest version by default.

  • When we boot up an OS container, we do exactly the same as the when
    we boot up a regular system: we simply combine a usr sub-volume
    with a root sub-volume.

  • When we enumerate the system’s users we simply go through the
    list of home snapshots.

  • When a user authenticates and logs in we mount his home
    directory from his snapshot.

  • When an app is run, we set up a new file system name-space, mount the
    app sub-volume to /opt/<vendorid>/, and the appropriate runtime
    sub-volume the app picked to /usr, as well as the user’s
    /home/$USER to its place.

  • When a developer wants to develop against a specific runtime he
    installs the right framework, and then temporarily transitions into
    a name space where /usris mounted from the framework sub-volume, and
    /home/$USER from his own home directory. In this name space he then
    runs his build commands. He can build in multiple name spaces at the
    same time, if he intends to builds software for multiple runtimes or
    architectures at the same time.

Instantiating a new system or OS container (which is exactly the same
in this scheme) just consists of creating a new appropriately named
root sub-volume. Completely naturally you can share one vendor OS
copy in one specific version with a multitude of container instances.

Everything is double-buffered (or actually, n-fold-buffered), because
usr, runtime, framework, app sub-volumes can exist in multiple
versions. Of course, by default the execution logic should always pick
the newest release of each sub-volume, but it is up to the user keep
multiple versions around, and possibly execute older versions, if he
desires to do so. In fact, like on ChromeOS this could even be handled
automatically: if a system fails to boot with a newer snapshot, the
boot loader can automatically revert back to an older version of the
OS.

An Example

Note that in result this allows installing not only multiple end-user
applications into the same btrfs volume, but also multiple operating
systems, multiple system instances, multiple runtimes, multiple
frameworks. Or to spell this out in an example:

Let’s say Fedora, Mageia and ArchLinux all implement this scheme,
and provide ready-made end-user images. Also, the GNOME, KDE, SDL
projects all define a runtime+framework to develop against. Finally,
both LibreOffice and Firefox provide their stuff according to this
scheme. You can now trivially install of these into the same btrfs
volume:

  • usr:org.fedoraproject.WorkStation:x86_64:24.7
  • usr:org.fedoraproject.WorkStation:x86_64:24.8
  • usr:org.fedoraproject.WorkStation:x86_64:24.9
  • usr:org.fedoraproject.WorkStation:x86_64:25beta
  • usr:org.mageia.Client:i386:39.3
  • usr:org.mageia.Client:i386:39.4
  • usr:org.mageia.Client:i386:39.6
  • usr:org.archlinux.Desktop:x86_64:302.7.8
  • usr:org.archlinux.Desktop:x86_64:302.7.9
  • usr:org.archlinux.Desktop:x86_64:302.7.10
  • root:revolution:org.fedoraproject.WorkStation:x86_64
  • root:testmachine:org.fedoraproject.WorkStation:x86_64
  • root:foo:org.mageia.Client:i386
  • root:bar:org.archlinux.Desktop:x86_64
  • runtime:org.gnome.GNOME3_20:x86_64:3.20.1
  • runtime:org.gnome.GNOME3_20:x86_64:3.20.4
  • runtime:org.gnome.GNOME3_20:x86_64:3.20.5
  • runtime:org.gnome.GNOME3_22:x86_64:3.22.0
  • runtime:org.kde.KDE5_6:x86_64:5.6.0
  • framework:org.gnome.GNOME3_22:x86_64:3.22.0
  • framework:org.kde.KDE5_6:x86_64:5.6.0
  • app:org.libreoffice.LibreOffice:GNOME3_20:x86_64:133
  • app:org.libreoffice.LibreOffice:GNOME3_22:x86_64:166
  • app:org.mozilla.Firefox:GNOME3_20:x86_64:39
  • app:org.mozilla.Firefox:GNOME3_20:x86_64:40
  • home:lennart:1000:1000
  • home:hrundivbakshi:1001:1001

In the example above, we have three vendor operating systems
installed. All of them in three versions, and one even in a beta
version. We have four system instances around. Two of them of Fedora,
maybe one of them we usually boot from, the other we run for very
specific purposes in an OS container. We also have the runtimes for
two GNOME releases in multiple versions, plus one for KDE. Then, we
have the development trees for one version of KDE and GNOME around, as
well as two apps, that make use of two releases of the GNOME
runtime. Finally, we have the home directories of two users.

Now, with the name-spacing concepts we introduced above, we can
actually relatively freely mix and match apps and OSes, or develop
against specific frameworks in specific versions on any operating
system. It doesn’t matter if you booted your ArchLinux instance, or
your Fedora one, you can execute both LibreOffice and Firefox just
fine, because at execution time they get matched up with the right
runtime, and all of them are available from all the operating systems
you installed. You get the precise runtime that the upstream vendor of
Firefox/LibreOffice did their testing with. It doesn’t matter anymore
which distribution you run, and which distribution the vendor prefers.

Also, given that the user database is actually encoded in the
sub-volume list, it doesn’t matter which system you boot, the
distribution should be able to find your local users automatically,
without any configuration in /etc/passwd.

Building Blocks

With this naming scheme plus the way how we can combine them on
execution we already came quite far, but how do we actually get these
sub-volumes onto the final machines, and how do we update them? Well,
btrfs has a feature they call “send-and-receive”. It basically allows
you to “diff” two file system versions, and generate a binary
delta. You can generate these deltas on a developer’s machine and then
push them into the user’s system, and he’ll get the exact same
sub-volume too. This is how we envision installation and updating of
operating systems, applications, runtimes, frameworks. At installation
time, we simply deserialize an initial send-and-receive delta into
our btrfs volume, and later, when a new version is released we just
add in the few bits that are new, by dropping in another
send-and-receive delta under a new sub-volume name. And we do it
exactly the same for the OS itself, for a runtime, a framework or an
app. There’s no technical distinction anymore. The underlying
operation for installing apps, runtime, frameworks, vendor OSes, as well
as the operation for updating them is done the exact same way for all.

Of course, keeping multiple full /usr trees around sounds like an
awful lot of waste, after all they will contain a lot of very similar
data, since a lot of resources are shared between distributions,
frameworks and runtimes. However, thankfully btrfs actually is able to
de-duplicate this for us. If we add in a new app snapshot, this simply
adds in the new files that changed. Moreover different runtimes and
operating systems might actually end up sharing the same tree.

Even though the example above focuses primarily on the end-user,
desktop side of things, the concept is also extremely powerful in
server scenarios. For example, it is easy to build your own usr
trees and deliver them to your hosts using this scheme. The usr
sub-volumes are supposed to be something that administrators can put
together. After deserializing them into a couple of hosts, you can
trivially instantiate them as OS containers there, simply by adding a
new root sub-volume for each instance, referencing the usr tree you
just put together. Instantiating OS containers hence becomes as easy
as creating a new btrfs sub-volume. And you can still update the images
nicely, get fully double-buffered updates and everything.

And of course, this scheme also applies great to embedded
use-cases. Regardless if you build a TV, an IVI system or a phone: you
can put together you OS versions as usr trees, and then use
btrfs-send-and-receive facilities to deliver them to the systems, and
update them there.

Many people when they hear the word “btrfs” instantly reply with “is
it ready yet?”. Thankfully, most of the functionality we really need
here is strictly read-only. With the exception of the home
sub-volumes (see below) all snapshots are strictly read-only, and are
delivered as immutable vendor trees onto the devices. They never are
changed. Even if btrfs might still be immature, for this kind of
read-only logic it should be more than good enough.

Note that this scheme also enables doing fat systems: for example,
an installer image could include a Fedora version compiled for x86-64,
one for i386, one for ARM, all in the same btrfs volume. Due to btrfs’
de-duplication they will share as much as possible, and when the image
is booted up the right sub-volume is automatically picked. Something
similar of course applies to the apps too!

This also allows us to implement something that we like to call
Operating-System-As-A-Virus. Installing a new system is little more
than:

  • Creating a new GPT partition table
  • Adding an EFI System Partition (FAT) to it
  • Adding a new btrfs volume to it
  • Deserializing a single usr sub-volume into the btrfs volume
  • Installing a boot loader into the EFI System Partition
  • Rebooting

Now, since the only real vendor data you need is the usr sub-volume,
you can trivially duplicate this onto any block device you want. Let’s
say you are a happy Fedora user, and you want to provide a friend with
his own installation of this awesome system, all on a USB stick. All
you have to do for this is do the steps above, using your installed
usr tree as source to copy. And there you go! And you don’t have to
be afraid that any of your personal data is copied too, as the usr
sub-volume is the exact version your vendor provided you with. Or with
other words: there’s no distinction anymore between installer images
and installed systems. It’s all the same. Installation becomes
replication, not more. Live-CDs and installed systems can be fully
identical.

Note that in this design apps are actually developed against a single,
very specific runtime, that contains all libraries it can link against
(including a specific glibc version!). Any library that is not
included in the runtime the developer picked must be included in the
app itself. This is similar how apps on Android declare one very
specific Android version they are developed against. This greatly
simplifies application installation, as there’s no dependency hell:
each app pulls in one runtime, and the app is actually free to pick
which one, as you can have multiple installed, though only one is used
by each app.

Also note that operating systems built this way will never see
“half-updated” systems, as it is common when a system is updated using
RPM/dpkg. When updating the system the code will either run the old or
the new version, but it will never see part of the old files and part
of the new files. This is the same for apps, runtimes, and frameworks,
too.

Where We Are Now

We are currently working on a lot of the groundwork necessary for
this. This scheme relies on the ability to monopolize the
vendor OS resources in /usr, which is the key of what I described in
Factory Reset, Stateless Systems, Reproducible Systems & Verifiable Systems
a few weeks back. Then, of course, for the full desktop app concept we
need a strong sandbox, that does more than just hiding files from the
file system view. After all with an app concept like the above the
primary interfacing between the executed desktop apps and the rest of the
system is via IPC (which is why we work on kdbus and teach it all
kinds of sand-boxing features), and the kernel itself. Harald Hoyer has
started working on generating the btrfs send-and-receive images based
on Fedora.

Getting to the full scheme will take a while. Currently we have many
of the building blocks ready, but some major items are missing. For
example, we push quite a few problems into btrfs, that other solutions
try to solve in user space. One of them is actually
signing/verification of images. The btrfs maintainers are working on
adding this to the code base, but currently nothing exists. This
functionality is essential though to come to a fully verified system
where a trust chain exists all the way from the firmware to the
apps. Also, to make the home sub-volume scheme fully workable we
actually need encrypted sub-volumes, so that the sub-volume’s
pass-phrase can be used for authenticating users in PAM. This doesn’t
exist either.

Working towards this scheme is a gradual process. Many of the steps we
require for this are useful outside of the grand scheme though, which
means we can slowly work towards the goal, and our users can already
take benefit of what we are working on as we go.

Also, and most importantly, this is not really a departure from
traditional operating systems:

Each app, each OS and each app sees a traditional Unix hierarchy with
/usr, /home, /opt, /var, /etc. It executes in an environment that is
pretty much identical to how it would be run on traditional systems.

There’s no need to fully move to a system that uses only btrfs and
follows strictly this sub-volume scheme. For example, we intend to
provide implicit support for systems that are installed on ext4 or
xfs, or that are put together with traditional packaging tools such as
RPM or dpkg: if the the user tries to install a
runtime/app/framework/os image on a system that doesn’t use btrfs so
far, it can just create a loop-back btrfs image in /var, and push the
data into that. Even us developers will run our stuff like this for a
while, after all this new scheme is not particularly useful for highly
individualized systems, and we developers usually tend to run
systems like that.

Also note that this in no way a departure from packaging systems like
RPM or DEB. Even if the new scheme we propose is used for installing
and updating a specific system, it is RPM/DEB that is used to put
together the vendor OS tree initially. Hence, even in this scheme
RPM/DEB are highly relevant, though not strictly as an end-user tool
anymore, but as a build tool.

So Let’s Summarize Again What We Propose

  • We want a unified scheme, how we can install and update OS images,
    user apps, runtimes and frameworks.

  • We want a unified scheme how you can relatively freely mix OS
    images, apps, runtimes and frameworks on the same system.

  • We want a fully trusted system, where cryptographic verification of
    all executed code can be done, all the way to the firmware, as
    standard feature of the system.

  • We want to allow app vendors to write their programs against very
    specific frameworks, under the knowledge that they will end up being
    executed with the exact same set of libraries chosen.

  • We want to allow parallel installation of multiple OSes and versions
    of them, multiple runtimes in multiple versions, as well as multiple
    frameworks in multiple versions. And of course, multiple apps in
    multiple versions.

  • We want everything double buffered (or actually n-fold buffered), to
    ensure we can reliably update/rollback versions, in particular to
    safely do automatic updates.

  • We want a system where updating a runtime, OS, framework, or OS
    container is as simple as adding in a new snapshot and restarting
    the runtime/OS/framework/OS container.

  • We want a system where we can easily instantiate a number of OS
    instances from a single vendor tree, with zero difference for doing
    this on order to be able to boot it on bare metal/VM or as a
    container.

  • We want to enable Linux to have an open scheme that people can use
    to build app markets and similar schemes, not restricted to a
    specific vendor.

Final Words

I’ll be talking about this at LinuxCon Europe in October. I originally
intended to discuss this at the Linux Plumbers Conference (which I
assumed was the right forum for this kind of major plumbing level
improvement), and at linux.conf.au, but there was no interest in my
session submissions there…

Of course this is all work in progress. These are our current ideas we
are working towards. As we progress we will likely change a number of
things. For example, the precise naming of the sub-volumes might look
very different in the end.

Of course, we are developers of the systemd project. Implementing this
scheme is not just a job for the systemd developers. This is a
reinvention how distributions work, and hence needs great support from
the distributions. We really hope we can trigger some interest by
publishing this proposal now, to get the distributions on board. This
after all is explicitly not supposed to be a solution for one specific
project and one specific vendor product, we care about making this
open, and solving it for the generic case, without cutting corners.

If you have any questions about this, you know how you can reach us
(IRC, mail, G+, …).

The future is going to be awesome!

FUDCON + GNOME.Asia Beijing 2014

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/fudcon-gnomeasia.html

Thanks to the funding from FUDCON I had the chance to attend and
keynote at the combined FUDCON Beijing 2014
and GNOME.Asia 2014 conference in
Beijing, China.

My talk was about systemd’s present and future, what we achieved
and where we are going. In my talk I tried to explain a bit where we
are coming from, and how we changed focus from being purely an init
system, to more being a set of basic building blocks to build an OS
from. Most of the talk I talked about where we still intend to take
systemd, which areas we believe should be covered by systemd, and of
course, also the always difficult question, on where to draw the line
and what clearly is outside of the focus of systemd. The slides of my
talk you find
online
. (No video recording I am aware of, sorry.)

The combined conferences were a lot of fun, and as usual, the best
discussions I had in the hallway track, discussing Linux and
systemd.

A number of pictures of the conference are now
online
. Enjoy!

After the conference I stayed for a few more days in Beijing, doing
a bit of sightseeing. What a fantastic city! The food was amazing, we
tried all kinds of fantastic stuff, from Peking duck, to Bullfrog
Sechuan style. Yummy. And one of those days I am sure I will find the
time to actually sort my photos and put them online, too.

I am really looking forward to the next FUDCON/GNOME.Asia!

Factory Reset, Stateless Systems, Reproducible Systems & Verifiable Systems

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/stateless.html

(Just a small heads-up: I don’t blog as much as I used to, I
nowadays update my Google+
page
a lot more frequently. You might want to subscribe that if
you are interested in more frequent technical updates on what we are
working on.)

In the past weeks we have been working on a couple of features for
systemd
that enable a number of new usecases I’d like to shed some light
on. Taking benefit of the /usr
merge
that a number of distributions have completed we want to
bring runtime behaviour of Linux systems to the next level. With the
/usr merge completed most static vendor-supplied OS data is
found exclusively in /usr, only a few additional bits in
/var and /etc are necessary to make a system
boot. On this we can build to enable a couple of new features:

  1. A mechanism we call Factory Reset shall flush out
    /etc and /var, but keep the vendor-supplied
    /usr, bringing the system back into a well-defined, pristine
    vendor state with no local state or configuration. This functionality
    is useful across the board from servers, to desktops, to embedded
    devices.
  2. A Stateless System goes one step further: a system like
    this never stores /etc or /var on persistent
    storage, but always comes up with pristine vendor state. On systems
    like this every reboot acts as factor reset. This functionality is
    particularly useful for simple containers or systems that boot off the
    network or read-only media, and receive all configuration they need
    during runtime from vendor packages or protocols like DHCP or are
    capable of discovering their parameters automatically from the
    available hardware or periphery.
  3. Reproducible Systems multiply a vendor image into many
    containers or systems. Only local configuration or state is stored
    per-system, while the vendor operating system is pulled in from the
    same, immutable, shared snapshot. Each system hence has its private
    /etc and /var for receiving local configuration,
    however the OS tree in /usr is pulled in via bind mounts (in
    case of containers) or technologies like NFS (in case of physical
    systems), or btrfs snapshots from a golden master image. This is
    particular interesting for containers where the goal is to run
    thousands of container images from the same OS tree. However, it also
    has a number of other usecases, for example thin client systems, which
    can boot the same NFS share a number of times. Furthermore this
    mechanism is useful to implement very simple OS installers, that
    simply unserialize a /usr snapshot into a file system,
    install a boot loader, and reboot.
  4. Verifiable Systems are closely related to stateless
    systems: if the underlying storage technology can cryptographically
    ensure that the vendor-supplied OS is trusted and in a consistent
    state, then it must be made sure that /etc or /var
    are either included in the OS image, or simply unnecessary for booting.

Concepts

A number of Linux-based operating systems have tried to implement
some of the schemes described out above in one way or
another. Particularly interesting are GNOME’s OSTree, CoreOS and Google’s Android and
ChromeOS. They generally found different solutions for the specific
problems you have when implementing schemes like this, sometimes taking
shortcuts that keep only the specific case in mind, and cannot cover
the general purpose. With systemd now being at the core of so many
distributions and deeply involved in bringing up and maintaining the
system we came to the conclusion that we should attempt to add generic
support for setups like this to systemd itself, to open this up for
the general purpose distributions to build on. We decided to focus on
three kinds of systems:

  1. The stateful system, the traditional system as we know it with
    machine-specific /etc, /usr and /var, all
    properly populated.
  2. Startup without a populated /var, but with configured
    /etc. (We will call these volatile systems.)
  3. Startup without either /etc or /var. (We will
    call these stateless systems.)

A factory reset is just a special case of the latter two modes,
where the system boots up without /var and /etc but
the next boot is a normal stateful boot like like the first described
mode. Note that a mode where /etc is flushed, but
/var is not is nothing we intend to cover (why? well, the
user ID question becomes much harder, see below, and we simply saw no
usecase for it worth the trouble).

Problems

Booting up a system without a populated /var is relatively
straight-forward. With a
few lines of tmpfiles configuration
it is possible to populate
/var with its basic structure in a way that is sufficient to
make a system boot cleanly. systemd version 214 and newer ship with
support for this. Of course, support for this scheme in systemd is
only a small part of the solution. While a lot of software
reconstructs the directory hierarchy it needs in /var
automatically, many software does not. In case like this it is
necessary to ship a couple of additional tmpfiles lines that setup up
at boot-time the necessary files or directories in /var to
make the software operate, similar to what RPM or DEB packages would
set up at installation time.

Booting up a system without a populated /etc is a more
difficult task. In /etc we have a lot of configuration bits
that are essential for the system to operate, for example and most
importantly system user and group information in /etc/passwd
and /etc/group. If the system boots up without /etc
there must be a way to replicate the minimal information necessary in
it, so that the system manages to boot up fully.

To make this even more complex, in order to support “offline”
updates of /usr that are replicated into a number of systems
possessing private /etc and /var there needs to be a
way how these directories can be upgraded transparently when
necessary, for example by recreating caches like
/etc/ld.so.cache or adding missing system users to
/etc/passwd on next reboot.

Starting with systemd 215 (yet unreleased, as I type this) we will
ship with a number of features in systemd that make /etc-less
boots functional:

  • A new tool systemd-sysusers as been added. It introduces
    a new drop-in directory /usr/lib/sysusers.d/. Minimal
    descriptions of necessary system users and groups can be placed
    there. Whenever the tool is invoked it will create these users in
    /etc/passwd and /etc/group should they be
    missing. It is only suitable for creating system users and groups, not
    for normal users. It will write to the files directly via the
    appropriate glibc APIs, which is the right thing to do for system
    users. (For normal users no such APIs exist, as the users might be
    stored centrally on LDAP or suchlike, and they are out of focus for
    our usecase.) The major benefit of this tool is that system user
    definition can happen offline: a package simply has to drop in a new
    file to register a user. This makes system user registration
    declarative instead of imperative — which is the way
    how system users are traditionally created from RPM or DEB
    installation scripts. By being declarative it is easy to replicate the
    users on next boot to a number of system instances.

    To make this new
    tool interesting for packaging scripts we make it easy to
    alternatively invoke it during package installation time, thus being a
    good alternative to invocations of useradd -r and
    groupadd -r.

    Some OS designs use a static, fixed user/group list stored in
    /usr as primary database for users/groups, which fixed
    UID/GID mappings. While this works for specific systems, this cannot
    cover the general purpose. As the UID/GID range for system
    users/groups is very small (only containing 998 users and groups on most systems), the
    best has to be made from this space and only UIDs/GIDs necessary on
    the specific system should be allocated. This means allocation has to
    be dynamic and adjust to what is necessary.

    Also note that this tool has
    one very nice feature: in addition to fully dynamic, and fully static
    UID/GID assignment for the users to create, it supports reading
    UID/GID numbers off existing files in /usr, so that vendors
    can make use of setuid/setgid binaries owned by specific users.

  • We also added a default
    user definition list
    which creates the most basic users the system
    and systemd need. Of course, very likely downstream distributions
    might need to alter this default list, add new entries and possibly
    map specific users to particular numeric UIDs.
  • A new condition ConditionNeedsUpdate= has been
    added. With this mechanism it is possible to conditionalize execution
    of services depending on whether /usr is newer than
    /etc or /var. The idea is that various services that
    need to be added into the boot process on upgrades make use of this to
    not delay boot-ups on normal boots, but run as necessary should
    /usr have been update since the last boot. This is
    implemented based on the mtime timestamp of the
    /usr: if the OS has been updated the packaging software
    should touch the directory, thus informing all instances that
    an upgrade of /etc and /var might be necessary.
  • We added a number of service files, that make use of the new
    ConditionNeedsUpdate= switch, and run a couple of services
    after each update. Among them are the aforementiond
    systemd-sysusers tool, as well as services that rebuild the
    udev hardware database, the journal catalog database and the library
    cache in /etc/ld.so.cache.
  • If systemd detects an empty /etc at early boot it will
    now use the unit
    preset
    information to enable all services by default that the
    vendor or packager declared. It will then proceed booting.
  • We added a
    new tmpfiles snippet
    that is able to reconstruct the
    most basic structure of /etc if it is missing.
  • tmpfiles also gained the ability copy entire directory trees into
    place should they be missing. This is particularly useful for copying
    certain essential files or directories into /etc without
    which the system refuses to boot. Currently the most prominent
    candidates for this are /etc/pam.d and
    /etc/dbus-1. In the long run we hope that packages can be
    fixed so that they always work correctly without configuration in
    /etc. Depending on the software this means that they should
    come with compiled-in defaults that just work should their
    configuration file be missing, or that they should fall back to static
    vendor-supplied configuration in /usr that is used whenever
    /etc doesn’t have any configuration. Both the PAM and the
    D-Bus case are probably candidates for the latter. Given that there
    are probably many cases like this we are working with a number of
    folks to introduce a new directory called /usr/share/etc
    (name is not settled yet) to major distributions, that always
    contain the full, original, vendor-supplied configuration of all
    packages. This is very useful here, so that there’s an obvious place
    to copy the original configuration from, but it is also useful
    completely independently as this provides administrators with an easy
    place to diff their own configuration in /etc
    against to see what local changes are in place.
  • We added a new --tmpfs= switch to systemd-nspawn
    to make testing of systems with unpopulated /etc and
    /var easy. For example, to run a fully state-less container, use a command line like this:

    # system-nspawn -D /srv/mycontainer --read-only --tmpfs=/var --tmpfs=/etc -b

    This command line will invoke the container tree stored in
    /srv/mycontainer in a read-only way, but with a (writable)
    tmpfs mounted to /var and /etc. With a very recent
    git snapshot of systemd invoking a Fedora rawhide system should mostly
    work OK, modulo the D-Bus and PAM problems mentioned above. A later
    version of systemd-nspawn is likely to gain a high-level
    switch --mode={stateful|volatile|stateless} that sets
    combines this into simple switches reusing the vocabulary introduced
    earlier.

What’s Next

Pulling this all together we are very close to making boots with
empty /etc and /var on general purpose Linux
operating systems a reality. Of course, while doing the groundwork in
systemd gets us some distance, there’s a lot of work left. Most
importantly: the majority of Linux packages are simply incomptible
with this scheme the way they are currently set up. They do not work
without configuration in /etc or state directories in
/var; they do not drop system user information in
/usr/lib/sysusers.d. However, we believe it’s our job to do
the groundwork, and to start somewhere.

So what does this mean for the next steps? Of course, currently
very little of this is available in any distribution (simply already
because 215 isn’t even released yet). However, this will hopefully
change quickly. As soon as that is accomplished we can start working
on making the other components of the OS work nicely in this
scheme. If you are an upstream developer, please consider making your
software work correctly if /etc and/or /var are not
populated. This means:

  • When you need a state directory in /var and it is missing,
    create it first. If you cannot do that, because you dropped priviliges
    or suchlike, please consider dropping in a tmpfiles snippet that
    creates the directory with the right permissions early at boot, should
    it be missing.
  • When you need configuration files in /etc to work
    properly, consider changing your application to work nicely when these
    files are missing, and automatically fall back to either built-in
    defaults, or to static vendor-supplied configuration files shipped in
    /usr, so that administrators can override configuration in
    /etc but if they don’t the default configuration counts.
  • When you need a system user or group, consider dropping in a file
    into /usr/lib/sysusers.d describing the users. (Currently
    documentation on this is minimal, we will provide more docs on this
    shortly.)

If you are a packager, you can also help on making this all work:

  • Ask upstream to implement what we describe above, possibly even preparing a patch for this.
  • If upstream will not make these changes, then consider dropping in
    tmpfiles snippets that copy the bare minimum of configuration files to
    make your software work from somewhere in /usr into
    /etc.
  • Consider moving from imperative useradd commands in
    packaging scripts, to declarative sysusers files. Ideally,
    this is shipped upstream too, but if that’s not possible then simply
    adding this to packages should be good enough.

Of course, before moving to declarative system user definitions you
should consult with your distribution whether their packaging policy
even allows that. Currently, most distributions will not, so we have
to work to get this changed first.

Anyway, so much about what we have been working on and where we want to take this.

Conclusion

Before we finish, let me stress again why we are doing all
this:

  1. For end-user machines like desktops, tablets or mobile phones, we
    want a generic way to implement factory reset, which the user can make
    use of when the system is broken (saves you support costs), or when he
    wants to sell it and get rid of his private data, and renew that “fresh
    car smell”.
  2. For embedded machines we want a generic way how to reset
    devices. We also want a way how every single boot can be identical to
    a factory reset, in a stateless system design.
  3. For all kinds of systems we want to centralize vendor data in
    /usr so that it can be strictly read-only, and fully
    cryptographically verified as one unit.
  4. We want to enable new kinds of OS installers that simply
    deserialize a vendor OS /usr snapshot into a new file system,
    install a boot loader and reboot, leaving all first-time configuration
    to the next boot.
  5. We want to enable new kinds of OS updaters that build on this, and
    manage a number of vendor OS /usr snapshots in verified states, and
    which can then update /etc and /var simply by
    rebooting into a newer version.
  6. We wanto to scale container setups naturally, by sharing a single
    golden master /usr tree with a large number of instances that
    simply maintain their own private /etc and /var for
    their private configuration and state, while still allowing clean
    updates of /usr.
  7. We want to make thin clients that share /usr across the
    network work by allowing stateless bootups. During all discussions on
    how /usr was to be organized this was fequently mentioned. A
    setup like this so far only worked in very specific cases, with this
    scheme we want to make this work in general case.

Of course, we have no illusions, just doing the groundwork for all
of this in systemd doesn’t make this all a real-life solution
yet. Also, it’s very unlikely that all of Fedora (or any other general
purpose distribution) will support this scheme for all its packages
soon, however, we are quite confident that the idea is convincing,
that we need to start somewhere, and that getting the most core
packages adapted to this shouldn’t be out of reach.

Oh, and of course, the concepts behind this are really not new, we
know that. However, what’s new here is that we try to make them
available in a general purpose OS core, instead of special purpose
systems.

Anyway, let’s get the ball rolling! Late’s make stateless systems a
reality!

And that’s all I have for now. I am sure this leaves a lot of
questions open. If you have any, join us on IRC on #systemd
on freenode or comment on Google+.

Upcoming Events

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/dates.html

You are invited to three events:

Christoph Wickert set up a Fedora 19
Release Party
here in Berlin! Please join us on Tuesday, July
2nd
.

We’ll have another Berlin Open
Source Meetup
on Sunday, July 14th.

And finally, theres’ going to be another systemd
Hackfest
, this time colocated with GUADEC, on Tuesday/Wednesday, August 6th/7th.

See you soon!

GNOME.Asia and LinuxCon Japan

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/asia-2013.html

Two weeks ago I attended GNOME.Asia/Seoul and LinuxCon Japan/Tokyo, thanks
to sponsoring by the GNOME Foundation and the Linux Foundation. At GNOME.Asia I
spoke about Sandboxed
Applications for GNOME
, and at LinuxCon Japan about the first
three years of systemd
. (I think at least the latter one was videotaped,
and recordings might show up on the net eventually). I like to believe both
talks went pretty well, and helped getting the message across to community what
we are working on and what the roadmap for us is, and what we expect from the
various projects, and especially GNOME. However, for me personally the
hallway track was the most interesting part. The personal Q&A regarding
our work on kdbus, cgroups, systemd and related projects where highly
interesting. In fact, at both conferences we had something like impromptu
hackfests on the topics of kdbus and cgroups, with some conferences attendees.
I also enjoyed the opportunity to be on Karen’s upcoming GNOME podcast,
recorded in a session at Gyeongbokgung Palace in Seoul (what better place could
there be for a podcast recording?).

I’d like to thank the GNOME and Linux foundations for sponsoring my attendance to these conferences. I’d especially like to thank the organizers of GNOME.Asia for their perfectly organized conference!

What Are We Breaking Now?

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/brno.html

End of February devconf.cz
took place in Brno, Czech Republic. At the conference Kay Sievers,
Harald Hoyer and I did two presentations about our work on systemd
and about the systemd Journal. These talks were taped and the
recordings are now available online.

First, here’s our talk about What Are We
Breaking Now?
, in which we try to give an overview on what we
are working on currently in the systemd context, and what we expect to
do in the next few months. We cover Predictable Network Interface
Names
, the Boot
Loader Spec
, kdbus, the Apps framework, and more.

And then, I did my second talk about The systemd
Journal
, with a focus on how to practically make use of
journalctl, as a day-to-day tool for administrators (these practical
bits start around 28:40). The commands demoed here are all explained in an earlier blog story of
mine
.

Unfortunately, the audience questions are sometimes hard or
impossible to understand from the videos, and sometimes the text on
the slides is hard to read, but I still believe that the two talks are
quite interesting.

systemd Hackfest!

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/hackfest.html

Hey, you, systemd hacker, Fedora hacker! Listen up! This Thu/Fri is the systemd
Hackfest
in Brno/Czech Rep, right before devconf.cz! On thursday we’ll talk about
(and hack on) all things systemd. And the hackfest friday is going to be a Fedora Activity Day,
so we’ll have a focus on systemd integration into Fedora.

You are invited!

See you in Brno!

The Biggest Myths

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/the-biggest-myths.html

Since we first proposed systemd
for inclusion in the distributions it has been frequently discussed in
many forums, mailing lists and conferences. In these discussions one
can often hear certain myths about systemd, that are repeated over and
over again, but certainly don’t gain any truth by constant
repetition. Let’s take the time to debunk a few of them:

  1. Myth: systemd is monolithic.

    If you build systemd with all configuration options enabled you
    will build 69 individual binaries. These binaries all serve different
    tasks, and are neatly separated for a number of reasons. For example,
    we designed systemd with security in mind, hence most daemons run at
    minimal privileges (using kernel capabilities, for example) and are
    responsible for very specific tasks only, to minimize their security
    surface and impact. Also, systemd parallelizes the boot more than any
    prior solution. This parallization happens by running more processes
    in parallel. Thus it is essential that systemd is nicely split up into
    many binaries and thus processes. In fact, many of these
    binaries[1] are separated out so nicely, that they are very
    useful outside of systemd, too.

    A package involving 69 individual binaries can hardly be called
    monolithic. What is different from prior solutions however,
    is that we ship more components in a single tarball, and maintain them
    upstream in a single repository with a unified release cycle.

  2. Myth: systemd is about speed.

    Yes, systemd is fast (A
    pretty complete userspace boot-up in ~900ms, anyone?
    ), but that’s
    primarily just a side-effect of doing things right. In fact, we
    never really sat down and optimized the last tiny bit of performance
    out of systemd. Instead, we actually frequently knowingly picked the
    slightly slower code paths in order to keep the code more
    readable. This doesn’t mean being fast was irrelevant for us, but
    reducing systemd to its speed is certainly quite a misconception,
    since that is certainly not anywhere near the top of our list of
    goals.

  3. Myth: systemd’s fast boot-up is irrelevant for
    servers.

    That is just completely not true. Many administrators actually are
    keen on reduced downtimes during maintenance windows. In High
    Availability setups it’s kinda nice if the failed machine comes back
    up really fast. In cloud setups with a large number of VMs or
    containers the price of slow boots multiplies with the number of
    instances. Spending minutes of CPU and IO on really slow boots of
    hundreds of VMs or containers reduces your system’s density
    drastically, heck, it even costs you more energy. Slow boots can be
    quite financially expensive. Then, fast booting of containers allows
    you to implement a logic such as socket
    activated containers
    , allowing you to drastically increase the
    density of your cloud system.

    Of course, in many server setups boot-up is indeed irrelevant, but
    systemd is supposed to cover the whole range. And yes, I am aware
    that often it is the server firmware that costs the most time at
    boot-up, and the OS anyways fast compared to that, but well, systemd
    is still supposed to cover the whole range (see above…), and no,
    not all servers have such bad firmware, and certainly not VMs and
    containers, which are servers of a kind, too.[2]

  4. Myth: systemd is incompatible with shell scripts.

    This is entirely bogus. We just don’t use them for the boot
    process, because we believe they aren’t the best tool for that
    specific purpose, but that doesn’t mean systemd was incompatible with
    them. You can easily run shell scripts as systemd services, heck, you
    can run scripts written in any language as systemd services,
    systemd doesn’t care the slightest bit what’s inside your
    executable. Moreover, we heavily use shell scripts for our own
    purposes, for installing, building, testing systemd. And you can stick
    your scripts in the early boot process, use them for normal services,
    you can run them at latest shutdown, there are practically no
    limits.

  5. Myth: systemd is difficult.

    This also is entire non-sense. A systemd platform is actually much
    simpler than traditional Linuxes because it unifies
    system objects and their dependencies as systemd units. The
    configuration file language is very simple, and redundant
    configuration files we got rid of. We provide uniform tools for much
    of the configuration of the system. The system is much less
    conglomerate than traditional Linuxes are. We also have pretty
    comprehensive documentation (all linked
    from the homepage
    ) about pretty much every detail of systemd, and
    this not only covers admin/user-facing interfaces, but also developer
    APIs.

    systemd certainly comes with a learning curve. Everything
    does. However, we like to believe that it is actually simpler to
    understand systemd than a Shell-based boot for most people. Surprised
    we say that? Well, as it turns out, Shell is not a pretty language to
    learn, it’s syntax is arcane and complex. systemd unit files are
    substantially easier to understand, they do not expose a programming
    language, but are simple and declarative by nature. That all said, if
    you are experienced in shell, then yes, adopting systemd will take a
    bit of learning.

    To make learning easy we tried hard to provide the maximum
    compatibility to previous solutions. But not only that, on many
    distributions you’ll find that some of the traditional tools will now
    even tell you — while executing what you are asking for — how you
    could do it with the newer tools instead, in a possibly nicer way.

    Anyway, the take-away is probably that systemd is probably as
    simple as such a system can be, and that we try hard to make it easy
    to learn. But yes, if you know sysvinit then adopting systemd will
    require a bit learning, but quite frankly if you mastered sysvinit,
    then systemd should be easy for you.

  6. Myth: systemd is not modular.

    Not true at all. At compile time you have a number of
    configure switches to select what you want to build, and what
    not. And we
    document
    how you can select in even more detail what you need,
    going beyond our configure switches.

    This modularity is not totally unlike the one of the Linux kernel,
    where you can select many features individually at compile time. If the
    kernel is modular enough for you then systemd should be pretty close,
    too.

  7. Myth: systemd is only for desktops.

    That is certainly not true. With systemd we try to cover pretty
    much the same range as Linux itself does. While we care for desktop
    uses, we also care pretty much the same way for server uses, and
    embedded uses as well. You can bet that Red Hat wouldn’t make it a
    core piece of RHEL7 if it wasn’t the best option for managing services
    on servers.

    People from numerous companies work on systemd. Car manufactureres
    build it into cars, Red Hat uses it for a server operating system, and
    GNOME uses many of its interfaces for improving the desktop. You find
    it in toys, in space telescopes, and in wind turbines.

    Most features I most recently worked on are probably relevant
    primarily on servers, such as container
    support
    , resource
    management
    or the security
    features
    . We cover desktop systems pretty well already, and there
    are number of companies doing systemd development for embedded, some
    even offer consulting services in it.

  8. Myth: systemd was created as result of the NIH syndrome.

    This is not true. Before we began working on systemd we were
    pushing for Canonical’s Upstart to be widely adopted (and Fedora/RHEL
    used it too for a while). However, we eventually came to the
    conclusion that its design was inherently flawed at its core (at least
    in our eyes: most fundamentally, it leaves dependency management to
    the admin/developer, instead of solving this hard problem in code),
    and if something’s wrong in the core you better replace it, rather
    than fix it. This was hardly the only reason though, other things that
    came into play, such as the licensing/contribution agreement mess
    around it. NIH wasn’t one of the reasons, though…[3]

  9. Myth: systemd is a freedesktop.org project.

    Well, systemd is certainly hosted at fdo, but freedesktop.org is
    little else but a repository for code and documentation. Pretty much
    any coder can request a repository there and dump his stuff there (as
    long as it’s somewhat relevant for the infrastructure of free
    systems). There’s no cabal involved, no “standardization” scheme, no
    project vetting, nothing. It’s just a nice, free, reliable place to
    have your repository. In that regard it’s a bit like SourceForge,
    github, kernel.org, just not commercial and without over-the-top
    requirements, and hence a good place to keep our stuff.

    So yes, we host our stuff at fdo, but the implied assumption of
    this myth in that there was a group of people who meet and then agree
    on how the future free systems look like, is entirely bogus.

  10. Myth: systemd is not UNIX.

    There’s certainly some truth in that. systemd’s sources do not
    contain a single line of code originating from original UNIX. However,
    we derive inspiration from UNIX, and thus there’s a ton of UNIX in
    systemd. For example, the UNIX idea of “everything is a file” finds
    reflection in that in systemd all services are exposed at runtime in a
    kernel file system, the cgroupfs. Then, one of the original
    features of UNIX was multi-seat support, based on built-in terminal
    support. Text terminals are hardly the state of the art how you
    interface with your computer these days however. With systemd we
    brought native multi-seat
    support back, but this time with full support for today’s hardware,
    covering graphics, mice, audio, webcams and more, and all that fully
    automatic, hotplug-capable and without configuration. In fact the
    design of systemd as a suite of integrated tools that each have their
    individual purposes but when used together are more than just the sum
    of the parts, that’s pretty much at the core of UNIX philosophy. Then,
    the way our project is handled (i.e. maintaining much of the core OS
    in a single git repository) is much closer to the BSD model (which is
    a true UNIX, unlike Linux) of doing things (where most of the core OS
    is kept in a single CVS/SVN repository) than things on Linux ever
    were.

    Ultimately, UNIX is something different for everybody. For us
    systemd maintainers it is something we derive inspiration from. For
    others it is a religion, and much like the other world religions there
    are different readings and understandings of it. Some define UNIX
    based on specific pieces of code heritage, others see it just as a set
    of ideas, others as a set of commands or APIs, and even others as a
    definition of behaviours. Of course, it is impossible to ever make all
    these people happy.

    Ultimately the question whether something is UNIX or not matters
    very little. Being technically excellent is hardly exclusive to
    UNIX. For us, UNIX is a major influence (heck, the biggest one), but
    we also have other influences. Hence in some areas systemd will be
    very UNIXy, and in others a little bit less.

  11. Myth: systemd is complex.

    There’s certainly some truth in that. Modern computers are complex
    beasts, and the OS running on it will hence have to be complex
    too. However, systemd is certainly not more complex than prior
    implementations of the same components. Much rather, it’s simpler, and
    has less redundancy (see above). Moreover, building a simple OS based
    on systemd will involve much fewer packages than a traditional Linux
    did. Fewer packages makes it easier to build your system, gets rid of
    interdependencies and of much of the different behaviour of every
    component involved.

  12. Myth: systemd is bloated.

    Well, bloated certainly has many different definitions. But in
    most definitions systemd is probably the opposite of bloat. Since
    systemd components share a common code base, they tend to share much
    more code for common code paths. Here’s an example: in a traditional
    Linux setup, sysvinit, start-stop-daemon, inetd, cron, dbus, all
    implemented a scheme to execute processes with various configuration
    options in a certain, hopefully clean environment. On systemd the code
    paths for all of this, for the configuration parsing, as well as the
    actual execution is shared. This means less code, less place for
    mistakes, less memory and cache pressure, and is thus a very good
    thing. And as a side-effect you actually get a ton more functionality
    for it…

    As mentioned above, systemd is also pretty modular. You can choose
    at build time which components you need, and which you don’t
    need. People can hence specifically choose the level of “bloat” they
    want.

    When you build systemd, it only requires three dependencies: glibc,
    libcap and dbus. That’s it. It can make use of more dependencies, but
    these are entirely optional.

    So, yeah, whichever way you look at it, it’s really not
    bloated.

  13. Myth: systemd being Linux-only is not nice to the BSDs.

    Completely wrong. The BSD folks are pretty much uninterested in
    systemd. If systemd was portable, this would change nothing, they
    still wouldn’t adopt it. And the same is true for the other Unixes in
    the world. Solaris has SMF, BSD has their own “rc” system, and they
    always maintained it separately from Linux. The init system is very
    close to the core of the entire OS. And these other operating systems
    hence define themselves among other things by their core
    userspace. The assumption that they’d adopt our core userspace if we
    just made it portable, is completely without any foundation.

  14. Myth: systemd being Linux-only makes it impossible for Debian to adopt it as default.

    Debian supports non-Linux kernels in their distribution. systemd
    won’t run on those. Is that a problem though, and should that hinder
    them to adopt system as default? Not really. The folks who ported
    Debian to these other kernels were willing to invest time in a massive
    porting effort, they set up test and build systems, and patched and
    built numerous packages for their goal. The maintainance of both a
    systemd unit file and a classic init script for the packaged services
    is a negligable amount of work compared to that, especially since
    those scripts more often than not exist already.

  15. Myth: systemd could be ported to other kernels if its maintainers just wanted to.

    That is simply not true. Porting systemd to other kernel is not
    feasible. We just use too many Linux-specific interfaces. For a few
    one might find replacements on other kernels, some features one might
    want to turn off, but for most this is nor really possible. Here’s a
    small, very incomprehensive list: cgroups, fanotify, umount2(),
    /proc/self/mountinfo
    (including notification), /dev/swaps (same),
    udev, netlink,
    the structure of /sys, /proc/$PID/comm,
    /proc/$PID/cmdline, /proc/$PID/loginuid, /proc/$PID/stat,
    /proc/$PID/session, /proc/$PID/exe, /proc/$PID/fd, tmpfs, devtmpfs,
    capabilities, namespaces of all kinds, various prctl()s, numerous
    ioctls,
    the mount() system call and its semantics, selinux, audit,
    inotify, statfs, O_DIRECTORY, O_NOATIME, /proc/$PID/root, waitid(),
    SCM_CREDENTIALS, SCM_RIGHTS, mkostemp(), /dev/input, ...

    And no, if you look at this list and pick out the few where you can
    think of obvious counterparts on other kernels, then think again, and
    look at the others you didn’t pick, and the complexity of replacing
    them.

  16. Myth: systemd is not portable for no reason.

    Non-sense! We use the Linux-specific functionality because we need
    it to implement what we want. Linux has so many features that
    UNIX/POSIX didn’t have, and we want to empower the user with
    them. These features are incredibly useful, but only if they are
    actually exposed in a friendly way to the user, and that’s what we do
    with systemd.

  17. Myth: systemd uses binary configuration files.

    No idea who came up with this crazy myth, but it’s absolutely not
    true. systemd is configured pretty much exclusively via simple text
    files. A few settings you can also alter with the kernel command line
    and via environment variables. There’s nothing binary in its
    configuration (not even XML). Just plain, simple, easy-to-read text
    files.

  18. Myth: systemd is a feature creep.

    Well, systemd certainly covers more ground that it used to. It’s
    not just an init system anymore, but the basic userspace building
    block to build an OS from, but we carefully make sure to keep most of
    the features optional. You can turn a lot off at compile time, and
    even more at runtime. Thus you can choose freely how much feature
    creeping you want.

  19. Myth: systemd forces you to do something.

    systemd is not the mafia. It’s Free Software, you can do with it
    whatever you want, and that includes not using it. That’s pretty much
    the opposite of “forcing”.

  20. Myth: systemd makes it impossible to run syslog.

    Not true, we carefully made sure when we introduced
    the journal
    that all data is also passed on to any syslog daemon
    running. In fact, if something changed, then only that syslog gets
    more complete data now than it got before, since we now cover early
    boot stuff as well as STDOUT/STDERR of any system service.

  21. Myth: systemd is incompatible.

    We try very hard to provide the best possible compatibility with
    sysvinit. In fact, the vast majority of init scripts should work just
    fine on systemd, unmodified. However, there actually are indeed a few
    incompatibilities, but we try to document
    these
    and explain what to do about them. Ultimately every system
    that is not actually sysvinit itself will have a certain amount of
    incompatibilities with it since it will not share the exect same code
    paths.

    It is our goal to ensure that differences between the various
    distributions are kept at a minimum. That means unit files usually
    work just fine on a different distribution than you wrote it on, which
    is a big improvement over classic init scripts which are very hard to
    write in a way that they run on multiple Linux distributions, due to
    numerous incompatibilities between them.

  22. Myth: systemd is not scriptable, because of its D-Bus use.

    Not true. Pretty much every single D-Bus interface systemd provides
    is also available in a command line tool, for example in systemctl,
    loginctl,
    timedatectl,
    hostnamectl,
    localectl
    and suchlike. You can easily call these tools from shell scripts, they
    open up pretty much the entire API from the command line with
    easy-to-use commands.

    That said, D-Bus actually has bindings for almost any scripting
    language this world knows. Even from the shell you can invoke
    arbitrary D-Bus methods with dbus-send
    or gdbus. If
    anything, this improves scriptability due to the good support of D-Bus
    in the various scripting languages.

  23. Myth: systemd requires you to use some arcane configuration
    tools instead of allowing you to edit your configuration files
    directly.

    Not true at all. We offer some configuration tools, and using them
    gets you a bit of additional functionality (for example, command line
    completion for all settings!), but there’s no need at all to use
    them. You can always edit the files in question directly if you wish,
    and that’s fully supported. Of course sometimes you need to explicitly
    reload configuration of some daemon after editing the configuration,
    but that’s pretty much true for most UNIX services.

  24. Myth: systemd is unstable and buggy.

    Certainly not according to our data. We have been monitoring the
    Fedora bug tracker (and some others) closely for a long long time. The
    number of bugs is very low for such a central component of the OS,
    especially if you discount the numerous RFE bugs we track for the
    project. We are pretty good in keeping systemd out of the list of
    blocker bugs of the distribution. We have a relatively fast
    development cycle with mostly incremental changes to keep quality and
    stability high.

  25. Myth: systemd is not debuggable.

    False. Some people try to imply that the shell was a good
    debugger. Well, it isn’t really. In systemd we provide you with actual
    debugging features instead. For example: interactive debugging,
    verbose tracing, the ability to mask any component during boot, and
    more. Also, we provide documentation
    for it
    .

    It’s certainly well debuggable, we needed that for our own
    development work, after all. But we’ll grant you one thing: it uses
    different debugging tools, we believe more appropriate ones for the
    purpose, though.

  26. Myth: systemd makes changes for the changes’ sake.

    Very much untrue. We pretty much exclusively have technical
    reasons for the changes we make, and we explain them in the various
    pieces of documentation, wiki pages, blog articles, mailing list
    announcements. We try hard to avoid making incompatible changes, and
    if we do we try to document the why and how in detail. And if you
    wonder about something, just ask us!

  27. Myth: systemd is a Red-Hat-only project, is private property
    of some smart-ass developers, who use it to push their views to the
    world.

    Not true. Currently, there are 16 hackers with commit powers to the
    systemd git tree. Of these 16 only six are employed by Red Hat. The 10
    others are folks from ArchLinux, from Debian, from Intel, even from
    Canonical, Mandriva, Pantheon and a number of community folks with
    full commit rights. And they frequently commit big stuff, major
    changes. Then, there are 374 individuals with patches in our tree, and
    they too came from a number of different companies and backgrounds,
    and many of those have way more than one patch in the tree. The
    discussions about where we want to take systemd are done in the open,
    on our IRC channel (#systemd on freenode, you are always
    weclome), on our mailing
    list
    , and on public hackfests (such
    as our next one in Brno
    , you are invited). We regularly attend
    various conferences, to collect feedback, to explain what we are doing
    and why, like few others do. We maintain blogs, engage in social
    networks (we actually
    have some pretty interesting content on Google+
    , and our Google+
    Community is pretty alive, too
    .), and try really hard to explain
    the why and the how how we do things, and to listen to feedback and
    figure out where the current issues are (for example, from that
    feedback we compiled this lists of often heard myths about
    systemd…).

    What most systemd contributors probably share is a rough idea how a
    good OS should look like, and the desire to make it happen. However,
    by the very nature of the project being Open Source, and rooted in the
    community systemd is just what people want it to be, and if it’s not
    what they want then they can drive the direction with patches and
    code, and if that’s not feasible, then there are numerous other
    options to use, too, systemd is never exclusive.

    One goal of systemd is to unify the dispersed Linux landscape a
    bit. We try to get rid of many of the more pointless differences of
    the various distributions in various areas of the core OS. As part of
    that we sometimes adopt schemes that were previously used by only one
    of the distributions and push it to a level where it’s the default of
    systemd, trying to gently push everybody towards the same set of basic
    configuration. This is never exclusive though, distributions can
    continue to deviate from that if they wish, however, if they end-up
    using the well-supported default their work becomes much easier and
    they might gain a feature or two. Now, as it turns out, more
    frequently than not we actually adopted schemes that where Debianisms,
    rather than Fedoraisms/Redhatisms as best supported scheme by
    systemd. For example, systems running systemd now generally store
    their hostname in /etc/hostname, something that used to be
    specific to Debian and now is used across distributions.

    One thing we’ll grant you though, we sometimes can be
    smart-asses. We try to be prepared whenever we open our mouth, in
    order to be able to back-up with facts what we claim. That might make
    us appear as smart-asses.

    But in general, yes, some of the more influental contributors of
    systemd work for Red Hat, but they are in the minority, and systemd is
    a healthy, open community with different interests, different
    backgrounds, just unified by a few rough ideas where the trip should
    go, a community where code and its design counts, and certainly not
    company affiliation.

  28. Myth: systemd doesn’t support /usr split from the root directory.

    Non-sense. Since its beginnings systemd supports the
    --with-rootprefix= option to its configure script
    which allows you to tell systemd to neatly split up the stuff needed
    for early boot and the stuff needed for later on. All this logic is
    fully present and we keep it up-to-date right there in systemd’s build
    system.

    Of course, we still don’t think that actually
    booting with /usr unavailable is a good idea
    , but we
    support this just fine in our build system. This won’t fix the
    inherent problems of the scheme that you’ll encounter all across the
    board, but you can’t blame that on systemd, because in systemd we
    support this just fine.

  29. Myth: systemd doesn’t allow your to replace its components.

    Not true, you can turn off and replace pretty much any part of
    systemd, with very few exceptions. And those exceptions (such as
    journald) generally allow you to run an alternative side by side to
    it, while cooperating nicely with it.

  30. Myth: systemd’s use of D-Bus instead of sockets makes it intransparent.

    This claim is already contradictory in itself: D-Bus uses sockets
    as transport, too. Hence whenever D-Bus is used to send something
    around, a socket is used for that too. D-Bus is mostly a standardized
    serialization of messages to send over these sockets. If anything this
    makes it more transparent, since this serialization is well
    documented, understood and there are numerous tracing tools and
    language bindings for it. This is very much unlike the usual
    homegrown protocols the various classic UNIX daemons use to
    communicate locally.

Hmm, did I write I just wanted to debunk a “few” myths? Maybe these
were more than just a few… Anyway, I hope I managed to clear up a
couple of misconceptions. Thanks for your time.

Footnotes

[1] For example, systemd-detect-virt,
systemd-tmpfiles,
systemd-udevd are.

[2] Also, we are trying to do our little part on maybe
making this better. By exposing boot-time performance of the firmware
more prominently in systemd’s boot output we hope to shame the
firmware writers to clean up their stuff.

[3] And anyways, guess which project includes a library “libnih” — Upstart or systemd?[4]

[4] Hint: it’s not systemd!

systemd for Administrators, Part XX

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/socket-activated-containers.html

This is no time for procrastination,
here
is
already the twentieth
installment
of

my ongoing series
on
systemd
for
Administrators:

Socket Activated Internet Services and OS Containers

Socket
Activation
is an important feature of systemd. When
we first
announced
systemd we already tried to make the point how great
socket activation is for increasing parallelization and robustness of
socket services, but also for simplifying the dependency logic of the
boot. In this episode I’d like to explain why socket activation is an
important tool for drastically improving how many services and even
containers you can run on a single system with the same resource
usage. Or in other words, how you can drive up the density of customer
sites on a system while spending less on new hardware.

Socket Activated Internet Services

First, let’s take a step back. What was socket activation again? —
Basically, socket activation simply means that systemd sets up
listening sockets (IP or otherwise) on behalf of your services
(without these running yet), and then starts (activates) the
services as soon as the first connection comes in. Depending on the
technology the services might idle for a while after having processed
the connection and possible follow-up connections before they exit on
their own, so that systemd will again listen on the sockets and
activate the services again the next time they are connected to. For
the client it is not visible whether the service it is interested in
is currently running or not. The service’s IP socket stays continously
connectable, no connection attempt ever fails, and all connects will
be processed promptly.

A setup like this lowers resource usage: as services are only
running when needed they only consume resources when required. Many
internet sites and services can benefit from that. For example, web
site hosters will have noticed that of the multitude of web sites that
are on the Internet only a tiny fraction gets a continous stream of
requests: the huge majority of web sites still needs to be available
all the time but gets requests only very unfrequently. With a scheme
like socket activation you take benefit of this. By hosting many of
these sites on a single system like this and only activating their
services as necessary allows a large degree of over-commit: you can
run more sites on your system than the available resources actually
allow. Of course, one shouldn’t over-commit too much to avoid
contention during peak times.

Socket activation like this is easy to use in systemd. Many modern
Internet daemons already support socket activation out of the box (and
for those which don’t yet it’s not
hard
to add). Together with systemd’s instantiated
units support
it is easy to write a pair of service and socket
templates that then may be instantiated multiple times, once for each
site. Then, (optionally) make use of some of the security
features
of systemd to nicely isolate the customer’s site’s
services from each other (think: each customer’s service should only
see the home directory of the customer, everybody else’s directories
should be invisible), and there you go: you now have a highly scalable
and reliable server system, that serves a maximum of securely
sandboxed services at a minimum of resources, and all nicely done with
built-in technology of your OS.

This kind of setup is already in production use in a number of
companies. For example, the great folks at Pantheon are running their
scalable instant Drupal system on a setup that is similar to this. (In
fact, Pantheon’s David Strauss pioneered this scheme. David, you
rock!)

Socket Activated OS Containers

All of the above can already be done with older versions of
systemd. If you use a distribution that is based on systemd, you can
right-away set up a system like the one explained above. But let’s
take this one step further. With systemd 197 (to be included in Fedora
19), we added support for socket activating not only individual
services, but entire OS containers. And I really have to say it
at this point: this is stuff I am really excited
about. 😉

Basically, with socket activated OS containers, the host’s systemd
instance will listen on a number of ports on behalf of a container,
for example one for SSH, one for web and one for the database, and as
soon as the first connection comes in, it will spawn the container
this is intended for, and pass to it all three sockets. Inside of the
container, another systemd is running and will accept the sockets and
then distribute them further, to the services running inside the
container using normal socket activation. The SSH, web and database
services will only see the inside of the container, even though they
have been activated by sockets that were originally created on the
host! Again, to the clients this all is not visible. That an entire OS
container is spawned, triggered by simple network connection is entirely
transparent to the client side.[1]

The OS containers may contain (as the name suggests) a full
operating system, that might even be a different distribution than is
running on the host. For example, you could run your host on Fedora,
but run a number of Debian containers inside of it. The OS containers
will have their own systemd init system, their own SSH instances,
their own process tree, and so on, but will share a number of other
facilities (such as memory management) with the host.

For now, only systemd’s own trivial container manager, systemd-nspawn
has been updated to support this kind of socket activation. We hope
that libvirt-lxc will
soon gain similar functionality. At this point, let’s see in more
detail how such a setup is configured in systemd using nspawn:

First, please use a tool such as debootstrap or yum’s
--installroot to set up a container OS
tree[2]. The details of that are a bit out-of-focus
for this story, there’s plenty of documentation around how to do
this. Of course, make sure you have systemd v197 installed inside
the container. For accessing the container from the command line,
consider using systemd-nspawn
itself. After you configured everything properly, try to boot it up
from the command line with systemd-nspawn’s -b switch.

Assuming you now have a working container that boots up fine, let’s
write a service file for it, to turn the container into a systemd
service on the host you can start and stop. Let’s create
/etc/systemd/system/mycontainer.service on the host:

[Unit]
Description=My little container

[Service]
ExecStart=/usr/bin/systemd-nspawn -jbD /srv/mycontainer 3
KillMode=process

This service can already be started and stopped via systemctl
start
and systemctl stop. However, there’s no nice way
to actually get a shell prompt inside the container. So let’s add SSH
to it, and even more: let’s configure SSH so that a connection to the
container’s SSH port will socket-activate the entire container. First,
let’s begin with telling the host that it shall now listen on the SSH
port of the container. Let’s create
/etc/systemd/system/mycontainer.socket on the host:

[Unit]
Description=The SSH socket of my little container

[Socket]
ListenStream=23

If we start this unit with systemctl start on the host
then it will listen on port 23, and as soon as a connection comes in
it will activate our container service we defined above. We pick port
23 here, instead of the usual 22, as our host’s SSH is already
listening on that. nspawn virtualizes the process list and the file
system tree, but does not actually virtualize the network stack, hence
we just pick different ports for the host and the various containers
here.

Of course, the system inside the container doesn’t yet know what to
do with the socket it gets passed due to socket activation. If you’d
now try to connect to the port, the container would start-up but the
incoming connection would be immediately closed since the container
can’t handle it yet. Let’s fix that!

All that’s necessary for that is teach SSH inside the container
socket activation. For that let’s simply write a pair of socket and
service units for SSH. Let’s create
/etc/systemd/system/sshd.socket in the container:

[Unit]
Description=SSH Socket for Per-Connection Servers

[Socket]
ListenStream=23
Accept=yes

Then, let’s add the matching SSH service file
/etc/systemd/system/[email protected] in the container:

[Unit]
Description=SSH Per-Connection Server for %I

[Service]
ExecStart=-/usr/sbin/sshd -i
StandardInput=socket

Then, make sure to hook sshd.socket into the
sockets.target so that unit is started automatically when the
container boots up:

ln -s /etc/systemd/system/sshd.socket /etc/systemd/system/sockets.target.wants/

And that’s it. If we now activate mycontainer.socket on
the host, the host’s systemd will bind the socket and we can connect
to it. If we do this, the host’s systemd will activate the container,
and pass the socket in to it. The container’s systemd will then take
the socket, match it up with sshd.socket inside the
container. As there’s still our incoming connection queued on it, it
will then immediately trigger an instance of [email protected],
and we’ll have our login.

And that’s already everything there is to it. You can easily add
additional sockets to listen on to
mycontainer.socket. Everything listed therein will be passed
to the container on activation, and will be matched up as good as
possible with all socket units configured inside the
container. Sockets that cannot be matched up will be closed, and
sockets that aren’t passed in but are configured for listening will be
bound be the container’s systemd instance.

So, let’s take a step back again. What did we gain through all of
this? Well, basically, we can now offer a number of full OS containers
on a single host, and the containers can offer their services without
running continously. The density of OS containers on the host can
hence be increased drastically.

Of course, this only works for kernel-based virtualization, not for
hardware virtualization. i.e. something like this can only be
implemented on systems such as libvirt-lxc or nspawn, but not in
qemu/kvm.

If you have a number of containers set up like this, here’s one
cool thing the journal allows you to do. If you pass -m to
journalctl on the host, it will automatically discover the
journals of all local containers and interleave them on
display. Nifty, eh?

With systemd 197 you have everything to set up your own socket
activated OS containers on-board. However, there are a couple of
improvements we’re likely to add soon: for example, right now even if
all services inside the container exit on idle, the container still
will stay around, and we really should make it exit on idle too, if
all its services exited and no logins are around. As it turns out we
already have much of the infrastructure for this around: we can reuse
the auto-suspend functionality we added for laptops: detecting when a
laptop is idle and suspending it then is a very similar problem to
detecting when a container is idle and shutting it down then.

Anyway, this blog story is already way too long. I hope I haven’t
lost you half-way already with all this talk of virtualization,
sockets, services, different OSes and stuff. I hope this blog story is
a good starting point for setting up powerful highly scalable server
systems. If you want to know more, consult the documentation and drop
by our IRC channel. Thank you!

Footnotes

[1] And BTW, this
is another reason
why fast boot times the way systemd offers them
are actually a really good thing on servers, too.

[2] To make it easy: you need a command line such as yum
--releasever=19 --nogpg --installroot=/srv/mycontainer/ --disablerepo='*'
--enablerepo=fedora install systemd passwd yum fedora-release vim-minimal

to install Fedora, and debootstrap --arch=amd64 unstable
/srv/mycontainer/
to install Debian. Also see the bottom of systemd-nspawn(1).
Also note that auditing is currently broken for containers, and if enabled in
the kernel will cause all kinds of errors in the container. Use
audit=0 on the host’s kernel command line to turn it off.

systemd for Administrators, Part XIX

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/detect-virt.html

Happy new year
2013! Here
is
now the nineteenth
installment
of

my ongoing series
on
systemd
for
Administrators:

Detecting Virtualization

When we started working on systemd
we had a closer look on what the various existing init scripts used on
Linux where actually doing. Among other things we noticed that a
number of them where checking explicitly whether they were running in
a virtualized environment (i.e. in a kvm, VMWare, LXC guest or
suchlike) or not. Some init scripts disabled themselves in such
cases[1], others enabled themselves only in such
cases[2]. Frequently, it would probably have been a better
idea to check for other conditions rather than explicitly checking for
virtualization, but after looking at this from all sides we came to
the conclusion that in many cases explicitly conditionalizing services
based on detected virtualization is a valid thing to do. As a result
we added a new configuration option to systemd that can be used to
conditionalize services this way: ConditionVirtualization;
we also added a small tool that can be used in shell scripts to detect
virtualization: systemd-detect-virt(1);
and finally, we added a minimal bus interface to query this from other
applications.

Detecting whether your code is run inside a virtualized environment
is
actually not that hard
. Depending on what precisely you want to
detect it’s little more than running the CPUID instruction and maybe
checking a few files in /sys and /proc. The
complexity is mostly about knowing the strings to look for, and
keeping this list up-to-date. Currently, the the virtualization
detection code in systemd can detect the following virtualization
systems:

  • Hardware virtualization (i.e. VMs):

    • qemu
    • kvm
    • vmware
    • microsoft
    • oracle
    • xen
    • bochs
  • Same-kernel virtualization (i.e. containers):

Let’s have a look how one may make use if this functionality.

Conditionalizing Units

Adding ConditionVirtualization
to the [Unit] section of a unit file is enough to
conditionalize it depending on which virtualization is used or whether
one is used at all. Here’s an example:

[Unit]
Name=My Foobar Service (runs only only on guests)
ConditionVirtualization=yes

[Service]
ExecStart=/usr/bin/foobard

Instead of specifiying “yes” or “no” it is possible
to specify the ID of a specific virtualization solution (Example:
kvm“, “vmware“, …), or either
container” or “vm” to check whether the kernel is
virtualized or the hardware. Also, checks can be prefixed with an exclamation mark (“!”) to invert a check. For further details see the manual page.

In Shell Scripts

In shell scripts it is easy to check for virtualized systems with
the systemd-detect-virt(1)
tool. Here’s an example:

if systemd-detect-virt -q ; then
        echo "Virtualization is used:" `systemd-detect-virt`
else
        echo "No virtualization is used."
fi

If this tool is run it will return with an exit code of zero
(success) if a virtualization solution has been found, non-zero
otherwise. It will also print a short identifier of the used
virtualization solution, which can be suppressed with
-q. Also, with the -c and -v parameters it is
possible to detect only kernel or only hardware virtualization
environments. For further details see the manual
page
.

In Programs

Whether virtualization is available is also exported on the system bus:

$ gdbus call --system --dest org.freedesktop.systemd1 --object-path /org/freedesktop/systemd1 --method org.freedesktop.DBus.Properties.Get org.freedesktop.systemd1.Manager Virtualization
(<'systemd-nspawn'>,)

This property contains the empty string if no virtualization is
detected. Note that some container environments cannot be detected
directly from unprivileged code. That’s why we expose this property on
the bus rather than providing a library — the bus implicitly solves
the privilege problem quite nicely.

Note that all of this will only ever detect and return information
about the “inner-most” virtualization solution. If you stack
virtualization (“We must go deeper!”) then these interfaces will
expose the one the code is most directly interfacing
with. Specifically that means that if a container solution is used
inside of a VM, then only the container is generally detected and
returned.

Footonotes

[1] For example: running certain device management service in a
container environment that has no access to any physical hardware makes little sense.

[2] For example: some VM solutions work best if certain
vendor-specific userspace components are running that connect the
guest with the host in some way.