Post Syndicated from Lennart Poettering original http://0pointer.net/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html
All posts by Lennart Poettering
ASG! 2019 CfP Re-Opened!
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/asg-2019-cfp-re-opened.html
Due to popular request we have re-opened the Call for Participation
(CFP) for All Systems Go! 2019 for one
day. It will close again TODAY, on 15 of July 2019, midnight Central
European Summit Time! If you missed the deadline so far, we’d like to
invite you to submit your proposals for consideration to the CFP
submission site quickly!
(And yes, this is the last extension, there’s not going to be any
more extensions.)
All Systems Go! is everybody’s favourite low-level Userspace Linux
conference, taking place in Berlin, Germany in September 20-22, 2019.
For more information please visit our conference
website!
Walkthrough for Portable Services in Go
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/walkthrough-for-portable-services-in-go.html
Portable Services Walkthrough (Go Edition)
A few months ago I posted a blog story with a walkthrough of systemd
Portable
Services. The
example service given was written in C, and the image was built with
mkosi
. In this blog story I’d
like to revisit the exercise, but this time focus on a different
aspect: modern programming languages like Go and Rust push users a lot
more towards static linking of libraries than the usual dynamic
linking preferred by C (at least in the way C is used by traditional
Linux distributions).
Static linking means we can greatly simplify image building: if we
don’t have to link against shared libraries during runtime we don’t
have to include them in the portable service image. And that means
pretty much all need for building an image from a Linux distribution
of some kind goes away as we’ll have next to no dependencies that
would require us to rely on a distribution package manager or
distribution packages. In fact, as it turns out, we only need as few
as three files in the portable service image to be fully functional.
So, let’s have a closer look how such an image can be put
together. All of the following is available in this git
repository.
A Simple Go Service
Let’s start with a simple Go service, an HTTP service that simply
counts how often a page from it is requested. Here are the sources:
main.go
— note that I am not a seasoned Go programmer, hence please be
gracious.
The service implements systemd’s socket activation protocol, and thus
can receive bound TCP listener sockets from systemd, using the
$LISTEN_PID
and $LISTEN_FDS
environment variables.
The service will store the counter data in the directory indicated in
the $STATE_DIRECTORY
environment variable, which happens to be an
environment variable current systemd versions set based on the
StateDirectory=
setting in service files.
Two Simple Unit Files
When a service shall be managed by systemd a unit file is
required. Since the service we are putting together shall be socket
activatable, we even have two:
portable-walkthrough-go.service
(the description of the service binary itself) and
portable-walkthrough-go.socket
(the description of the sockets to listen on for the service).
These units are not particularly remarkable: the .service
file
primarily contains the command line to invoke and a StateDirectory=
setting to make sure the service when invoked gets its own private
state directory under /var/lib/
(and the $STATE_DIRECTORY
environment variable is set to the resulting path). The .socket
file
simply lists 8088 as TCP/IP port to listen on.
An OS Description File
OS images (and that includes portable service images) generally should
include an
os-release
file. Usually, that is provided by the distribution. Since we are
building an image without any distribution let’s write our own
version of such a
file. Later
on we can use the portablectl inspect
command to have a look at this
metadata of our image.
Putting it All Together
The four files described above are already every file we need to build
our image. Let’s now put the portable service image together. For that
I’ve written a
Makefile
. It
contains two relevant rules: the first one builds the static binary
from the Go program sources. The second one then puts together a
squashfs
file system combining the following:
- The compiled, statically linked service binary
- The two systemd unit files
- The
os-release
file - A couple of empty directories such as
/proc/
,/sys/
,/dev/
and so on that need to be over-mounted with the respective kernel
API file system. We need to create them as empty directories here
since Linux insists on directories to exist in order to over-mount
them, and since the image we are building is going to be an
immutable read-only image (squashfs
) these directories cannot be
created dynamically when the portable image is mounted. - Two empty files
/etc/resolv.conf
and/etc/machine-id
that can
be over-mounted with the same files from the host.
And that’s already it. After a quick make
we’ll have our portable
service image portable-walkthrough-go.raw
and are ready to go.
Trying it out
Let’s now attach the portable service image to our host system:
# portablectl attach ./portable-walkthrough-go.raw
(Matching unit files with prefix 'portable-walkthrough-go'.)
Created directory /etc/systemd/system.attached.
Created directory /etc/systemd/system.attached/portable-walkthrough-go.socket.d.
Written /etc/systemd/system.attached/portable-walkthrough-go.socket.d/20-portable.conf.
Copied /etc/systemd/system.attached/portable-walkthrough-go.socket.
Created directory /etc/systemd/system.attached/portable-walkthrough-go.service.d.
Written /etc/systemd/system.attached/portable-walkthrough-go.service.d/20-portable.conf.
Created symlink /etc/systemd/system.attached/portable-walkthrough-go.service.d/10-profile.conf → /usr/lib/systemd/portable/profile/default/service.conf.
Copied /etc/systemd/system.attached/portable-walkthrough-go.service.
Created symlink /etc/portables/portable-walkthrough-go.raw → /home/lennart/projects/portable-walkthrough-go/portable-walkthrough-go.raw.
The portable service image is now attached to the host, which means we
can now go and start it (or even enable it):
# systemctl start portable-walkthrough-go.socket
Let’s see if our little web service works, by doing an HTTP request on port 8088:
# curl localhost:8088
Hello! You are visitor #1!
Let’s try this again, to check if it counts correctly:
# curl localhost:8088
Hello! You are visitor #2!
Nice! It worked. Let’s now stop the service again, and detach the image again:
# systemctl stop portable-walkthrough-go.service portable-walkthrough-go.socket
# portablectl detach portable-walkthrough-go
Removed /etc/systemd/system.attached/portable-walkthrough-go.service.
Removed /etc/systemd/system.attached/portable-walkthrough-go.service.d/10-profile.conf.
Removed /etc/systemd/system.attached/portable-walkthrough-go.service.d/20-portable.conf.
Removed /etc/systemd/system.attached/portable-walkthrough-go.service.d.
Removed /etc/systemd/system.attached/portable-walkthrough-go.socket.
Removed /etc/systemd/system.attached/portable-walkthrough-go.socket.d/20-portable.conf.
Removed /etc/systemd/system.attached/portable-walkthrough-go.socket.d.
Removed /etc/portables/portable-walkthrough-go.raw.
Removed /etc/systemd/system.attached.
And there we go, the portable image file is detached from the host again.
A Couple of Notes
Of course, this is a simplistic example: in real life services will
be more than one compiled file, even when statically linked. But
you get the idea, and it’s very easy to extend the example above to
include any additional, auxiliary files in the portable service
image.The service is very nicely sandboxed during runtime: while it runs
as regular service on the host (and you thus can watch its logs or
do resource management on it like you would do for all other
systemd services), it runs in a very restricted environment under a
dynamically assigned UID that ceases to exist when the service is
stopped again.Originally I wanted to make the service not only socket activatable
but also implement exit-on-idle, i.e. add a logic so that the
service terminates on its own when there’s no ongoing HTTP
connection for a while. I couldn’t figure out how to do this
race-freely in Go though, but I am sure an interested reader might
want to add that? By combining socket activation with exit-on-idle
we can turn this project into an excercise of putting together an
extremely resource-friendly and robust service architecture: the
service is started only when needed and terminates when no longer
needed. This would allow to pack services at a much higher density
even on systems with few resources.While the basic concepts of portable services have been around
since systemd 239, it’s best to try the above with systemd 241 or
newer since the portable service logic received a number of fixes
since then.
Further Reading
A low-level document introducing Portable Services is shipped along
with systemd.
Please have a look at the blog story from a few months
ago
that did something very similar with a service written in C.
There are also relevant manual pages:
portablectl(1)
and
systemd-portabled(8)
.
ASG! 2018 Tickets
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/asg-2018-tickets.html
Buy your tickets for All Systems Go!
2018 soon, they are quickly selling out!
The conference takes place on September 28-30, in Berlin, Germany, in
a bit over two weeks.
Why should you attend? If you are interested in low-level Linux
userspace, then All Systems Go! is the right conference for you. It
covers all topics relevant to foundational open-source Linux
technologies. For details on the covered topics see our schedule for day #1
and for day #2.
For more information please visit our conference
website!
See you in Berlin!
ASG! 2018 CfP Closes TODAY
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/asg-2018-cfp-closes-today.html
The Call for Participation (CFP) for All Systems Go!
2018 will close TODAY, on 30th of
July! We’d like to invite you to submit your proposals for
consideration to the CFP submission
site quickly!
All Systems Go! is everybody’s favourite low-level Userspace Linux
conference, taking place in Berlin, Germany in September 28-30, 2018.
For more information please visit our conference
website!
ASG! 2018 CfP Closes Soon
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/asg-2018-cfp-closes-soon.html
The Call for Participation (CFP) for All Systems Go!
2018 will close in one week, on 30th of
July! We’d like to invite you to submit your proposals for
consideration to the CFP submission
site quickly!
Notification of acceptance and non-acceptance will go out within 7
days of the closing of the CFP.
All topics relevant to foundational open-source Linux technologies are
welcome. In particular, however, we are looking for proposals
including, but not limited to, the following topics:
- Low-level container executors and infrastructure
- IoT and embedded OS infrastructure
- BPF and eBPF filtering
- OS, container, IoT image delivery and updating
- Building Linux devices and applications
- Low-level desktop technologies
- Networking
- System and service management
- Tracing and performance measuring
- IPC and RPC systems
- Security and Sandboxing
While our focus is definitely more on the user-space side of things,
talks about kernel projects are welcome, as long as they have a clear
and direct relevance for user-space.
For more information please visit our conference
website!
Walkthrough for Portable Services
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/walkthrough-for-portable-services.html
Portable Services with systemd v239
systemd
v239
contains a great number of new features. One of them is first class
support for Portable
Services. In
this blog story I’d like to shed some light on what they are and why
they might be interesting for your application.
What are “Portable Services”?
The “Portable Service” concept takes inspiration from classic
chroot()
environments as well as container management and brings a
number of their features to more regular system service management.
While the definition of what a “container” really is is hotly debated,
I figure people can generally agree that the “container” concept
primarily provides two major features:
Resource bundling: a container generally brings its own file system
tree along, bundling any shared libraries and other resources it
might need along with the main service executables.Isolation and sand-boxing: a container operates in a name-spaced
environment that is relatively detached from the host. Besides
living in its own file system namespace it usually also has its own
user database, process tree and so on. Access from the container to
the host is limited with various security technologies.
Of these two concepts the first one is also what traditional UNIX
chroot()
environments are about.
Both resource bundling and isolation/sand-boxing are concepts systemd
has implemented to varying degrees for a longer time. Specifically,
RootDirectory=
and
RootImage=
have been around for a long time, and so have been the various
sand-boxing
features
systemd provides. The Portable Services concept builds on that,
putting these features together in a new, integrated way to make them
more accessible and usable.
OK, so what precisely is a “Portable Service”?
Much like a container image, a portable service on disk can be just a
directory tree that contains service executables and all their
dependencies, in a hierarchy resembling the normal Linux directory
hierarchy. A portable service can also be a raw disk image, containing
a file system containing such a tree (which can be mounted via a
loop-back block device), or multiple file systems (in which case they
need to follow the Discoverable Partitions
Specification
and be located within a GPT partition table). Regardless whether the
portable service on disk is a simple directory tree or a raw disk
image, let’s call this concept the portable service image.
Such images can be generated with any tool typically used for the
purpose of installing OSes inside some directory, for example dnf
or
--installroot=debootstrap
. There are very few requirements made
on these trees, except the following two:
The tree should carry systemd unit
files
for relevant services in them.The tree should carry
/usr/lib/os-release
(or/etc/os-release
) OS release information.
Of course, as you might notice, OS trees generated from any of today’s
big distributions generally qualify for these two requirements without
any further modification, as pretty much all of them adopted
/usr/lib/os-release
and tend to ship their major services with
systemd unit files.
A portable service image generated like this can be “attached” or
“detached” from a host:
“Attaching” an image to a host is done through the new
portablectl
attach
command. This command dissects the image, reading theos-release
information, and searching for unit files in them. It then copies
relevant unit files out of the images and into
/etc/systemd/system/
. After that it augments any copied service
unit files in two ways: a drop-in adding aRootDirectory=
or
RootImage=
line is added in so that even though the unit files
are now available on the host when started they run the referenced
binaries from the image. It also symlinks in a second drop-in which
is called a “profile”, which is supposed to carry additional
security settings to enforce on the attached services, to ensure
the right amount of sand-boxing.“Detaching” an image from the host is done through
portable
. It reverses the steps above: the unit files copied out are
detach
removed again, and so are the two drop-in files generated for them.
While a portable service is attached its relevant unit files are made
available on the host like any others: they will appear in systemctl
, you can enable and disable them, you can start them
list-unit-files
and stop them. You can extend them with systemctl edit
. You can
introspect them. You can apply resource management to them like to any
other service, and you can process their logs like any other service
and so on. That’s because they really are native systemd services,
except that they have ‘twist’ if you so will: they have tougher
security by default and store their resources in a root directory or
image.
And that’s already the essence of what Portable Services are.
A couple of interesting points:
Even though the focus is on shipping service unit files in
portable service images, you can actually ship timer units, socket
units, target units, path units in portable services too. This
means you can very naturally do time, socket and path based
activation. It’s also entirely fine to ship multiple service units
in the same image, in case you have more complex applications.This concept introduces zero new metadata. Unit files are an
existing concept, as areos-release
files, and — in case you opt
for raw disk images — GPT partition tables are already established
too. This also means existing tools to generate images can be
reused for building portable service images to a large degree as no
completely new artifact types need to be generated.Because the Portable Service concepts introduces zero new metadata
and just builds on existing security and resource bundling
features of systemd it’s implemented in a set of distinct tools,
relatively disconnected from the rest of systemd. Specifically, the
main user-facing command is
portablectl
,
and the actual operations are implemented in
systemd-portabled.service
. If
you so will, portable services are a true add-on to systemd, just
making a specific work-flow nicer to use than with the basic
operations systemd otherwise provides. Also note that
systemd-portabled
provides bus APIs accessible to any program
that wants to interface with it,portablectl
is just one tool
that happens to be shipped along with systemd.Since Portable Services are a feature we only added very recently
we wanted to keep some freedom to make changes still. Due to that
we decided to install theportablectl
command into
/usr/lib/systemd/
for now, so that it does not appear in$PATH
by default. This means, for now you have to invoke it with a full
path:/usr/lib/systemd/portablectl
. We expect to move it into
/usr/bin/
very soon though, and make it a fully supported
interface of systemd.You may wonder which unit files contained in a portable service
image are the ones considered “relevant” and are actually copied
out by theportablectl attach
operation. Currently, this is
derived from the image name. Let’s say you have an image stored in
a directory/var/lib/portables/foobar_4711/
(or alternatively in
a raw image/var/lib/portables/foobar_4711.raw
). In that case the
unit files copied out match the patternfoobar*.service
,
foobar*.socket
,foobar*.target
,foobar*.path
,
foobar*.timer
.The Portable Services concept does not define any specific method
how images get on the deployment machines, that’s entirely up to
administrators. You can justscp
them there, orwget
them. You
could even package them as RPMs and then deploy them withdnf
if
you feel adventurous.Portable service images can reside in any directory you
like. However, if you place them in/var/lib/portables/
then
portablectl
will find them easily and can show you a list of
images you can attach and suchlike.Attaching a portable service image can be done persistently, so
that it remains attached on subsequent boots (which is the default),
or it can be attached only until the next reboot, by passing
--runtime
toportablectl
.Because portable service images are ultimately just regular OS
images, it’s natural and easy to build a single image that can be
used in three different ways:It can be attached to any host as a portable service image.
It can be booted as OS container, for example in a container
manager likesystemd-nspawn
.It can be booted as host system, for example on bare metal or
in a VM manager.
Of course, to qualify for the latter two the image needs to
contain more than just the service binaries, theos-release
file
and the unit files. To be bootable an OS container manager such as
systemd-nspawn
the image needs to contain an init system of some
form, for example
systemd
. To
be bootable on bare metal or as VM it also needs a boot loader of
some form, for example
systemd-boot
.
Profiles
In the previous section the “profile” concept was briefly
mentioned. Since they are a major feature of the Portable Services
concept, they deserve some focus. A “profile” is ultimately just a
pre-defined drop-in file for unit files that are attached to a
host. They are supposed to mostly contain sand-boxing and security
settings, but may actually contain any other settings, too. When a
portable service is attached a suitable profile has to be selected. If
none is selected explicitly, the default profile called default
is
used. systemd ships with four different profiles out of the box:
The
default
profile provides a medium level of security. It contains settings to
drop capabilities, enforce system call filters, restrict many kernel
interfaces and mount various file systems read-only.The
strict
profile is similar to thedefault
profile, but generally uses the
most restrictive sand-boxing settings. For example networking is turned
off and access toAF_NETLINK
sockets is prohibited.The
trusted
profile is the least strict of them all. In fact it makes almost no
restrictions at all. A service run with this profile has basically
full access to the host system.The
nonetwork
profile is mostly identical todefault
, but also turns off network access.
Note that the profile is selected at the time the portable service
image is attached, and it applies to all service files attached, in
case multiple are shipped in the same image. Thus, the sand-boxing
restriction to enforce are selected by the administrator attaching the
image and not the image vendor.
Additional profiles can be defined easily by the administrator, if
needed. We might also add additional profiles sooner or later to be
shipped with systemd out of the box.
What’s the use-case for this? If I have containers, why should I bother?
Portable Services are primarily intended to cover use-cases where code
should more feel like “extensions” to the host system rather than live
in disconnected, separate worlds. The profile concept is
supposed to be tunable to the exact right amount of integration or
isolation needed for an application.
In the container world the concept of “super-privileged containers”
has been touted a lot, i.e. containers that run with full
privileges. It’s precisely that use-case that portable services are
intended for: extensions to the host OS, that default to isolation,
but can optionally get as much access to the host as needed, and can
naturally take benefit of the full functionality of the host. The
concept should hence be useful for all kinds of low-level system
software that isn’t shipped with the OS itself but needs varying
degrees of integration with it. Besides servers and appliances this
should be particularly interesting for IoT and embedded devices.
Because portable services are just a relatively small extension to the
way system services are otherwise managed, they can be treated like
regular service for almost all use-cases: they will appear along
regular services in all tools that can introspect systemd unit data,
and can be managed the same way when it comes to logging, resource
management, runtime life-cycles and so on.
Portable services are a very generic concept. While the original
use-case is OS extensions, it’s of course entirely up to you and other
users to use them in a suitable way of your choice.
Walkthrough
Let’s have a look how this all can be used. We’ll start with building
a portable service image from scratch, before we attach, enable and
start it on a host.
Building a Portable Service image
As mentioned, you can use any tool you like that can create OS trees
or raw images for building Portable Service images, for example
debootstrap
or dnf --installroot=
. For this example walkthrough
run we’ll use mkosi
, which is
ultimately just a fancy wrapper around dnf
and debootstrap
but
makes a number of things particularly easy when repetitively building
images from source trees.
I have pushed everything necessary to reproduce this walkthrough
locally to a GitHub
repository. Let’s check it out:
$ git clone https://github.com/systemd/portable-walkthrough.git
Let’s have a look in the repository:
First of all,
walkthroughd.c
is the main source file of our little service. To keep things
simple it’s written in C, but it could be in any language of your
choice. The daemon as implemented won’t do much: it just starts up
and waits forSIGTERM
, at which point it will shut down. It’s
ultimately useless, but hopefully illustrates how this all fits
together. The C code has no dependencies besides libc.walkthroughd.service
is a systemd unit file that starts our little daemon. It’s a simple
service, hence the unit file is trivial.Makefile
is a short make build script to build the daemon binary. It’s
pretty trivial, too: it just takes the C file and builds a binary
from it. It can also install the daemon. It places the binary in
/usr/local/lib/walkthroughd/walkthroughd
(why not in
/usr/local/bin
? because it’s not a user-facing binary but a system
service binary), and its unit file in
/usr/local/lib/systemd/walkthroughd.service
. If you want to test
the daemon on the host we can now simply runmake
and then
./walkthroughd
in order to check everything works.mkosi.default
is file that tellsmkosi
how to build the image. We opt for a
Fedora-based image here (but we might as well have used Debian
here, or any other supported distribution). We need no particular
packages during runtime (after all we only depend on libc), but
during the build phase we need gcc and make, hence these are the
only packages we list inBuildPackages=
.mkosi.build
is a shell script that is invoked during mkosi’s build logic. All
it does is invokemake
andmake install
to build and install
our little daemon, and afterwards it extends the
distribution-supplied/etc/os-release
file with an additional
field that describes our portable service a bit.
Let’s now use this to build the portable service image. For that we
use the mkosi tool. It’s
sufficient to invoke it without parameter to build the first image: it
will automatically discover mkosi.default
and mkosi.build
which
tells it what to do. (Note that if you work on a project like this for
a longer time, mkosi -if
is probably the better command to use, as
it that speeds up building substantially by using an incremental build
mode). mkosi
will download the necessary RPMs, and put them all
together. It will build our little daemon inside the image and after
all that’s done it will output the resulting image:
walkthroughd_1.raw
.
Because we opted to build a GPT raw disk image in mkosi.default
this
file is actually a raw disk image containing a GPT partition
table. You can use fdisk -l walkthroughd_1.raw
to enumerate the
partition table. You can also use systemd-nspawn -i
to explore the image quickly if you need.
walkthroughd_1.raw
Using the Portable Service Image
Now that we have a portable service image, let’s see how we can
attach, enable and start the service included within it.
First, let’s attach the image:
# /usr/lib/systemd/portablectl attach ./walkthroughd_1.raw
(Matching unit files with prefix 'walkthroughd'.)
Created directory /etc/systemd/system/walkthroughd.service.d.
Written /etc/systemd/system/walkthroughd.service.d/20-portable.conf.
Created symlink /etc/systemd/system/walkthroughd.service.d/10-profile.conf → /usr/lib/systemd/portable/profile/default/service.conf.
Copied /etc/systemd/system/walkthroughd.service.
Created symlink /etc/portables/walkthroughd_1.raw → /home/lennart/projects/portable-walkthrough/walkthroughd_1.raw.
The command will show you exactly what is has been doing: it just
copied the main service file out, and added the two drop-ins, as
expected.
Let’s see if the unit is now available on the host, just like a regular unit, as promised:
# systemctl status walkthroughd.service
● walkthroughd.service - A simple example service
Loaded: loaded (/etc/systemd/system/walkthroughd.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/walkthroughd.service.d
└─10-profile.conf, 20-portable.conf
Active: inactive (dead)
Nice, it worked. We see that the unit file is available and that
systemd correctly discovered the two drop-ins. The unit is neither
enabled nor started however. Yes, attaching a portable service image
doesn’t imply enabling nor starting. It just means the unit files
contained in the image are made available to the host. It’s up to the
administrator to then enable them (so that they are automatically
started when needed, for example at boot), and/or start them (in case
they shall run right-away).
Let’s now enable and start the service in one step:
# systemctl enable --now walkthroughd.service
Created symlink /etc/systemd/system/multi-user.target.wants/walkthroughd.service → /etc/systemd/system/walkthroughd.service.
Let’s check if it’s running:
# systemctl status walkthroughd.service
● walkthroughd.service - A simple example service
Loaded: loaded (/etc/systemd/system/walkthroughd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/walkthroughd.service.d
└─10-profile.conf, 20-portable.conf
Active: active (running) since Wed 2018-06-27 17:55:30 CEST; 4s ago
Main PID: 45003 (walkthroughd)
Tasks: 1 (limit: 4915)
Memory: 4.3M
CGroup: /system.slice/walkthroughd.service
└─45003 /usr/local/lib/walkthroughd/walkthroughd
Jun 27 17:55:30 sigma walkthroughd[45003]: Initializing.
Perfect! We can see that the service is now enabled and running. The daemon is running as PID 45003.
Now that we verified that all is good, let’s stop, disable and detach the service again:
# systemctl disable --now walkthroughd.service
Removed /etc/systemd/system/multi-user.target.wants/walkthroughd.service.
# /usr/lib/systemd/portablectl detach ./walkthroughd_1.raw
Removed /etc/systemd/system/walkthroughd.service.
Removed /etc/systemd/system/walkthroughd.service.d/10-profile.conf.
Removed /etc/systemd/system/walkthroughd.service.d/20-portable.conf.
Removed /etc/systemd/system/walkthroughd.service.d.
Removed /etc/portables/walkthroughd_1.raw.
And finally, let’s see that it’s really gone:
# systemctl status walkthroughd
Unit walkthroughd.service could not be found.
Perfect! It worked!
I hope the above gets you started with Portable Services. If you have
further questions, please contact our mailing
list.
Further Reading
A more low-level document explaining details is shipped
along with systemd.
There are also relevant manual pages:
portablectl(1)
and
systemd-portabled(8)
.
For further information about mkosi
see its homepage.
All Systems Go! 2018 CfP Open
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/all-systems-go-2018-cfp-open.html
The Call for Participation (CFP) for All Systems Go!
2018 is now open. We’d like to invite you
to submit your proposals for consideration to the CFP submission
site.
The CFP will close on July 30th. Notification of acceptance and
non-acceptance will go out within 7 days of the closing of the CFP.
All topics relevant to foundational open-source Linux technologies are
welcome. In particular, however, we are looking for proposals
including, but not limited to, the following topics:
- Low-level container executors and infrastructure
- IoT and embedded OS infrastructure
- BPF and eBPF filtering
- OS, container, IoT image delivery and updating
- Building Linux devices and applications
- Low-level desktop technologies
- Networking
- System and service management
- Tracing and performance measuring
- IPC and RPC systems
- Security and Sandboxing
While our focus is definitely more on the user-space side of things,
talks about kernel projects are welcome, as long as they have a clear
and direct relevance for user-space.
For more information please visit our conference
website!
All Systems Go! 2017 Videos Online!
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/all-systems-go-2017-videos-online.html
For those living under a rock, the videos from everybody’s favourite
Userspace Linux Conference All Systems Go!
2017 are now available online.
The videos for my own two talks are available here:
Synchronizing Images with
casync
(Slides)
Containers without a Container Manager, with
systemd
(Slides)
Of course, this is the stellar work of the CCC
VOC folks, who are hard to beat when it comes to
videotaping of community conferences.
Attending and Speaking at GNOME.Asia 2017 Summit
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/attending-and-speaking-at-gnomeasia-2017-summit.html
The GNOME.Asia Summit 2017 organizers
invited to me to speak at their conference in Chongqing/China, and it
was an excellent event! Here’s my brief report:
Because we arrived one day early in Chongqing, my GNOME friends Sri,
Matthias, Jonathan, David and I started our journey with an excursion
to the Dazu Rock
Carvings, a short
bus trip from Chongqing, and an excellent (and sometimes quite
surprising) sight. I mean, where else can you see a buddha with 1000+
hands, and centuries old, holding a cell Nexus 5 cell phone? Here’s
proof:
The GNOME.Asia schedule was excellent, with various good talks,
including some about Flatpak, Endless OS, rpm-ostree, Blockchains and
more. My own talk was about The Path to a Fully Protected GNOME
Desktop OS Image (Slides available
here). In the
hallway track I did my best to advocate
casync to whoever was willing to
listen, and I think enough were ;-). As we all know attending
conferences is at least as much about the hallway track as about the
talks, and GNOME.Asia was a fantastic way to meet the Chinese GNOME
and Open Source communities.
The day after the conference the organizers of GNOME.Asia organized a
Chongqing day trip. A particular highlight was the ubiqutious hot pot,
sometimes with the local speciality: fresh pig brain.
Here some random photos from the trip: sights, food, social event and
more.
I’d like to thank the GNOME Foundation for funding my trip to
GNOME.Asia. And that’s all for now. But let me close with an old
chinese wisdom:
The Trials Of A Long Journey Always Feeling, Civilized Travel Pass Reputation.
IP Accounting and Access Lists with systemd
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/ip-accounting-and-access-lists-with-systemd.html
TL;DR: systemd now can do per-service IP traffic accounting, as well
as access control for IP address ranges.
Last Friday we released systemd
235. I
already blogged about its Dynamic User feature in
detail, but
there’s one more piece of new functionality that I think deserves special
attention: IP accounting and access control.
Before v235 systemd already provided per-unit resource management
hooks for a number of different kinds of resources: consumed CPU time,
disk I/O, memory usage and number of tasks. With v235 another kind of
resource can be controlled per-unit with systemd: network traffic
(specifically IP).
Three new unit file settings have been added in this context:
IPAccounting=
is a boolean setting. If enabled for a unit, all IP
traffic sent and received by processes associated with it is counted
both in terms of bytes and of packets.IPAddressDeny=
takes an IP address prefix (that means: an IP
address with a network mask). All traffic from and to this address will be
prohibited for processes of the service.IPAddressAllow=
is the matching positive counterpart to
IPAddressDeny=
. All traffic matching this IP address/network mask
combination will be allowed, even if otherwise listed in
IPAddressDeny=
.
The three options are thin wrappers around kernel functionality
introduced with Linux 4.11: the control group eBPF hooks. The actual
work is done by the kernel, systemd just provides a number of new
settings to configure this facet of it. Note that cgroup/eBPF is
unrelated to classic Linux firewalling,
i.e. NetFilter/iptables
. It’s up to you whether you use one or the
other, or both in combination (or of course neither).
IP Accounting
Let’s have a closer look at the IP accounting logic mentioned
above. Let’s write a simple unit
/etc/systemd/system/ip-accounting-test.service
:
[Service]
ExecStart=/usr/bin/ping 8.8.8.8
IPAccounting=yes
This simple unit invokes the
ping(8) command to
send a series of ICMP/IP ping packets to the IP address 8.8.8.8 (which
is the Google DNS server IP; we use it for testing here, since it’s
easy to remember, reachable everywhere and known to react to ICMP
pings; any other IP address responding to pings would be fine to use,
too). The IPAccounting=
option is used to turn on IP accounting for
the unit.
Let’s start this service after writing the file. Let’s then have a
look at the status output of systemctl
:
# systemctl daemon-reload
# systemctl start ip-accounting-test
# systemctl status ip-accounting-test
● ip-accounting-test.service
Loaded: loaded (/etc/systemd/system/ip-accounting-test.service; static; vendor preset: disabled)
Active: active (running) since Mon 2017-10-09 18:05:47 CEST; 1s ago
Main PID: 32152 (ping)
IP: 168B in, 168B out
Tasks: 1 (limit: 4915)
CGroup: /system.slice/ip-accounting-test.service
└─32152 /usr/bin/ping 8.8.8.8
Okt 09 18:05:47 sigma systemd[1]: Started ip-accounting-test.service.
Okt 09 18:05:47 sigma ping[32152]: PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
Okt 09 18:05:47 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=29.2 ms
Okt 09 18:05:48 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=2 ttl=59 time=28.0 ms
This shows the ping
command running — it’s currently at its second
ping cycle as we can see in the logs at the end of the output. More
interesting however is the IP:
line further up showing the current
IP byte counters. It currently shows 168 bytes have been received, and
168 bytes have been sent. That the two counters are at the same value
is not surprising: ICMP ping requests and responses are supposed to
have the same size. Note that this line is shown only if
IPAccounting=
is turned on for the service, as only then this data
is collected.
Let’s wait a bit, and invoke systemctl status
again:
# systemctl status ip-accounting-test
● ip-accounting-test.service
Loaded: loaded (/etc/systemd/system/ip-accounting-test.service; static; vendor preset: disabled)
Active: active (running) since Mon 2017-10-09 18:05:47 CEST; 4min 28s ago
Main PID: 32152 (ping)
IP: 22.2K in, 22.2K out
Tasks: 1 (limit: 4915)
CGroup: /system.slice/ip-accounting-test.service
└─32152 /usr/bin/ping 8.8.8.8
Okt 09 18:10:07 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=260 ttl=59 time=27.7 ms
Okt 09 18:10:08 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=261 ttl=59 time=28.0 ms
Okt 09 18:10:09 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=262 ttl=59 time=33.8 ms
Okt 09 18:10:10 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=263 ttl=59 time=48.9 ms
Okt 09 18:10:11 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=264 ttl=59 time=27.2 ms
Okt 09 18:10:12 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=265 ttl=59 time=27.0 ms
Okt 09 18:10:13 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=266 ttl=59 time=26.8 ms
Okt 09 18:10:14 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=267 ttl=59 time=27.4 ms
Okt 09 18:10:15 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=268 ttl=59 time=29.7 ms
Okt 09 18:10:16 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=269 ttl=59 time=27.6 ms
As we can see, after 269 pings the counters are much higher: at 22K.
Note that while systemctl status
shows only the byte counters,
packet counters are kept as well. Use the low-level systemctl show
command to query the current raw values of the in and out packet and
byte counters:
# systemctl show ip-accounting-test -p IPIngressBytes -p IPIngressPackets -p IPEgressBytes -p IPEgressPackets
IPIngressBytes=37776
IPIngressPackets=449
IPEgressBytes=37776
IPEgressPackets=449
Of course, the same information is also available via the D-Bus
APIs. If you want to process this data further consider talking proper
D-Bus, rather than scraping the output of systemctl show
.
Now, let’s stop the service again:
# systemctl stop ip-accounting-test
When a service with such accounting turned on terminates, a log line
about all its consumed resources is written to the logs. Let’s check
with journalctl
:
# journalctl -u ip-accounting-test -n 5
-- Logs begin at Thu 2016-08-18 23:09:37 CEST, end at Mon 2017-10-09 18:17:02 CEST. --
Okt 09 18:15:50 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=603 ttl=59 time=26.9 ms
Okt 09 18:15:51 sigma ping[32152]: 64 bytes from 8.8.8.8: icmp_seq=604 ttl=59 time=27.2 ms
Okt 09 18:15:52 sigma systemd[1]: Stopping ip-accounting-test.service...
Okt 09 18:15:52 sigma systemd[1]: Stopped ip-accounting-test.service.
Okt 09 18:15:52 sigma systemd[1]: ip-accounting-test.service: Received 49.5K IP traffic, sent 49.5K IP traffic
The last line shown is the interesting one, that shows the accounting
data. It’s actually a structured log message, and among its metadata
fields it contains the more comprehensive raw data:
# journalctl -u ip-accounting-test -n 1 -o verbose
-- Logs begin at Thu 2016-08-18 23:09:37 CEST, end at Mon 2017-10-09 18:18:50 CEST. --
Mon 2017-10-09 18:15:52.649028 CEST [s=89a2cc877fdf4dafb2269a7631afedad;i=14d7;b=4c7e7adcba0c45b69d612857270716d3;m=137592e75e;t=55b1f81298605;x=c3c9b57b28c9490e]
PRIORITY=6
_BOOT_ID=4c7e7adcba0c45b69d612857270716d3
_MACHINE_ID=e87bfd866aea4ae4b761aff06c9c3cb3
_HOSTNAME=sigma
SYSLOG_FACILITY=3
SYSLOG_IDENTIFIER=systemd
_UID=0
_GID=0
_TRANSPORT=journal
_PID=1
_COMM=systemd
_EXE=/usr/lib/systemd/systemd
_CAP_EFFECTIVE=3fffffffff
_SYSTEMD_CGROUP=/init.scope
_SYSTEMD_UNIT=init.scope
_SYSTEMD_SLICE=-.slice
CODE_FILE=../src/core/unit.c
_CMDLINE=/usr/lib/systemd/systemd --switched-root --system --deserialize 25
_SELINUX_CONTEXT=system_u:system_r:init_t:s0
UNIT=ip-accounting-test.service
CODE_LINE=2115
CODE_FUNC=unit_log_resources
MESSAGE_ID=ae8f7b866b0347b9af31fe1c80b127c0
INVOCATION_ID=98a6e756fa9d421d8dfc82b6df06a9c3
IP_METRIC_INGRESS_BYTES=50880
IP_METRIC_INGRESS_PACKETS=605
IP_METRIC_EGRESS_BYTES=50880
IP_METRIC_EGRESS_PACKETS=605
MESSAGE=ip-accounting-test.service: Received 49.6K IP traffic, sent 49.6K IP traffic
_SOURCE_REALTIME_TIMESTAMP=1507565752649028
The interesting fields of this log message are of course
IP_METRIC_INGRESS_BYTES=
, IP_METRIC_INGRESS_PACKETS=
,
IP_METRIC_EGRESS_BYTES=
, IP_METRIC_EGRESS_PACKETS=
that show the
consumed data.
The log message carries a message
ID
that may be used to quickly search for all such resource log messages
(ae8f7b866b0347b9af31fe1c80b127c0
). We can combine a search term for
messages of this ID with journalctl
‘s -u
switch to quickly find
out about the resource usage of any invocation of a specific
service. Let’s try:
# journalctl -u ip-accounting-test MESSAGE_ID=ae8f7b866b0347b9af31fe1c80b127c0
-- Logs begin at Thu 2016-08-18 23:09:37 CEST, end at Mon 2017-10-09 18:25:27 CEST. --
Okt 09 18:15:52 sigma systemd[1]: ip-accounting-test.service: Received 49.6K IP traffic, sent 49.6K IP traffic
Of course, the output above shows only one message at the moment,
since we started the service only once, but a new one will appear
every time you start and stop it again.
The IP accounting logic is also hooked up with
systemd-run
,
which is useful for transiently running a command as systemd service
with IP accounting turned on. Let’s try it:
# systemd-run -p IPAccounting=yes --wait wget https://cfp.all-systems-go.io/en/ASG2017/public/schedule/2.pdf
Running as unit: run-u2761.service
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 878ms
IP traffic received: 231.0K
IP traffic sent: 3.7K
This uses wget
to download the
PDF version of the schedule of everybody’s favorite Linux user-space
conference All Systems Go! 2017 (BTW,
have you already booked your ticket? We are very close to selling out,
be quick!). The IP traffic this command generated was 231K ingress and
4K egress. In the systemd-run
command line two parameters are
important. First of all, we use -p IPAccounting=yes
to turn on IP
accounting for the transient service (as above). And secondly we use
--wait
to tell systemd-run
to wait for the service to exit. If
--wait
is used, systemd-run
will also show you various statistics
about the service that just ran and terminated, including the IP
statistics you are seeing if IP accounting has been turned on.
It’s fun to combine this sort of IP accounting with interactive
transient units. Let’s try that:
# systemd-run -p IPAccounting=1 -t /bin/sh
Running as unit: run-u2779.service
Press ^] three times within 1s to disconnect TTY.
sh-4.4# dnf update
…
sh-4.4# dnf install firefox
…
sh-4.4# exit
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 5.297s
IP traffic received: …B
IP traffic sent: …B
This uses systemd-run
‘s --pty
switch (or short: -t
), which opens
an interactive pseudo-TTY connection to the invoked service process,
which is a bourne shell in this case. Doing this means we have a full,
comprehensive shell with job control and everything. Since the shell
is running as part of a service with IP accounting turned on, all IP
traffic we generate or receive will be accounted for. And as soon as
we exit the shell, we’ll see what it consumed. (For the sake of
brevity I actually didn’t paste the whole output above, but truncated
core parts. Try it out for yourself, if you want to see the output in
full.)
Sometimes it might make sense to turn on IP accounting for a unit that
is already running. For that, use systemctl set-property
, which will instantly turn on
foobar.service IPAccounting=yes
accounting for it. Note that it won’t count retroactively though: only
the traffic sent/received after the point in time you turned it on
will be collected. You may turn off accounting for the unit with the
same command.
Of course, sometimes it’s interesting to collect IP accounting data
for all services, and turning on IPAccounting=yes
in every single
unit is cumbersome. To deal with that there’s a global option
DefaultIPAccounting=
available which can be set in /etc/systemd/system.conf
.
IP Access Lists
So much about IP accounting. Let’s now have a look at IP access
control with systemd 235. As mentioned above, the two new unit file
settings, IPAddressAllow=
and IPAddressDeny=
maybe be used for
that. They operate in the following way:
If the source address of an incoming packet or the destination
address of an outgoing packet matches one of the IP addresses/network
masks in the relevant unit’sIPAddressAllow=
setting then it will be
allowed to go through.Otherwise, if a packet matches an
IPAddressDeny=
entry configured
for the service it is dropped.If the packet matches neither of the above it is allowed to go
through.
Or in other words, IPAddressDeny=
implements a blacklist, but
IPAddressAllow=
takes precedence.
Let’s try that out. Let’s modify our last example above in order to
get a transient service running an interactive shell which has such an
access list set:
# systemd-run -p IPAddressDeny=any -p IPAddressAllow=8.8.8.8 -p IPAddressAllow=127.0.0.0/8 -t /bin/sh
Running as unit: run-u2850.service
Press ^] three times within 1s to disconnect TTY.
sh-4.4# ping 8.8.8.8 -c1
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=27.9 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 27.957/27.957/27.957/0.000 ms
sh-4.4# ping 8.8.4.4 -c1
PING 8.8.4.4 (8.8.4.4) 56(84) bytes of data.
ping: sendmsg: Operation not permitted
^C
--- 8.8.4.4 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
sh-4.4# ping 127.0.0.2 -c1
PING 127.0.0.1 (127.0.0.2) 56(84) bytes of data.
64 bytes from 127.0.0.2: icmp_seq=1 ttl=64 time=0.116 ms
--- 127.0.0.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.116/0.116/0.116/0.000 ms
sh-4.4# exit
The access list we set up uses IPAddressDeny=any
in order to define
an IP white-list: all traffic will be prohibited for the session,
except for what is explicitly white-listed. In this command line, we
white-listed two address prefixes: 8.8.8.8 (with no explicit network
mask, which means the mask with all bits turned on is implied,
i.e. /32
), and 127.0.0.0/8. Thus, the service can communicate with
Google’s DNS server and everything on the local loop-back, but nothing
else. The commands run in this interactive shell show this: First we
try pinging 8.8.8.8 which happily responds. Then, we try to ping
8.8.4.4 (that’s Google’s other DNS server, but excluded from this
white-list), and as we see it is immediately refused with an Operation
not permitted error. As last step we ping 127.0.0.2 (which is on the
local loop-back), and we see it works fine again, as expected.
In the example above we used IPAddressDeny=any
. The any
identifier is a shortcut for writing 0.0.0.0/0 ::/0, i.e. it’s a
shortcut for everything, on both IPv4 and IPv6. A number of other
such shortcuts exist. For example, instead of spelling out
127.0.0.0/8
we could also have used the more descriptive shortcut
localhost
which is expanded to 127.0.0.0/8 ::1/128, i.e. everything
on the local loopback device, on both IPv4 and IPv6.
Being able to configure IP access lists individually for each unit is
pretty nice already. However, typically one wants to configure this
comprehensively, not just for individual units, but for a set of units
in one go or even the system as a whole. In systemd, that’s possible
by making use of
.slice
units (for those who don’t know systemd that well, slice units are a
concept for organizing services in hierarchical tree for the purpose of
resource management): the IP access list in effect for a unit is the
combination of the individual IP access lists configured for the unit
itself and those of all slice units it is contained in.
By default, system services are assigned to
system.slice
,
which in turn is a child of the root slice
-.slice
. Either
of these two slice units are hence suitable for locking down all
system services at once. If an access list is configured on
system.slice
it will only apply to system services, however, if
configured on -.slice
it will apply to all user processes of the
system, including all user session processes (i.e. which are by
default assigned to user.slice
which is a child of -.slice
) in
addition to the system services.
Let’s make use of this:
# systemctl set-property system.slice IPAddressDeny=any IPAddressAllow=localhost
# systemctl set-property apache.service IPAddressAllow=10.0.0.0/8
The two commands above are a very powerful way to first turn off all
IP communication for all system services (with the exception of
loop-back traffic), followed by an explicit white-listing of
10.0.0.0/8 (which could refer to the local company network, you get
the idea) but only for the Apache service.
Use-cases
After playing around a bit with this, let’s talk about use-cases. Here
are a few ideas:
The IP access list logic can in many ways provide a more modern
replacement for the venerable TCP
Wrapper, but unlike it it
applies to all IP sockets of a service unconditionally, and requires
no explicit support in any way in the service’s code: no patching
required. On the other hand, TCP wrappers have a number of features
this scheme cannot cover, most importantly systemd’s IP access lists
operate solely on the level of IP addresses and network masks, there
is no way to configure access by DNS name (though quite frankly, that
is a very dubious feature anyway, as doing networking — unsecured
networking even – in order to restrict networking sounds quite
questionable, at least to me).It can also replace (or augment) some facets of IP firewalling,
i.e. Linux NetFilter/iptables
. Right now, systemd’s access lists are
of course a lot more minimal than NetFilter, but they have one major
benefit: they understand the service concept, and thus are a lot more
context-aware than NetFilter. Classic firewalls, such as NetFilter,
derive most service context from the IP port number alone, but we live
in a world where IP port numbers are a lot more dynamic than they used
to be. As one example, a BitTorrent client or server may use any IP
port it likes for its file transfer, and writing IP firewalling rules
matching that precisely is hence hard. With the systemd IP access list
implementing this is easy: just set the list for your BitTorrent
service unit, and all is good.Let me stress though that you should be careful when comparing
NetFilter with systemd’s IP address list logic, it’s really like
comparing apples and oranges: to start with, the IP address list
logic has a clearly local focus, it only knows what a local
service is and manages access of it. NetFilter on the other hand
may run on border gateways, at a point where the traffic flowing
through is pure IP, carrying no information about a systemd unit
concept or anything like that.It’s a simple way to lock down distribution/vendor supplied system
services by default. For example, if you ship a service that you know
never needs to access the network, then simply setIPAddressDeny=any
(possibly combined withIPAddressAllow=localhost
) for it, and it
will live in a very tight networking sand-box it cannot escape
from. systemd itself makes use of this for a number of its services by
default now. For example, the logging service
systemd-journald.service
, the login managersystemd-logind
or the
core-dump processing unit[email protected]
all have such a
rule set out-of-the-box, because we know that neither of these
services should be able to access the network, under any
circumstances.Because the IP access list logic can be combined with transient
units, it can be used to quickly and effectively sandbox arbitrary
commands, and even include them in shell pipelines and such. For
example, let’s say we don’t trust ourcurl
implementation (maybe it
got modified locally by a hacker, and phones home?), but want to use
it anyway to download the the slides of my most recent casync
talk in order to
print it, but want to make sure it doesn’t connect anywhere except
where we tell it to:# systemd-resolve 0pointer.de 0pointer.de: 85.214.157.71 2a01:238:43ed:c300:10c3:bcf3:3266:da74 -- Information acquired via protocol DNS in 2.8ms. -- Data is authenticated: no # systemd-run --pipe -p IPAddressDeny=any \ -p IPAddressAllow=85.214.157.71 \ -p IPAddressAllow=2a01:238:43ed:c300:10c3:bcf3:3266:da74 \ curl http://0pointer.de/public/casync-kinvolk2017.pdf | lp
So much about use-cases. This is by no means a comprehensive list of
what you can do with it, after all both IP accounting and IP access
lists are very generic concepts. But I do hope the above inspires your
fantasy.
What does that mean for packagers?
IP accounting and IP access control are primarily concepts for the
local administrator. However, As suggested above, it’s a very good
idea to ship services that by design have no network-facing
functionality with an access list of IPAddressDeny=any
(and possibly
IPAddressAllow=localhost
), in order to improve the out-of-the-box
security of our systems.
An option for security-minded distributions might be a more radical
approach: ship the system with -.slice
or system.slice
configured
to IPAddressDeny=any
by default, and ask the administrator to punch
holes into that for each network facing service with systemctl
. But of course, that’s only an
set-property … IPAddressAllow=…
option for distributions willing to break compatibility with what was
before.
Notes
A couple of additional notes:
IP accounting and access lists may be mixed with socket
activation. In this case, it’s a good idea to configure access lists
and accounting for both the socket unit that activates and the service
unit that is activated, as both units maintain fully separate
settings. Note that IP accounting and access lists configured on the
socket unit applies to all sockets created on behalf of that unit, and
even if these sockets are passed on to the activated services, they
will still remain in effect and belong to the socket unit. This also
means that IP traffic done on such sockets will be accounted to the
socket unit, not the service unit. The fact that IP access lists are
maintained separately for the kernel sockets created on behalf of the
socket unit and for the kernel sockets created by the service code
itself enables some interesting uses. For example, it’s possible to
set a relatively open access list on the socket unit, but a very
restrictive access list on the service unit, thus making the sockets
configured through the socket unit the only way in and out of the
service.systemd’s IP accounting and access lists apply to IP sockets only,
not to sockets of any other address families. That also means that
AF_PACKET
(i.e. raw) sockets are not covered. This means it’s a good
idea to combine IP access lists withRestrictAddressFamilies=AF_UNIX
AF_INET
AF_INET6
in order to lock this down.You may wonder if the per-unit resource log message and
systemd-run --wait
may also show you details about other types or
resources consumed by a service. The answer is yes: if you turn on
CPUAccounting=
for a service, you’ll also see a summary of consumed
CPU time in the log message and the command output. And we are
planning to hook-upIOAccounting=
the same way too, soon.Note that IP accounting and access lists aren’t entirely
free. systemd inserts an eBPF program into the IP pipeline to make
this functionality work. However, eBPF execution has been optimized
for speed in the last kernel versions already, and given that it
currently is in the focus of interest to many I’d expect to be
optimized even further, so that the cost for enabling these features
will be negligible, if it isn’t already.IP accounting is currently not recursive. That means you cannot use
a slice unit to join the accounting of multiple units into one. This
is something we definitely want to add, but requires some more kernel
work first.You might wonder how the
PrivateNetwork=
setting relates toIPAccessDeny=any
. Superficially they have similar
effects: they make the network unavailable to services. However,
looking more closely there are a number of
differences.PrivateNetwork=
is implemented using Linux network
name-spaces. As such it entirely detaches all networking of a service
from the host, including non-IP networking. It does so by creating a
private little environment the service lives in where communication
with itself is still allowed though. In addition using the
JoinsNamespaceOf=
dependency additional services may be added to the same environment,
thus permitting communication with each other but not with anything
outside of this group.IPAddressAllow=
andIPAddressDeny=
are much
less invasive. First of all they apply to IP networking only, and can
match against specific IP addresses. A service running with
PrivateNetwork=
turned off butIPAddressDeny=any
turned on, may
enumerate the network interfaces and their IP configured even though
it cannot actually do any IP communication. On the other hand if you
turn onPrivateNetwork=
all network interfaces besideslo
disappear. Long story short: depending on your use-case one, the other,
both or neither might be suitable for sand-boxing of your service. If
possible I’d always turn on both, for best security, and that’s what
we do for all of systemd’s own long-running services.
And that’s all for now. Have fun with per-unit IP accounting and
access lists!
All Systems Go! 2017 Schedule Published
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/all-systems-go-2017-schedule-published.html
I am happy to announce that we have published the All Systems Go! 2017 schedule!
We are very happy with the large number and the quality of the
submissions we got, and the resulting schedule is exceptionally
strong.
Without further ado:
Here’s the schedule for the first day (Saturday, 21st of October).
And here’s the schedule for the second day (Sunday, 22nd of October).
Here are a couple of keywords from the topics of the talks:
1password, azure, bluetooth, build systems,
casync, cgroups, cilium, cockpit, containers,
ebpf, flatpak, habitat, IoT, kubernetes,
landlock, meson, OCI, rkt, rust, secureboot,
skydive, systemd, testing, tor, varlink,
virtualization, wifi, and more.
Our speakers are from all across the industry: Chef CoreOS, Covalent,
Facebook, Google, Intel, Kinvolk, Microsoft, Mozilla, Pantheon,
Pengutronix, Red Hat, SUSE and more.
For further information about All Systems Go! visit our conference web site.
Make sure to buy your ticket for All Systems Go! 2017 now! A limited
number of tickets are left at this point, so make sure you get yours
before we are all sold out! Find all details here.
See you in Berlin!
All Systems Go! 2017 CfP Closes Soon!
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/all-systems-go-2017-cfp-closes-soon.html
Please make sure to get your presentation proprosals forAll Systems Go! 2017 in now! The CfP closes on sunday!
In case you haven’t heard about All Systems Go! yet, here’s a quick reminder what kind of conference it is, and why you should attend and speak there:
All Systems Go! is an Open Source community conference focused
on the projects and technologies at the foundation of modern Linux
systems — specifically low-level user-space technologies. Its goal is
to provide a friendly and collaborative gathering place for
individuals and communities working to push these technologies
forward. All Systems Go! 2017 takes place in Berlin,
Germany on October 21st+22nd. All Systems Go! is a
2-day event with 2-3 talks happening in parallel. Full presentation
slots are 30-45 minutes in length and lightning talk slots are 5-10
minutes.
In particular, we are looking for sessions including, but not limited to, the following topics:
- Low-level container executors and infrastructure
- IoT and embedded OS infrastructure
- OS, container, IoT image delivery and updating
- Building Linux devices and applications
- Low-level desktop technologies
- Networking
- System and service management
- Tracing and performance measuring
- IPC and RPC systems
- Security and Sandboxing
While our focus is definitely more on the user-space side of things,
talks about kernel projects are welcome too, as long as they have a
clear and direct relevance for user-space.
To submit your proposal now please visit our CFP submission web site.
For further information about All Systems Go! visit our conference web site.
systemd.conf will not take place this year in lieu of All
Systems Go!. All Systems Go! welcomes all projects that
contribute to Linux user space, which, of course, includes
systemd. Thus, anything you think was appropriate for submission to
systemd.conf is also fitting for All Systems Go!
All Systems Go! 2017 Speakers
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/all-systems-go-2017-speakers.html
Don’t forget to send in your submissions to the All Systems Go! 2017 CfP! Proposals are accepted until September 3rd!
A couple of headline speakers have been announced now:
- Alban Crequy (Kinvolk)
- Brian “Redbeard” Harrington (CoreOS)
- Gianluca Borello (Sysdig)
- Jon Boulle (NStack/CoreOS)
- Martin Pitt (Debian)
- Thomas Graf (covalent.io/Cilium)
- Vincent Batts (Red Hat/OCI)
- (and yours truly)
These folks will also review your submissions as part of the papers committee!
All Systems Go! is an Open Source community conference focused on the projects and technologies at the foundation of modern Linux systems — specifically low-level user-space technologies. Its goal is to provide a friendly and collaborative gathering place for individuals and communities working to push these technologies forward.
All Systems Go! 2017 takes place in Berlin, Germany on October 21st+22nd.
To submit your proposal now please visit our CFP submission web site.
For further information about All Systems Go! visit our conference web site.
casync Video
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/casync-video.html
Video of my casync Presentation @ kinvolk
The great folks at kinvolk have uploaded a
video of my casync presentation at their offices last
week.
The slides are
available as well.
Enjoy!
mkosi — A Tool for Generating OS Images
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/mkosi-a-tool-for-generating-os-images.html
Introducing mkosi
After blogging about
casync
I realized I never blogged about the
mkosi
tool that combines nicely
with it. mkosi
has been around for a while already, and its time to
make it a bit better known. mkosi
stands for Make Operating System
Image, and is a tool for precisely that: generating an OS tree or
image that can be booted.
Yes, there are many tools like mkosi
, and a number of them are quite
well known and popular. But mkosi
has a number of features that I
think make it interesting for a variety of use-cases that other tools
don’t cover that well.
What is mkosi?
What are those use-cases, and what does mkosi
precisely set apart?
mkosi
is definitely a tool with a focus on developer’s needs for
building OS images, for testing and debugging, but also for generating
production images with cryptographic protection. A typical use-case
would be to add a mkosi.default
file to an existing project (for
example, one written in C or Python), and thus making it easy to
generate an OS image for it. mkosi
will put together the image with
development headers and tools, compile your code in it, run your test
suite, then throw away the image again, and build a new one, this time
without development headers and tools, and install your build
artifacts in it. This final image is then “production-ready”, and only
contains your built program and the minimal set of packages you
configured otherwise. Such an image could then be deployed with
casync
(or any other tool of course) to be delivered to your set of
servers, or IoT devices or whatever you are building.
mkosi
is supposed to be legacy-free: the focus is clearly on
today’s technology, not yesteryear’s. Specifically this means that
we’ll generate GPT partition tables, not MBR/DOS ones. When you tell
mkosi
to generate a bootable image for you, it will make it bootable
on EFI, not on legacy BIOS. The GPT images generated follow
specifications such as the Discoverable Partitions
Specification,
so that /etc/fstab
can remain unpopulated and tools such as
systemd-nspawn
can automatically dissect the image and boot from
them.
So, let’s have a look on the specific images it can generate:
- Raw GPT disk image, with ext4 as root
- Raw GPT disk image, with btrfs as root
- Raw GPT disk image, with a read-only squashfs as root
- A plain directory on disk containing the OS tree directly (this is useful for creating generic container images)
- A btrfs subvolume on disk, similar to the plain directory
- A tarball of a plain directory
When any of the GPT choices above are selected, a couple of additional
options are available:
- A swap partition may be added in
- The system may be made bootable on EFI systems
- Separate partitions for
/home
and/srv
may be added in - The root,
/home
and/srv
partitions may be optionally encrypted with LUKS - The root partition may be protected using
dm-verity
, thus making offline attacks on the generated system hard - If the image is made bootable, the
dm-verity
root hash is automatically added to the kernel command line, and the kernel together with its initial RAM disk and the kernel command line is optionally cryptographically signed for UEFI SecureBoot
Note that mkosi
is distribution-agnostic. It currently can build
images based on the following Linux distributions:
- Fedora
- Debian
- Ubuntu
- ArchLinux
- openSUSE
Note though that not all distributions are supported at the same
feature level currently. Also, as mkosi
is based on dnf
,
--installrootdebootstrap
, pacstrap
and zypper
, and those
packages are not packaged universally on all distributions, you might
not be able to build images for all those distributions on arbitrary
host distributions. For example, Fedora doesn’t package zypper
,
hence you cannot build an openSUSE image easily on Fedora, but you can
still build Fedora (obviously…), Debian, Ubuntu and ArchLinux images
on it just fine.
The GPT images are put together in a way that they aren’t just
compatible with UEFI systems, but also with VM and container managers
(that is, at least the smart ones, i.e. VM managers that know UEFI,
and container managers that grok GPT disk images) to a large
degree. In fact, the idea is that you can use mkosi
to build a
single GPT image that may be used to:
- Boot on bare-metal boxes
- Boot in a VM
- Boot in a
systemd-nspawn
container - Directly run a systemd service off, using systemd’s
RootImage=
unit file setting
Note that in all four cases the dm-verity
data is automatically used
if available to ensure the image is not tempered with (yes, you read
that right, systemd-nspawn
and systemd’s RootImage=
setting
automatically do dm-verity
these days if the image has it.)
Mode of Operation
The simplest usage of mkosi
is by simply invoking it without
parameters (as root):
# mkosi
Without any configuration this will create a GPT disk image for you,
will call it image.raw
and drop it in the current directory. The
distribution used will be the same one as your host runs.
Of course in most cases you want more control about how the image is
put together, i.e. select package sets, select the distribution, size
partitions and so on. Most of that you can actually specify on the
command line, but it is recommended to instead create a couple of
mkosi.$SOMETHING
files and directories in some directory. Then,
simply change to that directory and run mkosi
without any further
arguments. The tool will then look in the current working directory
for these files and directories and make use of them (similar to how
make
looks for a Makefile
…). Every single file/directory is
optional, but if they exist they are honored. Here’s a list of the
files/directories mkosi
currently looks for:
mkosi.default
— This is the main configuration file, here you
can configure what kind of image you want, which distribution, which
packages and so on.mkosi.extra/
— If this directory exists, thenmkosi
will copy
everything inside it into the images built. You can place arbitrary
directory hierarchies in here, and they’ll be copied over whatever is
already in the image, after it was put together by the distribution’s
package manager. This is the best way to drop additional static files
into the image, or override distribution-supplied ones.mkosi.build
— This executable file is supposed to be a build
script. When it exists,mkosi
will build two images, one after the
other in the mode already mentioned above: the first version is the
build image, and may include various build-time dependencies such as
a compiler or development headers. The build script is also copied
into it, and then run inside it. The script should then build
whatever shall be built and place the result in$DESTDIR
(don’t
worry, popular build tools such as Automake or Meson all honor
$DESTDIR
anyway, so there’s not much to do here explicitly). It may
also run a test suite, or anything else you like. After the script
finished, the build image is removed again, and a second image (the
final image) is built. This time, no development packages are
included, and the build script is not copied into the image again —
however, the build artifacts from the first run (i.e. those placed in
$DESTDIR
) are copied into the image.mkosi.postinst
— If this executable script exists, it is invoked
inside the image (inside asystemd-nspawn
invocation) and can
adjust the image as it likes at a very late point in the image
preparation. Ifmkosi.build
exists, i.e. the dual-phased
development build process used, then this script will be invoked
twice: once inside the build image and once inside the final
image. The first parameter passed to the script clarifies which phase
it is run in.mkosi.nspawn
— If this file exists, it should contain a
container configuration file forsystemd-nspawn
(see
systemd.nspawn(5)
for details), which shall be shipped along with the final image and
shall be included in the check-sum calculations (see below).mkosi.cache/
— If this directory exists, it is used as package
cache directory for the builds. This directory is effectively bind
mounted into the image at build time, in order to speed up building
images. The package installers of the various distributions will
place their package files here, so that subsequent runs can reuse
them.mkosi.passphrase
— If this file exists, it should contain a
pass-phrase to use for the LUKS encryption (if that’s enabled for the
image built). This file should not be readable to other users.mkosi.secure-boot.crt
andmkosi.secure-boot.key
should be an
X.509 key pair to use for signing the kernel and initrd for UEFI
SecureBoot, if that’s enabled.
How to use it
So, let’s come back to our most trivial example, without any of the
mkosi.$SOMETHING
files around:
# mkosi
As mentioned, this will create a build file image.raw
in the current
directory. How do we use it? Of course, we could dd
it onto some USB
stick and boot it on a bare-metal device. However, it’s much simpler
to first run it in a container for testing:
# systemd-nspawn -bi image.raw
And there you go: the image should boot up, and just work for you.
Now, let’s make things more interesting. Let’s still not use any of
the mkosi.$SOMETHING
files around:
# mkosi -t raw_btrfs --bootable -o foobar.raw
# systemd-nspawn -bi foobar.raw
This is similar as the above, but we made three changes: it’s no
longer GPT + ext4
, but GPT + btrfs
. Moreover, the system is made
bootable on UEFI systems, and finally, the output is now called
foobar.raw
.
Because this system is bootable on UEFI systems, we can run it in KVM:
qemu-kvm -m 512 -smp 2 -bios /usr/share/edk2/ovmf/OVMF_CODE.fd -drive format=raw,file=foobar.raw
This will look very similar to the systemd-nspawn
invocation, except
that this uses full VM virtualization rather than container
virtualization. (Note that the way to run a UEFI qemu/kvm instance
appears to change all the time and is different on the various
distributions. It’s quite annoying, and I can’t really tell you what
the right qemu command line is to make this work on your system.)
Of course, it’s not all raw GPT disk images with mkosi
. Let’s try
a plain directory image:
# mkosi -d fedora -t directory -o quux
# systemd-nspawn -bD quux
Of course, if you generate the image as plain directory you can’t boot
it on bare-metal just like that, nor run it in a VM.
A more complex command line is the following:
# mkosi -d fedora -t raw_squashfs --checksum --xz --package=openssh-clients --package=emacs
In this mode we explicitly pick Fedora as the distribution to use, ask
mkosi
to generate a compressed GPT image with a root squashfs,
compress the result with xz
, and generate a SHA256SUMS
file with
the hashes of the generated artifacts. The package will contain the
SSH client as well as everybody’s favorite editor.
Now, let’s make use of the various mkosi.$SOMETHING
files. Let’s
say we are working on some Automake-based project and want to make it
easy to generate a disk image off the development tree with the
version you are hacking on. Create a configuration file:
# cat > mkosi.default <<EOF
[Distribution]
Distribution=fedora
Release=24
[Output]
Format=raw_btrfs
Bootable=yes
[Packages]
# The packages to appear in both the build and the final image
Packages=openssh-clients httpd
# The packages to appear in the build image, but absent from the final image
BuildPackages=make gcc libcurl-devel
EOF
And let’s add a build script:
# cat > mkosi.build <<EOF
#!/bin/sh
cd $SRCDIR
./autogen.sh
./configure --prefix=/usr
make -j `nproc`
make install
EOF
# chmod +x mkosi.build
And with all that in place we can now build our project into a disk image, simply by typing:
# mkosi
Let’s try it out:
# systemd-nspawn -bi image.raw
Of course, if you do this you’ll notice that building an image like
this can be quite slow. And slow build times are actively hurtful to
your productivity as a developer. Hence let’s make things a bit
faster. First, let’s make use of a package cache shared between runs:
# mkdir mkosi.chache
Building images now should already be substantially faster (and
generate less network traffic) as the packages will now be downloaded
only once and reused. However, you’ll notice that unpacking all those
packages and the rest of the work is still quite slow. But mkosi
can
help you with that. Simply use mkosi
‘s incremental build feature. In
this mode mkosi
will make a copy of the build and final images
immediately before dropping in your build sources or artifacts, so
that building an image becomes a lot quicker: instead of always
starting totally from scratch a build will now reuse everything it can
reuse from a previous run, and immediately begin with building your
sources rather than the build image to build your sources in. To
enable the incremental build feature use -i
:
# mkosi -i
Note that if you use this option, the package list is not updated
anymore from your distribution’s servers, as the cached copy is made
after all packages are installed, and hence until you actually delete
the cached copy the distribution’s network servers aren’t contacted
again and no RPMs or DEBs are downloaded. This means the distribution
you use becomes “frozen in time” this way. (Which might be a bad
thing, but also a good thing, as it makes things kinda reproducible.)
Of course, if you run mkosi
a couple of times you’ll notice that it
won’t overwrite the generated image when it already exists. You can
either delete the file yourself first (rm image.raw
) or let mkosi
do it for you right before building a new image, with mkosi -f
. You
can also tell mkosi
to not only remove any such pre-existing images,
but also remove any cached copies of the incremental feature, by using
-f
twice.
I wrote mkosi
originally in order to test systemd, and quickly
generate a disk image of various distributions with the most current
systemd version from git, without all that affecting my host system. I
regularly use mkosi
for that today, in incremental mode. The two
commands I use most in that context are:
# mkosi -if && systemd-nspawn -bi image.raw
And sometimes:
# mkosi -iff && systemd-nspawn -bi image.raw
The latter I use only if I want to regenerate everything based on the
very newest set of RPMs provided by Fedora, instead of a cached
snapshot of it.
BTW, the mkosi
files for systemd are included in the systemd git
tree:
mkosi.default
and
mkosi.build
. This
way, any developer who wants to quickly test something with current
systemd git, or wants to prepare a patch based on it and test it can
check out the systemd repository and simply run mkosi
in it and a
few minutes later he has a bootable image he can test in
systemd-nspawn
or KVM. casync
has similar files:
mkosi.default
,
mkosi.build
.
Random Interesting Features
As mentioned already,
mkosi
will generatedm-verity
enabled
disk images if you ask for it. For that use the--verity
switch on
the command line orVerity=
setting inmkosi.default
. Of course,
dm-verity
implies that the root volume is read-only. In this mode
the top-leveldm-verity
hash will be placed along-side the output
disk image in a file named the same way, but with the.roothash
suffix. If the image is to be created bootable, the root hash is also
included on the kernel command line in theroothash=
parameter,
which current systemd versions can use to both find and activate the
root partition in adm-verity
protected way. BTW: it’s a good idea
to combine thisdm-verity
mode with theraw_squashfs
image mode,
to generate a genuinely protected, compressed image suitable for
running in your IoT device.As indicated above,
mkosi
can automatically create a check-sum
fileSHA256SUMS
for you (--checksum
) covering all the files it
outputs (which could be the image file itself, a matching.nspawn
file using themkosi.nspawn
file mentioned above, as well as the
.roothash
file for thedm-verity
root hash.) It can then
optionally sign this withgpg
(--sign
). Note thatsystemd
‘s
machinectl pull-tar
andmachinectl pull-raw
command can download
these files and theSHA256SUMS
file automatically and verify things
on download. With other words: whatmkosi
outputs is perfectly
ready for downloads using these twosystemd
commands.As mentioned,
mkosi
is big on supporting UEFI SecureBoot. To
make use of that, place your X.509 key pair in two files
mkosi.secureboot.crt
andmkosi.secureboot.key
, and set
SecureBoot=
or--secure-boot
. If so,mkosi
will sign the
kernel/initrd/kernel command line combination during the build. Of
course, if you use this mode, you should also use
Verity=
/--verity=
, otherwise the setup makes only partial
sense. Note thatmkosi
will not help you with actually enrolling
the keys you use in your UEFI BIOS.mkosi
has minimal support for GIT checkouts: when it recognizes
it is run in a git checkout and you use themkosi.build
script
stuff, the source tree will be copied into the build image, but will
all files excluded by.gitignore
removed.There’s support for encryption in place. Use
--encrypt=
or
Encrypt=
. Note that the UEFI ESP is never encrypted though, and the
root partition only if explicitly requested. The/home
and/srv
partitions are unconditionally encrypted if that’s enabled.Images may be built with all documentation removed.
The password for the root user and additional kernel command line
arguments may be configured for the image to generate.
Minimum Requirements
Current mkosi
requires Python 3.5, and has a number of dependencies,
listed in the
README
. Most
notably you need a somewhat recent systemd version to make use of its
full feature set: systemd 233. Older versions are already packaged for
various distributions, but much of what I describe above is only
available in the most recent release mkosi 3
.
The UEFI SecureBoot support requires sbsign
which currently isn’t
available in Fedora, but there’s a
COPR.
Future
It is my intention to continue turning mkosi
into a tool suitable
for:
- Testing and debugging projects
- Building images for secure devices
- Building portable service images
- Building images for secure VMs and containers
One of the biggest goals I have for the future is to teach mkosi
and
systemd
/sd-boot
native support for A/B IoT style partition
setups. The idea is that the combination of systemd
, casync
and
mkosi
provides generic building blocks for building secure,
auto-updating devices in a generic way from, even though all pieces
may be used individually, too.
FAQ
Why are you reinventing the wheel again? This is exactly like
$SOMEOTHERPROJECT
! — Well, to my knowledge there’s no tool that
integrates this nicely with your project’s development tree, and can
dodm-verity
and UEFI SecureBoot and all that stuff for you. So
nope, I don’t think this exactly like$SOMEOTHERPROJECT
, thank you
very much.What about creating MBR/DOS partition images? — That’s really
out of focus to me. This is an exercise in figuring out how generic
OSes and devices in the future should be built and an attempt to
commoditize OS image building. And no, the future doesn’t speak MBR,
sorry. That said, I’d be quite interested in adding support for
booting on Raspberry Pi, possibly using a hybrid approach, i.e. using
a GPT disk label, but arranging things in a way that the Raspberry Pi
boot protocol (which is built around DOS partition tables), can still
work.Is this portable? — Well, depends what you mean by
portable. No, this tool runs on Linux only, and as it uses
systemd-nspawn
during the build process it doesn’t run on
non-systemd
systems either. But then again, you should be able to
create images for any architecture you like with it, but of course if
you want the image bootable on bare-metal systems only systems doing
UEFI are supported (butsystemd-nspawn
should still work fine on
them).Where can I get this stuff? — Try
GitHub. And some distributions
carry packaged versions, but I think none of them the current v3
yet.Is this a systemd project? — Yes, it’s hosted under the
systemd GitHub umbrella. And yes,
during run-timesystemd-nspawn
in a current version is required. But
no, the code-bases are separate otherwise, already becausesystemd
is a C project, andmkosi
Python.Requiring systemd 233 is a pretty steep requirement, no? —
Yes, but the feature we need kind of matters (systemd-nspawn
‘s
--overlay=
switch), and again, this isn’t supposed to be a tool for
legacy systems.Can I run the resulting images in LXC or Docker? — Humm, I am
not an LXC nor Docker guy. If you selectdirectory
orsubvolume
as image type, LXC should be able to boot the generated images just
fine, but I didn’t try. Last time I looked, Docker doesn’t permit
running proper init systems as PID 1 inside the container, as they
define their own run-time without intention to emulate a proper
system. Hence, no I don’t think it will work, at least not with an
unpatched Docker version. That said, again, don’t ask me questions
about Docker, it’s not precisely my area of expertise, and quite
frankly I am not a fan. To my knowledge neither LXC nor Docker are
able to run containers directly off GPT disk images, hence the
variousraw_xyz
image types are definitely not compatible with
either. That means if you want to generate a single raw disk image
that can be booted unmodified both in a container and on bare-metal,
thensystemd-nspawn
is the container manager to go for
(specifically, its-i
/--image=
switch).
Should you care? Is this a tool for you?
Well, that’s up to you really.
If you hack on some complex project and need a quick way to compile
and run your project on a specific current Linux distribution, then
mkosi
is an excellent way to do that. Simply drop the mkosi.default
and mkosi.build
files in your git
tree and everything will be
easy. (And of course, as indicated above: if the project you are
hacking on happens to be called systemd
or casync
be aware that
those files are already part of the git tree — you can just use them.)
If you hack on some embedded or IoT device, then mkosi
is a great
choice too, as it will make it reasonably easy to generate secure
images that are protected against offline modification, by using
dm-verity
and UEFI SecureBoot.
If you are an administrator and need a nice way to build images for a
VM or systemd-nspawn
container, or a portable service then mkosi
is an excellent choice too.
If you care about legacy computers, old distributions, non-systemd
init systems, old VM managers, Docker, … then no, mkosi
is not for
you, but there are plenty of well-established alternatives around that
cover that nicely.
And never forget: mkosi
is an Open Source project. We are happy to
accept your patches and other contributions.
Oh, and one unrelated last thing: don’t forget to submit your talk
proposal
and/or buy a ticket for
All Systems Go! 2017 in Berlin — the
conference where things like systemd
, casync
and mkosi
are
discussed, along with a variety of other Linux userspace projects used
for building systems.
mkosi — A Tool for Generating OS Images
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/mkosi-a-tool-for-generating-os-images.html
Introducing mkosi
After blogging about
casync
I realized I never blogged about the
mkosi
tool that combines nicely
with it. mkosi
has been around for a while already, and its time to
make it a bit better known. mkosi
stands for Make Operating System
Image, and is a tool for precisely that: generating an OS tree or
image that can be booted.
Yes, there are many tools like mkosi
, and a number of them are quite
well known and popular. But mkosi
has a number of features that I
think make it interesting for a variety of use-cases that other tools
don’t cover that well.
What is mkosi?
What are those use-cases, and what does mkosi
precisely set apart?
mkosi
is definitely a tool with a focus on developer’s needs for
building OS images, for testing and debugging, but also for generating
production images with cryptographic protection. A typical use-case
would be to add a mkosi.default
file to an existing project (for
example, one written in C or Python), and thus making it easy to
generate an OS image for it. mkosi
will put together the image with
development headers and tools, compile your code in it, run your test
suite, then throw away the image again, and build a new one, this time
without development headers and tools, and install your build
artifacts in it. This final image is then “production-ready”, and only
contains your built program and the minimal set of packages you
configured otherwise. Such an image could then be deployed with
casync
(or any other tool of course) to be delivered to your set of
servers, or IoT devices or whatever you are building.
mkosi
is supposed to be legacy-free: the focus is clearly on
today’s technology, not yesteryear’s. Specifically this means that
we’ll generate GPT partition tables, not MBR/DOS ones. When you tell
mkosi
to generate a bootable image for you, it will make it bootable
on EFI, not on legacy BIOS. The GPT images generated follow
specifications such as the Discoverable Partitions
Specification,
so that /etc/fstab
can remain unpopulated and tools such as
systemd-nspawn
can automatically dissect the image and boot from
them.
So, let’s have a look on the specific images it can generate:
- Raw GPT disk image, with ext4 as root
- Raw GPT disk image, with btrfs as root
- Raw GPT disk image, with a read-only squashfs as root
- A plain directory on disk containing the OS tree directly (this is useful for creating generic container images)
- A btrfs subvolume on disk, similar to the plain directory
- A tarball of a plain directory
When any of the GPT choices above are selected, a couple of additional
options are available:
- A swap partition may be added in
- The system may be made bootable on EFI systems
- Separate partitions for
/home
and/srv
may be added in - The root,
/home
and/srv
partitions may be optionally encrypted with LUKS - The root partition may be protected using
dm-verity
, thus making offline attacks on the generated system hard - If the image is made bootable, the
dm-verity
root hash is automatically added to the kernel command line, and the kernel together with its initial RAM disk and the kernel command line is optionally cryptographically signed for UEFI SecureBoot
Note that mkosi
is distribution-agnostic. It currently can build
images based on the following Linux distributions:
- Fedora
- Debian
- Ubuntu
- ArchLinux
- openSUSE
Note though that not all distributions are supported at the same
feature level currently. Also, as mkosi
is based on dnf
,
--installrootdebootstrap
, pacstrap
and zypper
, and those
packages are not packaged universally on all distributions, you might
not be able to build images for all those distributions on arbitrary
host distributions.
The GPT images are put together in a way that they aren’t just
compatible with UEFI systems, but also with VM and container managers
(that is, at least the smart ones, i.e. VM managers that know UEFI,
and container managers that grok GPT disk images) to a large
degree. In fact, the idea is that you can use mkosi
to build a
single GPT image that may be used to:
- Boot on bare-metal boxes
- Boot in a VM
- Boot in a
systemd-nspawn
container - Directly run a systemd service off, using systemd’s
RootImage=
unit file setting
Note that in all four cases the dm-verity
data is automatically used
if available to ensure the image is not tampered with (yes, you read
that right, systemd-nspawn
and systemd’s RootImage=
setting
automatically do dm-verity
these days if the image has it.)
Mode of Operation
The simplest usage of mkosi
is by simply invoking it without
parameters (as root):
# mkosi
Without any configuration this will create a GPT disk image for you,
will call it image.raw
and drop it in the current directory. The
distribution used will be the same one as your host runs.
Of course in most cases you want more control about how the image is
put together, i.e. select package sets, select the distribution, size
partitions and so on. Most of that you can actually specify on the
command line, but it is recommended to instead create a couple of
mkosi.$SOMETHING
files and directories in some directory. Then,
simply change to that directory and run mkosi
without any further
arguments. The tool will then look in the current working directory
for these files and directories and make use of them (similar to how
make
looks for a Makefile
…). Every single file/directory is
optional, but if they exist they are honored. Here’s a list of the
files/directories mkosi
currently looks for:
mkosi.default
— This is the main configuration file, here you
can configure what kind of image you want, which distribution, which
packages and so on.mkosi.extra/
— If this directory exists, thenmkosi
will copy
everything inside it into the images built. You can place arbitrary
directory hierarchies in here, and they’ll be copied over whatever is
already in the image, after it was put together by the distribution’s
package manager. This is the best way to drop additional static files
into the image, or override distribution-supplied ones.mkosi.build
— This executable file is supposed to be a build
script. When it exists,mkosi
will build two images, one after the
other in the mode already mentioned above: the first version is the
build image, and may include various build-time dependencies such as
a compiler or development headers. The build script is also copied
into it, and then run inside it. The script should then build
whatever shall be built and place the result in$DESTDIR
(don’t
worry, popular build tools such as Automake or Meson all honor
$DESTDIR
anyway, so there’s not much to do here explicitly). It may
also run a test suite, or anything else you like. After the script
finished, the build image is removed again, and a second image (the
final image) is built. This time, no development packages are
included, and the build script is not copied into the image again —
however, the build artifacts from the first run (i.e. those placed in
$DESTDIR
) are copied into the image.mkosi.postinst
— If this executable script exists, it is invoked
inside the image (inside asystemd-nspawn
invocation) and can
adjust the image as it likes at a very late point in the image
preparation. Ifmkosi.build
exists, i.e. the dual-phased
development build process used, then this script will be invoked
twice: once inside the build image and once inside the final
image. The first parameter passed to the script clarifies which phase
it is run in.mkosi.nspawn
— If this file exists, it should contain a
container configuration file forsystemd-nspawn
(see
systemd.nspawn(5)
for details), which shall be shipped along with the final image and
shall be included in the check-sum calculations (see below).mkosi.cache/
— If this directory exists, it is used as package
cache directory for the builds. This directory is effectively bind
mounted into the image at build time, in order to speed up building
images. The package installers of the various distributions will
place their package files here, so that subsequent runs can reuse
them.mkosi.passphrase
— If this file exists, it should contain a
pass-phrase to use for the LUKS encryption (if that’s enabled for the
image built). This file should not be readable to other users.mkosi.secure-boot.crt
andmkosi.secure-boot.key
should be an
X.509 key pair to use for signing the kernel and initrd for UEFI
SecureBoot, if that’s enabled.
How to use it
So, let’s come back to our most trivial example, without any of the
mkosi.$SOMETHING
files around:
# mkosi
As mentioned, this will create a build file image.raw
in the current
directory. How do we use it? Of course, we could dd
it onto some USB
stick and boot it on a bare-metal device. However, it’s much simpler
to first run it in a container for testing:
# systemd-nspawn -bi image.raw
And there you go: the image should boot up, and just work for you.
Now, let’s make things more interesting. Let’s still not use any of
the mkosi.$SOMETHING
files around:
# mkosi -t raw_btrfs --bootable -o foobar.raw
# systemd-nspawn -bi foobar.raw
This is similar as the above, but we made three changes: it’s no
longer GPT + ext4
, but GPT + btrfs
. Moreover, the system is made
bootable on UEFI systems, and finally, the output is now called
foobar.raw
.
Because this system is bootable on UEFI systems, we can run it in KVM:
qemu-kvm -m 512 -smp 2 -bios /usr/share/edk2/ovmf/OVMF_CODE.fd -drive format=raw,file=foobar.raw
This will look very similar to the systemd-nspawn
invocation, except
that this uses full VM virtualization rather than container
virtualization. (Note that the way to run a UEFI qemu/kvm instance
appears to change all the time and is different on the various
distributions. It’s quite annoying, and I can’t really tell you what
the right qemu command line is to make this work on your system.)
Of course, it’s not all raw GPT disk images with mkosi
. Let’s try
a plain directory image:
# mkosi -d fedora -t directory -o quux
# systemd-nspawn -bD quux
Of course, if you generate the image as plain directory you can’t boot
it on bare-metal just like that, nor run it in a VM.
A more complex command line is the following:
# mkosi -d fedora -t raw_squashfs --checksum --xz --package=openssh-clients --package=emacs
In this mode we explicitly pick Fedora as the distribution to use, ask
mkosi
to generate a compressed GPT image with a root squashfs,
compress the result with xz
, and generate a SHA256SUMS
file with
the hashes of the generated artifacts. The package will contain the
SSH client as well as everybody’s favorite editor.
Now, let’s make use of the various mkosi.$SOMETHING
files. Let’s
say we are working on some Automake-based project and want to make it
easy to generate a disk image off the development tree with the
version you are hacking on. Create a configuration file:
# cat > mkosi.default <<EOF
[Distribution]
Distribution=fedora
Release=24
[Output]
Format=raw_btrfs
Bootable=yes
[Packages]
# The packages to appear in both the build and the final image
Packages=openssh-clients httpd
# The packages to appear in the build image, but absent from the final image
BuildPackages=make gcc libcurl-devel
EOF
And let’s add a build script:
# cat > mkosi.build <<EOF
#!/bin/sh
./autogen.sh
./configure --prefix=/usr
make -j `nproc`
make install
EOF
# chmod +x mkosi.build
And with all that in place we can now build our project into a disk image, simply by typing:
# mkosi
Let’s try it out:
# systemd-nspawn -bi image.raw
Of course, if you do this you’ll notice that building an image like
this can be quite slow. And slow build times are actively hurtful to
your productivity as a developer. Hence let’s make things a bit
faster. First, let’s make use of a package cache shared between runs:
# mkdir mkosi.cache
Building images now should already be substantially faster (and
generate less network traffic) as the packages will now be downloaded
only once and reused. However, you’ll notice that unpacking all those
packages and the rest of the work is still quite slow. But mkosi
can
help you with that. Simply use mkosi
‘s incremental build feature. In
this mode mkosi
will make a copy of the build and final images
immediately before dropping in your build sources or artifacts, so
that building an image becomes a lot quicker: instead of always
starting totally from scratch a build will now reuse everything it can
reuse from a previous run, and immediately begin with building your
sources rather than the build image to build your sources in. To
enable the incremental build feature use -i
:
# mkosi -i
Note that if you use this option, the package list is not updated
anymore from your distribution’s servers, as the cached copy is made
after all packages are installed, and hence until you actually delete
the cached copy the distribution’s network servers aren’t contacted
again and no RPMs or DEBs are downloaded. This means the distribution
you use becomes “frozen in time” this way. (Which might be a bad
thing, but also a good thing, as it makes things kinda reproducible.)
Of course, if you run mkosi
a couple of times you’ll notice that it
won’t overwrite the generated image when it already exists. You can
either delete the file yourself first (rm image.raw
) or let mkosi
do it for you right before building a new image, with mkosi -f
. You
can also tell mkosi
to not only remove any such pre-existing images,
but also remove any cached copies of the incremental feature, by using
-f
twice.
I wrote mkosi
originally in order to test systemd, and quickly
generate a disk image of various distributions with the most current
systemd version from git, without all that affecting my host system. I
regularly use mkosi
for that today, in incremental mode. The two
commands I use most in that context are:
# mkosi -if && systemd-nspawn -bi image.raw
And sometimes:
# mkosi -iff && systemd-nspawn -bi image.raw
The latter I use only if I want to regenerate everything based on the
very newest set of RPMs provided by Fedora, instead of a cached
snapshot of it.
BTW, the mkosi
files for systemd are included in the systemd git
tree:
mkosi.default
and
mkosi.build
. This
way, any developer who wants to quickly test something with current
systemd git, or wants to prepare a patch based on it and test it can
check out the systemd repository and simply run mkosi
in it and a
few minutes later he has a bootable image he can test in
systemd-nspawn
or KVM. casync
has similar files:
mkosi.default
,
mkosi.build
.
Random Interesting Features
As mentioned already,
mkosi
will generatedm-verity
enabled
disk images if you ask for it. For that use the--verity
switch on
the command line orVerity=
setting inmkosi.default
. Of course,
dm-verity
implies that the root volume is read-only. In this mode
the top-leveldm-verity
hash will be placed along-side the output
disk image in a file named the same way, but with the.roothash
suffix. If the image is to be created bootable, the root hash is also
included on the kernel command line in theroothash=
parameter,
which current systemd versions can use to both find and activate the
root partition in adm-verity
protected way. BTW: it’s a good idea
to combine thisdm-verity
mode with theraw_squashfs
image mode,
to generate a genuinely protected, compressed image suitable for
running in your IoT device.As indicated above,
mkosi
can automatically create a check-sum
fileSHA256SUMS
for you (--checksum
) covering all the files it
outputs (which could be the image file itself, a matching.nspawn
file using themkosi.nspawn
file mentioned above, as well as the
.roothash
file for thedm-verity
root hash.) It can then
optionally sign this withgpg
(--sign
). Note thatsystemd
‘s
machinectl pull-tar
andmachinectl pull-raw
command can download
these files and theSHA256SUMS
file automatically and verify things
on download. With other words: whatmkosi
outputs is perfectly
ready for downloads using these twosystemd
commands.As mentioned,
mkosi
is big on supporting UEFI SecureBoot. To
make use of that, place your X.509 key pair in two files
mkosi.secureboot.crt
andmkosi.secureboot.key
, and set
SecureBoot=
or--secure-boot
. If so,mkosi
will sign the
kernel/initrd/kernel command line combination during the build. Of
course, if you use this mode, you should also use
Verity=
/--verity=
, otherwise the setup makes only partial
sense. Note thatmkosi
will not help you with actually enrolling
the keys you use in your UEFI BIOS.mkosi
has minimal support for GIT checkouts: when it recognizes
it is run in a git checkout and you use themkosi.build
script
stuff, the source tree will be copied into the build image, but will
all files excluded by.gitignore
removed.There’s support for encryption in place. Use
--encrypt=
or
Encrypt=
. Note that the UEFI ESP is never encrypted though, and the
root partition only if explicitly requested. The/home
and/srv
partitions are unconditionally encrypted if that’s enabled.Images may be built with all documentation removed.
The password for the root user and additional kernel command line
arguments may be configured for the image to generate.
Minimum Requirements
Current mkosi
requires Python 3.5, and has a number of dependencies,
listed in the
README
. Most
notably you need a somewhat recent systemd version to make use of its
full feature set: systemd 233. Older versions are already packaged for
various distributions, but much of what I describe above is only
available in the most recent release mkosi 3
.
The UEFI SecureBoot support requires sbsign
which currently isn’t
available in Fedora, but there’s a
COPR.
Future
It is my intention to continue turning mkosi
into a tool suitable
for:
- Testing and debugging projects
- Building images for secure devices
- Building portable service images
- Building images for secure VMs and containers
One of the biggest goals I have for the future is to teach mkosi
and
systemd
/sd-boot
native support for A/B IoT style partition
setups. The idea is that the combination of systemd
, casync
and
mkosi
provides generic building blocks for building secure,
auto-updating devices in a generic way from, even though all pieces
may be used individually, too.
FAQ
Why are you reinventing the wheel again? This is exactly like
$SOMEOTHERPROJECT
! — Well, to my knowledge there’s no tool that
integrates this nicely with your project’s development tree, and can
dodm-verity
and UEFI SecureBoot and all that stuff for you. So
nope, I don’t think this exactly like$SOMEOTHERPROJECT
, thank you
very much.What about creating MBR/DOS partition images? — That’s really
out of focus to me. This is an exercise in figuring out how generic
OSes and devices in the future should be built and an attempt to
commoditize OS image building. And no, the future doesn’t speak MBR,
sorry. That said, I’d be quite interested in adding support for
booting on Raspberry Pi, possibly using a hybrid approach, i.e. using
a GPT disk label, but arranging things in a way that the Raspberry Pi
boot protocol (which is built around DOS partition tables), can still
work.Is this portable? — Well, depends what you mean by
portable. No, this tool runs on Linux only, and as it uses
systemd-nspawn
during the build process it doesn’t run on
non-systemd
systems either. But then again, you should be able to
create images for any architecture you like with it, but of course if
you want the image bootable on bare-metal systems only systems doing
UEFI are supported (butsystemd-nspawn
should still work fine on
them).Where can I get this stuff? — Try
GitHub. And some distributions
carry packaged versions, but I think none of them the current v3
yet.Is this a systemd project? — Yes, it’s hosted under the
systemd GitHub umbrella. And yes,
during run-timesystemd-nspawn
in a current version is required. But
no, the code-bases are separate otherwise, already becausesystemd
is a C project, andmkosi
Python.Requiring systemd 233 is a pretty steep requirement, no? —
Yes, but the feature we need kind of matters (systemd-nspawn
‘s
--overlay=
switch), and again, this isn’t supposed to be a tool for
legacy systems.Can I run the resulting images in LXC or Docker? — Humm, I am
not an LXC nor Docker guy. If you selectdirectory
orsubvolume
as image type, LXC should be able to boot the generated images just
fine, but I didn’t try. Last time I looked, Docker doesn’t permit
running proper init systems as PID 1 inside the container, as they
define their own run-time without intention to emulate a proper
system. Hence, no I don’t think it will work, at least not with an
unpatched Docker version. That said, again, don’t ask me questions
about Docker, it’s not precisely my area of expertise, and quite
frankly I am not a fan. To my knowledge neither LXC nor Docker are
able to run containers directly off GPT disk images, hence the
variousraw_xyz
image types are definitely not compatible with
either. That means if you want to generate a single raw disk image
that can be booted unmodified both in a container and on bare-metal,
thensystemd-nspawn
is the container manager to go for
(specifically, its-i
/--image=
switch).
Should you care? Is this a tool for you?
Well, that’s up to you really.
If you hack on some complex project and need a quick way to compile
and run your project on a specific current Linux distribution, then
mkosi
is an excellent way to do that. Simply drop the mkosi.default
and mkosi.build
files in your git
tree and everything will be
easy. (And of course, as indicated above: if the project you are
hacking on happens to be called systemd
or casync
be aware that
those files are already part of the git tree — you can just use them.)
If you hack on some embedded or IoT device, then mkosi
is a great
choice too, as it will make it reasonably easy to generate secure
images that are protected against offline modification, by using
dm-verity
and UEFI SecureBoot.
If you are an administrator and need a nice way to build images for a
VM or systemd-nspawn
container, or a portable service then mkosi
is an excellent choice too.
If you care about legacy computers, old distributions, non-systemd
init systems, old VM managers, Docker, … then no, mkosi
is not for
you, but there are plenty of well-established alternatives around that
cover that nicely.
And never forget: mkosi
is an Open Source project. We are happy to
accept your patches and other contributions.
Oh, and one unrelated last thing: don’t forget to submit your talk
proposal
and/or buy a ticket for
All Systems Go! 2017 in Berlin — the
conference where things like systemd
, casync
and mkosi
are
discussed, along with a variety of other Linux userspace projects used
for building systems.
casync — A tool for distributing file system images
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
Introducing casync
In the past months I have been working on a new project:
casync
. casync
takes
inspiration from the popular rsync
file
synchronization tool as well as the probably even more popular
git
revision control system. It combines the
idea of the rsync
algorithm with the idea of git
-style
content-addressable file systems, and creates a new system for
efficiently storing and delivering file system images, optimized for
high-frequency update cycles over the Internet. Its current focus is
on delivering IoT, container, VM, application, portable service or OS
images, but I hope to extend it later in a generic fashion to become
useful for backups and home directory synchronization as well (but
more about that later).
The basic technological building blocks casync
is built from are
neither new nor particularly innovative (at least not anymore),
however the way casync
combines them is different from existing tools,
and that’s what makes it useful for a variety of usecases that other
tools can’t cover that well.
Why?
I created casync
after studying how today’s popular tools store and
deliver file system images. To very incomprehensively and briefly name
a few: Docker has a layered tarball approach,
OSTree serves the
individual files directly via HTTP and maintains packed deltas to
speed up updates, while other systems operate on the block layer and
place raw squashfs
images (or other archival file systems, such as
IS09660) for download on HTTP shares (in the better cases combined
with zsync
data).
Neither of these approaches appeared fully convincing to me when used
in high-frequency update cycle systems. In such systems, it is
important to optimize towards a couple of goals:
- Most importantly, make updates cheap traffic-wise (for this most tools use image deltas of some form)
- Put boundaries on disk space usage on servers (keeping deltas between all version combinations clients might want to run updates between, would suggest keeping an exponentially growing amount of deltas on servers)
- Put boundaries on disk space usage on clients
- Be friendly to Content Delivery Networks (CDNs), i.e. serve neither too many small nor too many overly large files, and only require the most basic form of HTTP. Provide the repository administrator with high-level knobs to tune the average file size delivered.
- Simplicity to use for users, repository administrators and developers
I don’t think any of the tools mentioned above are really good on more
than a small subset of these points.
Specifically: Docker’s layered tarball approach dumps the “delta”
question onto the feet of the image creators: the best way to make
your image downloads minimal is basing your work on an existing image
clients might already have, and inherit its resources, maintaing full
history. Here, revision control (a tool for the developer) is
intermingled with update management (a concept for optimizing
production delivery). As container histories grow individual deltas
are likely to stay small, but on the other hand a brand-new deployment
usually requires downloading the full history onto the deployment
system, even though there’s no use for it there, and likely requires
substantially more disk space and download sizes.
OSTree’s serving of individual files is unfriendly to CDNs (as many
small files in file trees cause an explosion of HTTP GET
requests). To counter that OSTree supports placing pre-calculated
delta images between selected revisions on the delivery servers, which
means a certain amount of revision management, that leaks into the
clients.
Delivering direct squashfs
(or other file system) images is almost
beautifully simple, but of course means every update requires a full
download of the newest image, which is both bad for disk usage and
generated traffic. Enhancing it with zsync
makes this a much better
option, as it can reduce generated traffic substantially at very
little cost of history/metadata (no explicit deltas between a large
number of versions need to be prepared server side). On the other hand
server requirements in disk space and functionality (HTTP Range
requests) are minus points for the usecase I am interested in.
(Note: all the mentioned systems have great properties, and it’s not
my intention to badmouth them. They only point I am trying to make is
that for the use case I care about — file system image delivery with
high high frequency update-cycles — each system comes with certain
drawbacks.)
Security & Reproducability
Besides the issues pointed out above I wasn’t happy with the security
and reproducability properties of these systems. In today’s world
where security breaches involving hacking and breaking into connected
systems happen every day, an image delivery system that cannot make
strong guarantees regarding data integrity is out of
date. Specifically, the tarball format is famously undeterministic:
the very same file tree can result in any number of different
valid serializations depending on the tool used, its version and the
underlying OS and file system. Some tar
implementations attempt to
correct that by guaranteeing that each file tree maps to exactly
one valid serialization, but such a property is always only specific
to the tool used. I strongly believe that any good update system must
guarantee on every single link of the chain that there’s only one
valid representatin of the data to deliver, that can easily be
verified.
What casync Is
So much about the background why I created casync
. Now, let’s have a
look what casync
actually is like, and what it does. Here’s the brief
technical overview:
Encoding: Let’s take a large linear data stream, split it into
variable-sized chunks (the size of each being a function of the
chunk’s contents), and store these chunks in individual, compressed
files in some directory, each file named after a strong hash value of
its contents, so that the hash value may be used to as key for
retrieving the full chunk data. Let’s call this directory a “chunk
store”. At the same time, generate a “chunk index” file that lists
these chunk hash values plus their respective chunk sizes in a simple
linear array. The chunking algorithm is supposed to create variable,
but similarly sized chunks from the data stream, and do so in a way
that the same data results in the same chunks even if placed at
varying offsets. For more information see this blog
story.
Decoding: Let’s take the chunk index file, and reassemble the large
linear data stream by concatenating the uncompressed chunks retrieved
from the chunk store, keyed by the listed chunk hash values.
As an extra twist, we introduce a well-defined, reproducible,
random-access serialization format for file trees (think: a more
modern tar
), to permit efficient, stable storage of complete file
trees in the system, simply by serializing them and then passing them
into the encoding step explained above.
Finally, let’s put all this on the network: for each image you want to
deliver, generate a chunk index file and place it on an HTTP
server. Do the same with the chunk store, and share it between the
various index files you intend to deliver.
Why bother with all of this? Streams with similar contents will result
in mostly the same chunk files in the chunk store. This means it is
very efficient to store many related versions of a data stream in the
same chunk store, thus minimizing disk usage. Moreover, when
transferring linear data streams chunks already known on the receiving
side can be made use of, thus minimizing network traffic.
Why is this different from rsync
or OSTree, or similar tools? Well,
one major difference between casync
and those tools is that we
remove file boundaries before chunking things up. This means that
small files are lumped together with their siblings and large files
are chopped into pieces, which permits us to recognize similarities in
files and directories beyond file boundaries, and makes sure our chunk
sizes are pretty evenly distributed, without the file boundaries
affecting them.
The “chunking” algorithm is based on a the buzhash rolling hash
function. SHA256 is used as strong hash function to generate digests
of the chunks. xz is used to compress the individual chunks.
Here’s a diagram, hopefully explaining a bit how the encoding process
works, wasn’t it for my crappy drawing skills:
The diagram shows the encoding process from top to bottom. It starts
with a block device or a file tree, which is then serialized and
chunked up into variable sized blocks. The compressed chunks are then
placed in the chunk store, while a chunk index file is written listing
the chunk hashes in order. (The original SVG of this graphic may be
found here.
Details
Note that casync
operates on two different layers, depending on the
usecase of the user:
You may use it on the block layer. In this case the raw block data
on disk is taken as-is, read directly from the block device, split
into chunks as described above, compressed, stored and delivered.You may use it on the file system layer. In this case, the
file tree serialization format mentioned above comes into play:
the file tree is serialized depth-first (much liketar
would do
it) and then split into chunks, compressed, stored and delivered.
The fact that it may be used on both the block and file system layer
opens it up for a variety of different usecases. In the VM and IoT
ecosystems shipping images as block-level serializations is more
common, while in the container and application world file-system-level
serializations are more typically used.
Chunk index files referring to block-layer serializations carry the
.caibx
suffix, while chunk index files referring to file system
serializations carry the .caidx
suffix. Note that you may also use
casync
as direct tar
replacement, i.e. without the chunking, just
generating the plain linear file tree serialization. Such files
carry the .catar
suffix. Internally .caibx
are identical to
.caidx
files, the only difference is semantical: .caidx
files
describe a .catar
file, while .caibx
files may describe any other
blob. Finally, chunk stores are directories carrying the .castr
suffix.
Features
Here are a couple of other features casync
has:
When downloading a new image you may use
casync
‘s--seed=
feature: each block device, file, or directory specified is processed
using the same chunking logic described above, and is used as
preferred source when putting together the downloaded image locally,
avoiding network transfer of it. This of course is useful whenever
updating an image: simply specify one or more old versions as seed and
only download the chunks that truly changed since then. Note that
using seeds requires no history relationship between seed and the new
image to download. This has major benefits: you can even use it to
speed up downloads of relatively foreign and unrelated data. For
example, when downloading a container image built using Ubuntu you can
use your Fedora host OS tree in/usr
as seed, andcasync
will
automatically use whatever it can from that tree, for example timezone
and locale data that tends to be identical between
distributions. Example:casync extract
. This
http://example.com/myimage.caibx --seed=/dev/sda1 /dev/sda2
will place the block-layer image described by the indicated URL in the
/dev/sda2
partition, using the existing/dev/sda1
data as seeding
source. An invocation like this could be typically used by IoT systems
with an A/B partition setup. Example 2:casync extract
, is very similar but
http://example.com/mycontainer-v3.caidx --seed=/srv/container-v1
--seed=/srv/container-v2 /src/container-v3
operates on the file system layer, and uses two old container versions
to seed the new version.When operating on the file system level, the user has fine-grained
control on the metadata included in the serialization. This is
relevant since different usecases tend to require a different set of
saved/restored metadata. For example, when shipping OS images, file
access bits/ACLs and ownership matter, while file modification times
hurt. When doing personal backups OTOH file ownership matters little
but file modification times are important. Moreover different backing
file systems support different feature sets, and storing more
information than necessary might make it impossible to validate a tree
against an image if the metadata cannot be replayed in full. Due to
this,casync
provides a set of--with=
and--without=
parameters
that allow fine-grained control of the data stored in the file tree
serialization, including the granularity of modification times and
more. The precise set of selected metadata features is also always
part of the serialization, so that seeding can work correctly and
automatically.casync
tries to be as accurate as possible when storing file
system metadata. This means that besides the usual baseline of file
metadata (file ownership and access bits), and more advanced features
(extended attributes, ACLs, file capabilities) a number of more exotic
data is stored as well, including Linux
chattr(1) file attributes, as
well as FAT file
attributes
(you may wonder why the latter? — EFI is FAT, and/efi
is part of
the comprehensive serialization of any host). In the future I intend
to extend this further, for example storingbtrfs
subvolume
information where available. Note that as described above every single
type of metadata may be turned off and on individually, hence if you
don’t need FAT file bits (and I figure it’s pretty likely you don’t),
then they won’t be stored.The user creating
.caidx
or.caibx
files may control the desired
average chunk length (before compression) freely, using the
--chunk-size=
parameter. Smaller chunks increase the number of
generated files in the chunk store and increase HTTP GET load on the
server, but also ensure that sharing between similar images is
improved, as identical patterns in the images stored are more likely
to be recognized. By defaultcasync
will use a 64K average chunk
size. Tweaking this can be particularly useful when adapting the
system to specific CDNs, or when delivering compressed disk images
such assquashfs
(see below).Emphasis is placed on making all invocations reproducible,
well-defined and strictly deterministic. As mentioned above this is a
requirement to reach the intended security guarantees, but is also
useful for many other usecases. For example, thecasync digest
command may be used to calculate a hash value identifying a specific
directory in all desired detail (use--with=
and--without
to pick
the desired detail). Moreover thecasync mtree
command may be used
to generate a BSD mtree(5) compatible manifest of a directory tree,
.caidx
or.catar
file.The file system serialization format is nicely composable. By this
I mean that the serialization of a file tree is the concatenation of
the serializations of all files and file subtrees located at the
top of the tree, with zero metadata references from any of these
serializations into the others. This property is essential to ensure
maximum reuse of chunks when similar trees are serialized.When extracting file trees or disk image files,
casync
will automatically create
reflinks
from any specified seeds if the underlying file system supports it
(such asbtrfs
,ocfs
, and futurexfs
). After all, instead of
copying the desired data from the seed, we can just tell the file
system to link up the relevant blocks. This works both when extracting
.caidx
and.caibx
files — the latter of course only when the
extracted disk image is placed in a regular raw image file on disk,
rather than directly on a plain block device, as plain block devices
do not know the concept of reflinks.Optionally, when extracting file trees,
casync
can
create traditional UNIX hardlinks for identical files in specified
seeds (--hardlink=yes
). This works on all UNIX file systems, and can
save substantial amounts of disk space. However, this only works for
very specific usecases where disk images are considered read-only
after extraction, as any changes made to one tree will propagate to
all other trees sharing the same hardlinked files, as that’s the
nature of hardlinks. In this mode,casync
exposes OSTree-like
behaviour, which is built heavily around read-only hardlink trees.casync
tries to be smart when choosing what to include in file
system images. Implicitly, file systems such as procfs and sysfs are
exluded from serialization, as they expose API objects, not real
files. Moreover, the “nodump” (+d
)
chattr(1) flag is honoured by
default, permitting users to mark files to exclude from serialization.When creating and extracting file trees
casync
may apply am
automatic or explicit UID/GID shift. This is particularly useful when
transferring container image for use with Linux user namespacing.In addition to local operation,
casync
currently supports HTTP,
HTTPS, FTP and ssh natively for downloading chunk index files and
chunks (the ssh mode requires installingcasync
on the remote host,
though, but an sftp mode not requiring that should be easy to
add). When creating index files or chunks, only ssh is supported as
remote backend.When operating on block-layer images, you may expose locally or
remotely stored images as local block devices. Example:casync mkdev
exposes the disk image described by
http://example.com/myimage.caibx
the indicated URL as local block device in/dev
, which you then may
use the usual block device tools on, such as mount or fdisk (only
read-only though). Chunks are downloaded on access with high priority,
and at low priority when idle in the background. Note that in this
mode,casync
also plays a role similar to “dm-verity”, as all blocks
are validated against the strong digests in the chunk index file
before passing them on to the kernel’s block layer. This feature is
implemented though Linux’ NBD kernel facility.Similar, when operating on file-system-layer images, you may mount
locally or remotely stored images as regular file systems. Example:
casync mount http://example.com/mytree.caidx /srv/mytree
mounts the
file tree image described by the indicated URL as a local directory
/srv/mytree
. This feature is implemented though Linux’ FUSE kernel
facility. Note that special care is taken that the images exposed this
way can be packed up again withcasync make
and are guaranteed to
return the bit-by-bit exact same serialization again that it was
mounted from. No data is lost or changed while passing thrings through
FUSE (OK, strictly speaking this is a lie, we do lose ACLs, but that’s
hopefully just a temporary gap to be fixed soon).In IoT A/B fixed size partition setups the file systems placed in
the two partitions are usually much shorter than the partition size,
in order to keep some room for later, larger updates.casync
is able
to analyze the superblock of a number of common file systems in order
to determine the actual size of a file system stored on a block
device, so that writing a file system to such a partition and reading
it back again will result in reproducible data. Moreover this speeds
up the seeding process, as there’s little point in seeding the
whitespace after the file system within the partition.
Example Command Lines
Here’s how to use casync
, explained with a few examples:
$ casync make foobar.caidx /some/directory
This will create a chunk index file foobar.caidx
in the local
directory, and populate the chunk store directory default.castr
located next to it with the chunks of the serialization (you can
change the name for the store directory with --store=
if you
like). This command operates on the file-system level. A similar
command operating on the block level:
$ casync make foobar.caibx /dev/sda1
This command creates a chunk index file foobar.caibx
in the local
directory describing the current contents of the /dev/sda1
block
device, and populates default.castr
in the same way as above. Note
that you may as well read a raw disk image from a file instead of a
block device:
$ casync make foobar.caibx myimage.raw
To reconstruct the original file tree from the .caidx
file and
the chunk store of the first command, use:
$ casync extract foobar.caidx /some/other/directory
And similar for the block-layer version:
$ casync extract foobar.caibx /dev/sdb1
or, to extract the block-layer version into a raw disk image:
$ casync extract foobar.caibx myotherimage.raw
The above are the most basic commands, operating on local data
only. Now let’s make this more interesting, and reference remote
resources:
$ casync extract http://example.com/images/foobar.caidx /some/other/directory
This extracts the specified .caidx
onto a local directory. This of
course assumes that foobar.caidx
was uploaded to the HTTP server in
the first place, along with the chunk store. You can use any command
you like to accomplish that, for example scp
or
rsync
. Alternatively, you can let casync
do this directly when
generating the chunk index:
$ casync make ssh.example.com:images/foobar.caidx /some/directory
This will use ssh to connect to the ssh.example.com
server, and then
places the .caidx
file and the chunks on it. Note that this mode of
operation is “smart”: this scheme will only upload chunks currently
missing on the server side, and not retransmit what already is
available.
Note that you can always configure the precise path or URL of the
chunk store via the --store=
option. If you do not do that, then the
store path is automatically derived from the path or URL: the last
component of the path or URL is replaced by default.castr
.
Of course, when extracting .caidx
or .caibx
files from remote sources,
using a local seed is advisable:
$ casync extract http://example.com/images/foobar.caidx --seed=/some/exising/directory /some/other/directory
Or on the block layer:
$ casync extract http://example.com/images/foobar.caibx --seed=/dev/sda1 /dev/sdb2
When creating chunk indexes on the file system layer casync
will by
default store metadata as accurately as possible. Let’s create a chunk
index with reduced metadata:
$ casync make foobar.caidx --with=sec-time --with=symlinks --with=read-only /some/dir
This command will create a chunk index for a file tree serialization
that has three features above the absolute baseline supported: 1s
granularity timestamps, symbolic links and a single read-only bit. In
this mode, all the other metadata bits are not stored, including
nanosecond timestamps, full unix permission bits, file ownership or
even ACLs or extended attributes.
Now let’s make a .caidx
file available locally as a mounted file
system, without extracting it:
$ casync mount http://example.comf/images/foobar.caidx /mnt/foobar
And similar, let’s make a .caibx
file available locally as a block device:
$ casync mkdev http://example.comf/images/foobar.caidx
This will create a block device in /dev
and print the used device
node path to STDOUT.
As mentioned, casync
is big about reproducability. Let’s make use of
that to calculate the a digest identifying a very specific version of
a file tree:
$ casync digest .
This digest will include all metadata bits casync
and the underlying
file system know about. Usually, to make this useful you want to
configure exactly what metadata to include:
$ casync digest --with=unix .
This makes use of the --with=unix
shortcut for selecting metadata
fields. Specifying --with-unix=
selects all metadata that
traditional UNIX file systems support. It is a shortcut for writing out:
--with=16bit-uids --with=permissions --with=sec-time --with=symlinks
.
--with=device-nodes --with=fifos --with=sockets
Note that when calculating digests or creating chunk indexes you may
also use the negative --without=
option to remove specific features
but start from the most precise:
$ casync digest --without=flag-immutable
This generates a digest with the most accurate metadata, but leaves
one feature out: chattr(1)‘s
immutable (+i
) file flag.
To list the contents of a .caidx
file use a command like the following:
$ casync list http://example.com/images/foobar.caidx
or
$ casync mtree http://example.com/images/foobar.caidx
The former command will generate a brief list of files and
directories, not too different from tar t
or ls -al
in its
output. The latter command will generate a BSD
mtree(5) compatible
manifest. Note that casync
actually stores substantially more file
metadata than mtree
files can express, though.
What casync isn’t
casync
is not an attempt to minimize serialization and downloaded
deltas to the extreme. Instead, the tool is supposed to find a good
middle ground, that is good on traffic and disk space, but not at the
price of convenience or requiring explicit revision control. If you
care about updates that are absolutely minimal, there are binary delta
systems around that might be an option for you, such as Google’s
Courgette.casync
is not a replacement forrsync
, orgit
orzsync
or
anything like that. They have very different usecases and
semantics. For example,rsync
permits you to directly synchronize two
file trees remotely.casync
just cannot do that, and it is unlikely
it every will.
Where next?
casync
is supposed to be a generic synchronization tool. Its primary
focus for now is delivery of OS images, but I’d like to make it useful
for a couple other usecases, too. Specifically:
To make the tool useful for backups, encryption is missing. I have
pretty concrete plans how to add that. When implemented, the tool
would might become an alternative to
restic
or
tarsnap
.Right now, if you want to deploy
casync
in real-life, you still
need to validate the downloaded.caidx
or.caibx
file yourself, for
example with somegpg
signature. It is my intention to integrate with
gpg
in a minimal way so that signing and verifying chunk index files
is done automatically.In the longer run, I’d like to build an automatic synchronizer for
$HOME
between systems from this. Each$HOME
instance would be
stored automatically in regular intervals in the cloud using casync,
and conflicts would be resolved locally.casync
is written in a shared library style, but it is not yet
built as one. Specifically this means that almost all ofcasync
‘s
functionality is supposed to be available as C API soon, and
applications can processcasync
files on every level. It is my
intention to make this library useful enough so that it will be easy
to write a module for GNOME’sgvfs
subsystem in order to make remote
or local.caidx
files directly available to applications (as an
alternative tocasync mount
). In fact the idea is to make this all
flexible enough that even the remoting backends can be replaced
easily, for example to replacecasync
‘s default HTTP/HTTPS backends
built on CURL with GNOME’s own HTTP implementation, in order to share
cookies, certificates, … There’s also an alternative method to
integrate withcasync
in place already: simply invokecasync
as a
subprocess.casync
will inform you about a certain set of state
changes using a mechanism compatible with
sd_notify(3). In
future it will also propagate progress data this way and more.I intend to a add a new seeding back-end that sources chunks from
the local network. After downloading the new.caidx
file off the
Internetcasync
would then search for the listed chunks on the local
network first before retrieving them from the Internet. This should
speed things up on all installations that have multiple similar
systems deployed in the same network.
Further plans are listed tersely in the
TODO file.
FAQ:
Is this a systemd project? —
casync
is hosted under the
github systemd umbrella, and the
projects share the same coding style. However, the codebases are
distinct and without interdependencies, andcasync
works fine both
on systemd systems and systems without it.Is
casync
portable? — At the moment: no. I only run Linux and
that’s what I code for. That said, I am open to accepting portability
patches (unlike for systemd, which doesn’t really make sense on
non-Linux systems), as long as they don’t interfere too much with the
waycasync
works. Specifically this means that I am not too
enthusiastic about merging portability patches for OSes lacking the
openat(2) family
of APIs.Does
casync
require reflink-capable file systems to work, such
asbtrfs
? No it doesn’t. The reflink magic incasync
is
employed when the file system permits it, and it’s good to have it,
but it’s not a requirement, andcasync
will implicitly fall back to
copying when it isn’t available. Note thatcasync
supports a number
of file system features on a variety of file systems that aren’t
available everywhere, for example FAT’s system/hidden file flags or
xfs
‘sprojinherit
file flag.Is
casync
stable? — I just tagged the first, initial
release. While I have been working on it since quite some time and it
is quite featureful, this is the first time I advertise it publicly,
and it hence received very little testing outside of its own test
suite. I am also not fully ready to commit to the stability of the
current serialization or chunk index format. I don’t see any breakages
coming for it though.casync
is pretty light on documentation right
now, and does not even have a man page. I also intend to correct that
soon.Are the
.caidx
/.caibx
and.catar
file formats open and
documented? —casync
is Open Source, so if you want to know the
precise format, have a look at the sources for now. It’s definitely my
intention to add comprehensive docs for both formats however. Don’t
forget this is just the initial version right now.casync
is just like$SOMEOTHERTOOL
! Why are you reinventing
the wheel (again)? — Well, becausecasync
isn’t “just like” some
other tool. I am pretty sure I did my homework, and that there is no
tool just likecasync
right now. The tools coming closest are probably
rsync
,zsync
,tarsnap
,restic
, but they are quite different beasts
each.Why did you invent your own serialization format for file trees?
Why don’t you just usetar
? That’s a good question, and other
systems — most prominentlytarsnap
— do that. However, as mentioned
abovetar
doesn’t enforce reproducability. It also doesn’t really do
random access: if you want to access some specific file you need to
read every single byte stored before it in thetar
archive to find
it, which is of course very expensive. The serializationcasync
implements places a focus on reproducability, random access, and
metadata control. Much like traditionaltar
it can still be
generated and extracted in a stream fashion though.Does
casync
save/restore SELinux/SMACK file labels? At the
moment not. That’s not because I wouldn’t want it to, but simply
because I am not a guru of either of these systems, and didn’t want to
implement something I do not fully grok nor can test. If you look at
the sources you’ll find that there’s already some definitions in place
that keep room for them though. I’d be delighted to accept a patch
implementing this fully.What about delivering
squashfs
images? How well does chunking
work on compressed serializations? – That’s a very good point!
Usually, if you apply the a chunking algorithm to a compressed data
stream (let’s say atar.gz
file), then changing a single bit at the
front will propagate into the entire remainder of the file, so that
minimal changes will explode into major changes. Thankfully this
doesn’t apply that strictly tosquashfs
images, as it provides
random access to files and directories and thus breaks up the
compression streams in regular intervals to make seeking easy. This
fact is beneficial for systems employing chunking, such ascasync
as
this means single bit changes might affect their vicinity but will not
explode in an unbounded fashion. In order achieve best results when
deliveringsquashfs
images throughcasync
the block sizes of
squashfs
and the chunks sizes ofcasync
should be matched up (using
casync
‘s--chunk-size=
option). How precisely to choose both
values is left to reasearch by the user, for now.What does the name
casync
mean? – It’s a synchronizing
tool, hence the-sync
suffix, followingrsync
‘s naming. It makes
use of the content-addressable concept ofgit
hence theca-
prefix.Where can I get this stuff? Is it already packaged? – Check
out the sources on GitHub. I
just tagged the first
version. Martin
Pitt has packagedcasync
for
Ubuntu. There
is also an ArchLinux
package.
Should you care? Is this a tool for you?
Well, that’s up to you really. If you are involved with projects that
need to deliver IoT, VM, container, application or OS images, then
maybe this is a great tool for you — but other options exist, some of
which are linked above.
Note that casync
is an Open Source project: if it doesn’t do exactly
what you need, prepare a patch that adds what you need, and we’ll
consider it.
If you are interested in the project and would like to talk about this
in person, I’ll be presenting casync
soon at Kinvolk’s Linux
Technologies
Meetup
in Berlin, Germany. You are invited. I also intend to talk about it at
All Systems Go!, also in Berlin.
All Systems Go! 2017 CfP Open
Post Syndicated from Lennart Poettering original http://0pointer.net/blog/all-systems-go-2017-cfp-open.html
We’d like to invite presentation proposals for All Systems Go! 2017!
All Systems Go! is an Open Source community conference focused on the projects and technologies at the foundation of modern Linux systems — specifically low-level user-space technologies. Its goal is to provide a friendly and collaborative gathering place for individuals and communities working to push these technologies forward.
All Systems Go! 2017 takes place in Berlin, Germany on October 21st+22nd.
All Systems Go! is a 2-day event with 2-3 talks happening in parallel. Full presentation slots are 30-45 minutes in length and lightning talk slots are 5-10 minutes.
We are now accepting submissions for presentation proposals. In particular, we are looking for sessions including, but not limited to, the following topics:
- Low-level container executors and infrastructure
- IoT and embedded OS infrastructure
- OS, container, IoT image delivery and updating
- Building Linux devices and applications
- Low-level desktop technologies
- Networking
- System and service management
- Tracing and performance measuring
- IPC and RPC systems
- Security and Sandboxing
While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome too, as long as they have a clear and direct relevance for user-space.
Please submit your proposals by September 3rd. Notification of acceptance will be sent out 1-2 weeks later.
To submit your proposal now please visit our CFP submission web site.
For further information about All Systems Go! visit our conference web site.
systemd.conf will not take place this year in lieu of All Systems Go!. All Systems Go! welcomes all projects that contribute to Linux user space, which, of course, includes systemd. Thus, anything you think was appropriate for submission to systemd.conf is also fitting for All Systems Go!