Tag Archives: trap

Grafana Labs and Intel partner on Grafana and Snap

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2016/04/11/grafana-labs-and-intel-partner-on-grafana-and-snap/

At Grafana Labs, we’re building an OpenSaaS monitoring platform around Grafana.
We want to create a powerful yet turnkey experience; one that offers the best of both the SaaS and open source worlds.
Our sense is that monitoring has gotten too expensive, and creates unnecessary silos. Users have lost control of their data destiny, and are trapped by inflexible solutions and spiraling costs.
We think that there’s a better way forward.

Case 224: Unsupported Accusations

Post Syndicated from The Codeless Code original http://thecodelesscode.com/case/224

While passing by the temple’s Support Desk, the nun
Hwídah heard of strange behavior in a certain
application. Since she had been appointed by master
Banzen to assist with production issues, the nun
dutifully described the symptoms to the application’s senior

“Occasionally a user will return to a record they had
previously edited, only to discover that some information is
missing,” said Hwídah. “The behavior is not repeatable, and
the users confess that they may be imagining things.”

“I have heard these reports,” said the senior monk. “There is
no bug in the code that I can see, nor can we reproduce the
problem in a lower environment.”

“Still, it may be prudent to investigate further,” said the

The monk sighed. “We are all exceedingly busy. Only a few
users have reported this issue, and even they doubt
themselves. So far, all are content to simply re-enter the
‘missing’ information and continue about their business.
Can you offer me one shred of evidence that this is anything
more than user error?”

The nun shook her head, bowed, and departed.

- - -

That night, the senior monk was awoken from his sleep by a
squeaking under his bed, of the sort a mouse might make.
This sound continued throughout the night—sometimes in
one place, sometimes another, presumably as the intruder
wandered about in search of food. A sandal flung in the
direction of the sound resulted in immediate quiet, but
eventually the squeaking would begin again in a different
part of the room.

“This is doubtless some lesson that the meddlesome Hwídah
wishes to teach me,” he complained to his fellows the next
day, dark circles under his eyes. “Yet I will not be
bullied into chasing nonexistent bugs. If the nun is so
annoyed by the squeaking of our users, let her deal with

The monk set mousetraps in the corners and equipped himself
with a pair of earplugs. Thus he passed the next night, and
the night after, though his sleep was less restful than he
would have liked.

On the seventh night, the exhausted monk turned off the
light and fell hard upon his bed. There was a loud CRACK
and the monk found himself tumbling through space. With a
CRASH he bounced off his mattress and rolled onto a cold
stone floor. His bed had, apparently, fallen through the
floor into the basement.

Perched high on a ladder—just outside the gaping hole in
the basement’s wooden ceiling—was the nun Hwídah, her
face lit only by a single candle hanging nearby. She
descended and dropped an old brace-and-bit hand drill into
the monk’s lap. Then she crouched down next to his ear.

“If you don’t understand it, it’s dangerous,” whispered the

Amazon SES Best Practices: Top 5 Best Practices for List Management

Post Syndicated from Manuel Basurto original https://aws.amazon.com/blogs/ses/amazon-ses-best-practices-top-5-best-practices-for-list-management/

If you are an Amazon SES customer, you probably know that in addition to managing your email campaigns, you need to be mindful of your reputation as an email sender.

Maintaining a good reputation as a sender is vital if you rely on email delivery as part of your business – if you have a good reputation, your emails have the best chance of arriving in your recipients’ inboxes. If your reputation is questionable, your messages could get dropped or banished to the spam folder. Recipient ISPs may also decide to throttle your email, preventing you from delivering emails to your recipients on time.

This blog post provides five best practices to help you keep your email-sending reputation and deliverability high by focusing on the source of most deliverability problems: list acquisition and management.

Without further ado, here are our Top 5 Best Practices for List Management:

1. Use confirmed opt-in (a.k.a. double opt-in or the gold standard).

The principle behind this is simple – when a user enters an email address on your website, you need to verify that the address is legitimate before you add it to the mailing list you use for your regular campaigns. To this end, you send a verification email to the address and ask the subscriber to click a link in the email, which will then enable the account. By clicking on this link, the email address owner is confirming that they are willing to receive the email notifications they signed up for on your website. The benefits of this practice are evident:

  • You will not send to an email address more than once (or a few times, if the customer requests a second verification email). If the address is fake (or a typo) and the email is sent to someone who doesn’t want to hear from you, then you are less likely to get a complaint from this person because they will only get one email.
  • Since your actual mail campaigns are only going to addresses you have verified, then you know that you are making good use of your resources and that your campaigns are actually appreciated by the recipients.

2. Process bounces and complaints.

SES provides feedback on bounces and complaints through SNS (or email) to make it easy for you to be alerted of addresses that bounce or recipients who complain. If you get a hard bounce or a complaint, you should remove that email address from your list. You should also identify the root cause of the bounces and complaints. For example, say that you notice that your bounce rate for new subscriptions is rising. This could be an indicator that people are signing up for your service using fake email addresses. While it is not unusual for someone to sign up using a fake email address, you need to make sure that you are not encouraging your customers to do so. One way in which you could be encouraging customers to do this is by giving away free stuff without asking for a confirmed opt-in. If you are in this situation, you need to change the incentive that drives customers to sign up using fake addresses: either remove the gifts or implement confirmed opt-in (there is a reason we call this the gold standard J).

3. Remove non-engagers.

You need to operate under the assumption that if a customer is not opening or clicking your email, then they are not interested in what you’re sending. Define a timeframe that makes sense to your business, and if a recipient doesn’t interact with your mail within that timeframe, stop emailing them. This tactic is a great complement to double opt-in and should be standard for any email sender. Regardless of whether a customer originally opted in through double opt-in or just a regular signup, an email address can go stale and become a spamtrap.  Spamtraps are silent reputation traps, which means that you will get no indication that you are hitting them – removing non-engagers is the only way to avoid them. They are used by many organizations to measure a sender’s reputation, and particularly how well the sender is measuring engagement. If you continue to email spamtraps, your mail could end up in the spam folder, your domain could be blacklisted, and SES could suspend your service.

4. Make it easy for your recipients to unsubscribe.

If you are sending bulk email (as opposed to mail that is the result of a transaction), then you need to make it easy for customers to opt out of the mail. Include an easy-to-spot opt-out link in every bulk email, and use the list-unsubscribe header for easy integration with ISPs who support it. If a customer does not want the mail, you should not send it to them. Sending email to an unwilling recipient will do more harm than good. In many locations, including the US, Canada, and much of Europe and Asia, enabling recipients to easily opt out of your email is a legal requirement.

5. Keep your mailing lists independent.

If you operate more than one website, you should never mix your subscriber lists. Customers who sign up for website A should never (under any circumstance) receive an email from website B, unless they sign up for that one too. The reason is simple: These customers have only agreed to receive email from website A. Furthermore, if your customers get mail from a website unknown to them, they are likely to mark that mail as spam, thus hurting your email reputation.

Never forget: Your email campaigns are only as good as their ability to reach your customers, and following best practices can be the difference between a delivered and a dropped email. While the above best practices should help you, list management is only a part of the equation – the quality of your content also plays a big role in your ability to deliver email. Nevertheless, we hope our recommendations in this post will prove useful in your email endeavors.

If you have questions, feel free to let us know via the SES Forums or in the comment section of this blog.

systemd for Administrators, Part XX

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/socket-activated-containers.html

This is no time for procrastination,
already the twentieth

my ongoing series

Socket Activated Internet Services and OS Containers

is an important feature of systemd. When
we first
systemd we already tried to make the point how great
socket activation is for increasing parallelization and robustness of
socket services, but also for simplifying the dependency logic of the
boot. In this episode I’d like to explain why socket activation is an
important tool for drastically improving how many services and even
containers you can run on a single system with the same resource
usage. Or in other words, how you can drive up the density of customer
sites on a system while spending less on new hardware.

Socket Activated Internet Services

First, let’s take a step back. What was socket activation again? —
Basically, socket activation simply means that systemd sets up
listening sockets (IP or otherwise) on behalf of your services
(without these running yet), and then starts (activates) the
services as soon as the first connection comes in. Depending on the
technology the services might idle for a while after having processed
the connection and possible follow-up connections before they exit on
their own, so that systemd will again listen on the sockets and
activate the services again the next time they are connected to. For
the client it is not visible whether the service it is interested in
is currently running or not. The service’s IP socket stays continously
connectable, no connection attempt ever fails, and all connects will
be processed promptly.

A setup like this lowers resource usage: as services are only
running when needed they only consume resources when required. Many
internet sites and services can benefit from that. For example, web
site hosters will have noticed that of the multitude of web sites that
are on the Internet only a tiny fraction gets a continous stream of
requests: the huge majority of web sites still needs to be available
all the time but gets requests only very unfrequently. With a scheme
like socket activation you take benefit of this. By hosting many of
these sites on a single system like this and only activating their
services as necessary allows a large degree of over-commit: you can
run more sites on your system than the available resources actually
allow. Of course, one shouldn’t over-commit too much to avoid
contention during peak times.

Socket activation like this is easy to use in systemd. Many modern
Internet daemons already support socket activation out of the box (and
for those which don’t yet it’s not
to add). Together with systemd’s instantiated
units support
it is easy to write a pair of service and socket
templates that then may be instantiated multiple times, once for each
site. Then, (optionally) make use of some of the security
of systemd to nicely isolate the customer’s site’s
services from each other (think: each customer’s service should only
see the home directory of the customer, everybody else’s directories
should be invisible), and there you go: you now have a highly scalable
and reliable server system, that serves a maximum of securely
sandboxed services at a minimum of resources, and all nicely done with
built-in technology of your OS.

This kind of setup is already in production use in a number of
companies. For example, the great folks at Pantheon are running their
scalable instant Drupal system on a setup that is similar to this. (In
fact, Pantheon’s David Strauss pioneered this scheme. David, you

Socket Activated OS Containers

All of the above can already be done with older versions of
systemd. If you use a distribution that is based on systemd, you can
right-away set up a system like the one explained above. But let’s
take this one step further. With systemd 197 (to be included in Fedora
19), we added support for socket activating not only individual
services, but entire OS containers. And I really have to say it
at this point: this is stuff I am really excited
about. 😉

Basically, with socket activated OS containers, the host’s systemd
instance will listen on a number of ports on behalf of a container,
for example one for SSH, one for web and one for the database, and as
soon as the first connection comes in, it will spawn the container
this is intended for, and pass to it all three sockets. Inside of the
container, another systemd is running and will accept the sockets and
then distribute them further, to the services running inside the
container using normal socket activation. The SSH, web and database
services will only see the inside of the container, even though they
have been activated by sockets that were originally created on the
host! Again, to the clients this all is not visible. That an entire OS
container is spawned, triggered by simple network connection is entirely
transparent to the client side.[1]

The OS containers may contain (as the name suggests) a full
operating system, that might even be a different distribution than is
running on the host. For example, you could run your host on Fedora,
but run a number of Debian containers inside of it. The OS containers
will have their own systemd init system, their own SSH instances,
their own process tree, and so on, but will share a number of other
facilities (such as memory management) with the host.

For now, only systemd’s own trivial container manager, systemd-nspawn
has been updated to support this kind of socket activation. We hope
that libvirt-lxc will
soon gain similar functionality. At this point, let’s see in more
detail how such a setup is configured in systemd using nspawn:

First, please use a tool such as debootstrap or yum’s
--installroot to set up a container OS
tree[2]. The details of that are a bit out-of-focus
for this story, there’s plenty of documentation around how to do
this. Of course, make sure you have systemd v197 installed inside
the container. For accessing the container from the command line,
consider using systemd-nspawn
itself. After you configured everything properly, try to boot it up
from the command line with systemd-nspawn’s -b switch.

Assuming you now have a working container that boots up fine, let’s
write a service file for it, to turn the container into a systemd
service on the host you can start and stop. Let’s create
/etc/systemd/system/mycontainer.service on the host:

Description=My little container

ExecStart=/usr/bin/systemd-nspawn -jbD /srv/mycontainer 3

This service can already be started and stopped via systemctl
and systemctl stop. However, there’s no nice way
to actually get a shell prompt inside the container. So let’s add SSH
to it, and even more: let’s configure SSH so that a connection to the
container’s SSH port will socket-activate the entire container. First,
let’s begin with telling the host that it shall now listen on the SSH
port of the container. Let’s create
/etc/systemd/system/mycontainer.socket on the host:

Description=The SSH socket of my little container


If we start this unit with systemctl start on the host
then it will listen on port 23, and as soon as a connection comes in
it will activate our container service we defined above. We pick port
23 here, instead of the usual 22, as our host’s SSH is already
listening on that. nspawn virtualizes the process list and the file
system tree, but does not actually virtualize the network stack, hence
we just pick different ports for the host and the various containers

Of course, the system inside the container doesn’t yet know what to
do with the socket it gets passed due to socket activation. If you’d
now try to connect to the port, the container would start-up but the
incoming connection would be immediately closed since the container
can’t handle it yet. Let’s fix that!

All that’s necessary for that is teach SSH inside the container
socket activation. For that let’s simply write a pair of socket and
service units for SSH. Let’s create
/etc/systemd/system/sshd.socket in the container:

Description=SSH Socket for Per-Connection Servers


Then, let’s add the matching SSH service file
/etc/systemd/system/[email protected] in the container:

Description=SSH Per-Connection Server for %I

ExecStart=-/usr/sbin/sshd -i

Then, make sure to hook sshd.socket into the
sockets.target so that unit is started automatically when the
container boots up:

ln -s /etc/systemd/system/sshd.socket /etc/systemd/system/sockets.target.wants/

And that’s it. If we now activate mycontainer.socket on
the host, the host’s systemd will bind the socket and we can connect
to it. If we do this, the host’s systemd will activate the container,
and pass the socket in to it. The container’s systemd will then take
the socket, match it up with sshd.socket inside the
container. As there’s still our incoming connection queued on it, it
will then immediately trigger an instance of [email protected],
and we’ll have our login.

And that’s already everything there is to it. You can easily add
additional sockets to listen on to
mycontainer.socket. Everything listed therein will be passed
to the container on activation, and will be matched up as good as
possible with all socket units configured inside the
container. Sockets that cannot be matched up will be closed, and
sockets that aren’t passed in but are configured for listening will be
bound be the container’s systemd instance.

So, let’s take a step back again. What did we gain through all of
this? Well, basically, we can now offer a number of full OS containers
on a single host, and the containers can offer their services without
running continously. The density of OS containers on the host can
hence be increased drastically.

Of course, this only works for kernel-based virtualization, not for
hardware virtualization. i.e. something like this can only be
implemented on systems such as libvirt-lxc or nspawn, but not in

If you have a number of containers set up like this, here’s one
cool thing the journal allows you to do. If you pass -m to
journalctl on the host, it will automatically discover the
journals of all local containers and interleave them on
display. Nifty, eh?

With systemd 197 you have everything to set up your own socket
activated OS containers on-board. However, there are a couple of
improvements we’re likely to add soon: for example, right now even if
all services inside the container exit on idle, the container still
will stay around, and we really should make it exit on idle too, if
all its services exited and no logins are around. As it turns out we
already have much of the infrastructure for this around: we can reuse
the auto-suspend functionality we added for laptops: detecting when a
laptop is idle and suspending it then is a very similar problem to
detecting when a container is idle and shutting it down then.

Anyway, this blog story is already way too long. I hope I haven’t
lost you half-way already with all this talk of virtualization,
sockets, services, different OSes and stuff. I hope this blog story is
a good starting point for setting up powerful highly scalable server
systems. If you want to know more, consult the documentation and drop
by our IRC channel. Thank you!


[1] And BTW, this
is another reason
why fast boot times the way systemd offers them
are actually a really good thing on servers, too.

[2] To make it easy: you need a command line such as yum
--releasever=19 --nogpg --installroot=/srv/mycontainer/ --disablerepo='*'
--enablerepo=fedora install systemd passwd yum fedora-release vim-minimal

to install Fedora, and debootstrap --arch=amd64 unstable
to install Debian. Also see the bottom of systemd-nspawn(1).
Also note that auditing is currently broken for containers, and if enabled in
the kernel will cause all kinds of errors in the container. Use
audit=0 on the host’s kernel command line to turn it off.

Why systemd?

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/why.html

systemd is
still a young project, but it is not a baby anymore. The initial
I posted precisely a year ago. Since then most of the
big distributions have decided to adopt it in one way or another, many
smaller distributions have already switched. The first big
distribution with systemd by default will be Fedora 15, due end of
May. It is expected that the others will follow the lead a bit later
(with one exception). Many
embedded developers have already adopted it too, and there’s even a company specializing on engineering and
consulting services for systemd
. In short: within one year
systemd became a really successful project.

However, there are still folks who we haven’t won over yet. If you
fall into one of the following categories, then please have a look on
the comparison of init systems below:

  • You are working on an embedded project and are wondering whether
    it should be based on systemd.
  • You are a user or administrator and wondering which distribution
    to pick, and are pondering whether it should be based on systemd or
  • You are a user or administrator and wondering why your favourite
    distribution has switched to systemd, if everything already worked so
    well before.
  • You are developing a distribution that hasn’t switched yet, and
    you are wondering whether to invest the work and go systemd.

And even if you don’t fall into any of these categories, you might still
find the comparison interesting.

We’ll be comparing the three most relevant init systems for Linux:
sysvinit, Upstart and systemd. Of course there are other init systems
in existance, but they play virtually no role in the big
picture. Unless you run Android (which is a completely different beast
anyway), you’ll almost definitely run one of these three init systems
on your Linux kernel. (OK, or busybox, but then you are basically not
running any init system at all.) Unless you have a soft spot for
exotic init systems there’s little need to look further. Also, I am
kinda lazy, and don’t want to spend the time on analyzing those other
systems in enough detail to be completely fair to them.

Speaking of fairness: I am of course one of the creators of
systemd. I will try my best to be fair to the other two contenders,
but in the end, take it with a grain of salt. I am sure though that
should I be grossly unfair or otherwise incorrect somebody will point
it out in the comments of this story, so consider having a look on
those, before you put too much trust in what I say.

We’ll look at the currently implemented features in a released
version. Grand plans don’t count.

General Features

Interfacing via D-Busnoyesyes
Shell-free bootupnonoyes
Modular C coded early boot services includednonoyes
Socket-based Activationnono[2]yes
Socket-based Activation: inetd compatibilitynono[2]yes
Bus-based Activationnono[3]yes
Device-based Activationnono[4]yes
Configuration of device dependencies with udev rulesnonoyes
Path-based Activation (inotify)nonoyes
Timer-based Activationnonoyes
Mount handlingnono[5]yes
fsck handlingnono[5]yes
Quota handlingnonoyes
Automount handlingnonoyes
Swap handlingnonoyes
Snapshotting of system statenonoyes
XDG_RUNTIME_DIR Supportnonoyes
Optionally kills remaining processes of users logging outnonoyes
Linux Control Groups Integrationnonoyes
Audit record generation for started servicesnonoyes
SELinux integrationnonoyes
PAM integrationnonoyes
Encrypted hard disk handling (LUKS)nonoyes
SSL Certificate/LUKS Password handling, including Plymouth, Console, wall(1), TTY and GNOME agentsnonoyes
Network Loopback device handlingnonoyes
binfmt_misc handlingnonoyes
System-wide locale handlingnonoyes
Console and keyboard setupnonoyes
Infrastructure for creating, removing, cleaning up of temporary and volatile filesnonoyes
Handling for /proc/sys sysctlnonoyes
Plymouth integrationnoyesyes
Save/restore random seednonoyes
Static loading of kernel modulesnonoyes
Automatic serial console handlingnonoyes
Unique Machine ID handlingnonoyes
Dynamic host name and machine meta data handlingnonoyes
Reliable termination of servicesnonoyes
Early boot /dev/log loggingnonoyes
Minimal kmsg-based syslog daemon for embedded usenonoyes
Respawning on service crash without losing connectivitynonoyes
Gapless service upgradesnonoyes
Graphical UInonoyes
Built-In Profiling and Toolsnonoyes
Instantiated servicesnoyesyes
PolicyKit integrationnonoyes
Remote access/Cluster support built into client toolsnonoyes
Can list all processes of a servicenonoyes
Can identify service of a processnonoyes
Automatic per-service CPU cgroups to even out CPU usage between themnonoyes
Automatic per-user cgroupsnonoyes
SysV compatibilityyesyesyes
SysV services controllable like native servicesyesnoyes
SysV-compatible /dev/initctlyesnoyes
Reexecution with full serialization of stateyesnoyes
Interactive boot-upno[6]no[6]yes
Container support (as advanced chroot() replacement)nonoyes
Dependency-based bootupno[7]noyes
Disabling of services without editing filesyesnoyes
Masking of services without editing filesnonoyes
Robust system shutdown within PID 1nonoyes
Built-in kexec supportnonoyes
Dynamic service generationnonoyes
Upstream support in various other OS componentsyesnoyes
Service files compatible between distributionsnonoyes
Signal delivery to servicesnonoyes
Reliable termination of user sessions before shutdownnonoyes
utmp/wtmp supportyesyesyes
Easily writable, extensible and parseable service files, suitable for manipulation with enterprise management toolsnonoyes

[1] Read-Ahead implementation for Upstart available in separate package ureadahead, requires non-standard kernel patch.

[2] Socket activation implementation for Upstart available as preview, lacks parallelization support hence entirely misses the point of socket activation.

[3] Bus activation implementation for Upstart posted as patch, not merged.

[4] udev device event bridge implementation for Upstart available as preview, forwards entire udev database into Upstart, not practical.

[5] Mount handling utility mountall for Upstart available in separate package, covers only boot-time mounts, very limited dependency system.

[6] Some distributions offer this implemented in shell.

[7] LSB init scripts support this, if they are used.

Available Native Service Settings

OOM Adjustmentnoyes[1]yes
Working Directorynoyesyes
Root Directory (chroot())noyesyes
Environment Variablesnoyesyes
Environment Variables from external filenonoyes
Resource Limitsnosome[2]yes
User/Group/Supplementary Groupsnonoyes
IO Scheduling Class/Prioritynonoyes
CPU Scheduling Nice Valuenoyesyes
CPU Scheduling Policy/Prioritynonoyes
CPU Scheduling Reset on fork() controlnonoyes
CPU affinitynonoyes
Timer Slacknonoyes
Capabilities Controlnonoyes
Secure Bits Controlnonoyes
Control Group Controlnonoyes
High-level file system namespace control: making directories inacessiblenonoyes
High-level file system namespace control: making directories read-onlynonoyes
High-level file system namespace control: private /tmpnonoyes
High-level file system namespace control: mount inheritancenonoyes
Input on Consoleyesyesyes
Output on Syslognonoyes
Output on kmsg/dmesgnonoyes
Output on arbitrary TTYnonoyes
Kill signal controlnonoyes
Conditional execution: by identified CPU virtualization/containernonoyes
Conditional execution: by file existancenonoyes
Conditional execution: by security frameworknonoyes
Conditional execution: by kernel command linenonoyes

[1] Upstart supports only the deprecated oom_score_adj mechanism, not the current oom_adj logic.

[2] Upstart lacks support for RLIMIT_RTTIME and RLIMIT_RTPRIO.

Note that some of these options are relatively easily added to SysV
init scripts, by editing the shell sources. The table above focusses
on easily accessible options that do not require source code


Maturity> 15 years6 years1 year
Specialized professional consulting and engineering services availablenonoyes
Copyright-assignment-free contributingyesnoyes


As the tables above hopefully show in all clarity systemd
has left behind both sysvinit and Upstart in almost every
aspect. With the exception of the project’s age/maturity systemd wins
in every category. At this point in time it will be very hard for
sysvinit and Upstart to catch up with the features systemd provides
today. In one year we managed to push systemd forward much further
than Upstart has been pushed in six.

It is our intention to drive forward the development of the Linux
platform with systemd. In the next release cycle we will focus more
strongly on providing the same features and speed improvement we
already offer for the system to the user login session. This will
bring much closer integration with the other parts of the OS and
applications, making the most of the features the service manager
provides, and making it available to login sessions. Certain
components such as ConsoleKit will be made redundant by these
upgrades, and services relying on them will be updated. The
burden for maintaining these then obsolete components
will be passed on the vendors who plan to continue to rely on

If you are wondering whether or not to adopt systemd, then systemd
obviously wins when it comes to mere features. Of course that should
not be the only aspect to keep in mind. In the long run, sticking with
the existing infrastructure (such as ConsoleKit) comes at a price:
porting work needs to take place, and additional maintainance work for
bitrotting code needs to be done. Going it on your own means increased

That said, adopting systemd is also not free. Especially if you
made investments in the other two solutions adopting systemd means
work. The basic work to adopt systemd is relatively minimal for
porting over SysV systems (since compatibility is provided), but can
mean substantial work when coming from Upstart. If you plan to go for
a 100% systemd system without any SysV compatibility (recommended for
embedded, long run goal for the big distributions) you need to be
willing to invest some work to rewrite init scripts as simple systemd
unit files.

systemd is in the process of becoming a comprehensive, integrated
and modular platform providing everything needed to bootstrap and
maintain an operating system’s userspace. It includes C rewrites of
all basic early boot init scripts that are shipped with the various
distributions. Especially for the embedded case adopting systemd
provides you in one step with almost everything you need, and you can
pick the modules you want. The other two init systems are singular
individual components, which to be useful need a great number of
additional components with differing interfaces. The emphasis of
systemd to provide a platform instead of just a component allows for
closer integration, and cleaner APIs. Sooner or later this will
trickle up to the applications. Already, there are accepted XDG
specifications (e.g. XDG basedir spec, more specifically
XDG_RUNTIME_DIR) that are not supported on the other init systems.

systemd is also a big opportunity for Linux standardization. Since
it standardizes many interfaces of the system that previously have
been differing on every distribution, on every implementation,
adopting it helps to work against the balkanization of the Linux
interfaces. Choosing systemd means redefining more closely
what the Linux platform is about. This improves the lifes of
programmers, users and administrators alike.

I believe that momentum is clearly with systemd. We invite you to
join our community and be part of that momentum.

systemd for Administrators, Part VI

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/changing-roots.html

Here’s another installment of
systemd for Administrators:

Changing Roots

As administrator or developer sooner or later you’ll ecounter chroot()
. The chroot() system call simply shifts what
a process and all its children consider the root directory /, thus
limiting what the process can see of the file hierarchy to a subtree
of it. Primarily chroot() environments have two uses:

  1. For security purposes: In this use a specific isolated daemon is
    chroot()ed into a private subdirectory, so that when exploited the
    attacker can see only the subdirectory instead of the full OS
    hierarchy: he is trapped inside the chroot() jail.
  2. To set up and control a debugging, testing, building, installation
    or recovery image of an OS: For this a whole guest operating
    system hierarchy is mounted or bootstraped into a subdirectory of the
    host OS, and then a shell (or some other application) is started
    inside it, with this subdirectory turned into its /. To the shell it
    appears as if it was running inside a system that can differ greatly
    from the host OS. For example, it might run a different distribution
    or even a different architecture (Example: host x86_64, guest
    i386). The full hierarchy of the host OS it cannot see.

On a classic System-V-based operating system it is relatively easy
to use chroot() environments. For example, to start a specific daemon
for test or other reasons inside a chroot()-based guest OS tree, mount
/proc, /sys and a few other API file systems into
the tree, and then use chroot(1) to enter the chroot, and
finally run the SysV init script via /sbin/service from
inside the chroot.

On a systemd-based OS things are not that easy anymore. One of the
big advantages of systemd is that all daemons are guaranteed to be
invoked in a completely clean and independent context which is in no
way related to the context of the user asking for the service to be
started. While in sysvinit-based systems a large part of the execution
context (like resource limits, environment variables and suchlike) is
inherited from the user shell invoking the init skript, in systemd the
user just notifies the init daemon, and the init daemon will then fork
off the daemon in a sane, well-defined and pristine execution context
and no inheritance of the user context parameters takes place. While
this is a formidable feature it actually breaks traditional approaches
to invoke a service inside a chroot() environment: since the actual
daemon is always spawned off PID 1 and thus inherits the chroot()
settings from it, it is irrelevant whether the client which asked for
the daemon to start is chroot()ed or not. On top of that, since
systemd actually places its local communications sockets in
/run/systemd a process in a chroot() environment will not even
be able to talk to the init system (which however is probably a good thing, and the
daring can work around this of course by making use of bind

This of course opens the question how to use chroot()s properly in
a systemd environment. And here’s what we came up with for you, which
hopefully answers this question thoroughly and comprehensively:

Let’s cover the first usecase first: locking a daemon into a
chroot() jail for security purposes. To begin with, chroot() as a
security tool is actually quite dubious, since chroot() is not a
one-way street. It is relatively easy to escape a chroot()
environment, as even the
man page points out
. Only in combination with a few other
techniques it can be made somewhat secure. Due to that it usually
requires specific support in the applications to chroot() themselves
in a tamper-proof way. On top of that it usually requires a deep
understanding of the chroot()ed service to set up the chroot()
environment properly, for example to know which directories to bind mount from
the host tree, in order to make available all communication channels
in the chroot() the service actually needs. Putting this together,
chroot()ing software for security purposes is almost always done best
in the C code of the daemon itself. The developer knows best (or at
least should know best) how to properly secure down the
chroot(), and what the minimal set of files, file systems and
directories is the daemon will need inside the chroot(). These days a
number of daemons are capable of doing this, unfortunately however of
those running by default on a normal Fedora installation only two are
doing this: Avahi and
RealtimeKit. Both apparently written by the same really smart
dude. Chapeau! 😉 (Verify this easily by running ls -l
on your system.)

That all said, systemd of course does offer you a way to chroot()
specific daemons and manage them like any other with the usual
tools. This is supported via the RootDirectory= option in
systemd service files. Here’s an example:

Description=A chroot()ed Service


In this example, RootDirectory= configures where to
chroot() to before invoking the daemon binary specified with
ExecStart=. Note that the path specified in
ExecStart= needs to refer to the binary inside the chroot(),
it is not a path to the binary in the host tree (i.e. in this example
the binary executed is seen as
/srv/chroot/foobar/usr/bin/foobard from the host OS). Before
the daemon is started a shell script setup-foobar-chroot.sh
is invoked, whose purpose it is to set up the chroot environment as
necessary, i.e. mount /proc and similar file systems into it,
depending on what the service might need. With the
RootDirectoryStartOnly= switch we ensure that only the daemon
as specified in ExecStart= is chrooted, but not the
ExecStartPre= script which needs to have access to the full
OS hierarchy so that it can bind mount directories from there. (For
more information on these switches see the respective man
If you place a unit file like this in
/etc/systemd/system/foobar.service you can start your
chroot()ed service by typing systemctl start
. You may then introspect it with systemctl
status foobar.service
. It is accessible to the administrator like
any other service, the fact that it is chroot()ed does — unlike on
SysV — not alter how your monitoring and control tools interact with

Newer Linux kernels support file system namespaces. These are
similar to chroot() but a lot more powerful, and they do not
suffer by the same security problems as chroot(). systemd
exposes a subset of what you can do with file system namespaces right
in the unit files themselves. Often these are a useful and simpler
alternative to setting up full chroot() environment in a
subdirectory. With the switches ReadOnlyDirectories= and
InaccessibleDirectories= you may setup a file system
namespace jail for your service. Initially, it will be identical to
your host OS’ file system namespace. By listing directories in these
directives you may then mark certain directories or mount points of
the host OS as read-only or even completely inaccessible to the
daemon. Example:

Description=A Service With No Access to /home


This service will have access to the entire file system tree of the
host OS with one exception: /home will not be visible to it, thus
protecting the user’s data from potential exploiters. (See the
man page for details on these options.

File system namespaces are in fact a better replacement for
chroot()s in many many ways. Eventually Avahi and RealtimeKit
should probably be updated to make use of namespaces replacing

So much about the security usecase. Now, let’s look at the other
use case: setting up and controlling OS images for debugging, testing,
building, installing or recovering.

chroot() environments are relatively simple things: they only
virtualize the file system hierarchy. By chroot()ing into a
subdirectory a process still has complete access to all system calls,
can kill all processes and shares about everything else with the host
it is running on. To run an OS (or a small part of an OS) inside a
chroot() is hence a dangerous affair: the isolation between host and
guest is limited to the file system, everything else can be freely
accessed from inside the chroot(). For example, if you upgrade a
distribution inside a chroot(), and the package scripts send a SIGTERM
to PID 1 to trigger a reexecution of the init system, this will
actually take place in the host OS! On top of that, SysV shared
memory, abstract namespace sockets and other IPC primitives are shared
between host and guest. While a completely secure isolation for
testing, debugging, building, installing or recovering an OS is
probably not necessary, a basic isolation to avoid accidental
modifications of the host OS from inside the chroot() environment is
desirable: you never know what code package scripts execute which
might interfere with the host OS.

To deal with chroot() setups for this use systemd offers you a
couple of features:

First of all, systemctl detects when it is run in a
chroot. If so, most of its operations will become NOPs, with the
exception of systemctl enable and systemctl
. If a package installation script hence calls these two
commands, services will be enabled in the guest OS. However, should a
package installation script include a command like systemctl
as part of the package upgrade process this will have no
effect at all when run in a chroot() environment.

More importantly however systemd comes out-of-the-box with the systemd-nspawn
tool which acts as chroot(1) on steroids: it makes use of file system
and PID namespaces to boot a simple lightweight container on a file
system tree. It can be used almost like chroot(1), except that the
isolation from the host OS is much more complete, a lot more secure
and even easier to use. In fact, systemd-nspawn is capable of
booting a complete systemd or sysvinit OS in container with a single
command. Since it virtualizes PIDs, the init system in the container
can act as PID 1 and thus do its job as normal. In contrast to
chroot(1) this tool will implicitly mount /proc,
/sys for you.

Here’s an example how in three commands you can boot a Debian OS on
your Fedora machine inside an nspawn container:

# yum install debootstrap
# debootstrap --arch=amd64 unstable debian-tree/
# systemd-nspawn -D debian-tree/

This will bootstrap the OS directory tree and then simply invoke a
shell in it. If you want to boot a full system in the container, use a
command like this:

# systemd-nspawn -D debian-tree/ /sbin/init

And after a quick bootup you should have a shell prompt, inside a
complete OS, booted in your container. The container will not be able
to see any of the processes outside of it. It will share the network
configuration, but not be able to modify it. (Expect a couple of
EPERMs during boot for that, which however should not be
fatal). Directories like /sys and /proc/sys are
available in the container, but mounted read-only in order to avoid
that the container can modify kernel or hardware configuration. Note
however that this protects the host OS only from accidental
changes of its parameters. A process in the container can manually
remount the file systems read-writeable and then change whatever it
wants to change.

So, what’s so great about systemd-nspawn again?

  1. It’s really easy to use. No need to manually mount /proc
    and /sys into your chroot() environment. The tool will do it
    for you and the kernel automatically cleans it up when the container
  2. The isolation is much more complete, protecting the host OS from
    accidental changes from inside the container.
  3. It’s so good that you can actually boot a full OS in the
    container, not just a single lonesome shell.
  4. It’s actually tiny and installed everywhere where systemd is
    installed. No complicated installation or setup.

systemd itself has been modified to work very well in such a
container. For example, when shutting down and detecting that it is
run in a container, it just calls exit(), instead of reboot() as last

Note that systemd-nspawn is not a full container
solution. If you need that LXC is the better choice for
you. It uses the same underlying kernel technology but offers a lot
more, including network virtualization. If you so will,
systemd-nspawn is the GNOME 3 of container solutions:
slick and trivially easy to use — but with few configuration
options. LXC OTOH is more like KDE: more configuration options than lines of
code. I wrote systemd-nspawn specifically to cover testing,
debugging, building, installing, recovering. That’s what you should use
it for and what it is really good at, and where it is a much much nicer
alternative to chroot(1).

So, let’s get this finished, this was already long enough. Here’s what to take home from
this little blog story:

  1. Secure chroot()s are best done natively in the C sources of your program.
  2. ReadOnlyDirectories=, InaccessibleDirectories=
    might be suitable alternatives to a full chroot() environment.
  3. RootDirectory= is your friend if you want to chroot() a specific service.
  4. systemd-nspawn is made of awesome.
  5. chroot()s are lame, file system namespaces are totally l33t.

All of this is readily available on your Fedora 15 system.

And that’s it for today. See you again for the next installment.

My thoughts on the future of Gnome-VFS

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/gnomevfs-future.html

One of the major construction sites in GNOME and all the other free
desktop environments is the VFS abstraction. Recently, there has been
some discussion about developing a replacement DVFS
as replacement for the venerable Gnome-VFS system. Here are my 5 euro
cent on this issue (Yepp, I am not fully up-to-date on the whole DVFS
discussion, but during my flight from HEL to HAM I wrote this up,
without being necesarily too well informed, lacking an Internet
connection. Hence, if you find that I am an uniformed idiot, you’re of
course welcome to flame me!):

First of all, we have to acknowledge that Gnome-VFS never achieved any major
adoption besides some core (not even all) GNOME applications. The reasons are
many, among them: the API wasn’t all fun, using Gnome-VFS added another
dependency to applications, KDE uses a different abstraction (KIO), and many
others. Adoption was suboptimal, and due to that user experience was
suboptimal, too (to say the least).

One of the basic problems of Gnome-VFS is that it is a (somewhat) redundant
abstraction layer over yet another abstraction layer. Gnome-VFS makes available
an API that offers more or less the same functionality as the (most of the
time) underlying POSIX API. The POSIX API is well accepted, relatively
easy-to-use, portable and very well accepted. The same is not true for
Gnome-VFS. Semantics of the translation between Gnome-VFS and POSIX are not
always that clear. Paths understood by Gnome-VFS (URLs) follow a different
model than those of the Linux kernel. Applications which understand Gnome-VFS
can deal with FTP and HTTP resources, while the majority of the applications
which do not link against Gnome-VFS does not understand it. Integration of
Gnome-VFS-speaking and POSIX-speaking applications is difficult and most of the
time only partially implementable.

So, in short: One one side we have that POSIX API which is a file system
abstraction API. And a (kernel-based) virtual file system behind it. And on the other
side we have the Gnome-VFS API which is also a file system abstraction API and
a virtual file system behind it. Hence, why did we decide to standardize on
Gnome-VFS, and not just on POSIX?

The major reason of course is that until recently accessing FTP,
HTTP and other protocol shares through the POSIX API was not doable
without special kernel patches. However, a while ago the FUSE system
has been merged into the Linux kernel and has been made available for
other operating systems as well, among them FreeBSD and MacOS X. This
allows implementing file system drivers in userspace. Currently there
are all kinds of these FUSE based file systems around, FTP and SSHFS
are only two of them. My very own fusedav tool
implements WebDAV for FUSE.

Another (*the* other?) major problem of the POSIX file system API is
its synchronous design. While that is usually not a problem for local
file systems and for high-speed network file systems such as NFS it
becomes a problem for slow network FSs such as HTTP or FTP. Having the
GUI block for various seconds while an application saves its documents
is certainly not user friendly. But, can this be fixed? Yes, definitely, it can!
Firstly, there already is the POSIX AIO interface — which however is
quite unfriendly to use (one reason is its use of Unix signals for
notification of completed IO operations). Secondly, the (Linux) kernel
people are working on a better asynchronous IO API (see the
syslets/fibrils discussion). Unfortunately it will take a while
before that new API will finally be available in upstream
kernels. However, there’s always the third solution: add an
asynchronous API entirely in userspace. This is doable in a clean (and
glib-ified) fashion: have a couple of worker threads which
(synchronously) execute the various POSIX file system functions and
add a nice, asynchronous API that can start and stop these threads,
feed them operations to execute, and so on.

So, what’s the grand solution I propose for the desktop VFS mess? First, kick
Gnome-VFS entirely and don’t replace it. Instead write a small D-Bus-accessible
daemon that organizes a special directory ~/net/. Populate that
directory with subdirectories for all WebDAV, FTP, NFS and SMB shares that can
be found on the local network using both Avahi-based browsing and native SMB
browsing. Now use the Linux automounting interface on top of that directory and
automount the respective share every time someone wants to access it. For
shares that are not announced via Avahi/Samba, add some D-Bus API (and a nice
UI) for adding arbitrary shares. NFS and CIFS/SMB shares are mounted with the
fast, optimized kernel filesystem implementation; WebDAV and FTP on the other
hand are accessed via userspace FUSE-based file systems. The latter should also
integrate with D-BUS in some way, to query the user nicely for access
credentials and suchlike, with gnome-keyring support and everything.

~/net/ itself can — but probably doesn’t need to — be a FUSE
filesystem itself.

A shared library should be made available that will implement a few
remaining things, that are not available in the POSIX file
system API directly:

  • As mentioned, some nice Glib-ish asynchronous POSIX file system
    API wrapper
  • High-level file system operations such as copying, moving,
    deleting (trash!) which show a nice GUI when they are long-running
  • An API to translate and setup URL <-> filesystem
    mappings, i.e. something that translates
    ftp://test.local/a/certain/path/ to
    ~/net/ftp:test.local/a/certain/path and vice versa. (and
    probably also to a more user-friendly notation, maybe like “FTP Share
    on test.local
    ” or similar). (Needs to communicate with the ~/net/
    handling daemon to setup mappings if required)
  • Meta data extraction. It makes sense to integrate that with
    extended attribute support (EA) in the kernel file system layer, which should be used more often anyway.
  • Explicit mount operations (in contrast to implicit mounts, that
    are done through automounting) (this also needs to communicate with
    the ~/net/ daemon in some way)

Et voilá! Without a lot of new code you get a nice, asynchronous,
modern, well integrated file system, that doesn’t suck. (or at least,
it doesn’t suck as much as other solutions).

Also, this way we can escape the “abstraction trap”. Let’s KDE play
the abstraction game, maybe they’ll grow up eventually and learn that
abstracting abstracted abstraction layers is child’s play.

Yeah, sure, this proposed solution also has a few drawbacks, but be it that way. Here’s a short incomprehensive list:

  • The POSIX file system API sucks for file systems that don’t have “inodes” or that are attached to a specific user sessions. — Yes, sure, but both problems have been overcome by the FUSE project, at least partially.
  • Not that portable — Yes, but FUSE is now available for many systems besides Linux. The automount project is the bigger problem. But all you loose if you would run this proposed system on these (let’s say “legacy”) systems that don’t have FUSE or automounting is access to FTP and WebDAV shares. So what? Local files can still be accessed.
  • Translating between URLs and $HOME/net/ based paths sucks — yepp, it does. But much less than not being able to access FTP/WebDAV shares from some apps but not from others, as we have it right now.
  • Bah, you suck — Yes, I do. On a straw, taking a nip from my caipirinha, right at the moment.

I guess I don’t have to list all the advantages of this solution, do I?

BTW, pumping massive amounts of data through D-Bus sucks anyway.

And no, I am not going to hack on this. Too busy with other stuff.

The plane is now landing in HAM, that shall conclude our small rant.

Update: No, I didn’t get a Caipirinha during my flight. That line I
added in before publishing the blog story, which was when I was drinking my
Caipirinha. In contrast to other people from the Free Software community I don’t
own my own private jet yet, with two stewardesses that might fix me a