Control Groups vs. Control Groups

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/cgroups-vs-cgroups.html

TL;DR: systemd does not
require the performance-sensitive bits of Linux control groups enabled in the kernel.
However, it does require some non-performance-sensitive bits of the control
group logic.

In some areas of the community there’s still some confusion about Linux
control groups and their performance impact, and what precisely it is that
systemd requires of them. In the hope to clear this up a bit, I’d like to point
out a few things:

Control Groups are two things: (A) a way to hierarchally group and
label processes
, and (B) a way to then apply resource limits
to these groups. systemd only requires the former (A), and not the latter (B).
That means you can compile your kernel without any control group resource
controllers (B) and systemd will work perfectly on it. However, if you in
addition disable the grouping feature entirely (A) then systemd will loudly
complain at boot and proceed only reluctantly with a big warning and in a
limited functionality mode.

At compile time, the grouping/labelling feature in the kernel is enabled by
CONFIG_CGROUPS=y, the individual controllers by CONFIG_CGROUP_FREEZER=y,
CONFIG_CGROUP_DEVICE=y, CONFIG_CGROUP_CPUACCT=y, CONFIG_CGROUP_MEM_RES_CTLR=y,
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y, CONFIG_CGROUP_MEM_RES_CTLR_KMEM=y,
CONFIG_CGROUP_PERF=y, CONFIG_CGROUP_SCHED=y, CONFIG_BLK_CGROUP=y,
CONFIG_NET_CLS_CGROUP=y, CONFIG_NETPRIO_CGROUP=y. And since (as mentioned) we
only need the former (A), not the latter (B) you may disable all of the latter
options while enabling CONFIG_CGROUPS=y, if you want to run systemd on your
system.

What about the performance impact of these options? Well, every bit of code
comes at some price, so none of these options come entirely for free. However,
the grouping feature (A) alters the general logic very little, it just sticks
hierarchial labels on processes, and its impact is minimal since that is
usually not in any hot path of the OS. This is different for the various
controllers (B) which have a much bigger impact since they influence the resource
management of the OS and are full of hot paths. This means that the kernel
feature that systemd mandatorily requires (A) has a minimal effect on system
performance, but the actually performance-sensitive features of control groups
(B) are entirely optional.

On boot, systemd will mount all controller hierarchies it finds enabled
in the kernel to individual directories below /sys/fs/cgroup/. This is
the official place where kernel controllers are mounted to these days. The
/sys/fs/cgroup/ mount point in the kernel was created precisely for
this purpose. Since the control group controllers are a shared facility that
might be used by a number of different subsystems a few
projects have agreed on a set of rules in order to avoid that the various bits
of code step on each other’s toes when using these directories
.

systemd will also maintain its own, private, controller-less, named control
group hierarchy which is mounted to /sys/fs/cgroup/systemd/. This
hierarchy is private property of systemd, and other software should not try to
interfere with it. This hierarchy is how systemd makes use of the naming and
grouping feature of control groups (A) without actually requiring any kernel
controller enabled for that.

Now, you might notice that by default systemd does create per-service
cgroups in the “cpu” controller if it finds it enabled in the kernel. This is
entirely optional, however. We chose to make use of it by default to even out
CPU usage between system services. Example: On a traditional web server machine
Apache might end up having 100 CGI worker processes around, while MySQL only
has 5 processes running. Without the use of the “cpu” controller this means
that Apache all together ends up having 20x more CPU available than MySQL since
the kernel tries to provide every process with the same amount of CPU time. On
the other hand, if we add these two services to the “cpu” controller in
individual groups by default, Apache and MySQL get the same amount of CPU,
which we think is a good default.

Note that if the CPU controller is not enabled in the kernel systemd will not
attempt to make use of the “cpu” hierarchy as described above. Also, even if it is enabled in the kernel it
is trivial to tell systemd not to make use of it: Simply edit
/etc/systemd/system.conf and set DefaultControllers= to the
empty string.

Let’s discuss a few frequently heard complaints regarding systemd’s use of control groups:

  • systemd mounts all controllers to /sys/fs/cgroup/ even though
    my software requires it at /dev/cgroup/ (or some other place)!
    The
    standardization of /sys/fs/cgroup/ as mount point of the hierarchies
    is a relatively recent change in the kernel. Some software has not been updated
    yet for it. If you cannot change the software in question you are welcome to
    unmount the hierarchies from /sys/fs/cgroup/ and mount them wherever
    you need them instead. However, make sure to leave
    /sys/fs/cgroup/systemd/ untouched.
  • systemd makes use of the “cpu” hierarchy, but it should leave its dirty
    fingers from it!
    As mentioned above, just set the
    DefaultControllers= option of systemd to the empty string.
  • I need my two controllers “foo” and “bar” mounted into one hierarchy,
    but systemd mounts them in two!
    Use the JoinControllers= setting
    in /etc/systemd/system.conf to mount several controllers into a single
    hierarchy.
  • Control groups are evil and they make everything slower! Well,
    please read the text above and understand the difference between
    “control-groups-as-in-naming-and-grouping” (A) and “cgroups-as-in-controllers”
    (B). Then, please turn off all controllers in you kernel build (B) but leave
    CONFIG_CGROUPS=y (A) enabled.
  • I have heard some kernel developers really hate control groups
    and think systemd is evil because it requires them!
    Well, there are a
    couple of things behind the dislike of control groups by some folks.
    Primarily, this is probably caused because the hackers in question do not
    distuingish the naming-and-grouping bits of the control group logic (A) and the
    controllers that are based on it (B). Mainly, their beef is with the latter
    (which systemd does not require, which is the key point I am trying to make in
    the text above), but there are other issues as well: for example, the code of
    the grouping logic is not the most beautiful bit of code ever written by man
    (which is thankfully likely to get better now, since the control groups
    subsystem now has an active maintainer again). And then for some
    developers it is important that they can compare the runtime behaviour of many
    historic kernel versions in order to find bugs (git bisect). Since systemd
    requires kernels with basic control group support enabled, and this is a
    relatively recent feature addition to the kernel, this makes it difficult for
    them to use a newer distribution with all these old kernels
    that predate cgroups. Anyway, the summary is probably that what matters to
    developers is different from what matters to users and
    administrators.

I hope this explanation was useful for a reader or two! Thank you for your time!