For every page of memory in the system, the kernel maintains a set of page
flags describing how the page is used and various aspects of its current
state. Space for page flags has been in chronic short supply, leading to a desire to
eliminate or consolidate them whenever possible. That objective, though,
is hampered by the fact that the purpose of many page flags is not well
understood. In a memory-management-track session at the 2024 Linux Storage,
Filesystem, Memory-Management and BPF Summit, Matthew Wilcox set out to
cooperatively update the page-flag documentation to improve that situation.
The problem of sharing page tables across processes has been discussed
numerous times over the years, Khaled Aziz said at the beginning of his 2024 Linux Storage,
Filesystem, Memory-Management and BPF Summit session on the topic. He
was there to, once again, talk about the proposed mshare() system call (which, in its
current form, is no longer actually a system call but the feature still
goes by that name) and to see what can be done to finally get it into the
mainline.
The kernel’s hugetlbfs
subsystem was the first mechanism by which the kernel made huge pages
available to user space; it was added to the 2.5.46 development kernel in
2002. While hugetlbfs remains useful, it is also viewed as a sort of
second memory-management subsystem that would be best unified with the rest
of the kernel. At the 2024 Linux Storage,
Filesystem, Memory-Management and BPF Summit, Peter Xu raised the
question of what that unification would involve and what the first steps
might be.
The 2024 Linux
Storage, Filesystem, Memory-Management and BPF Summit was a development
conference, where discussion was prioritized and presentations with a lot
of slides were discouraged. Paul McKenney seemingly flouted this
convention in a joint session of the storage, filesystem, and
memory-management tracks where he presented about 50 slides — in five
minutes, twice. The subject was the use of the read-copy-update (RCU)
mechanism in the memory-reclaim process, and whether changes to RCU would
be needed for that purpose.
Looking up a virtual memory area (VMA) in a process’s address space, for
the handling of page faults or any of a number of other tasks, in
multi-threaded processes has long been bedeviled by lock contention in the
kernel. As a result, developer gatherings have been subjected to many
sessions on how to improve the situation. At the 2024 Linux Storage,
Filesystem, Memory-Management and BPF Summit, developers in the
memory-management track met, in a session led by Liam Howlett, to talk
about a situation that has improved considerably in recent times, but which
still offers opportunities for optimization.
Brendan Jackman started his memory-management-track session at the 2024 Linux Storage,
Filesystem, Memory-Management and BPF Summit by saying that, for some
years now, the kernel community has been stuck in a reactive posture with
regard to hardware vulnerabilities. Each problem shows up with its own
scary name, and kernel developers find a way to mitigate it, usually losing
performance in the process. Jackman said that it is time to take back the
initiative against these vulnerabilities by reconsidering the more
general use of address-space isolation.
Optimizing the kernel’s memory use is made much easier if developers have
an accurate idea of how memory is being used, but the kernel’s
instrumentation is not as good as it could be. When Suren Baghdasaryan and
Kent Overstreet presented their
memory-allocation profiling work, which is meant to address this
shortcoming, at the 2023 Linux Storage, Filesystem, Memory Management, and
BPF Summit, their objective was uncontroversial but the proposed solution
ran into opposition that played out at length on the mailing lists (example)
over the last year. So it may be a bit surprising that, when the two
returned to the memory-management track in the 2024 gathering, the
controversy was gone and the discussion focused on improving details of the
implementation.
The kernel stack is a scarce and tightly constrained resource; kernel
developers often have to go far out of their way to avoid using too much
stack space. The size of the stack is also fixed, leading to situations
where it is too small for some code paths, while wastefully large for
others. At the 2024 Linux Storage,
Filesystem, Memory Management, and BPF Summit, Pasha Tatashin proposed
making the kernel stack size dynamic, making more space available when
needed while saving memory overall. This change is not as easy to
implement as it might seem, though.
The page
structure is a complicated beast, but some parts of it are more
intimidating than others. The mapcount field is one of the
scarier parts. It allegedly records the number of references to the page
in page tables, but, as David Hildenbrand described during the
memory-management track at the 2024 Linux Storage,
Filesystem, Memory Management, and BPF Summit, things are more
complicated than that. Few people truly understand the semantics of this
field, but the situation will hopefully get better over time.
Security updates have been issued by AlmaLinux (firefox, nodejs, and thunderbird), Fedora (uriparser), Oracle (firefox and thunderbird), Slackware (mariadb), SUSE (cairo, gdk-pixbuf, krb5, libosinfo, postgresql14, and python310), and Ubuntu (firefox, linux-aws, linux-aws-5.15, and linux-azure).
There are two fundamental levels of memory allocator in the Linux kernel:
the page allocator, which allocates memory in units of pages, and the slab
allocator, which allocates arbitrarily-sized chunks that are usually (but
not necessarily) smaller than a page. The slab allocator is the one that
stands behind commonly used kernel functions like kmalloc(). At
the 2024 Linux
Storage, Filesystem, Memory Management, and BPF Summit, slab maintainer
Vlastimil Babka provided an update on recent changes at the slab level and
discussed the changes that are yet to come.
The term “memory tiering” refers to the management of memory placement on
systems with multiple types of memory, each of which has its own
performance characteristics. On such systems, poor placement can lead to
significantly worse performance. A memory-management-track discussion at
the 2024 Linux Storage,
Filesystem, Memory Management, and BPF Summit took yet another look at
tiering challenges with a focus on upcoming technologies that may simplify
(or complicate) the picture.
Bundles are multiple buffers used in a single operation. On the
receive side, this means a single receive may utilize multiple
buffers, reducing the roundtrip through the networking stack from N
per N buffers to just a single one. On the send side, this also
enables better handling of how an application deals with sends from
a socket, eliminating the need to serialize sends on a single
socket. Bundles work with provided buffers, hence this feature also
adds support for provided buffers for send operations.
Security updates have been issued by Debian (bind9, chromium, and thunderbird), Fedora (buildah, chromium, firefox, mingw-python-werkzeug, and suricata), Mageia (golang), Oracle (firefox and nodejs:20), Red Hat (firefox, httpd:2.4, nodejs, and thunderbird), and SUSE (firefox, git-cliff, and ucode-intel).
Non-uniform memory access (NUMA) systems are organized with their CPUs
grouped into nodes, each of which has memory attached to it. All memory in
the system is accessible from all CPUs, but memory attached to the local
node is faster. The kernel’s memory-policy
(“mempolicy”) interface allows threads to inform the kernel about how
they would like their memory placed to get the best performance. In recent
years, the NUMA concept has been extended to support the management of
different types of memory in a system, pushing the limits of the mempolicy
subsystem. In a remotely presented session at the 2024 Linux Storage,
Filesystem, Memory Management, and BPF Summit, Gregory Price discussed
the ways in which the kernel’s memory-policy support should evolve to
handle today’s more-complex systems.
The DAMON
subsystem was the subject of the first session in the memory-management
track at the Linux
Storage, Filesystem, Memory Management, and BPF Summit. DAMON
maintainer SeongJae Park introduced the data-access monitoring
framework, which can generate snapshots of how memory is accessed, enabling
the detection of hot and cold regions of memory in both the virtual and
physical address spaces. The session covered recent changes and future
plans for this tool.
Ronnie Sahlberg, Jonathan Maple, and Jeremy Allison of CiQ have published a white
paper looking at the security-relevant bug fixes applied (or not
applied) to the RHEL 8.x kernel over time.
This means that over time, the security of the RHEL kernels get
worse and worse as more issues are discovered in the upstream code
and are potentially exploitable but fewer and fewer of the fixes
for these known bugs are back-ported into RHEL kernels.
After reaching RHEL 8.7, the theory is that the kernel has been
stabilized, with a corresponding improvement in security. However
we still have an influx of newly discovered bugs in the upstream
kernel affecting RHEL 8.7 that are not addressed. Each minor
version of upstream is released on an approximately quarterly basis
and we can see that the influx of new bugs that are unaddressed in
RHEL is growing. The number of known issues in these kernels
increases by approximately 250 new bugs per quarter or more.
The merge window for the 6.10 kernel release opened on May 12; between
then and the time of this writing, 6,819 non-merge commits were pulled into
the mainline kernel for that release. Your editor has taken some time out
from LSFMM+BPF in an attempt to keep
up with the commit flood. Read on for an overview of the most significant
changes that were pulled in the early part of the 6.10 merge window.
The Mozilla Foundation has announced
that its new executive director will be Nabiha Syed.
Syed is known for her mission-driven leadership, focused on
increasing transparency into the most powerful institutions in
society. She comes to Mozilla after leading The Markup, an
award-winning publication that challenges technology to serve the
public good, from its launch through its successful acquisition in
2024.
The collective thoughts of the interwebz
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.