[$] How to cope with hardware-poisoned page-cache pages

Post Syndicated from original https://lwn.net/Articles/893565/

“Hardware poisoning” is a mechanism for detecting and handling memory
errors in a running system. When a particular range of memory ceases to
remember correctly, it is “poisoned” and further accesses to it will
generate errors. The kernel has had support for
hardware poisoning
for over a decade, but that doesn’t mean it can’t be
improved. At the 2022 Linux Storage,
Filesystem, Memory-management and BPF Summit
, Yang Shi discussed the
challenges of dealing with hardware poisoning when it affects memory used
for the page cache.