On the Brokenness of File Locking

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/locking.html

It’s amazing how far Linux has come without providing for proper file
locking that works and is usable from userspace. A little overview why file
locking is still in a very sad state:

To begin with, there’s a plethora of APIs, and all of them are awful:

  • POSIX File locking as available with fcntl(F_SET_LK): the POSIX
    locking API is the most portable one and in theory works across NFS. It can do
    byte-range locking. So much on the good side. On the bad side there’s a lot
    more however: locks are bound to processes, not file descriptors. That means
    that this logic cannot be used in threaded environments unless combined with a
    process-local mutex. This is hard to get right, especially in libraries that do
    not know the environment they are run in, i.e. whether they are used in
    threaded environments or not. The worst part however is that POSIX locks are
    automatically released if a process calls close() on any (!) of
    its open file descriptors for that file. That means that when one part of a
    program locks a file and another by coincidence accesses it too for a short
    time, the first part’s lock will be broken and it won’t be notified about that.
    Modern software tends to load big frameworks (such as Gtk+ or Qt) into memory
    as well as arbitrary modules via mechanisms such as NSS, PAM, gvfs,
    GTK_MODULES, Apache modules, GStreamer modules where one module seldom can
    control what another module in the same process does or accesses. The effect of
    this is that POSIX locks are unusable in any non-trivial program where it
    cannot be ensured that a file that is locked is never accessed by
    any other part of the process at the same time. Example: a user managing
    daemon wants to write /etc/passwd and locks the file for that. At
    the same time in another thread (or from a stack frame further down)
    something calls getpwuid() which internally accesses
    /etc/passwd and causes the lock to be released, the first thread
    (or stack frame) not knowing that. Furthermore should two threads use the
    locking fcntl()s on the same file they will interfere with each other’s locks
    and reset the locking ranges and flags of each other. On top of that locking
    cannot be used on any file that is publicly accessible (i.e. has the R bit set
    for groups/others, i.e. more access bits on than 0600), because that would
    otherwise effectively give arbitrary users a way to indefinitely block
    execution of any process (regardless of the UID it is running under) that wants
    to access and lock the file. This is generally not an acceptable security risk.
    Finally, while POSIX file locks are supposedly NFS-safe they not always really
    are as there are still many NFS implementations around where locking is not properly
    implemented, and NFS tends to be used in heterogenous networks. The biggest
    problem about this is that there is no way to properly detect whether file
    locking works on a specific NFS mount (or any mount) or not.
  • The other API for POSIX file locks: lockf() is another API for the
    same mechanism and suffers by the same problems. One wonders why there are two
    APIs for the same messed up interface.
  • BSD locking based on flock(). The semantics of this kind of
    locking are much nicer than for POSIX locking: locks are bound to file
    descriptors, not processes. This kind of locking can hence be used safely
    between threads and can even be inherited across fork() and
    exec(). Locks are only automatically broken on the close()
    call for the one file descriptor they were created with (or the last duplicate
    of it). On the other hand this kind of locking does not offer byte-range
    locking and suffers by the same security problems as POSIX locking, and works
    on even less cases on NFS than POSIX locking (i.e. on BSD and Linux < 2.6.12
    they were NOPs returning success). And since BSD locking is not as portable as
    POSIX locking this is sometimes an unsafe choice. Some OSes even find it funny
    to make flock() and fcntl(F_SET_LK) control the same locks.
    Linux treats them independently — except for the cases where it doesn’t: on
    Linux NFS they are transparently converted to POSIX locks, too now. What a chaos!
  • Mandatory locking is available too. It’s based on the POSIX locking API but
    not portable in itself. It’s dangerous business and should generally be avoided
    in cleanly written software.
  • Traditional lock file based file locking. This is how things where done
    traditionally, based around known atomicity guarantees of certain basic file
    system operations. It’s a cumbersome thing, and requires polling of the file
    system to get notifications when a lock is released. Also, On Linux NFS < 2.6.5
    it doesn’t work properly, since O_EXCL isn’t atomic there. And of course the
    client cannot really know what the server is running, so again this brokeness
    is not detectable.

The Disappointing Summary

File locking on Linux is just broken. The broken semantics of POSIX locking
show that the designers of this API apparently never have tried to actually use
it in real software. It smells a lot like an interface that kernel people
thought makes sense but in reality doesn’t when you try to use it from
userspace.

Here’s a list of places where you shouldn’t use file locking due to the
problems shown above: If you want to lock a file in $HOME, forget about it as
$HOME might be NFS and locks generally are not reliable there. The same applies
to every other file system that might be shared across the network. If the file
you want to lock is accessible to more than your own user (i.e. an access mode
> 0700), forget about locking, it would allow others to block your
application indefinitely. If your program is non-trivial or threaded or uses a
framework such as Gtk+ or Qt or any of the module-based APIs such as NSS, PAM,
… forget about about POSIX locking. If you care about portability, don’t use
file locking.

Or to turn this around, the only case where it is kind of safe to use file locking
is in trivial applications where portability is not key and by using BSD
locking on a file system where you can rely that it is local and on files
inaccessible to others. Of course, that doesn’t leave much, except for private
files in /tmp for trivial user applications.

Or in one sentence: in its current state Linux file locking is unusable.

And that is a shame.

Update: Check out the follow-up story on this topic.