On the Brokenness of File Locking

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/locking.html

It’s amazing how far Linux has come without providing for proper file
locking that works and is usable from userspace. A little overview why file
locking is still in a very sad state:

To begin with, there’s a plethora of APIs, and all of them are awful:

POSIX File locking as available with fcntl(F_SET_LK): the POSIX
locking API is the most portable one and in theory works across NFS. It can do
byte-range locking. So much on the good side. On the bad side there’s a lot
more however: locks are bound to processes, not file descriptors. That means
that this logic cannot be used in threaded environments unless combined with a
process-local mutex. This is hard to get right, especially in libraries that do
not know the environment they are run in, i.e. whether they are used in
threaded environments or not. The worst part however is that POSIX locks are
automatically released if a process calls close() on any (!) of
its open file descriptors for that file. That means that when one part of a
program locks a file and another by coincidence accesses it too for a short
time, the first part’s lock will be broken and it won’t be notified about that.
Modern software tends to load big frameworks (such as Gtk+ or Qt) into memory
as well as arbitrary modules via mechanisms such as NSS, PAM, gvfs,
GTK_MODULES, Apache modules, GStreamer modules where one module seldom can
control what another module in the same process does or accesses. The effect of
this is that POSIX locks are unusable in any non-trivial program where it
cannot be ensured that a file that is locked is never accessed by
any other part of the process at the same time. Example: a user managing
daemon wants to write /etc/passwd and locks the file for that. At
the same time in another thread (or from a stack frame further down)
something calls getpwuid() which internally accesses
/etc/passwd and causes the lock to be released, the first thread
(or stack frame) not knowing that. Furthermore should two threads use the
locking fcntl()s on the same file they will interfere with each other’s locks
and reset the locking ranges and flags of each other. On top of that locking
cannot be used on any file that is publicly accessible (i.e. has the R bit set
for groups/others, i.e. more access bits on than 0600), because that would
otherwise effectively give arbitrary users a way to indefinitely block
execution of any process (regardless of the UID it is running under) that wants
to access and lock the file. This is generally not an acceptable security risk.
Finally, while POSIX file locks are supposedly NFS-safe they not always really
are as there are still many NFS implementations around where locking is not properly
implemented, and NFS tends to be used in heterogenous networks. The biggest
problem about this is that there is no way to properly detect whether file
locking works on a specific NFS mount (or any mount) or not.
The other API for POSIX file locks: lockf() is another API for the
same mechanism and suffers by the same problems. One wonders why there are two
APIs for the same messed up interface.
BSD locking based on flock(). The semantics of this kind of
locking are much nicer than for POSIX locking: locks are bound to file
descriptors, not processes. This kind of locking can hence be used safely
between threads and can even be inherited across fork() and
exec(). Locks are only automatically broken on the close()
call for the one file descriptor they were created with (or the last duplicate
of it). On the other hand this kind of locking does not offer byte-range
locking and suffers by the same security problems as POSIX locking, and works
on even less cases on NFS than POSIX locking (i.e. on BSD and Linux < 2.6.12
they were NOPs returning success). And since BSD locking is not as portable as
POSIX locking this is sometimes an unsafe choice. Some OSes even find it funny
to make flock() and fcntl(F_SET_LK) control the same locks.
Linux treats them independently — except for the cases where it doesn’t: on
Linux NFS they are transparently converted to POSIX locks, too now. What a chaos!
Mandatory locking is available too. It’s based on the POSIX locking API but
not portable in itself. It’s dangerous business and should generally be avoided
in cleanly written software.
Traditional lock file based file locking. This is how things where done
traditionally, based around known atomicity guarantees of certain basic file
system operations. It’s a cumbersome thing, and requires polling of the file
system to get notifications when a lock is released. Also, On Linux NFS < 2.6.5
it doesn’t work properly, since O_EXCL isn’t atomic there. And of course the
client cannot really know what the server is running, so again this brokeness
is not detectable.

The Disappointing Summary

File locking on Linux is just broken. The broken semantics of POSIX locking
show that the designers of this API apparently never have tried to actually use
it in real software. It smells a lot like an interface that kernel people
thought makes sense but in reality doesn’t when you try to use it from
userspace.

Here’s a list of places where you shouldn’t use file locking due to the
problems shown above: If you want to lock a file in $HOME, forget about it as
$HOME might be NFS and locks generally are not reliable there. The same applies
to every other file system that might be shared across the network. If the file
you want to lock is accessible to more than your own user (i.e. an access mode
> 0700), forget about locking, it would allow others to block your
application indefinitely. If your program is non-trivial or threaded or uses a
framework such as Gtk+ or Qt or any of the module-based APIs such as NSS, PAM,
… forget about about POSIX locking. If you care about portability, don’t use
file locking.

Or to turn this around, the only case where it is kind of safe to use file locking
is in trivial applications where portability is not key and by using BSD
locking on a file system where you can rely that it is local and on files
inaccessible to others. Of course, that doesn’t leave much, except for private
files in /tmp for trivial user applications.

Or in one sentence: in its current state Linux file locking is unusable.

And that is a shame.

Update: Check out the follow-up story on this topic.

Noise

On the Brokenness of File Locking

The Disappointing Summary

The collective thoughts of the interwebz