Scaling Git’s garbage collection (GitHub blog)

Post Syndicated from original https://lwn.net/Articles/908035/

The GitHub blog has a
detailed look at garbage collection in Git
and the work that has been
done to make it faster.

To solve this problem, we turned to a long-discussed idea on the
Git mailing list: cruft packs. The idea is simple: store an
auxiliary list of mtime data alongside a pack containing
just unreachable objects. To garbage collect a repository, Git
places the unreachable objects in a pack. That pack is designated
as a “cruft pack” because Git also writes the mtime data
corresponding to each object in a separate file alongside that
pack. This makes it possible to update the mtime of a
single unreachable object without changing the mtimes of
any other unreachable object.