SourceHut outage post-mortem

Post Syndicated from daroc original https://lwn.net/Articles/958794/

SourceHut has published

a post-mortem
of its
outage earlier this month.
The post-mortem covers the causes of the outage and what steps SourceHut
took to mitigate it, ending by saying:

As unfortunate as these events were, we welcome opportunities to stress-test
our emergency procedures; we found them to be compatible with our objectives
for the alpha and we learned a lot of ways to improve our reliability
further for the future. We are going to continue working on our
post-incident tasks, building up our infrastructure’s resilience,
reliability, and scalability as planned. Once we address the high-priority
tasks, though, our first order of business in the immediate future will be
to get some rest.