Post Syndicated from Brian Wilson original https://www.backblaze.com/blog/b2-503-500-server-error/
Just try again — it’s free, easy, and will work.
Seriously, that’s it. Occasionally, I’ll see questions that amount to, “I’m getting a 503 error; does that mean B2 is down?” To address that question, I wanted to take today’s post to go into a bit more detail on how to handle a 500 or 503 error. The short answer is no. B2 is not down. It simply means that B2 is functioning as designed as the most affordable, easy to use cloud storage service on the planet.
As we’ve described in our developer docs, the best decision is to write your integration in a way that it retries in the event of a 500 or 503. This modest amount of upfront work will result in a stable and transparent long term experience.
The Backblaze Contract Architecture
To understand the vast majority of B2 500 and 503 errors, it’s helpful to go into the “contract architecture” for B2. To create a service that is fully scalable at incredibly low cost, Backblaze has had to innovate in a number of areas. One way is what we refer to as “contract architecture.” It’s the approach that let us cut a large expense in traditional cloud storage infrastructure — high bandwidth load balancers for uploads.
Here’s how it works: when a client wants to push data to Backblaze, it contacts a “dispatching server.” That dispatching server figures out where there data will ultimately live inside a given Backblaze data center.
The dispatching server tells the client “there is space over on vault-9015.
Armed with that information (and an auth token), the client ends its connection with the dispatching server and creates a brand new request directly to vault-9015. The “contract” concept is not novel: ultimately, all APIs are contracts between two entities (machines). In the B2 case, our design leverages that insight as the client and vault negotiate how they will work together. In this example, once authenticated, the client continues to transmit to vault-9015 until it’s done or the vault fills up (or happens to go offline). In those instances, all the client has to do is return to the dispatching server to get information for the next available vault. This is a relatively trivial step and can be easily handled at the software level.
What Causes a B2 500 or 503 Error Response?
The client knows when to go back to the dispatching server because it receives (wait for it) a 500 or 503 error from vault-9015. The system is designed to send a firm message that says, in effect, “stop uploading to vault-9015.” We documented the specifics of what happens where in the B2 error handling protocols. The bottom line is an error in the 500 block should be interpreted by the client as the signal to GO BACK to the dispatching server and ask for a new vault for uploads. Rinse and repeat. It’s a free process that causes negligible incremental overhead.
What if, after getting a 503 and asking the dispatch server for a new URL, you try to upload and get ANOTHER 503 from the new vault? To address this unusual case, write your software to pause for a few seconds, then go back to the dispatch server. In this scenario, the user has hit a statistically unusual situation where the user was told to go to a vault with very little space left and somebody else got there and filled up that space. The second 503 is a sign the system is functioning as designed. Your program can elegantly handle it by going back to the dispatch server.
Other services, notably, Amazon S3, provide the client with a “well known URL.” The client can merrily push data to the URL and Amazon handles load balancing and finding open storage space after receiving the data. That’s a totally valid approach, but objectively more expensive as it involves high bandwidth load balancers. There are other interesting implications to the load balancing scenario. If you’re interested, I wrote a blog post on the difference between the two approaches.
As I discussed in that post, the contract architecture does introduce some complexity when the client has to go back to the dispatching server. But, for that modest amount of error handling upfront, we help fuel Backblaze B2 as an infinitely scalable, fully sustainable service that has and will continue to be the affordability leader in the object storage market.
The post What To Do When You Get a B2 503 (or 500) Server Error appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.