Tag Archives: Durable Objects

Durable Objects: Easy, Fast, Correct — Choose three.

2021-08-03 Kenton Varda

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/durable-objects-easy-fast-correct-choose-three/

Durable Objects: Easy, Fast, Correct — Choose three.

Storage in distributed systems is surprisingly hard to get right. Distributed databases and consensus are well-known to be extremely hard to build. But, application code isn’t necessarily easy either. There are many ways in which apps that use databases can have subtle timing bugs that could result in inconsistent results, or even data loss. Worse, these problems can be very hard to test for, as they’ll often manifest only under heavy load, or only after a sudden machine failure.

Up until recently, Durable Objects were no exception. A Durable Object is a special kind of Cloudflare Worker that has access to persistent storage and processes requests in one of Cloudflare’s points of presence. Each Object has its own private storage, accessible through a classical key/value storage API. Like any classical database API, this storage API had to be used carefully to avoid possible race conditions and data loss, especially when performance mattered. And like any classical database API, many apps got it wrong.

However, rather than fix the apps, we decided to fix the model. Last month, we rolled out deep changes to the Durable Objects runtime such that many applications which previously contained subtle race conditions are now correct by default, and many that were previously slow are now fast. Developers can now write their code in an intuitive way, and have it work. No changes at all are needed to your code in order to take advantage of these new features.

So, let me tell you about what changed…

Background: Durable Objects are Single-Threaded

To understand what changed, it’s necessary to first understand Durable Objects. For a full introduction, see the Durable Objects announcement blog post.

The most important point is: Each Durable Object runs in exactly one location, in one single thread, at a time. Each object has its own private on-disk storage. This is a very different situation from a typical database, where many clients may be accessing the same data. In Durable Objects, any particular piece of data belongs to exactly one thread at a time.

Because a single Durable Object is single-threaded, it’s possible, and even encouraged, to keep state and perform synchronization in memory. This is, indeed, the killer feature of Durable Objects. With classical databases, in-memory state is extremely difficult to keep synchronized between all database clients. But with Durable Objects, since each piece of data belongs to a specific thread, this synchronization is easy.

However, interacting with the disk is still an I/O (input/output) operation, which means that each operation returns a Promise which you must await. As we’ll see, this re-introduces some of the synchronization difficulties that we were trying to avoid. However, it turns out, we can solve these difficulties within the system itself, without bothering application developers.

An Example

Consider this code:

// Used to be slow and racy -- but not anymore!
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

At first glance, this seems like reasonable code that returns a unique number each time it is called (incrementing each time).

Unfortunately, before now, this code had two problems:

It had a subtle race condition (even though Durable Objects are single-threaded!).
It was kind of slow.

The Race Condition

A race condition occurs when two operations running concurrently might interfere with each other in a way that makes them behave incorrectly. Race conditions are commonly associated with code that uses multiple threads.

JavaScript, however, famously does not use threads. Instead, it uses event-driven programming, with callbacks. It’s not possible for two pieces of JavaScript code to be running "at the same time" in the same isolate (and Durable Objects promises that no other isolate could possibly be accessing the same storage). Does that mean that race conditions aren’t a problem in JavaScript, the way they are in multi-threaded apps?

Unfortunately, it does not. The problem is, the code above is an async function, containing two await statements. Each time await is used, execution pauses, waiting for the specified Promise to complete.

In the meantime, though, other code can run! For example, the Durable Object might receive two requests at the same time. If each of them calls getUniqueNumber(), then the two calls might be interleaved. Each time one call performs an await, execution may switch to the other call. So, the two calls might end up looking like this:

Request 1 timeline	Request 2 timeline
`async function getUniqueNumber() { let val = await this.storage.get("counter");`
	`async function getUniqueNumber() { let val = await this.storage.get("counter");`
`await this.storage.put("counter", val + 1);`
	`await this.storage.put("counter", val + 1);`
`return val; }`
	`return val; }`

There’s a big problem here: Both of these two calls will call get("counter") before either of them calls put("counter", val + 1). That means, both of them will return the same value!

This problem is especially bad because it only happens when multiple requests are being handled at the same time — and even then, only sometimes. It is very hard to test for this kind of problem, and everything might seem just fine when the application is deployed, as long as it isn’t getting too much traffic. But one day, when a lot of visitors try to use the same object at the same time, all of a sudden getUniqueNumber() starts returning duplicates!

The Slowness

To add insult to injury, getUniqueNumber() was (until recently) pretty slow. The problem is, it has to do two round trips to storage — a get() and a put(). The get() might typically take a couple milliseconds. The put(), however, will take much longer, probably tens of milliseconds.

Why is put() so slow? Because we don’t want to lose data. The worst thing an application can do is tell the user that their action was successful when it wasn’t. If, for some reason, a write cannot be completed, then it’s imperative that the application presents an error to the user, so that the user knows that something is wrong and they’ll have to try again or look for a fix.

In order to make sure an application does not prematurely report success to the user, await put() has to make sure it doesn’t return until the data is actually safe on disk. Disks are slow, so this might take a while.

But that’s not all. Disks can fail. In order for the data to be really safe, we have to write the same data on multiple disks, in multiple machines. That means we have to wait for some network traffic.

But that’s still not all. What if a meteor were to come out of the sky and land on a Cloudflare data center, completely destroying it? Or, more likely, what if the power or network connection failed? We don’t want a user’s data to be lost in this case, or even temporarily become unavailable. Therefore, Durable Object data is replicated to multiple Cloudflare locations. This requires communicating across long distances before any write can be confirmed. There is little we can do to make this faster, the speed of light being what it is.

A call to getUniqueNumber() will therefore always take tens of milliseconds. If an application calls it multiple times, awaiting each call before beginning the next, it can easily become very slow very quickly. Or, at least, that was the case before our recent changes.

The Wrong Fixes

There are several ways that an application could fix these problems, but all of them have their own issues.

Transactions?

Many databases offer "transactions". A transaction allows an application to make sure some operation completes "atomically", with no interference from concurrent operations.

The Durable Objects storage API has always supported transactions. We could use them to fix our getUniqueNumber() implementation like so:

// No more race condition... but slow and complicated.
async function getUniqueNumber() {
  let val;
  await this.storage.transaction(async (txn) => {
    val = await txn.get("counter");
    await txn.put("counter", val + 1);
  });
  return val;
}

This fixes our race condition. Now, if getUniqueNumber() is called multiple times concurrently such that the storage operations interleave, the system will detect the problem. One of the concurrent calls will be chosen to be the "winner", and will complete normally. The other calls will be canceled and retried, so that they can see the value written by the first call.

This fixes our problems! But, at some cost:

getUniqueNumber() is now even slower than it was before. The difference typically won’t be huge, but setting up a transaction does require some additional coordination in the database. Of course, if the transaction needs to be retried, then it may end up being much slower. And retries will tend to happen more when load gets high… the worst possible time.
Speaking of retries, many developers might not realize that the transaction callback can be called multiple times. It’s difficult to test for this, since retries will only happen when concurrent operations cause conflicts. The problem is especially acute when the application is trying to synchronize not just on-disk state, but also in-memory state — if the transaction callback modifies in-memory state, it must be careful to ensure that its changes are idempotent. The need for idempotency may not be top of mind for most developers, and tests won’t catch the problem, making it very easy to end up deploying buggy code.

So we solved our problem, but we did it with a foot-gun. If we keep using the foot-gun, we’re probably going to shoot our own feet eventually.

Is there another way?

In-memory caching?

Durable Objects’ superpower is their in-memory state. Each object has only one active instance at any particular time. All requests sent to that object are handled by that same instance. That means, you can store some state in memory.

// Much faster! But (used to be) wrong.
async function getUniqueNumber() {
  if (this.val === undefined) {
    this.val = await this.storage.get("counter");
  }

  let result = this.val;
  ++this.val;
  this.storage.put("counter", this.val);
  return result;
}

This code is MUCH faster than the previous implementation, because it stores the value in memory. In fact, after the function runs once, further calls won’t wait for any I/O at all — they will return immediately. This is because by caching the value in memory, we avoid waiting for a get() (except for the first time), and we don’t wait for the put() either, trusting that it will complete asynchronously later on.

Returning immediately also means that there’s no opportunity for concurrency, so the calls that return immediately will always return unique numbers! This means that not only is this implementation faster than our original implementation, it is also more correct. This is only possible because the Durable Objects platform guarantees that there will only be one instance, and therefore only one copy of this.val.

Unfortunately, there are two problems with this code:

We still have a race condition on initialization. If the first two calls to getUniqueNumber() happen to occur at about the same time, then initialization will be performed multiple times. The second call will likely clobber what the first call did, and the two calls will end up returning the same number. We could solve this problem by making initialization more complicated — the first call could create an initialization promise, and other concurrent calls could wait on it, so that initialization really only happens once. But this creates even deeper complexity: What if initialization fails for some reason? The object could be placed in a permanently broken state. It’s possible to get this right, but it’s surprisingly tricky.
Because we don’t wait for the put() to report success, it’s possible that it could be silently lost. For example, if the machine hosting the Durable Object suffered a sudden power failure, then the Durable Object would be transferred to some other machine. When it starts up there, calls to getUniqueNumber() might return numbers that had already been returned under the old instance before it failed, because the put()s hadn’t actually completed before the failure occurred. But if we await the put(), then our function becomes slow again, and creates more opportunities for race conditions (e.g. in the calling code).

Our answer: Make it automatic

When looking at this, we had two options:

Try to carefully document these problems and educate developers about them, so that they could write code that does the right thing.
Change the system so that naturally-written code just does the right thing by default — and runs quickly.

We chose option 2. We accomplished this in three parts.

Part 1: Input Gates

Let’s go back to our original example. Can we make this example "just work", even in the face of concurrent requests?

// Can this "just work" please?
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

It turns out we can! We create a new rule:

Input gates: While a storage operation is executing, no events shall be delivered to the object except for storage completion events. Any other events will be deferred until such a time as the object is no longer executing JavaScript code and is no longer waiting for any storage operations. We say that these events are waiting for the "input gate" to open.

If we do this, then our storage operations above are no longer an opportunity for concurrency. Our concurrent requests now look like this:

Request 1 timeline	Request 2 timeline
`async function getUniqueNumber() { let val = await this.storage.get("counter");`
	`// Request 2 delivery is blocked because // request 1 is waiting for storage.`
`await this.storage.put("counter", val + 1);`
	`// Request 2 delivery is blocked because // request 1 is waiting for storage.`
`return val; }`
	`async function getUniqueNumber() { let val = await this.storage.get("counter"); await this.storage.put("counter", val + 1); return val; }`

The two calls return unique numbers, as expected. Hooray! (Unfortunately, we did it by delaying the second request, creating latency and reducing throughput — but we’ll address that in part 3, below.)

Note that our rule does not preclude making multiple concurrent requests to storage at the same time. You can still say:

let promise1 = this.storage.get("foo");
let promise2 = this.storage.put("bar", 123);
await promise1;
frob();
await promise2;

Here, the get() and put() execute concurrently. Moreover, the call to frob() may execute before the put() has completed (but strictly after the get() completes, since we awaited that promise). However, no other event — such as receiving a new request — can unexpectedly happen in the meantime.

On the other hand, the rule protects you not just against concurrent incoming requests, but also concurrent responses to outgoing requests. For example, say you have:

async function task1() {
  await fetch("https://example.com/api1");
  return await this.getUniqueNumber();
}
async function task2() {
  await fetch("https://example.com/api2");
  return await this.getUniqueNumber();
}
let promise1 = task1();
let promise2 = task2();
let val1 = await promise1;
let val2 = await promise2;

This code launches two fetch() calls concurrently. After each fetch completes, getUniqueNumber() is invoked. Could the two calls interfere with each other?

No, they will not. The completion of a fetch() is itself a kind of event. Our rule states that such events cannot be delivered while storage events are in progress. When the first of the two fetches returns, the app calls getUniqueNumber(), which starts performing some storage operations. If the second fetch() also returns while these storage operations are still outstanding, that return will be deferred until after the storage operations are done. Once again, our code ends up correct!

At this point, the async programming experts in the audience are probably starting to feel like something is fishy here. Indeed, there is a catch. What if we do:

// Still a problem even with input gates.
let promise1 = getUniqueNumber();
let promise2 = getUniqueNumber();
let val1 = await promise1;
let val2 = await promise2;

In this case, there is, in fact, a problem. Two calls to getUniqueNumber() are initiated by the same event. The application does not await the first call before starting the second, so the two calls end up running concurrently. Our special rule doesn’t protect us here, because there is no incoming event that can be deferred between when the two calls are made. From the system’s point of view, there’s no way to distinguish this code from code which legitimately decided to perform two storage operations in parallel.

As such, in this case, the two calls to getUniqueNumber() will interfere with each other. However, this problem is far less likely to come about by accident, and is far easier to catch in testing. This bug is deterministic, not caused by the unpredictable timing of network events. We consider this an acceptable caveat in order to solve the larger problem posed by concurrent requests.

Part 2: Output Gates

Let’s go back to our in-memory caching example. Can we make it work?

// Can we make this "just work"?
async function getUniqueNumber() {
  if (this.val === undefined) {
    this.val = await this.storage.get("counter");
  }

  let result = this.val;
  ++this.val;
  this.storage.put("counter", this.val);
  return result;
}

With input gates (part 1), we’ve solved one of the two problems this code had: the race condition of initialization. We no longer need to worry that two requests will call this at the same time, leading this.val to be initialized twice.

However, the problem with not awaiting the put() is still there. If we don’t await it, then we could lose data. If we do await it, then the call is slow.

We make another new rule:

Output gates: When a storage write operation is in progress, any new outgoing network messages will be held back until the write has completed. We say that these messages are waiting for the "output gate" to open. If the write ultimately fails, the outgoing network messages will be discarded and replaced with errors, while the Durable Object will be shut down and restarted from scratch.

With this rule, we no longer have to await the result of put(). Our code can happily continue executing and just assume the put() will succeed. If the put() doesn’t succeed, then anything the application does here will never be observable to the rest of the world anyway. For example, if the app prematurely sends a response to the user saying that the operation succeeded, this response will not actually be delivered until after the put() completes successfully. So, by the time the user receives the message, it is no longer "premature"! In the very rare event that the write operation fails, the user will not receive the premature confirmation at all.

Note that output gates apply not only to responses sent back to a client, but also to new outgoing requests made with fetch() — those requests will be delayed from being sent until all prior writes are confirmed. So, once again, it is impossible for anything else in the world to observe a premature confirmation.

With this change, our getUniqueNumber() implementation with in-memory caching is now fully correct, while retaining most of its speed advantage over the non-caching implementation. Except for the very first call, the application will never be blocked waiting for getUniqueNumber() to finish. The final response from the app to the client will be delayed pending write confirmation, but that write can be performed in parallel with any writes the application performs after getUniqueNumber() completes.

Part 3: Automatic in-memory caching

Our in-memory caching example now works great. But, it’s still a little bit complicated and unnatural to write. Let’s go back to our original, simple code one more time… can we make it fast by default?

// Can we make this not just work, but just work FAST?
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

The answer to this part is a classic one: we can add automatic caching to the storage layer, just like most operating systems do for disk storage.

We have rolled out an in-memory caching layer for Durable Objects. This layer keeps up to several megabytes worth of data directly in memory in the process where the object runs.

When a get() requests a key that is in cache, the operation returns immediately, without even context-switching out of the thread and isolate where the object is hosted. If the key is not in cache, then a storage request will still be needed, but reads complete relatively quickly.

Better yet, put() requests now always complete "instantaneously". A put() simply writes to cache. We rely on output gates ("part 2", above) to prevent the premature confirmation of writes to any external party. Writes will be coalesced (even if you await them), so that the output gate waits only for O(1) network round trips of latency, not O(n).

Moreover, because get() and put() now complete instantly in most or all cases, the negative impact of input gates on throughput is largely mitigated, because the gate now spends relatively little time blocked.

With Durable Objects built-in caching, our simple code is now just as fast as our code that manually implemented in-memory caching. Combined with input and output gates, our code is now simple, fast, and correct, all at the same time.

Bonus Correctness

Our caching layer provides some bonus consistency guarantees, in addition to performance.

First, writes are automatically coalesced. That is, if you perform multiple put() or delete() operations without awaiting them or anything else in between, then the operations are automatically grouped together and stored atomically. In the case of a sudden power failure, after coming back up, either all of the writes will have been stored, or none of them will. For example:

// Move a value from "foo" to "bar".
let val = await this.storage.get("foo");

this.storage.delete("foo");
this.storage.put("bar", val);
// There's no possibility of data loss, because the delete() and the
// following put() are automatically coalesced into one atomic
// operation. This is true as long as you do not `await` anything
// in between.

Second, the API is also able to provide stronger ordering guarantees for reads. Previously, overlapping storage operations did not have guaranteed ordering. For example, if you issued a get() and a put() on the same key at the same time (without awaiting one before starting the other), then it was not deterministic whether the get() might return the value written by the put() — regardless of the ordering of the statements in your code. The caching layer fixes this. Now, operations are performed in exactly the order in which they were initiated, regardless of when they complete.

These two features eliminate more subtle bugs that might otherwise be hard to catch in testing, so that you don’t have to be a database expert to write code that works.

Optional Bypass

We expect gates and caching will be a win in the vast majority of use cases, but not always. In some use cases, concurrency won’t lead to any problems, and so blocking it may be a loss. Sometimes, the application is OK with prematurely confirming writes in order to minimize latency. And sometimes, caching may just waste memory because the same keys are not frequently accessed.

For those cases, we offer explicit bypasses:

this.storage.get("foo", {allowConcurrency: true, noCache: true});
this.storage.put("foo", "bar", {allowUnconfirmed: true, noCache: true});

Developers who have taken the time to think carefully about these issues can use these flags to tune performance to their specific needs. For those who don’t want to think about it, the defaults should work well.

Conclusion

Concurrency is hard. It doesn’t matter if you’re a novice or an expert: even experts regularly get it wrong. It’s difficult to think about all the ways that concurrent operations might overlap to corrupt your application state.

The traditional answer has been to make applications stateless, and defer all concurrency control to the database layer using transactions. However, transactions are slow, which is a big reason why so many web applications today take hundreds of milliseconds or more to respond to basic actions.

Durable Objects are all about state. By keeping state in memory in addition to on disk, and directing requests for the same data to be coordinated through the same instance, we can make applications much faster. But until recently, this was extremely tricky to get right.

With input gates, output gates, and caching, code written in the most intuitive way now "just works", and runs fast. This means you can focus on building your application, without wasting time optimizing I/O performance and debugging obscure race conditions.

Building Waiting Room on Workers and Durable Objects

2021-06-16 Fabienne Semeria

Post Syndicated from Fabienne Semeria original https://blog.cloudflare.com/building-waiting-room-on-workers-and-durable-objects/

Building Waiting Room on Workers and Durable Objects

In January, we announced the Cloudflare Waiting Room, which has been available to select customers through Project Fair Shot to help COVID-19 vaccination web applications handle demand. Back then, we mentioned that our system was built on top of Cloudflare Workers and the then brand new Durable Objects. In the coming days, we are making Waiting Room available to customers on our Business and Enterprise plans. As we are expanding availability, we are taking this opportunity to share how we came up with this design.

What does the Waiting Room do?

You may have seen lines of people queueing in front of stores or other buildings during sales for a new sneaker or phone. That is because stores have restrictions on how many people can be inside at the same time. Every store has its own limit based on the size of the building and other factors. If more people want to get inside than the store can hold, there will be too many people in the store.

The same situation applies to web applications. When you build a web application, you have to budget for the infrastructure to run it. You make that decision according to how many users you think the site will have. But sometimes, the site can see surges of users above what was initially planned. This is where the Waiting Room can help: it stands between users and the web application and automatically creates an orderly queue during traffic spikes.

The main job of the Waiting Room is to protect a customer’s application while providing a good user experience. To do that, it must make sure that the number of users of the application around the world does not exceed limits set by the customer. Using this product should not degrade performance for end users, so it should not add significant latency and should admit them automatically. In short, this product has three main requirements: respect the customer’s limits for users on the web application, keep latency low, and provide a seamless end user experience.

When there are more users trying to access the web application than the limits the customer has configured, new users are given a cookie and greeted with a waiting room page. This page displays their estimated wait time and automatically refreshes until the user is automatically admitted to the web application.

Configuring Waiting Rooms

The important configurations that define how the waiting room operates are:

Total Active Users – the total number of active users that can be using the application at any given time
New Users Per Minute – how many new users per minute are allowed into the application, and
Session Duration – how long a user session lasts. Note: the session is renewed as long as the user is active. We terminate it after Session Duration minutes of inactivity.

How does the waiting room work?

If a web application is behind Cloudflare, every request from an end user to the web application will go to a Cloudflare data center close to them. If the web application enables the waiting room, Cloudflare issues a ticket to this user in the form of an encrypted cookie.

At any given moment, every waiting room has a limit on the number of users that can go to the web application. This limit is based on the customer configuration and the number of users currently on the web application. We refer to the number of users that can go into the web application at any given time as the number of user slots. The total number of users slots is equal to the limit configured by the customer minus the total number of users that have been let through.

When a traffic surge happens on the web application the number of user slots available on the web application keeps decreasing. Current user sessions need to end before new users go in. So user slots keep decreasing until there are no more slots. At this point the waiting room starts queueing.

The chart above is a customer’s traffic to a web application between 09:40 and 11:30. The configuration for total active users is set to 250 users (yellow line). As time progresses there are more and more users on the application. The number of user slots available (orange line) in the application keeps decreasing as more users get into the application (green line). When there are more users on the application, the number of slots available decreases and eventually users start queueing (blue line). Queueing users ensures that the total number of active users stays around the configured limit.

To effectively calculate the user slots available, every service at the edge data centers should let its peers know how many users it lets through to the web application.

Coordination within a data center is faster and more reliable than coordination between many different data centers. So we decided to divide the user slots available on the web application to individual limits for each data center. The advantage of doing this is that only the data center limits will get exceeded if there is a delay in traffic information getting propagated. This ensures we don’t overshoot by much even if there is a delay in getting the latest information.

The next step was to figure out how to divide this information between data centers. For this we decided to use the historical traffic data on the web application. More specifically, we track how many different users tried to access the application across every data center in the preceding few minutes. The great thing about historical traffic data is that it’s historical and cannot change anymore. So even with a delay in propagation, historical traffic data will be accurate even when the current traffic data is not.

Let’s see an actual example: the current time is Thu, 27 May 2021 16:33:20 GMT. For the minute Thu, 27 May 2021 16:31:00 GMT there were 50 users in Nairobi and 50 in Dublin. For the minute Thu, 27 May 2021 16:32:00 GMT there were 45 users in Nairobi and 55 in Dublin. This was the only traffic on the application during that time.

Every data center looks at what the share of traffic to each data center was two minutes in the past. For Thu, 27 May 2021 16:33:20 GMT that value is Thu, 27 May 2021 16:31:00 GMT.

Thu, 27 May 2021 16:31:00 GMT: 
{
  Nairobi: 0.5, //50/100(total) users
  Dublin: 0.5,  //50/100(total) users
},
Thu, 27 May 2021 16:32:00 GMT: 
{
  Nairobi: 0.45, //45/100(total) users
  Dublin: 0.55,  //55/100(total) users
}

For the minute Thu, 27 May 2021 16:33:00 GMT, the number of user slots available will be divided equally between Nairobi and Dublin as the traffic ratio for Thu, 27 May 2021 16:31:00 GMT is 0.5 and 0.5. So, if there are 1000 slots available, Nairobi will be able to send 500 and Dublin can send 500.

For the minute Thu, 27 May 2021 16:34:00 GMT, the number of user slots available will be divided using the ratio 0.45 (Nairobi) to 0.55 (Dublin). So if there are 1000 slots available, Nairobi will be able to send 450 and Dublin can send 550.

The service at the edge data centers counts the number of users it let into the web application. It will start queueing when the data center limit is approached. The presence of limits for the data center that change based on historical traffic helps us to have a system that doesn’t need to communicate often between data centers.

Clustering

In order to let people access the application fairly we need a way to keep track of their position in the queue. A bucket has an identifier (bucketId) calculated based on the time the user tried to visit the waiting room for the first time. All the users who visited the waiting room between 19:51:00 and 19:51:59 are assigned to the bucketId 19:51:00. It’s not practical to track every end user in the waiting room individually. When end users visit the application around the same time, they are given the same bucketId. So we cluster users who came around the same time as one time bucket.

We mentioned an encrypted cookie that is assigned to the user when they first visit the waiting room. Every time the user comes back, they bring this cookie with them. The cookie is a ticket for the user to get into the web application. The content below is the typical information the cookie contains when visiting the web application. This user first visited around Wed, 26 May 2021 19:51:00 GMT, waited for around 10 minutes and got accepted on Wed, 26 May 2021 20:01:13 GMT.

{
  "bucketId": "Wed, 26 May 2021 19:51:00 GMT",
  "lastCheckInTime": "Wed, 26 May 2021 20:01:13 GMT",
  "acceptedAt": "Wed, 26 May 2021 20:01:13 GMT",
 }

Here

bucketId – the bucketId is the cluster the ticket is assigned to. This tracks the position in the queue.

acceptedAt – the time when the user got accepted to the web application for the first time.

lastCheckInTime – the time when the user was last seen in the waiting room or the web application.

Once a user has been let through to the web application, we have to check how long they are eligible to spend there. Our customers can customize how long a user spends on the web application using Session Duration. Whenever we see an accepted user we set the cookie to expire Session Duration minutes from when we last saw them.

Waiting Room State

Previously we talked about the concept of user slots and how we can function even when there is a delay in communication between data centers. The waiting room state helps to accomplish this. It is formed by historical data of events happening in different data centers. So when a waiting room is first created, there is no waiting room state as there is no recorded traffic. The only information available is the customer’s configured limits. Based on that we start letting users in. In the background the service (introduced later in this post as Data Center Durable Object) running in the data center periodically reports about the tickets it has issued to a co-ordinating service and periodically gets a response back about things happening around the world.

As time progresses more and more users with different bucketIds show up in different parts of the globe. Aggregating this information from the different data centers gives the waiting room state.

Let’s look at an example: there are two data centers, one in Nairobi and the other in Dublin. When there are no user slots available for a data center, users start getting queued. Different users who were assigned different bucketIds get queued. The data center state from Dublin looks like this:

activeUsers: 50,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 20,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 40,
    }
  }
]

The same thing is happening in Nairobi and the data from there looks like this:

activeUsers: 151,
buckets: 
[ 
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
    },
  } 
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 30,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 20,
    }
  }
]

This information from data centers are reported in the background and aggregated to form a data structure similar to the one below:

activeUsers: 201, // 151(Nairobi) + 50(Dublin)
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2, // 2 users from (Nairobi)
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT", 
    data: 
    {
      waiting: 50, // 20 from Nairobi and 30 from Dublin
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60, // 20 from Nairobi and 40 from Dublin
    }
  }
]

The data structure above is a sorted list of all the bucketIds in the waiting room. The waiting field has information about how many people are waiting with a particular bucketId. The activeUsers field has information about the number of users who are active on the web application.

Imagine for this customer, the limits they have set in the dashboard are

Total Active Users – 200
New Users Per Minute – 200

As per their configuration only 200 customers can be at the web application at any time. So users slots available for the waiting room state above are 200 – 201(activeUsers) = -1. So no one can go in and users get queued.

Now imagine that some users have finished their session and activeUsers is now 148.

Now userSlotsAvailable = 200 – 148 = 52 users. We should let 52 of the users who have been waiting the longest into the application. We achieve this by giving the eligible slots to the oldest buckets in the queue. In the example below 2 users are waiting from bucket Thu, 27 May 2021 15:54:00 GMT and 50 users are waiting from bucket Thu, 27 May 2021 15:55:00 GMT. These are the oldest buckets in the queue who get the eligible slots.

activeUsers: 148,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
      eligibleSlots: 2,
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 50,
      eligibleSlots: 50,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60,
      eligibleSlots: 0,
    }
  }
]

If there are eligible slots available for all the users in their bucket, then they can be sent to the web application from any data center. This ensures the fairness of the waiting room.

There is another case that can happen where we do not have enough eligible slots for a whole bucket. When this happens things get a little more complicated as we cannot send everyone from that bucket to the web application. Instead, we allocate a share of eligible slots to each data center.

key: "Thu, 27 May 2021 15:56:00 GMT",
data: 
{
  waiting: 60,
  eligibleSlots: 20,
}

As we did before, we use the ratio of past traffic from each data center to decide how many users it can let through. So if the current time is Thu, 27 May 2021 16:34:10 GMT both data centers look at the traffic ratio in the past at Thu, 27 May 2021 16:32:00 GMT and send a subset of users from those data centers to the web application.

Thu, 27 May 2021 16:32:00 GMT: 
{
  Nairobi: 0.25, // 0.25 * 20 = 5 eligibleSlots
  Dublin: 0.75,  // 0.75 * 20 = 15 eligibleSlots
}

Estimated wait time

When a request comes from a user we look at their bucketId. Based on the bucketId it is possible to know how many people are in front of the user’s bucketId from the sorted list. Similar to how we track the activeUsers we also calculate the average number of users going to the web application per minute. Dividing the number of people who are in front of the user by the average number of users going to the web application gives us the estimated time. This is what is shown to the user who visits the waiting room.

avgUsersToWebApplication:  30,
activeUsers: 148,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
      eligibleSlots: 2,
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 50,
      eligibleSlots: 50,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60,
      eligibleSlots: 0,
    }
  }
]

In the case above for a user with bucketId Thu, 27 May 2021 15:56:00 GMT, there are 60 users ahead of them. With 30 activeUsersToWebApplication per minute, the estimated time to get into the web application is 60/30 which is 2 minutes.

Implementation with Workers and Durable Objects

Now that we have talked about the user experience and the algorithm, let’s focus on the implementation. Our product is specifically built for customers who experience high volumes of traffic, so we needed to run code at the edge in a highly scalable manner. Cloudflare has a great culture of building upon its own products, so we naturally thought of Workers. The Workers platform uses Isolates to scale up and can scale horizontally as there are more requests.

The Workers product has an ecosystem of tools like wrangler which help us to iterate and debug things quickly.

Workers also reduce long-term operational work.

For these reasons, the decision to build on Workers was easy. The more complex choice in our design was for the coordination. As we have discussed before, our workers need a way to share the waiting room state. We need every worker to be aware of changes in traffic patterns quickly in order to respond to sudden traffic spikes. We use the proportion of traffic from two minutes before to allocate user slots among data centers, so we need a solution to aggregate this data and make it globally available within this timeframe. Our design also relies on having fast coordination within a data center to react quickly to changes. We considered a few different solutions before settling on Cache and Durable Objects.

Idea #1: Workers KV

We started to work on the project around March 2020. At that point, Workers offered two options for storage: the Cache API and KV. Cache is shared only at the data center level, so for global coordination we had to use KV. Each worker writes its own key to KV that describes the requests it received and how it processed them. Each key is set to expire after a few minutes if the worker stopped writing. To create a workerState, the worker periodically does a list operation on the KV namespace to get the state around the world.

This design has some flaws because KV wasn’t built for a use case like this. The state of a waiting room changes all the time to match traffic patterns. Our use case is write intensive and KV is intended for read-intensive workflows. As a consequence, our proof of concept implementation turned out to be more expensive than expected. Moreover, KV is eventually consistent: it takes time for information written to KV to be available in all of our data centers. This is a problem for Waiting Room because we need fine-grained control to be able to react quickly to traffic spikes that may be happening simultaneously in several locations across the globe.

Idea #2: Centralized Database

Another alternative was to run our own databases in our core data centers. The Cache API in Workers lets us use the cache directly within a data center. If there is frequent communication with the core data centers to get the state of the world, the cached data in the data center should let us respond with minimal latency on the request hot path. There would be fine-grained control on when the data propagation happens and this time can be kept low.

As noted before, this application is very write-heavy and the data is rather short-lived. For these reasons, a standard relational database would not be a good fit. This meant we could not leverage the existing database clusters maintained by our in-house specialists. Rather, we would need to use an in-memory data store such as Redis, and we would have to set it up and maintain it ourselves. We would have to install a data store cluster in each of our core locations, fine tune our configuration, and make sure data is replicated between them. We would also have to create a proxy service running in our core data centers to gate access to that database and validate data before writing to it.

We could likely have made it work, at the cost of substantial operational overhead. While that is not insurmountable, this design would introduce a strong dependency on the availability of core data centers. If there were issues in the core data centers, it would affect the product globally whereas an edge-based solution would be more resilient. If an edge data center goes offline Anycast takes care of routing the traffic to the nearby data centers. This will ensure a web application will not be affected.

The Scalable Solution: Durable Objects

Around that time, we learned about Durable Objects. The product was in closed beta back then, but we decided to embrace Cloudflare’s thriving dogfooding culture and did not let that deter us. With Durable Objects, we could create one global Durable Object instance per waiting room instead of maintaining a single database. This object can exist anywhere in the world and handle redundancy and availability. So Durable Objects give us sharding for free. Durable Objects gave us fine-grained control as well as better availability as they run in our edge data centers. Additionally, each waiting room is isolated from the others: adverse events affecting one customer are less likely to spill over to other customers.

Implementation with Durable Objects
Based on these advantages, we decided to build our product on Durable Objects.

As mentioned above, we use a worker to decide whether to send users to the Waiting Room or the web application. That worker periodically sends a request to a Durable Object saying how many users it sent to the Waiting Room and how many it sent to the web application. A Durable Object instance is created on the first request and remains active as long as it is receiving requests. The Durable Object aggregates the counters sent by every worker to create a count of users sent to the Waiting Room and a count of users on the web application.

A Durable Object instance is only active as long as it is receiving requests and can be restarted during maintenance. When a Durable Object instance is restarted, its in-memory state is cleared. To preserve the in-memory data on Durable Object restarts, we back up the data using the Cache API. This offers weaker guarantees than using the Durable Object persistent storage as data may be evicted from cache, or the Durable Object can be moved to a different data center. If that happens, the Durable Object will have to start without cached data. On the other hand, persistent storage at the edge still has limited capacity. Since we can rebuild state very quickly from worker updates, we decided that cache is enough for our use case.

Scaling up
When traffic spikes happen around the world, new workers are created. Every worker needs to communicate how many users have been queued and how many have been let through to the web application. However, while workers automatically scale horizontally when traffic increases, Durable Objects do not. By design, there is only one instance of any Durable Object. This instance runs on a single thread so if it receives requests more quickly than it can respond, it can become overloaded. To avoid that, we cannot let every worker send its data directly to the same Durable Object. The way we achieve scalability is by sharding: we create per data center Durable Object instances that report up to one global instance.

The aggregation is done in two stages: at the data-center level and at the global level.

Data Center Durable Object
When a request comes to a particular location, we can see the corresponding data center by looking at the cf.colo field on the request. The Data Center Durable Object keeps track of the number of workers in the data center. It aggregates the state from all those workers. It also responds to workers with important information within a data center like the number of users making requests to a waiting room or number of workers. Frequently, it updates the Global Durable Object and receives information about other data centers as the response.

Worker User Slots

Above we talked about how a data center gets user slots allocated to it based on the past traffic patterns. If every worker in the data center talks to the Data Center Durable Object on every request, the Durable Object could get overwhelmed. Worker User Slots help us to overcome this problem.

Every worker keeps track of the number of users it has let through to the web application and the number of users that it has queued. The worker user slots are the number of users a worker can send to the web application at any point in time. This is calculated from the user slots available for the data center and the worker count in the data center. We divide the total number of user slots available for the data center by the number of workers in the data center to get the user slots available for each worker. If there are two workers and 10 users that can be sent to the web application from the data center, then we allocate five as the budget for each worker. This division is needed because every worker makes its own decisions on whether to send the user to the web application or the waiting room without talking to anyone else.

When the traffic changes, new workers can spin up or old workers can die. The worker count in a data center is dynamic as the traffic to the data center changes. Here we make a trade off similar to the one for inter data center coordination: there is a risk of overshooting the limit if many more workers are created between calls to the Data Center Durable Object. But too many calls to the Data Center Durable Object would make it hard to scale. In this case though, we can use Cache for faster synchronization within the data center.

Cache

On every interaction to the Data Center Durable Object, the worker saves a copy of the data it receives to the cache. Every worker frequently talks to the cache to update the state it has in memory with the state in cache. We also adaptively adjust the rate of writes from the workers to the Data Center Durable Object based on the number of workers in the data center. This helps to ensure that we do not take down the Data Center Durable Object when traffic changes.

Global Durable Object

The Global Durable Object is designed to be simple and stores the information it receives from any data center in memory. It responds with the information it has about all data centers. It periodically saves its in-memory state to cache using the Workers Cache API so that it can withstand restarts as mentioned above.

Recap

This is how the waiting room works right now. Every request with the enabled waiting room goes to a worker at a Cloudflare edge data center. When this happens, the worker looks for the state of the waiting room in the Cache first. We use cache here instead of Data Center Durable Object so that we do not overwhelm the Durable Object instance when there is a spike in traffic. Plus, reading data from cache is faster. The workers periodically make a request to the Data Center Durable Object to get the waiting room state which they then write to the cache. The idea here is that the cache should have a recent copy of the waiting room state.

Workers can examine the request to know which data center they are in. Every worker periodically makes a request to the corresponding Data Center Durable Object. This interaction updates the worker state in the Data Center Durable Object. In return, the workers get the waiting room state from the Data Center Durable Object. The Data Center Durable Object sends the data center state to the Global Durable Object periodically. In the response, the Data Center Durable Object receives all data center states globally. It then calculates the waiting room state and returns that state to a worker in its response.

The advantage of this design is that it’s possible to adjust the rate of writes from workers to the Data Center Durable Object and from the Data Center Durable Object to the Global Durable Object based on the traffic received in the waiting room. This helps us respond to requests during high traffic without overloading the individual Durable Object instances.

Conclusion

By using Workers and Durable Objects, Waiting Room was able to scale up to keep web application servers online for many of our early customers during large spikes of traffic. It helped keep vaccination sign-ups online for companies and governments around the world for free through Project Fair Shot: Verto Health was able to serve over 4 million customers in Canada; Ticket Tailor reduced their peak resource utilization from 70% down to 10%; the County of San Luis Obispo was able to stay online during traffic surges of up to 23,000 users; and the country of Latvia was able to stay online during surges of thousands of requests per second. These are just a few of the customers we served and will continue to serve until Project Fair Shot ends.

In the coming days, we are rolling out the Waiting Room to customers on our business plan. Sign up today to prevent spikes of traffic to your web application. If you are interested in access to Durable Objects, it’s currently available to try out in Open Beta.

Noise

Tag Archives: Durable Objects

Durable Objects: Easy, Fast, Correct — Choose three.

Background: Durable Objects are Single-Threaded

An Example

The Race Condition

The Slowness

The Wrong Fixes

Transactions?

In-memory caching?

Our answer: Make it automatic

Part 1: Input Gates

Part 2: Output Gates

Part 3: Automatic in-memory caching

Bonus Correctness

Optional Bypass

Conclusion

Building Waiting Room on Workers and Durable Objects

What does the Waiting Room do?

Configuring Waiting Rooms

How does the waiting room work?

Clustering

Waiting Room State

Estimated wait time

Implementation with Workers and Durable Objects

Idea #1: Workers KV

Idea #2: Centralized Database

The Scalable Solution: Durable Objects

Worker User Slots

Cache

Global Durable Object

Recap

Conclusion

The collective thoughts of the interwebz

Background: Durable Objects are Single-Threaded

An Example

The Race Condition

The Slowness

The Wrong Fixes

Transactions?

In-memory caching?

Our answer: Make it automatic

Part 1: Input Gates

Part 2: Output Gates

Part 3: Automatic in-memory caching

Bonus Correctness

Optional Bypass

Conclusion

What does the Waiting Room do?

Configuring Waiting Rooms

How does the waiting room work?

Clustering

What is in the cookie returned to an end user?

Waiting Room State

Estimated wait time

Implementation with Workers and Durable Objects

Idea #1: Workers KV

Idea #2: Centralized Database

The Scalable Solution: Durable Objects

Worker User Slots

Cache

Global Durable Object

Recap

Conclusion

The collective thoughts of the interwebz