Tag Archives: Waiting Room

Cloudflare and COVID-19: Project Fair Shot Update

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/cloudflare-and-covid-19-project-fair-shot-update/

Cloudflare and COVID-19: Project Fair Shot Update

Cloudflare and COVID-19: Project Fair Shot Update

In February 2021, Cloudflare launched Project Fair Shot — a program that gave our Waiting Room product free of charge to any government, municipality, private/public business, or anyone responsible for the scheduling and/or dissemination of the COVID-19 vaccine.

By having our Waiting Room technology in front of the vaccine scheduling application, it ensured that:

  • Applications would remain available, reliable, and resilient against massive spikes of traffic for users attempting to get their vaccine appointment scheduled.
  • Visitors could wait for their long-awaited vaccine with confidence, arriving at a branded queuing page that provided accurate, estimated wait times.
  • Vaccines would get distributed equitably, and not just to folks with faster reflexes or Internet connections.

Since February, we’ve seen a good number of participants in Project Fair Shot. To date, we have helped more than 100 customers across more than 10 countries to schedule approximately 100 million vaccinations. Even better, these vaccinations went smoothly, with customers like the County of San Luis Obispo regularly dealing with more than 20,000 appointments in a day.  “The bottom line is Cloudflare saved lives today. Our County will forever be grateful for your participation in getting the vaccine to those that need it most in an elegant, efficient and ethical manner” — Web Services Administrator for the County of San Luis Obispo.

We are happy to have helped not just in the US, but worldwide as well. In Canada, we partnered with a number of organizations and the Canadian government to increase access to the vaccine. One partner stated: “Our relationship with Cloudflare went from ‘Let’s try Waiting Room’ to ‘Unless you have this, we’re not going live with that public-facing site.’” — CEO of Verto Health. In another country in Europe, we saw over three million people go through the Waiting Room in less than 24 hours, leading to a significantly smoother and less stressful experience. Cities in Japan, — working closely with our partner, Classmethod — have been able to vaccinate over 40 million people and are on track to complete their vaccination process across 317 cities. If you want more stories from Project Fair Shot, check out our case studies.

Cloudflare and COVID-19: Project Fair Shot Update
A European customer seeing very high amounts of traffic during a vaccination event

We are continuing to add more customers to Project Fair Shot every day to ensure we are doing all that we can to help distribute more vaccines. With the emergence of the Delta variant and others, vaccine distribution (and soon, booster shots) is still very much a real problem to keep everyone healthy and resilient. Because of these new developments, Cloudflare will be extending Project Fair Shot until at least July 1, 2022. Though we are not excited to see the pandemic continue, we are humbled to be able to provide our services and be a critical part in helping us collectively move towards a better tomorrow.

Building Waiting Room on Workers and Durable Objects

Post Syndicated from Fabienne Semeria original https://blog.cloudflare.com/building-waiting-room-on-workers-and-durable-objects/

Building Waiting Room on Workers and Durable Objects

Building Waiting Room on Workers and Durable Objects

In January, we announced the Cloudflare Waiting Room, which has been available to select customers through Project Fair Shot to help COVID-19 vaccination web applications handle demand. Back then, we mentioned that our system was built on top of Cloudflare Workers and the then brand new Durable Objects. In the coming days, we are making Waiting Room available to customers on our Business and Enterprise plans. As we are expanding availability, we are taking this opportunity to share how we came up with this design.

What does the Waiting Room do?

You may have seen lines of people queueing in front of stores or other buildings during sales for a new sneaker or phone. That is because stores have restrictions on how many people can be inside at the same time. Every store has its own limit based on the size of the building and other factors. If more people want to get inside than the store can hold, there will be too many people in the store.

The same situation applies to web applications. When you build a web application, you have to budget for the infrastructure to run it. You make that decision according to how many users you think the site will have. But sometimes, the site can see surges of users above what was initially planned. This is where the Waiting Room can help: it stands between users and the web application and automatically creates an orderly queue during traffic spikes.

The main job of the Waiting Room is to protect a customer’s application while providing a good user experience. To do that, it must make sure that the number of users of the application around the world does not exceed limits set by the customer. Using this product should not degrade performance for end users, so it should not add significant latency and should admit them automatically. In short, this product has three main requirements: respect the customer’s limits for users on the web application, keep latency low, and provide a seamless end user experience.

When there are more users trying to access the web application than the limits the customer has configured, new users are given a cookie and greeted with a waiting room page. This page displays their estimated wait time and automatically refreshes until the user is automatically admitted to the web application.

Building Waiting Room on Workers and Durable Objects

Configuring Waiting Rooms

The important configurations that define how the waiting room operates are:

  1. Total Active Users – the total number of active users that can be using the application at any given time
  2. New Users Per Minute – how many new users per minute are allowed into the application, and
  3. Session Duration – how long a user session lasts. Note: the session is renewed as long as the user is active. We terminate it after Session Duration minutes of inactivity.

How does the waiting room work?

If a web application is behind Cloudflare, every request from an end user to the web application will go to a Cloudflare data center close to them. If the web application enables the waiting room, Cloudflare issues a ticket to this user in the form of an encrypted cookie.

Building Waiting Room on Workers and Durable Objects
Waiting Room Overview

At any given moment, every waiting room has a limit on the number of users that can go to the web application. This limit is based on the customer configuration and the number of users currently on the web application. We refer to the number of users that can go into the web application at any given time as the number of user slots. The total number of users slots is equal to the limit configured by the customer minus the total number of users that have been let through.

When a traffic surge happens on the web application the number of user slots available on the web application keeps decreasing. Current user sessions need to end before new users go in. So user slots keep decreasing until there are no more slots. At this point the waiting room starts queueing.

Building Waiting Room on Workers and Durable Objects

The chart above is a customer’s traffic to a web application between 09:40 and 11:30. The configuration for total active users is set to 250 users (yellow line). As time progresses there are more and more users on the application. The number of user slots available (orange line) in the application keeps decreasing as more users get into the application (green line). When there are more users on the application, the number of slots available decreases and eventually users start queueing (blue line). Queueing users ensures that the total number of active users stays around the configured limit.

To effectively calculate the user slots available, every service at the edge data centers should let its peers know how many users it lets through to the web application.

Coordination within a data center is faster and more reliable than coordination between many different data centers. So we decided to divide the user slots available on the web application to individual limits for each data center. The advantage of doing this is that only the data center limits will get exceeded if there is a delay in traffic information getting propagated. This ensures we don’t overshoot by much even if there is a delay in getting the latest information.

The next step was to figure out how to divide this information between data centers. For this we decided to use the historical traffic data on the web application. More specifically, we track how many different users tried to access the application across every data center in the preceding few minutes. The great thing about historical traffic data is that it’s historical and cannot change anymore. So even with a delay in propagation, historical traffic data will be accurate even when the current traffic data is not.

Let’s see an actual example: the current time is Thu, 27 May 2021 16:33:20 GMT. For the minute Thu, 27 May 2021 16:31:00 GMT there were 50 users in Nairobi and 50 in Dublin. For the minute Thu, 27 May 2021 16:32:00 GMT there were 45 users in Nairobi and 55 in Dublin. This was the only traffic on the application during that time.

Every data center looks at what the share of traffic to each data center was two minutes in the past. For Thu, 27 May 2021 16:33:20 GMT that value is Thu, 27 May 2021 16:31:00 GMT.

Thu, 27 May 2021 16:31:00 GMT: 
{
  Nairobi: 0.5, //50/100(total) users
  Dublin: 0.5,  //50/100(total) users
},
Thu, 27 May 2021 16:32:00 GMT: 
{
  Nairobi: 0.45, //45/100(total) users
  Dublin: 0.55,  //55/100(total) users
}

For the minute Thu, 27 May 2021 16:33:00 GMT, the number of user slots available will be divided equally between Nairobi and Dublin as the traffic ratio for Thu, 27 May 2021 16:31:00 GMT is 0.5 and 0.5. So, if there are 1000 slots available, Nairobi will be able to send 500 and Dublin can send 500.

For the minute Thu, 27 May 2021 16:34:00 GMT, the number of user slots available will be divided using the ratio 0.45 (Nairobi) to 0.55 (Dublin). So if there are 1000 slots available, Nairobi will be able to send 450 and Dublin can send 550.

Building Waiting Room on Workers and Durable Objects

The service at the edge data centers counts the number of users it let into the web application. It will start queueing when the data center limit is approached. The presence of limits for the data center that change based on historical traffic helps us to have a system that doesn’t need to communicate often between data centers.

Clustering

In order to let people access the application fairly we need a way to keep track of their position in the queue. A bucket has an identifier (bucketId) calculated based on the time the user tried to visit the waiting room for the first time.  All the users who visited the waiting room between 19:51:00 and 19:51:59 are assigned to the bucketId 19:51:00. It’s not practical to track every end user in the waiting room individually. When end users visit the application around the same time, they are given the same bucketId. So we cluster users who came around the same time as one time bucket.

We mentioned an encrypted cookie that is assigned to the user when they first visit the waiting room. Every time the user comes back, they bring this cookie with them. The cookie is a ticket for the user to get into the web application. The content below is the typical information the cookie contains when visiting the web application. This user first visited around Wed, 26 May 2021 19:51:00 GMT, waited for around 10 minutes and got accepted on Wed, 26 May 2021 20:01:13 GMT.

{
  "bucketId": "Wed, 26 May 2021 19:51:00 GMT",
  "lastCheckInTime": "Wed, 26 May 2021 20:01:13 GMT",
  "acceptedAt": "Wed, 26 May 2021 20:01:13 GMT",
 }

Here

bucketId – the bucketId is the cluster the ticket is assigned to. This tracks the position in the queue.

acceptedAt – the time when the user got accepted to the web application for the first time.

lastCheckInTime – the time when the user was last seen in the waiting room or the web application.

Once a user has been let through to the web application, we have to check how long they are eligible to spend there. Our customers can customize how long a user spends on the web application using Session Duration. Whenever we see an accepted user we set the cookie to expire Session Duration minutes from when we last saw them.

Waiting Room State

Previously we talked about the concept of user slots and how we can function even when there is a delay in communication between data centers. The waiting room state helps to accomplish this. It is formed by historical data of events happening in different data centers. So when a waiting room is first created, there is no waiting room state as there is no recorded traffic. The only information available is the customer’s configured limits. Based on that we start letting users in. In the background the service (introduced later in this post as Data Center Durable Object) running in the data center periodically reports about the tickets it has issued to a co-ordinating service and periodically gets a response back about things happening around the world.

As time progresses more and more users with different bucketIds show up in different parts of the globe. Aggregating this information from the different data centers gives the waiting room state.

Let’s look at an example: there are two data centers, one in Nairobi and the other in Dublin. When there are no user slots available for a data center, users start getting queued. Different users who were assigned different bucketIds get queued. The data center state from Dublin looks like this:

activeUsers: 50,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 20,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 40,
    }
  }
]

The same thing is happening in Nairobi and the data from there looks like this:

activeUsers: 151,
buckets: 
[ 
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
    },
  } 
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 30,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 20,
    }
  }
]

This information from data centers are reported in the background and aggregated to form a data structure similar to the one below:

activeUsers: 201, // 151(Nairobi) + 50(Dublin)
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2, // 2 users from (Nairobi)
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT", 
    data: 
    {
      waiting: 50, // 20 from Nairobi and 30 from Dublin
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60, // 20 from Nairobi and 40 from Dublin
    }
  }
]

The data structure above is a sorted list of all the bucketIds in the waiting room. The waiting field has information about how many people are waiting with a particular bucketId. The activeUsers field has information about the number of users who are active on the web application.

Imagine for this customer, the limits they have set in the dashboard are

Total Active Users – 200
New Users Per Minute – 200

As per their configuration only 200 customers can be at the web application at any time. So users slots available for the waiting room state above are 200 – 201(activeUsers) = -1. So no one can go in and users get queued.

Now imagine that some users have finished their session and activeUsers is now 148.

Now userSlotsAvailable = 200 – 148 = 52 users. We should let 52 of the users who have been waiting the longest into the application. We achieve this by giving the eligible slots to the oldest buckets in the queue. In the example below 2 users are waiting from bucket Thu, 27 May 2021 15:54:00 GMT and 50 users are waiting from bucket Thu, 27 May 2021 15:55:00 GMT. These are the oldest buckets in the queue who get the eligible slots.

activeUsers: 148,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
      eligibleSlots: 2,
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 50,
      eligibleSlots: 50,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60,
      eligibleSlots: 0,
    }
  }
]

If there are eligible slots available for all the users in their bucket, then they can be sent to the web application from any data center. This ensures the fairness of the waiting room.

There is another case that can happen where we do not have enough eligible slots for a whole bucket. When this happens things get a little more complicated as we cannot send everyone from that bucket to the web application. Instead, we allocate a share of eligible slots to each data center.

key: "Thu, 27 May 2021 15:56:00 GMT",
data: 
{
  waiting: 60,
  eligibleSlots: 20,
}

As we did before, we use the ratio of past traffic from each data center to decide how many users it can let through. So if the current time is Thu, 27 May 2021 16:34:10 GMT both data centers look at the traffic ratio in the past at Thu, 27 May 2021 16:32:00 GMT and send a subset of users from those data centers to the web application.

Thu, 27 May 2021 16:32:00 GMT: 
{
  Nairobi: 0.25, // 0.25 * 20 = 5 eligibleSlots
  Dublin: 0.75,  // 0.75 * 20 = 15 eligibleSlots
}

Estimated wait time

When a request comes from a user we look at their bucketId. Based on the bucketId it is possible to know how many people are in front of the user’s bucketId from the sorted list. Similar to how we track the activeUsers we also calculate the average number of users going to the web application per minute. Dividing the number of people who are in front of the user by the average number of users going to the web application gives us the estimated time. This is what is shown to the user who visits the waiting room.

avgUsersToWebApplication:  30,
activeUsers: 148,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
      eligibleSlots: 2,
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 50,
      eligibleSlots: 50,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60,
      eligibleSlots: 0,
    }
  }
]

In the case above for a user with bucketId Thu, 27 May 2021 15:56:00 GMT, there are 60 users ahead of them. With 30 activeUsersToWebApplication per minute, the estimated time to get into the web application is 60/30 which is 2 minutes.

Implementation with Workers and Durable Objects

Now that we have talked about the user experience and the algorithm, let’s focus on the implementation. Our product is specifically built for customers who experience high volumes of traffic, so we needed to run code at the edge in a highly scalable manner. Cloudflare has a great culture of building upon its own products, so we naturally thought of Workers. The Workers platform uses Isolates to scale up and can scale horizontally as there are more requests.

The Workers product has an ecosystem of tools like wrangler which help us to iterate and debug things quickly.

Workers also reduce long-term operational work.

For these reasons, the decision to build on Workers was easy. The more complex choice in our design was for the coordination. As we have discussed before, our workers need a way to share the waiting room state. We need every worker to be aware of changes in traffic patterns quickly in order to respond to sudden traffic spikes. We use the proportion of traffic from two minutes before to allocate user slots among data centers, so we need a solution to aggregate this data and make it globally available within this timeframe. Our design also relies on having fast coordination within a data center to react quickly to changes. We considered a few different solutions before settling on Cache and Durable Objects.

Idea #1: Workers KV

We started to work on the project around March 2020. At that point, Workers offered two options for storage: the Cache API and KV. Cache is shared only at the data center level, so for global coordination we had to use KV. Each worker writes its own key to KV that describes the requests it received and how it processed them. Each key is set to expire after a few minutes if the worker stopped writing. To create a workerState, the worker periodically does a list operation on the KV namespace to get the state around the world.

Building Waiting Room on Workers and Durable Objects
Design using KV

This design has some flaws because KV wasn’t built for a use case like this. The state of a waiting room changes all the time to match traffic patterns. Our use case is write intensive and KV is intended for read-intensive workflows. As a consequence, our proof of concept implementation turned out to be more expensive than expected. Moreover, KV is eventually consistent: it takes time for information written to KV to be available in all of our data centers. This is a problem for Waiting Room because we need fine-grained control to be able to react quickly to traffic spikes that may be happening simultaneously in several locations across the globe.

Idea #2: Centralized Database

Another alternative was to run our own databases in our core data centers. The Cache API in Workers lets us use the cache directly within a data center. If there is frequent communication with the core data centers to get the state of the world, the cached data in the data center should let us respond with minimal latency on the request hot path. There would be fine-grained control on when the data propagation happens and this time can be kept low.

Building Waiting Room on Workers and Durable Objects
Design using Core Data centers‌‌

As noted before, this application is very write-heavy and the data is rather short-lived. For these reasons, a standard relational database would not be a good fit. This meant we could not leverage the existing database clusters maintained by our in-house specialists. Rather, we would need to use an in-memory data store such as Redis, and we would have to set it up and maintain it ourselves. We would have to install a data store cluster in each of our core locations, fine tune our configuration, and make sure data is replicated between them. We would also have to create a  proxy service running in our core data centers to gate access to that database and validate data before writing to it.

We could likely have made it work, at the cost of substantial operational overhead. While that is not insurmountable, this design would introduce a strong dependency on the availability of core data centers. If there were issues in the core data centers, it would affect the product globally whereas an edge-based solution would be more resilient. If an edge data center goes offline Anycast takes care of routing the traffic to the nearby data centers. This will ensure a web application will not be affected.

The Scalable Solution: Durable Objects

Around that time, we learned about Durable Objects. The product was in closed beta back then, but we decided to embrace Cloudflare’s thriving dogfooding culture and did not let that deter us. With Durable Objects, we could create one global Durable Object instance per waiting room instead of maintaining a single database. This object can exist anywhere in the world and handle redundancy and availability. So Durable Objects give us sharding for free. Durable Objects gave us fine-grained control as well as better availability as they run in our edge data centers. Additionally, each waiting room is isolated from the others: adverse events affecting one customer are less likely to spill over to other customers.

Implementation with Durable Objects
Based on these advantages, we decided to build our product on Durable Objects.

As mentioned above, we use a worker to decide whether to send users to the Waiting Room or the web application. That worker periodically sends a request to a Durable Object saying how many users it sent to the Waiting Room and how many it sent to the web application. A Durable Object instance is created on the first request and remains active as long as it is receiving requests. The Durable Object aggregates the counters sent by every worker to create a count of users sent to the Waiting Room and a count of users on the web application.

Building Waiting Room on Workers and Durable Objects

A Durable Object instance is only active as long as it is receiving requests and can be restarted during maintenance. When a Durable Object instance is restarted, its in-memory state is cleared. To preserve the in-memory data on Durable Object restarts, we back up the data using the Cache API. This offers weaker guarantees than using the Durable Object persistent storage as data may be evicted from cache, or the Durable Object can be moved to a different data center. If that happens, the Durable Object will have to start without cached data. On the other hand, persistent storage at the edge still has limited capacity. Since we can rebuild state very quickly from worker updates, we decided that cache is enough for our use case.

Scaling up
When traffic spikes happen around the world, new workers are created. Every worker needs to communicate how many users have been queued and how many have been let through to the web application. However, while workers automatically scale horizontally when traffic increases, Durable Objects do not. By design, there is only one instance of any Durable Object. This instance runs on a single thread so if it receives requests more quickly than it can respond, it can become overloaded. To avoid that, we cannot let every worker send its data directly to the same Durable Object. The way we achieve scalability is by sharding: we create per data center Durable Object instances that report up to one global instance.

Building Waiting Room on Workers and Durable Objects
Durable Objects implementation

The aggregation is done in two stages: at the data-center level and at the global level.

Data Center Durable Object
When a request comes to a particular location, we can see the corresponding data center by looking at the cf.colo field on the request. The Data Center Durable Object keeps track of the number of workers in the data center. It aggregates the state from all those workers. It also responds to workers with important information within a data center like the number of users making requests to a waiting room or number of workers. Frequently, it updates the Global Durable Object and receives information about other data centers as the response.

Worker User Slots

Above we talked about how a data center gets user slots allocated to it based on the past traffic patterns. If every worker in the data center talks to the Data Center Durable Object on every request, the Durable Object could get overwhelmed. Worker User Slots help us to overcome this problem.

Every worker keeps track of the number of users it has let through to the web application and the number of users that it has queued. The worker user slots are the number of users a worker can send to the web application at any point in time. This is calculated from the user slots available for the data center and the worker count in the data center. We divide the total number of user slots available for the data center by the number of workers in the data center to get the user slots available for each worker. If there are two workers and 10 users that can be sent to the web application from the data center, then we allocate five as the budget for each worker. This division is needed because every worker makes its own decisions on whether to send the user to the web application or the waiting room without talking to anyone else.

Building Waiting Room on Workers and Durable Objects
Waiting room inside a data center

When the traffic changes, new workers can spin up or old workers can die. The worker count in a data center is dynamic as the traffic to the data center changes. Here we make a trade off similar to the one for inter data center coordination: there is a risk of overshooting the limit if many more workers are created between calls to the Data Center Durable Object. But too many calls to the Data Center Durable Object would make it hard to scale. In this case though, we can use Cache for faster synchronization within the data center.

Cache

On every interaction to the Data Center Durable Object, the worker saves a copy of the data it receives to the cache. Every worker frequently talks to the cache to update the state it has in memory with the state in cache. We also adaptively adjust the rate of writes from the workers to the Data Center Durable Object based on the number of workers in the data center. This helps to ensure that we do not take down the Data Center Durable Object when traffic changes.

Global Durable Object

The Global Durable Object is designed to be simple and stores the information it receives from any data center in memory. It responds with the information it has about all data centers. It periodically saves its in-memory state to cache using the Workers Cache API so that it can withstand restarts as mentioned above.

Building Waiting Room on Workers and Durable Objects
Components of waiting room

Recap

This is how the waiting room works right now. Every request with the enabled waiting room goes to a worker at a Cloudflare edge data center. When this happens, the worker looks for the state of the waiting room in the Cache first. We use cache here instead of Data Center Durable Object so that we do not overwhelm the Durable Object instance when there is a spike in traffic. Plus, reading data from cache is faster. The workers periodically make a request to the Data Center Durable Object to get the waiting room state which they then write to the cache. The idea here is that the cache should have a recent copy of the waiting room state.

Workers can examine the request to know which data center they are in. Every worker periodically makes a request to the corresponding Data Center Durable Object. This interaction updates the worker state in the Data Center Durable Object. In return, the workers get the waiting room state from the Data Center Durable Object. The Data Center Durable Object sends the data center state to the Global Durable Object periodically. In the response, the Data Center Durable Object receives all data center states globally. It then calculates the waiting room state and returns that state to a worker in its response.

The advantage of this design is that it’s possible to adjust the rate of writes from workers to the Data Center Durable Object and from the Data Center Durable Object to the Global Durable Object based on the traffic received in the waiting room. This helps us respond to requests during high traffic without overloading the individual Durable Object instances.

Conclusion

By using Workers and Durable Objects, Waiting Room was able to scale up to keep web application servers online for many of our early customers during large spikes of traffic. It helped keep vaccination sign-ups online for companies and governments around the world for free through Project Fair Shot: Verto Health was able to serve over 4 million customers in Canada; Ticket Tailor reduced their peak resource utilization from 70% down to 10%; the County of San Luis Obispo was able to stay online during traffic surges of up to 23,000 users; and the country of Latvia was able to stay online during surges of thousands of requests per second. These are just a few of the customers we served and will continue to serve until Project Fair Shot ends.

In the coming days, we are rolling out the Waiting Room to customers on our business plan. Sign up today to prevent spikes of traffic to your web application. If you are interested in access to Durable Objects, it’s currently available to try out in Open Beta.

Cloudflare Waiting Room

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/cloudflare-waiting-room/

Cloudflare Waiting Room

Cloudflare Waiting Room

Today, we are excited to announce Cloudflare Waiting Room! It will first be available to select customers through a new program called Project Fair Shot which aims to help with the problem of overwhelming demand for COVID-19 vaccinations causing appointment registration websites to fail. General availability in our Business and Enterprise plans will be added in the near future.

Wait, you’re excited about a… Waiting Room?

Most of us are familiar with the concept of a waiting room, and rarely are we excited about the idea of being in one. Usually our first experience of one is at a doctor’s office — yes, you have an appointment, but sometimes the doctor is running late (or one of the patients was). Given the doctor can only see one person at a time… the waiting room was born, as a mechanism to queue up patients.

While servers can handle more concurrent requests than a doctor can, they too can be overwhelmed. If, in a pre-COVID world, you’ve ever tried buying tickets to a popular concert or event, you’ve probably encountered a waiting room online. It limits requests inbound to an application, and places these requests into a virtual queue. Once the number of users in the application has reduced, new users are let in within the defined thresholds the application can handle. This protects the origin servers supporting the application from being inundated with too many requests, while also ensuring equity from a user perspective — users who try to access a resource when the system is overloaded are not unfairly dropped and forced to reconnect, hoping to join their chance in the queue.

Why Now?

Given not many of us are going to live concerts any time soon, why is Cloudflare doing this now?

Well, perhaps we aren’t going to concerts, but the second order effects of COVID-19 have created a huge need for waiting rooms. First of all, given social distancing and the closing of many places of business and government, customers and citizens have shifted to online channels, putting substantially more strain on business and government infrastructure.

Second, the pandemic and the flow-on consequences of it have meant many folks around the world have come to rely on resources that they didn’t need twelve months earlier. To be specific, these are often health or government-related resources — for example, unemployment insurance websites. The online infrastructure was set up to handle a peak load that didn’t foresee the impact of COVID-19. We’re seeing a similar pattern emerge with websites that are related to vaccines.

Historically, the number of organizations that needed waiting rooms was quite small. The nature of most businesses online usually involve a more consistent user load, rather than huge crushes of people all at once. Those organizations were able to build custom waiting rooms and were integrated deeply into their application (for example, buying tickets).  With Cloudflare’s Waiting Room, no code changes to the application are necessary and a Waiting Room can be set up in a matter of minutes for any website without writing a single line of code.

Whether you are an engineering architect or a business operations analyst, setting up a Waiting Room is simple. We make it quick and easy to ensure your applications are reliable and protected from unexpected spikes in traffic.  Other features we felt were important are automatic enablement and dynamic outflow. In other words, a waiting room should turn on automatically when thresholds are exceeded and as users finish their tasks in the application, let out different sized buckets of users and intake new ones already in the queue. It should just work. Lastly, we’ve seen the major impact COVID-19 has made on users and businesses alike, especially, but not limited to, the health and government sectors. We wanted to provide another way to ensure these applications remain available and functional so all users can receive the care that they need and not errors within their browser.

How does Cloudflare’s Waiting Room work?

We built Waiting Room on top of our edge network and our Workers product. By leveraging Workers and our new Durable Objects offerings, we were able to remove the need for any customer coding and provide a seamless, out of the box product that will ‘just work’. On top of this, we get the benefits of the scale and performance of our Workers product to ensure we maintain extremely low latency overhead, keep estimated times presented to end users accurate as can be and not keep any user in the queue longer than needed. But building a centralized system in a decentralized network is no easy task. When requests come into an application from around the world, we need to be able to get a broad, accurate view of what that load looks like inbound and outbound to a given application.

Cloudflare Waiting Room
Request going through Cloudflare without a Waiting Room

These requests, as fast as they are, still take time to travel across the planet. And so, a unique edge case was presented. What if a website is getting reasonable traffic from North America and Europe, but then a sudden major spike of traffic takes place from South America – how do we know when to keep letting users into the application and when to kick in the Waiting Room to protect the origin servers from being overloaded?

Thanks to some clever engineering and our Workers product, we were able to create a system that almost immediately keeps itself synced with global demand to an application giving us the necessary insight into when we should and should not be queueing users into the Waiting Room. By leveraging our global Anycast network and over 200+ data centers, we remove any single point of failure to protect our customers’ infrastructure yet also provide a great experience to end-users who have to wait a small amount of time to enter the application under high load.

Cloudflare Waiting Room
Request going through Cloudflare with a Waiting Room

How to setup a Waiting Room

Setting up a Waiting Room is incredibly easy and very fast! At the easiest side of the scale, a user needs to fill out only five fields: 1) the name of the Waiting Room, 2) a hostname (which will already be pre-populated with the zone it’s being configured on), 3) the total active users that can be in the application at any given time, 4) the new users per minute allowed into the application, and 5) the session duration for any given user. No coding or any application changes are necessary.

Cloudflare Waiting Room

We provide the option of using our default Waiting Room template for customers who don’t want to add additional branding. This simplifies the process of getting a Waiting Room up and running.

Cloudflare Waiting Room

That’s it! Press save and the Waiting Room is ready to go!

Cloudflare Waiting Room

For customers with more time and technical ability, the same process is followed, except we give full customization capabilities to our users so they can brand the Waiting Room, ensuring it matches the look and feel of their overall product.

Cloudflare Waiting Room

Lastly, managing different Waiting Rooms is incredibly easy. With our Manage Waiting Room table, at a glance you are able to get a full snapshot of which rooms are actively queueing, not queueing, and/or disabled.

Cloudflare Waiting Room

We are very excited to put the power of our Waiting Room into the hands of our customers to ensure they continue to focus on their businesses and customers. Keep an eye out for another blog post coming soon with major updates to our Waiting Room product for Enterprise!