Tag Archives: Waiting Room

Banish bots from your Waiting Room and improve wait times for real users

2025-03-03 Rachel Wyatt

Post Syndicated from Rachel Wyatt original https://blog.cloudflare.com/banish-bots-from-your-waiting-room-and-improve-wait-times-for-real-users/

With Cloudflare Waiting Room, you can safeguard your site from traffic surges by placing visitors in a customizable, virtual queue. Previously, many site visitors waited in the queue alongside bots, only to find themselves competing for inventory once in the application. This competition is inherently unfair, as bots are much faster and more efficient than humans. As a result, humans inevitably lose out in these high-demand situations, unable to secure inventory before bots sweep it all up. This creates a frustrating experience for real customers, who feel powerless against the speed and automation of bots, leading to a diminished experience overall. Those days are over! Today, we are thrilled to announce the launch of two Waiting Room solutions that significantly improve the visitor experience.

Now, all Waiting Room customers can add an invisible Turnstile challenge to their queueing page, robustly challenging traffic and gathering analytics on bot activity within their queue. With Advanced Waiting Rooms, you can select between an invisible, managed, or non-interactive widget mode. But, we won’t just block these bots! Instead, traffic with definite bot signals that have failed the Turnstile challenge can be sent to an Infinite Queue, a completely customizable page that mimics a real user experience. This prolongs the time it takes bots to realize they have not actually joined the queue, wasting their resources without impacting real users. This feature not only protects your site against bots, but also reduces wait times and protects inventory by ensuring the queue only consists of genuine users. To offset the environmental impact of wasting bot resources, we’re contributing to a tree planting initiative, helping to reduce the carbon footprint of inefficient bots.

The second solution we have launched to improve the visitor experience is Session Revocation, which allows you to end a user’s session based on an action, dynamically opening up spots and admitting users from the queue. This new capability allows you to integrate Waiting Room more seamlessly with your customer journey, resulting in increased throughput, decreased wait times, and increased fairness by giving more users the opportunity to make it through the queue during high demand events.

This feature has proven to be extremely impactful for our customers, including a large online retailer that frequently has high-demand limited edition product drops. A common challenge in this space is maximizing the number of customers who can make a purchase during a limited-time event, all while maintaining a fair and efficient system for everyone involved. Previously, this customer had to limit their users to only one item in the cart and force them to wait for a period of time after each checkout before allowing them to rejoin the queue. This led to an awkward experience for end users, longer wait times, and reduced site throughput. With session revocation, this online retailer can now end the user’s session immediately after a purchase is complete, placing the user back in the queue if applicable, without being forced to wait for a preset timeout period. This significantly improves the end user experience by reducing unnecessary wait times and streamlining the purchase process.

Let’s deep dive into these two capabilities and how they improve the overall user experience.

How bots impact the Waiting Room user experience

Waiting Room is often used to protect sites from being overwhelmed by traffic surges during high demand online events. These high demand events, such as ticket or e-commerce product sales, attract both a deluge of genuine users, and sophisticated bots, such as scalper bots. This type of bot traffic is unique, as they can complete the checkout process or user journey much quicker than normal human traffic. Bots in the queue negatively affect the user experience by increasing wait times, as they often occupy multiple spots. Additionally, their behavior can exacerbate the issue — if they don’t handle cookies properly, they fail to take their spot in the application when their turn comes, further preventing the queue from progressing smoothly. Once past the queue, bots can also contribute to inventory hoarding, as they often reserve or consume large quantities of stock without genuine intent to purchase. An example of this is the PlayStation 5’s launch in November 2020. Due to high demand and production limitations during the COVID-19 pandemic, scalper bots bought up stock quickly, making it difficult for average consumers to purchase the console at retail prices. This led to extreme frustration for retailers and consumers as these bots drove the prices up significantly.

Quantifying bot traffic to Waiting Room with an invisible Turnstile challenge

Waiting Room customers have long been curious about the nature of large traffic spikes. Historically, bot scores and managed challenges have been the primary methods of collecting this data and acting on it. While these can provide some insight into the distribution of traffic, the Turnstile invisible challenge gives us the ability to actively interrogate the browser, providing the most complete set of data on whether that browser is being operated by a human or a bot.

To start quantifying bot traffic to waiting rooms, we have added an invisible Turnstile challenge to all basic rooms. With the purchase of an Advanced Waiting Room, customers can select between invisible, managed, or non-interactive widget modes. This Turnstile team blog post has more details on the different widget modes.

Waiting Room’s integration with Turnstile aims to protect your site with minimal impact to the user experience by placing a Turnstile challenge on your waiting room’s queuing page. Unlike a standard WAF challenge, the Waiting Room Turnstile challenge is presented only when the waiting room is queuing. This way, users won’t face any interruptions once they are past the queue and into the application.

^{With an advanced waiting room, you can configure the type of Turnstile challenge from the Cloudflare dashboard and API.}

From the analytics we’ve gathered with the invisible Turnstile challenge on all basic waiting rooms, we’ve been able to determine that many large traffic spikes come from user agents that don’t even attempt to run the challenge, leaving it unsolved. In other words, we send the challenge widget in the HTML for the queuing page, but sometimes those challenges never get completed. By subtracting the number of times we see solved challenges from the total number of times we send challenges, we can get a count of requests that are likely from unsophisticated bots. These requests are reported to Waiting Room Analytics as “Likely Bots.” We’ve seen small businesses with low baseline traffic hit with tens of thousands of such requests (or more) in a short period of time. When a large influx of non-human traffic like this comes in, every visitor to the website ends up queued in a waiting room, not just the bots.

These bots could be any software that simply sends out HTTP requests. This data can help determine whether a traffic spike and subsequent queueing is coming from real human users, or a bunch of simple bots that don’t even bother to run JavaScript.

With the Turnstile integration, we are also catching sophisticated bots. While many of the bots we see don’t attempt to run the challenge, there are a few that do. Detecting these bots is more difficult than detecting simple bots that don’t run JavaScript. The Turnstile widget runs a series of checks against the browser to find evidence that a browser isn’t being operated by a human, and is instead being driven by something like Selenium. If Turnstile isn’t able to determine that the browser is being operated by a human, we count that as a failed challenge and report those users to Waiting Room Analytics as “Bots,” since we are quite confident that these users are not human.

About 1 in 20 “users” that run the challenge end up not passing. Just like the previously mentioned unsophisticated bots, these more sophisticated bots inflate the size of the queue, making it more difficult for real humans to make it through to your website.

The remaining 19 in 20 “users” that successfully pass the challenge are counted in Waiting Room Analytics as “Likely Humans.”

These new metrics related to Turnstile challenge outcomes are available in your Waiting Room Analytics dashboard and the analytics GraphQL API, so you can see the distribution of bot to human traffic in your waiting room. Once you know what your traffic looks like, the real question is: what can you do about it?

^{View the distribution of traffic and challenges issued in Waiting Room Analytics}

New Infinite Queue feature

Beyond logging your Turnstile challenge outcomes, Advanced Waiting Room customers have the option to select the Infinite Queue feature. With this feature, all traffic that fails the Turnstile challenge, such as a bot, will be sent to an Infinite Queue page. The Infinite Queue matches the normal queuing experience, prolonging the time it takes the bot to recognize they are being blocked and effectively consuming their resources. While the Infinite Queue will have the same look and feel as the Waiting Room page, the bot is not actually a part of the real queue.

With the infinite Queue enabled, all traffic will have to pass the challenge to enter the real queue. By blocking bots from joining the queue, we will reduce wait times for humans and prevent bots from using up server resources during a traffic spike.

^{Enable the Infinite Queue option through the Cloudflare dashboard or API.}

Bots will be none the wiser, wasting their time and resources waiting in an infinite queue that will never get them to where they’re trying to go.

We keep track of the traffic hitting the infinite queue, counting the number of times they refresh their queuing page in Waiting Room Analytics. This appears as the “infinite queue refreshes” count in the analytics dash and GraphQL API. This metric gives you a good idea of the amount of time these bots have wasted trying to reach your website.

How Waiting Room integrates with Turnstile

Turnstile is a powerful and versatile product that anyone, Cloudflare and others alike, can use to build systems to thwart bot traffic. Waiting Room integrates Turnstile the same as any other Turnstile user.

<!DOCTYPE html>
<html>
	<head>
		<title>Waiting Room</title>
	</head>
	<body>
		<h1>You are currently in the queue.</h1>
		{{#waitTimeKnown}}
			<h2>Your estimated wait time is {{waitTimeFormatted}}.</h2>
		{{/waitTimeKnown}}
		{{^waitTimeKnown}}
			<h2>Your estimated wait time is unknown.</h2>
		{{/waitTimeKnown}}
		{{#turnstile}}
			<!-- for a managed (and potentially interactive) challenge, you may want to instruct the user to complete the challenge -->
			<p>Please complete this challenge so we know you're a human:</p>
			{{{turnstile}}} <!-- include the turnstile widget -->
		{{/turnstile}}
	</body>
</html>

^{The Turnstile widget can be embedded in custom queuing page templates by including the}^{{{{turnstile}}}}^variable.

<!DOCTYPE html>
<html>
	<head>
		<title>Waiting Room</title>
	</head>
	<body>
		{{#turnstile}}
			<h1>This website is currently using a waiting room.</h1>
			<p>We use a Turnstile challenge to ensure you aren't waiting in line behind bots. Complete this challenge to enter the queue.</p>
			{{{turnstile}}} <!-- include the turnstile widget -->
		{{/turnstile}}
		{{^turnstile}}
			<h1>You are currently in the queue.</h1>
			{{#waitTimeKnown}}
				<h2>Your estimated wait time is {{waitTimeFormatted}}.</h2>
			{{/waitTimeKnown}}
			{{^waitTimeKnown}}
				<h2>Your estimated wait time is unknown.</h2>
			{{/waitTimeKnown}}
		{{/turnstile}}
	</body>
</html>

^{When using Infinite Queue (especially with managed challenges which may be interactive), you may want to tell users they will not be in the queue until they complete the challenge.}

We embed a plain Turnstile challenge in the queuing page by passing the HTML to the queuing page template in a turnstile variable. The default queuing page template and any newly created custom templates include this variable already. If you have an existing custom HTML template and wish to enable the Turnstile integration, you will need to add {{{turnstile}}} somewhere in the template to tell Waiting Room where the widget should be placed. Waiting Room uses Mustache templates, so including raw HTML within your template without escaping requires three curly braces instead of two.

^{A managed Turnstile challenge on the default Waiting Room queuing page template}

Once the challenge completes, fails, or times out, the page refreshes and passes the Turnstile token to Waiting Room’s worker. Next, we check in with Turnstile’s siteverify endpoint to make sure the challenge was successful. From there, we report the outcome to the Waiting Room’s analytics and optionally send failed traffic (bots) to an infinite queue.

The infinite queue itself is designed to be as close to normal queuing as possible. When a bot is sent to the infinite queue, we issue it a cookie which looks like a normal waiting room cookie. Inside the cookie’s encryption though, we have a boolean flag that tells our worker to send the bot’s requests to the infinite queue. When we see that flag, we skip all the normal queuing logic and just render a queuing page.

That queuing page shows a fake estimated time remaining. It’s based on an asymptotic curve which appears to decrease linearly from the start. As time goes on, the curve gets flatter (and progress through the “queue” gets slower), so the estimated time remaining never quite reaches 0.

^{This graph is an approximation of the time remaining (y-axis, minutes) that bots will see, compared to the amount of time they’ve waited in the infinite queue (x-axis, minutes).}

We reuse much of the same code for rendering the queuing page for the infinite queue and the normal queue. We do this to reduce the amount of signal bots may have that they are in the infinite queue rather than the normal queue.

let cookie
if (query['cf_wr_turnstile']) {
    const turnstileToken = query['cf_wr_turnstile']
    const tokenOk = await siteverify(turnstileToken)
    if (tokenOk) {
        analytics.turnstileSuccesses++
        cookie = newCookie()
    } else {
        analytics.turnstileFailures++
        cookie = { infiniteQueuing: true }
    }
    response.headers['Set-Cookie'] = encryptCookie(cookie)
}
if (!cookie) {
    cookie = decryptCookie(headers['Cookie'])
}
if (!cookie) {
    analytics.turnstileChallenges++
    return await queuingPage(await estimateTimeRemaining(), { turnstileChallenge: true })
} else if (cookie.infiniteQueuing) {
    analytics.infiniteQueueRequests++
    return await queuingPage(fakeTimeRemaining())
} else if (cookie.accepted) {
    return await sendToOrigin()
} else {
    // run Waiting Room's distributed queuing logic to check whether
    // this user has made it to the front of the queue, but only after
    // the user has completed a Turnstile challenge and isn't in the
    // fake infinite queue
    const { letThrough, timeRemaining } = calculateQueuing(cookie)
    if (letThrough) {
        cookie.accepted = true
        response.headers['Set-Cookie'] = encryptCookie(cookie)
        return await sendToOrigin()
    } else {
        return await queuingPage(timeRemaining)
    }
}

^{Approximate psuedocode for how we handle incoming requests when infinite queue is enabled in the Waiting Room worker}

Thanks to the versatility of Turnstile, we only needed to rely on public Turnstile APIs to build this integration.

Adding Turnstile to Waiting Room is a proactive step in managing traffic that directly contributes to a smoother, faster experience for end users. Building on that efficiency, let’s dive into how you can add an additional layer of control to increase throughput and minimize wait times for your customers.

Further improve wait times using session revocation

We have talked extensively in a previous blog post about how we queue users with respect to the current active users on the application and the defined limits, and, in the same blog post, what state and calculations we use to determine the amount of total active users. Here is a quick summary for those who have not read that post:

When a user navigates to a page behind a waiting room, they receive a cookie and are associated with a time period called a bucket. We use these buckets to track the number of users either waiting in the queue or accessing the application for that specific time period. Whenever a user makes a request, we move their session from their previous bucket to the latest bucket. Once a bucket is older than the configured session duration, we know that those user sessions are no longer valid (expired) and we can clean up those values. Thus, that user session expires, and new slots are opened for the next users to enter the application.

These buckets are aggregated at Cloudflare data centers and then globally via the internal state of the waiting room, which is structured as multiple CRDT counters and registers. This allows us to merge the distributed state of the waiting room stored in multiple data centers as a single global state without conflicts.

To calculate the total active users on an application, we first merge the state from all data centers. Then, we sum the active users for all the buckets where a session can still be active.

Because the Waiting Room runs per user request, we do not explicitly know when a user has stopped accessing the application, and instead we only stop receiving requests from them. So, we must consider their session active and as a contributor to the total active users count until it is older than the session duration limit. For waiting rooms that have a high session duration value configured, a user might navigate to the site for a small duration of time but contribute to the total active users count for up to the configured session duration even after they have stopped accessing the application. This can cause decreased throughput and longer wait times for users in the queue.

Introducing Session Revocation

With the Session Revocation feature, we now allow origins to return a command to the waiting room via an HTTP header (Cf-Waiting-Room-Command) to notify the Waiting Room to revoke the user session associated with the current response. This command removes the current user’s session and decreases the number of total active users for the bucket the session was last tracked in. This allows origins to terminate a user’s session early without needing to wait for the session to expire naturally.

This can improve the throughput of waiting rooms in front of applications which have a dynamic user flow where the session duration is set very high to account for users who send infrequent requests to the application.

To set up session revocation in your waiting room, in the user session settings section in the configuration, check the “Allow session termination via origin commands” box. You must also configure your origin to return a session revocation HTTP header (Cf-Waiting-Room-Command: revoke) on the response when you want the session associated with that response to be revoked. For more information on how to do this, refer to our developer documentation.

^{Enable session revocation in the user session settings configuration}

In Waiting Room Analytics, you can view the number of sessions revoked per minute. The sessionsRevoked field is the count of how many sessions were revoked in that minute in the analytics GraphQL API.

In summary, Waiting Room Turnstile Integration and Session Revocation work together to enhance both security and user experience. The addition of a Turnstile challenge in the Waiting Room helps identify and block bots, ensuring that legitimate users don’t face unnecessary delays. Meanwhile, the Session Revocation feature optimizes resource usage by allowing you to end user sessions after key actions, like completing a purchase, freeing up space for other users.

Together, these features successfully increase throughput and reduce wait times, providing a faster, more efficient experience for your customers. For more information on these features, check out our developer documentation.

Waiting Room adds multi-host and path coverage, unlocking broader protection and multilingual setups

2023-10-04 Arielle Olache

Post Syndicated from Arielle Olache original http://blog.cloudflare.com/multihost-waiting-room/

Waiting Room adds multi-host and path coverage, unlocking broader protection and multilingual setups

Cloudflare Waiting Room protects sites from overwhelming traffic surges by placing excess visitors in a fully customizable virtual waiting room, admitting them dynamically as spots become available. Instead of throwing error pages or delivering poorly-performing site pages, Waiting Room empowers customers to take control of their end-user experience during unmanageable traffic surges.

A key decision customers make when setting up a waiting room is what pages it will protect. Before now, customers could select one hostname and path combination to determine what pages would be covered by a waiting room. Today, we are thrilled to announce that Waiting Room now supports coverage of multiple hostname and path combinations with a single waiting room, giving customers more flexibility and offering broader site coverage without interruptions to end-user flows. This new capability is available to all Enterprise customers with an Advanced Purchase of Waiting Room.

Waiting Room site placement

As part of the simple, no-coding-necessary process for deploying a waiting room, customers specify a hostname and path combination to indicate which pages are covered by a particular waiting room. When a site visitor makes a preliminary request to that hostname and path or any of its subpaths, they will be issued a waiting room cookie and either be admitted to the site or placed in a waiting room if the site is at capacity.

Last year, we added Waiting Room Bypass Rules, giving customers many options for creating exceptions to this hostname and path coverage. This unlocked capabilities such as User Agent Bypassing, geo-targeting, URL exclusions, and administrative IP bypassing. It also allowed for some added flexibility regarding which pages a waiting room would apply to on a customer’s site by adding the ability to exclude URLs, paths, and query strings. While this update allowed for greater specificity regarding which traffic should be gated by Waiting Room, it didn’t offer the broader coverage that many customers still needed to protect greater portions of their sites with a single waiting room.

Why customers needed broader Waiting Room coverage

Let’s review some simple yet impactful examples of why this broader coverage option was important for our customers. Imagine you have an online store, example.com, and you want to ensure that a single waiting room covers the entire customer journey — from the homepage, to product browsing, to checkout. Many sites would delineate these steps in the flow using paths: example.com/, example.com/shop/product1, example.com/checkout. Because Waiting Room assumes an implied wildcard at the end of a waiting room’s configured path, this use case was already satisfied for these sites. Thus, placing a waiting room at example.com/ would cover all the URLs associated with every step of this customer journey. In this setup, once a site visitor has passed the waiting room, they would not be re-queued at any step in their user flow because they are still using the same waiting room’s cookie, which indicates to Waiting Room that they are the same user between URLs.

However, many sites delineate different steps of this type of shopping flow using subdomains instead of or as well as paths. For example, many sites will have their checkout page on a different subdomain, such as checkout.example.com. Before now, if a customer had this site structure and wanted to protect their entire site with a single waiting room, they would have needed to deploy a waiting room at example.com/ and another at checkout.example.com/. This option was not ideal for many customers because a site visitor could be queued at two different parts of the same customer journey. This is because the waiting room at checkout.example.com/ would count the same visitor as a different user than the waiting room covering example.com/.

That said, there are cases where it is wise to separate waiting rooms on a single site. For example, a ticketing website could place a waiting room at its apex domain (example.com) and distinct waiting rooms with pre-queues on pages for specific events (example.com/popular_artist_tour). The waiting room set at example.com/ ensures that the main point of entry to the site does not get overwhelmed and crash when ticket sales for any one event open. The waiting room placed on the specific event page ensures that traffic for a single event can start queuing ahead of the event without impacting traffic going to other parts of the site.

Ultimately, whether a customer wants one or multiple waiting rooms to protect their site, we wanted to give our customers the flexibility to deploy Waiting Room however was best for their use cases and site structure. We’re thrilled to announce that Waiting Room now supports multi-hostname and path coverage with a single waiting room!

Getting started with multi-host and path coverage

Customers can now configure a waiting room on multiple hostname and path combinations — or routes — belonging to the same zone. To do so, navigate to Traffic > Waiting Room and select Create. The name of your domain will already be populated. To add additional routes to your waiting room configuration, select Add Hostname and Path. You can then enter another hostname and path to be covered by the same waiting room. Remember, there is an implied wildcard after each path. Therefore, creating a waiting room for each URL you want your waiting room to cover is unnecessary. Only create additional routes for URLs that wouldn’t be covered by the other hostname and path combinations you have already entered.

When deploying a waiting room that covers multiple hostname and path combinations, you must create a unique cookie name for this waiting room (more on that later!). Then, proceed with the same workflow as usual to deploy your waiting room.

Deploying a multi-language waiting room

A frequent customer request was the ability to cover a multi-language site with a single waiting room — offering different text per language while counting all site traffic toward the same waiting room limits. There are various ways a site can be structured to delineate between different language options, but the two most common are either by subdomain or path. For a site that uses path delineation, this could look like example.com/en and example.com/es for English and Spanish, respectively. For a site using subdomain delineation, this could look like en.example.com/ and es.example.com/. Before multihost Waiting Room coverage, the subdomain variation would not have been able to be covered by a single waiting room.

Waiting Room’s existing configuration options already supported the path variation. However, this was only true if the customer wanted to gate the entire site, which they could do by placing the waiting room at example.com/. Many e-commerce customers asked for the ability to gate different high-demand product pages selling the same product but with different language options. For instance, consider example.com/en/product_123 and example.com/es/product_123, where the customer wants the same waiting room and traffic limits to cover both URLs. Before now, this would not have been possible without some complex bypass rule logic.

Now, customers can deploy a waiting room that accommodates either the subdomain or path approach to structuring a multi-language site. The only remaining step is setting up your waiting room to serve different languages when a user is queued in a waiting room. This can be achieved by constructing a template that reads the URL to determine the locale and define the appropriate translations for each of the locales within the template.

Here is an example of a template that determines the locale from the URL path, and renders the translated text:

<!DOCTYPE html>
<html>
  <head>
    <title>Waiting Room powered by Cloudflare</title>
  </head>
  <body>
    <section>
      <h1 id="inline-msg">
        You are now in line.
      </h1>
      <h1 id="patience-msg">
        Thank you for your patience.
      </h1>
    </section>
    <h2 id="waitTime"></h2>

    <script>
      var locale = location.pathname.split("/")[1] || "en";
      var translations = {
        "en": {
          "waittime_60_less": "Your estimated wait time is {{waitTime}} minute.",
          "waittime_60_greater": "Your estimated wait time is {{waitTimeHours}} hours and {{waitTimeHourMinutes}} minutes.",
          "inline-msg": "You are now in line.",
          "patience-msg": "Thank you for your patience.",
        },
        "es": {
          "waittime_60_less": "El tiempo de espera estimado es {{waitTime}} minuto.",
          "waittime_60_greater": "El tiempo de espera estimado es {{waitTimeHours}} de horas y {{waitTimeHourMinutes}} minutos.",
          "inline-msg": "Ahora se encuentra en la fila de espera previa.",
          "patience-msg": "Gracias por su paciencia.",
        }
      };

      {{#waitTimeKnown}}
      var minutes = parseInt( {{waitTime}} , 10);
      var time = document.getElementById('waitTime');

      if ( minutes < 61) {
        time.innerText = translations[locale]["waittime_60_less"]
      } else {
        time.innerText = translations[locale]["waittime_60_greater"]
      }
      {{/waitTimeKnown}}

      // translate template text for each of the elements with “id” suffixed with “msg”
      for (const msg of ["inline-msg", "patience-msg"]) {
        const el = document.getElementById(msg);
        if (el == null) continue;
        el.innerText = translations[locale][msg];
      }
    </script>
  </body>
</html>

Here’s how the template works: given a site distinguishes different locales with various paths such as /en/product_123 or /es/product_123 in the <script /> body after the rendering the page, the locale is extracted from the location.pathname via locale = location.pathname.split(“/”)[1]. If there isn’t a locale specified within the translations object we default it to “en”. For example, if a user visits example.com/product_123, then by default render the English text template.

Similarly, in order to support a multi-language waiting room for sites that delineate site structure via subdomain, it would only require you to update how you extract the locale from the URL. Instead of looking at the pathname we look at the hostname property of the window.location object like locale = location.hostname.split(“.”)[0], given, we have site structure as following: en.example.com, es.example.com.

The above code is a pared down example of starter templates we have added to our developer documentation, which we have included to make it easier for you to get started with a multi-language waiting room. You can download these templates and edit them to have the look and feel aligned with your site and with the language options your site offers.

These templates can further be expanded to include other languages by adding translations to the `translations` object for each of the locales. Thus, a single template is able to serve multiple languages depending on whatever the locale the site is being served as either via subdomain or path.

How we handle cookies with a multihostname and path waiting room

The waiting room assigns a __cfwaitingroom cookie to each user to manage the state of the user that determines the position of the user in line among other properties needed to make the queueing decisions for the user. Traditionally, for a single hostname and path configuration it was trivial to just set the cookie as __cfwaitingroom=[cookie-value]; Domain=example.com; Path=/es/product_123. This ensured that when the page refreshed it would send us the same Waiting Room cookie for us to examine and update. However, this became non-trivial when we wanted to share the same cookie across different subdomain or path combinations, for example, on example.com/en/product_123.

Approaches to setting multiple cookies

There were two approaches that we brainstormed and evaluated to allow the sharing of the cookie for the same waiting room.

The first approach we considered was to issue multiple Set-Cookie headers in the HTTP response. For example, in the multi-language example above, we could issue the following in the response header:

Set-Cookie: __cfwaitingroom=[cookie-value]…Domain=example.com; Path=/en/product_123 …
Set-Cookie: __cfwaitingroom=[cookie-value]...Domain=example.com; Path=/es/product_123 …

This approach would allow us to handle the multiple hostnames and paths for the same waiting room. However, it does not present itself as a very scalable solution. Per RFC6265, there isn’t a strict limit defined in the specification, browsers have their own implementation-specific limits. For example, Chrome allows the response header to grow up to 256K bytes before throwing the transaction with ERR_RESPONSE_HEADERS_TOO_BIG. Additionally, in this case, the response header would grow proportionally to the number of unique hostname and path combinations, and we would be redundantly repeating the same cookie value N (where N is the number of additional routes) number of times. While this approach would have allowed us to effectively handle a bounded list of hostname and path combinations that need to be set, it was not ideal for cases where we can expect more than a handful routes for a particular site.

The second approach that we considered and chose to move forward with was to set the cookie on the apex domain (or the most common subdomain). In other words, we would figure out the most common subdomain from the list of routes and use that as the domain. Similarly, for the paths, this would entail determining the least common prefix from the list of paths and use that as the value to be set on the path attribute. For example, consider a waiting room with the following two routes configured, a.example.com/shoes/product_123 and b.example.com/shoes/product_456.

To determine the domain that would be set for the cookie, we can see that example.com is the most common subdomain among the list of domains. Applying the most common subdomain algorithm, we would get example.com as the domain. Applying the least common prefix algorithm on the set of paths, /shoes/product_123 and /shoes/product_456, we would get /shoes as the path. Thus, the set-cookie header would be the following:

Set-Cookie: … __cfwaitingroom=[cookie-value]; Domain=example.com; Path=/shoes …

Now, when a visitor accesses any of the pages covered by this waiting room, we can guarantee Waiting Room receives the right cookie and there will only be Set-Cookie included in the response header.

However, we are still not there yet. Although we are able to identify the common subdomain (or apex domain) and common path prefix, there would still be a problem if routes from one waiting room would overlap with another waiting room. For example, say we configure two waiting rooms for the same site where we are selling our special shoes:

Waiting Room A
a.example.com/shoes/product_123
b.example.com/shoes/product_456

Waiting Room B
c.example.com/shoes/product_789
d.example.com/shoes/product_012

If we apply the same algorithm as described above, we would get the same domain and path properties set for the two cookies. For Waiting Room A, the domain would be example.com and the path would be /shoes. Similarly, for Waiting Room B, the most common subdomain would also be example.com and least common path prefix would be /shoes. This would effectively prevent us from distinguishing the two cookies and route the visitor to the right waiting room. In order to solve the issue of overlapping cookie names, we introduced the requirement of setting a custom suffix that is unique to the customer’s zone. This is why it is required to provide a custom cookie suffix when configuring a waiting room that covers multiple hostnames or paths. Therefore, if Waiting Room A was assigned cookie suffix “a” and Waiting Room B was assigned cookie suffix “b”, the two Set-Cookie definitions would look like the following:

Waiting Room A

Set-Cookie: __cfwaitingroom_a=[cookie-value]; Domain=example.com; Path=/shoes

Waiting Room B

Set-Cookie: __cfwaitingroom_b=[cookie-value]; Domain=example.com; Path=/shoes

When a visitor makes a request to any pages where the two different waiting rooms are configured, Waiting Room can distinguish and select which cookie corresponds to the given request and use this to determine which waiting room’s settings apply to that user depending on where they are on the site.

With the addition of multihost and multipath Waiting Room coverage, we’re thrilled to offer more flexible options for incorporating Waiting Room into your site, giving your visitors the best experience possible while protecting your site during times of high traffic. Make sure your site is always protected from traffic surges. Try out Waiting Room today or reach out to us to learn more about the Advanced Waiting Room!

How Waiting Room makes queueing decisions on Cloudflare’s highly distributed network

2023-09-20 George Thomas

Post Syndicated from George Thomas original http://blog.cloudflare.com/how-waiting-room-queues/

How Waiting Room makes queueing decisions on Cloudflare's highly distributed network

Almost three years ago, we launched Cloudflare Waiting Room to protect our customers’ sites from overwhelming spikes in legitimate traffic that could bring down their sites. Waiting Room gives customers control over user experience even in times of high traffic by placing excess traffic in a customizable, on-brand waiting room, dynamically admitting users as spots become available on their sites. Since the launch of Waiting Room, we’ve continued to expand its functionality based on customer feedback with features like mobile app support, analytics, Waiting Room bypass rules, and more.

We love announcing new features and solving problems for our customers by expanding the capabilities of Waiting Room. But, today, we want to give you a behind the scenes look at how we have evolved the core mechanism of our product–namely, exactly how it kicks in to queue traffic in response to spikes.

How was the Waiting Room built, and what are the challenges?

The diagram below shows a quick overview of where the Waiting room sits when a customer enables it for their website.

Waiting Room is built on Workers that runs across a global network of Cloudflare data centers. The requests to a customer’s website can go to many different Cloudflare data centers. To optimize for minimal latency and enhanced performance, these requests are routed to the data center with the most geographical proximity. When a new user makes a request to the host/path covered by the Waiting room, the waiting room worker decides whether to send the user to the origin or the waiting room. This decision is made by making use of the waiting room state which gives an idea of how many users are on the origin.

The waiting room state changes continuously based on the traffic around the world. This information can be stored in a central location or changes can get propagated around the world eventually. Storing this information in a central location can add significant latency to each request as the central location can be really far from where the request is originating from. So every data center works with its own waiting room state which is a snapshot of the traffic pattern for the website around the world available at that point in time. Before letting a user into the website, we do not want to wait for information from everywhere else in the world as that adds significant latency to the request. This is the reason why we chose not to have a central location but have a pipeline where changes in traffic get propagated eventually around the world.

This pipeline which aggregates the waiting room state in the background is built on Cloudflare Durable Objects. In 2021, we wrote a blog talking about how the aggregation pipeline works and the different design decisions we took there if you are interested. This pipeline ensures that every data center gets updated information about changes in traffic within a few seconds.

The Waiting room has to make a decision whether to send users to the website or queue them based on the state that it currently sees. This has to be done while making sure we queue at the right time so that the customer's website does not get overloaded. We also have to make sure we do not queue too early as we might be queueing for a falsely suspected spike in traffic. Being in a queue could cause some users to abandon going to the website. Waiting Room runs on every server in Cloudflare’s network, which spans over 300 cities in more than 100 countries. We want to make sure, for every new user, the decision whether to go to the website or the queue is made with minimal latency. This is what makes the decision of when to queue a hard question for the waiting room. In this blog, we will cover how we approached that tradeoff. Our algorithm has evolved to decrease the false positives while continuing to respect the customer’s set limits.

How a waiting room decides when to queue users

The most important factor that determines when your waiting room will start queuing is how you configured the traffic settings. There are two traffic limits that you will set when configuring a waiting room–total active users and new users per minute.The total active users is a target threshold for how many simultaneous users you want to allow on the pages covered by your waiting room. New users per minute defines the target threshold for the maximum rate of user influx to your website per minute. A sharp spike in either of these values might result in queuing. Another configuration that affects how we calculate the total active users is session duration. A user is considered active for session duration minutes since the request is made to any page covered by a waiting room.

The graph below is from one of our internal monitoring tools for a customer and shows a customer's traffic pattern for 2 days. This customer has set their limits, new users per minute and total active users to 200 and 200 respectively.

If you look at their traffic you can see that users were queued on September 11th around 11:45. At that point in time, the total active users was around 200. As the total active users ramped down (around 12:30), the queued users progressed to 0. The queueing started again on September 11th around 15:00 when total active users got to 200. The users that were queued around this time ensured that the traffic going to the website is around the limits set by the customer.

Once a user gets access to the website, we give them an encrypted cookie which indicates they have already gained access. The contents of the cookie can look like this.

{  
  "bucketId": "Mon, 11 Sep 2023 11:45:00 GMT",
  "lastCheckInTime": "Mon, 11 Sep 2023 11:45:54 GMT",
  "acceptedAt": "Mon, 11 Sep 2023 11:45:54 GMT"
}

The cookie is like a ticket which indicates entry to the waiting room.The bucketId indicates which cluster of users this user is part of. The acceptedAt time and lastCheckInTime indicate when the last interaction with the workers was. This information can ensure if the ticket is valid for entry or not when we compare it with the session duration value that the customer sets while configuring the waiting room. If the cookie is valid, we let the user through which ensures users who are on the website continue to be able to browse the website. If the cookie is invalid, we create a new cookie treating the user as a new user and if there is queueing happening on the website they get to the back of the queue. In the next section let us see how we decide when to queue those users.

To understand this further, let's see what the contents of the waiting room state are. For the customer we discussed above, at the time "Mon, 11 Sep 2023 11:45:54 GMT", the state could look like this.

{  
  "activeUsers": 50,
}

As mentioned above the customer’s configuration has new users per minute and total active users equal to 200 and 200 respectively.

So the state indicates that there is space for the new users as there are only 50 active users when it's possible to have 200. So there is space for another 150 users to go in. Let's assume those 50 users could have come from two data centers San Jose (20 users) and London (30 users). We also keep track of the number of workers that are active across the globe as well as the number of workers active at the data center in which the state is calculated. The state key below could be the one calculated at San Jose.

{  
  "activeUsers": 50,
  "globalWorkersActive": 10,
  "dataCenterWorkersActive": 3,
  "trafficHistory": {
    "Mon, 11 Sep 2023 11:44:00 GMT": {
       San Jose: 20/200, // 10%
       London: 30/200, // 15%
       Anywhere: 150/200 // 75%
    }
  }
}

Imagine at the time "Mon, 11 Sep 2023 11:45:54 GMT", we get a request to that waiting room at a datacenter in San Jose.

To see if the user that reached San Jose can go to the origin we first check the traffic history in the past minute to see the distribution of traffic at that time. This is because a lot of websites are popular in certain parts of the world. For a lot of these websites the traffic tends to come from the same data centers.

Looking at the traffic history for the minute "Mon, 11 Sep 2023 11:44:00 GMT" we see San Jose has 20 users out of 200 users going there (10%) at that time. For the current time "Mon, 11 Sep 2023 11:45:54 GMT" we divide the slots available at the website at the same ratio as the traffic history in the past minute. So we can send 10% of 150 slots available from San Jose which is 15 users. We also know that there are three active workers as "dataCenterWorkersActive" is 3.

The number of slots available for the data center is divided evenly among the workers in the data center. So every worker in San Jose can send 15/3 users to the website. If the worker that received the traffic has not sent any users to the origin for the current minute they can send up to five users (15/3).

At the same time ("Mon, 11 Sep 2023 11:45:54 GMT"), imagine a request goes to a data center in Delhi. The worker at the data center in Delhi checks the trafficHistory and sees that there are no slots allotted for it. For traffic like this we have reserved the Anywhere slots as we are really far away from the limit.

{  
  "activeUsers":50,
  "globalWorkersActive": 10,
  "dataCenterWorkersActive": 1,
  "trafficHistory": {
    "Mon, 11 Sep 2023 11:44:00 GMT": {
       San Jose: 20/200, // 10%
       London: 30/200, // 15%
       Anywhere: 150/200 // 75%
    }
  }
}

The Anywhere slots are divided among all the active workers in the globe as any worker around the world can take a part of this pie. 75% of the remaining 150 slots which is 113.

The state key also keeps track of the number of workers (globalWorkersActive) that have spawned around the world. The Anywhere slots allotted are divided among all the active workers in the world if available. globalWorkersActive is 10 when we look at the waiting room state. So every active worker can send as many as 113/10 which is approximately 11 users. So the first 11 users that come to a worker in the minute Mon, 11 Sep 2023 11:45:00 GMT gets admitted to the origin. The extra users get queued. The extra reserved slots (5) in San Jose for minute Mon, 11 Sep 2023 11:45:00 GMT discussed before ensures that we can admit up to 16(5 + 11) users from a worker from San Jose to the website.

Queuing at the worker level can cause users to get queued before the slots available for the data center

As we can see from the example above, we decide whether to queue or not at the worker level. The number of new users that go to workers around the world can be non-uniform. To understand what can happen when there is non-uniform distribution of traffic to two workers, let us look at the diagram below.

Imagine the slots available for a data center in San Jose are ten. There are two workers running in San Jose. Seven users go to worker1 and one user goes to worker2. In this situation worker1 will let in five out of the seven workers to the website and two of them get queued as worker1 only has five slots available. The one user that shows up at worker2 also gets to go to the origin. So we queue two users, when in reality ten users can get sent from the datacenter San Jose when only eight users show up.

This issue while dividing slots evenly among workers results in queueing before a waiting room’s configured traffic limits, typically within 20-30% of the limits set. This approach has advantages which we will discuss next. We have made changes to the approach to decrease the frequency with which queuing occurs outside that 20-30% range, queuing as close to limits as possible, while still ensuring Waiting Room is prepared to catch spikes. Later in this blog, we will cover how we achieved this by updating how we allocate and count slots.

What is the advantage of workers making these decisions?

The example above talked about how a worker in San Jose and Delhi makes decisions to let users through to the origin. The advantage of making decisions at the worker level is that we can make decisions without any significant latency added to the request. This is because to make the decision, there is no need to leave the data center to get information about the waiting room as we are always working with the state that is currently available in the data center. The queueing starts when the slots run out within the worker. The lack of additional latency added enables the customers to turn on the waiting room all the time without worrying about extra latency to their users.

Waiting Room’s number one priority is to ensure that customer’s sites remain up and running at all times, even in the face of unexpected and overwhelming traffic surges. To that end, it is critical that a waiting room prioritizes staying near or below traffic limits set by the customer for that room. When a spike happens at one data center around the world, say at San Jose, the local state at the data center will take a few seconds to get to Delhi.

Splitting the slots among workers ensures that working with slightly outdated data does not cause the overall limit to be exceeded by an impactful amount. For example, the activeUsers value can be 26 in the San Jose data center and 100 in the other data center where the spike is happening. At that point in time, sending extra users from Delhi may not overshoot the overall limit by much as they only have a part of the pie to start with in Delhi. Therefore, queueing before overall limits are reached is part of the design to make sure your overall limits are respected. In the next section we will cover the approaches we implemented to queue as close to limits as possible without increasing the risk of exceeding traffic limits.

Allocating more slots when traffic is low relative to waiting room limits

The first case we wanted to address was queuing that occurs when traffic is far from limits. While rare and typically lasting for one refresh interval (20s) for the end users who are queued, this was our first priority when updating our queuing algorithm. To solve this, while allocating slots we looked at the utilization (how far you are from traffic limits) and allotted more slots when traffic is really far away from the limits. The idea behind this was to prevent the queueing that happens at lower limits while still being able to readjust slots available per worker when there are more users on the origin.

To understand this let's revisit the example where there is non-uniform distribution of traffic to two workers. So two workers similar to the one we discussed before are shown below. In this case the utilization is low (10%). This means we are far from the limits. So the slots allocated(8) are closer to the slotsAvailable for the datacenter San Jose which is 10. As you can see in the diagram below, all the eight users that go to either worker get to reach the website with this modified slot allocation as we are providing more slots per worker at lower utilization levels.

The diagram below shows how the slots allocated per worker changes with utilization (how far you are away from limits). As you can see here, we are allocating more slots per worker at lower utilization. As the utilization increases, the slots allocated per worker decrease as it’s getting closer to the limits, and we are better prepared for spikes in traffic. At 10% utilization every worker gets close to the slots available for the data center. As the utilization is close to 100% it becomes close to the slots available divided by worker count in the data center.

How do we achieve more slots at lower utilization?

This section delves into the mathematics which helps us get there. If you are not interested in these details, meet us at the “Risk of over provisioning” section.

To understand this further, let's revisit the previous example where requests come to the Delhi data center. The activeUsers value is 50, so utilization is 50/200 which is around 25%.

{
  "activeUsers": 50,
  "globalWorkersActive": 10,
  "dataCenterWorkersActive": 1,
  "trafficHistory": {
    "Mon, 11 Sep 2023 11:44:00 GMT": {
       San Jose: 20/200, // 10%
       London: 30/200, // 15%
       Anywhere: 150/200 // 75%
    }
  }
}

The idea is to allocate more slots at lower utilization levels. This ensures that customers do not see unexpected queueing behaviors when traffic is far away from limits. At time Mon, 11 Sep 2023 11:45:54 GMT requests to Delhi are at 25% utilization based on the local state key.

To allocate more slots to be available at lower utilization we added a workerMultiplier which moves proportionally to the utilization. At lower utilization the multiplier is lower and at higher utilization it is close to one.

workerMultiplier = (utilization)^curveFactor
adaptedWorkerCount = actualWorkerCount * workerMultiplier

utilization – how far away from the limits you are.

curveFactor – is the exponent which can be adjusted which decides how aggressive we are with the distribution of extra budgets at lower worker counts. To understand this let's look at the graph of how y = x and y = x^2 looks between values 0 and 1.

The graph for y=x is a straight line passing through (0, 0) and (1, 1).

The graph for y=x^2 is a curved line where y increases slower than x when x < 1 and passes through (0, 0) and (1, 1)

Using the concept of how the curves work, we derived the formula for workerCountMultiplier where y=workerCountMultiplier, x=utilization and curveFactor is the power which can be adjusted which decides how aggressive we are with the distribution of extra budgets at lower worker counts. When curveFactor is 1, the workerMultiplier is equal to the utilization.

Let's come back to the example we discussed before and see what the value of the curve factor will be. At time Mon, 11 Sep 2023 11:45:54 GMT requests to Delhi are at 25% utilization based on the local state key. The Anywhere slots are divided among all the active workers in the globe as any worker around the world can take a part of this pie. i.e. 75% of the remaining 150 slots (113).

globalWorkersActive is 10 when we look at the waiting room state. In this case we do not divide the 113 slots by 10 but instead divide by the adapted worker count which is globalWorkersActive * workerMultiplier. If curveFactor is 1, the workerMultiplier is equal to the utilization which is at 25% or 0.25.

So effective workerCount = 10 * 0.25 = 2.5

So, every active worker can send as many as 113/2.5 which is approximately 45 users. The first 45 users that come to a worker in the minute Mon, 11 Sep 2023 11:45:00 GMT gets admitted to the origin. The extra users get queued.

Therefore, at lower utilization (when traffic is farther from the limits) each worker gets more slots. But, if the sum of slots are added up, there is a higher chance of exceeding the overall limit.

Risk of over provisioning

The method of giving more slots at lower limits decreases the chances of queuing when traffic is low relative to traffic limits. However, at lower utilization levels a uniform spike happening around the world could cause more users to go into the origin than expected. The diagram below shows the case where this can be an issue. As you can see the slots available are ten for the data center. At 10% utilization we discussed before, each worker can have eight slots each. If eight users show up at one worker and seven show up at another, we will be sending fifteen users to the website when only ten are the maximum available slots for the data center.

With the range of customers and types of traffic we have, we were able to see cases where this became a problem. A traffic spike from low utilization levels could cause overshooting of the global limits. This is because we are over provisioned at lower limits and this increases the risk of significantly exceeding traffic limits. We needed to implement a safer approach which would not cause limits to be exceeded while also decreasing the chance of queueing when traffic is low relative to traffic limits.

Taking a step back and thinking about our approach, one of the assumptions we had was that the traffic in a data center directly correlates to the worker count that is found in a data center. In practice what we found is that this was not true for all customers. Even if the traffic correlates to the worker count, the new users going to the workers in the data centers may not correlate. This is because the slots we allocate are for new users but the traffic that a data center sees consists of both users who are already on the website and new users trying to go to the website.

In the next section we are talking about an approach where worker counts do not get used and instead workers communicate with other workers in the data center. For that we introduced a new service which is a durable object counter.

Decrease the number of times we divide the slots by introducing Data Center Counters

From the example above, we can see that overprovisioning at the worker level has the risk of using up more slots than what is allotted for a data center. If we do not over provision at low levels we have the risk of queuing users way before their configured limits are reached which we discussed first. So there has to be a solution which can achieve both these things.

The overprovisioning was done so that the workers do not run out of slots quickly when an uneven number of new users reach a bunch of workers. If there is a way to communicate between two workers in a data center, we do not need to divide slots among workers in the data center based on worker count. For that communication to take place, we introduced counters. Counters are a bunch of small durable object instances that do counting for a set of workers in the data center.

To understand how it helps with avoiding usage of worker counts, let's check the diagram below. There are two workers talking to a Data Center Counter below. Just as we discussed before, the workers let users through to the website based on the waiting room state. The count of the number of users let through was stored in the memory of the worker before. By introducing counters, it is done in the Data Center Counter. Whenever a new user makes a request to the worker, the worker talks to the counter to know the current value of the counter. In the example below for the first new request to the worker the counter value received is 9. When a data center has 10 slots available, that will mean the user can go to the website. If the next worker receives a new user and makes a request just after that, it will get a value 10 and based on the slots available for the worker, the user will get queued.

The Data Center Counter acts as a point of synchronization for the workers in the waiting room. Essentially, this enables the workers to talk to each other without really talking to each other directly. This is similar to how a ticketing counter works. Whenever one worker lets someone in, they request tickets from the counter, so another worker requesting the tickets from the counter will not get the same ticket number. If the ticket value is valid, the new user gets to go to the website. So when different numbers of new users show up at workers, we will not over allocate or under allocate slots for the worker as the number of slots used is calculated by the counter which is for the data center.

The diagram below shows the behavior when an uneven number of new users reach the workers, one gets seven new users and the other worker gets one new user. All eight users that show up at the workers in the diagram below get to the website as the slots available for the data center is ten which is below ten.

This also does not cause excess users to get sent to the website as we do not send extra users when the counter value equals the slotsAvailable for the data center. Out of the fifteen users that show up at the workers in the diagram below ten will get to the website and five will get queued which is what we would expect.

Risk of over provisioning at lower utilization also does not exist as counters help workers to communicate with each other.

To understand this further, let's look at the previous example we talked about and see how it works with the actual waiting room state.

The waiting room state for the customer is as follows.

{  
  "activeUsers": 50,
  "globalWorkersActive": 10,
  "dataCenterWorkersActive": 3,
  "trafficHistory": {
    "Mon, 11 Sep 2023 11:44:00 GMT": {
       San Jose: 20/200, // 10%
       London: 30/200, // 15%
       Anywhere: 150/200 // 75%
    }
  }
}

The objective is to not divide the slots among workers so that we don’t need to use that information from the state. At time Mon, 11 Sep 2023 11:45:54 GMT requests come to San Jose. So, we can send 10% of 150 slots available from San Jose which is 15.

The durable object counter at San Jose keeps returning the counter value it is at right now for every new user that reaches the data center. It will increment the value by 1 after it returns to a worker. So the first 15 new users that come to the worker get a unique counter value. If the value received for a user is less than 15 they get to use the slots at the data center.

Once the slots available for the data center runs out, the users can make use of the slots allocated for Anywhere data-centers as these are not reserved for any particular data center. Once a worker in San Jose gets a ticket value that says 15, it realizes that it's not possible to go to the website using the slots from San Jose.

The Anywhere slots are available for all the active workers in the globe i.e. 75% of the remaining 150 slots (113). The Anywhere slots are handled by a durable object that workers from different data centers can talk to when they want to use Anywhere slots. Even if 128 (113 + 15) users end up going to the same worker for this customer we will not queue them. This increases the ability of Waiting Room to handle an uneven number of new users going to workers around the world which in turn helps the customers to queue close to the configured limits.

Why do counters work well for us?

When we built the Waiting Room, we wanted the decisions for entry into the website to be made at the worker level itself without talking to other services when the request is in flight to the website. We made that choice to avoid adding latency to user requests. By introducing a synchronization point at a durable object counter, we are deviating from that by introducing a call to a durable object counter.

However, the durable object for the data center stays within the same data center. This leads to minimal additional latency which is usually less than 10 ms. For the calls to the durable object that handles Anywhere data centers, the worker may have to cross oceans and long distances. This could cause the latency to be around 60 or 70 ms in those cases. The 95th percentile values shown below are higher because of calls that go to farther data centers.

The design decision to add counters adds a slight extra latency for new users going to the website. We deemed the trade-off acceptable because this reduces the number of users that get queued before limits are reached. In addition, the counters are only required when new users try to go into the website. Once new users get to the origin, they get entry directly from workers as the proof of entry is available in the cookies that the customers come with, and we can let them in based on that.

Counters are really simple services which do simple counting and do nothing else. This keeps the memory and CPU footprint of the counters minimal. Moreover, we have a lot of counters around the world handling the coordination between a subset of workers.This helps counters to successfully handle the load for the synchronization requirements from the workers. These factors add up to make counters a viable solution for our use case.

Summary

Waiting Room was designed with our number one priority in mind–to ensure that our customers’ sites remain up and running, no matter the volume or ramp up of legitimate traffic. Waiting Room runs on every server in Cloudflare’s network, which spans over 300 cities in more than 100 countries. We want to make sure, for every new user, the decision whether to go to the website or the queue is made with minimal latency and is done at the right time. This decision is a hard one as queuing too early at a data center can cause us to queue earlier than the customer set limits. Queuing too late can cause us to overshoot the customer set limits.

With our initial approach where we divide slots among our workers evenly we were sometimes queuing too early but were pretty good at respecting customer set limits. Our next approach of giving more slots at low utilization (low traffic levels compared to customer limits) ensured that we did better at the cases where we queued earlier than the customer set limits as every worker has more slots to work with at each worker. But as we have seen, this made us more likely to overshoot when a sudden spike in traffic occurred after a period of low utilization.

With counters we are able to get the best of both worlds as we avoid the division of slots by worker counts. Using counters we are able to ensure that we do not queue too early or too late based on the customer set limits. This comes at the cost of a little bit of latency to every request from a new user which we have found to be negligible and creates a better user experience than getting queued early.

We keep iterating on our approach to make sure we are always queuing people at the right time and above all protecting your website. As more and more customers are using the waiting room, we are learning more about different types of traffic and that is helping the product be better for everyone.

Understand the impact of your waiting room’s settings with Waiting Room Analytics

2023-06-07 Arielle Olache

Post Syndicated from Arielle Olache original http://blog.cloudflare.com/understand-the-impact-of-your-waiting-rooms-settings-with-waiting-room-analytics/

Understand the impact of your waiting room’s settings with Waiting Room Analytics

In January 2021, we gave you a behind-the-scenes look at how we built Waiting Room on Cloudflare’s Durable Objects. Today, we are thrilled to announce the launch of Waiting Room Analytics and tell you more about how we built this feature. Waiting Room Analytics offers insights into end-user experience and provides visualizations of your waiting room traffic. These new metrics enable you to make well-informed configuration decisions, ensuring an optimal end-user experience while protecting your site from overwhelming traffic spikes.

If you’ve ever bought tickets for a popular concert online you’ll likely have been put in a virtual queue. That’s what Waiting Room provides. It keeps your site up and running in the face of overwhelming traffic surges. Waiting Room sends excess visitors to a customizable virtual waiting room and admits them to your site as spots become available.

While customers have come to rely on the protection Waiting Room provides against traffic surges, they have faced challenges analyzing their waiting room’s performance and impact on end-user flow. Without feedback about waiting room traffic as it relates to waiting room settings, it was challenging to make Waiting Room configuration decisions.

Up until now, customers could only monitor their waiting room's status endpoint to get a general idea of waiting room traffic. This endpoint displays the current number of queued users, active users on the site, and the estimated wait time shown to the last user in line.

The status endpoint is still a great tool for at a glance understanding of the near real-time status of a waiting room. However, there were many questions customers had about their waiting room that were either difficult or impossible to answer using the status endpoint, such as:

How long did visitors wait in the queue?
What was my peak number of visitors?
How long was the pre-queue for my online event?
How did changing my waiting room's settings impact wait times?

Today, Waiting Room is ready to answer those questions and more with the launch of Waiting Room Analytics, available in the Waiting Room dashboard to all Business and Enterprise plans! We will show you the new waiting room metrics available and review how these metrics can help you make informed decisions about your waiting room's settings. We'll also walk you through the unique challenge of how we built Waiting Room Analytics on our distributed network.

How Waiting Room settings impact traffic

Before covering the newly available Waiting Room metrics, let's review some key settings you configure when creating a waiting room. Understanding these settings is essential as they directly impact your waiting room's analytics, traffic, and user experience.

When configuring a waiting room, you will first define traffic limits to your site by setting two values–Total active users and New users per minute. Total active users is a target threshold for how many simultaneous users you want to allow on the pages covered by your waiting room. Waiting Room will kick in as traffic ramps up to keep active users near this limit. The other value which will control the volume of traffic allowed past your waiting room is New users per minute. This setting defines the target threshold for the maximum rate of user influx to your application. Waiting Room will kick in when the influx accelerates to keep this rate near your limits. Queuing occurs when traffic is at or near your New users per minute or Total active users target values.

The two other settings which will impact your traffic flow and user wait times are Session duration and session renewal. The session duration setting determines how long it takes for end-user sessions to expire, thereby freeing up spots on your site. If you enable session renewal, users can stay on your site as long as they want, provided they make a request once every session_duration minutes. If you disable session renewal, users' sessions will expire after the duration you set for session_duration has run out. After the session expires, the user will be issued a new waiting room cookie upon their next request. If there is active queueing, this user will be placed in the back of the queue. Otherwise, they can continue browsing for another session_duration minutes.

Let's walk through the new analytics available in the Waiting Room dashboard, which allows you to see how these settings can impact waiting room throughput, how many users get queued, and how long users wait to enter your site from the queue.

Waiting Room Analytics in the dash

To access metrics for a waiting room, navigate to the Waiting Room dashboard, where you can find pre-built visualizations of your waiting room traffic. The dashboard offers at-a-glance metrics for the peak waiting room traffic over the last 24 hours.

To dig deeper and analyze up to 30 days of historical data, open your waiting room's analytics by selecting View more.

Alternatively, we've made it easy to hone in on analytics for a past waiting room event (within the last 30 days). You can automatically open the analytics dashboard to a past event's exact start and end time, including the pre-queueing period by selecting the blue link in the events table.

User insights

The first two metrics–Time in queue and Time on origin–provide insights into your end-users' experience and behavior. The time in queue values help you understand how long queued users waited before accessing your site over the time period selected. The time on origin values shed light on end-user behavior by displaying an estimate of the range of time users spend on your site before leaving. If session renewal is disabled, this time will max out at session_duration and reflect the time at which users are issued a new waiting room cookie. For both metrics, we provide time for both the typical user, represented by a range of the 50th and 75th percentile of users, as well as for the top 5% of users who spend the longest in the queue or on your site.

If session renewal is disabled, keeping an eye on Time on origin values is especially important. When sessions do not renew, once a user's session has expired, they are given a new waiting room cookie upon their next request. The user will be put at the back of the line if there is active queueing. Otherwise, they will continue browsing, but their session timer will start over and your analytics will never show a Time on origin greater than your configured session duration, even if individual users are on your site longer than the session duration. If session renewal is disabled and the typical time on origin is close to your configured session duration, this could be an indicator you may need to give your users more time to complete their journey before putting them back in line.

Analyze past waiting room traffic

Scrolling down the page, you will find visualizations of your user traffic compared to your waiting room's target thresholds for Total active users and New users per minute. These two settings determine when your waiting room will start queueing as traffic increases. The Total active users setting controls the number of concurrent users on your site, while the New users per minute threshold restricts the flow rate of users onto your site.

On the Active users graph, each bar represents the maximum number of queued users stacked on top of the maximum number of users on your site at that point in time. The example below shows how the waiting room kicked in at different times with respect to the active user threshold. The total length of the bar illustrates how many total users were either on the site or waiting to enter the site at that point in time, with a clear divide between those two values where the active user threshold kicked in. Hover over any bar to display a tooltip with the exact values for the period you are interested in.

Below the Active users chart is the New users per minute graph, which shows the rate of users entering your application per minute compared to your configured threshold. Make sure to review this graph to identify any surges in the rate of users to your application that may have caused queueing.

Adjusting settings with Waiting Room Analytics

By leveraging the insights provided by the analytics dashboard, you can fine-tune your waiting room settings while ensuring a safe and seamless user experience. A common concern for customers is longer than desired wait times during high traffic periods. We will walk through some guidelines for evaluating peak traffic and settings’ adjustments that could be made.

Identify peak traffic. The first step is to identify when peak traffic occurred. To do so, zoom out to 30 days or some time period inclusive of a known high traffic event. Reviewing the graph, locate a period of time where traffic peaked and use your cursor to highlight from the left to right of the peak. This will zoom in to that time period, updating all other values on the analytics page.

Evaluate wait times. Now that you have honed in on the time period of peak traffic, review the Time in queue metric to analyze if the wait times during peak traffic were acceptable. If you determine that wait times were significantly longer than you had anticipated, consider the following options to reduce wait times for your next traffic peak.

Decrease session duration when session renewal is enabled. This is a safe option as it does not increase the allowed load to your site. By decreasing this duration, you decrease the amount of time it takes for spots to open up as users go idle. This is a good option if your customer journey is typically request heavy, such as a checkout flow. For other situations, such as video streaming or long-form content viewing, this may not be a good option as users may not make frequent requests even though they are not actually idle.

Disable session renewal. This option also does not increase the allowed load to your site. Disabling session renewal means that users will have session_duration minutes to stay on the site before being put back in the queue. This option is popular for high demand events such as product drops, where customers want to give as many users as possible a fair chance to participate and avoid inventory hoarding. When disabling session renewal, review your waiting room’s analytics to determine an appropriate session duration to set.

The Time on origin values will give you an idea of how long users need before leaving your site. In the example below, the session duration is set to 10 minutes but even the users who spend the longest only spend around 5 minutes on the site. With the session renewal disabled, this customer could reduce wait times by decreasing the session duration to 5 minutes without disruption to most users, allowing for more users to get access.

Adjust Total active users or New users per minute settings. Lastly, you can decrease wait times by increasing your waiting room’s traffic limits–Total active users or New users per minute. This is the most sure-fire way to reduce wait times but it also requires more consideration. Before increasing either limit, you will need to evaluate if it is safe to do so and make small, iterative adjustments to these limits, monitoring certain signals to ensure your origin is still able to handle the load. A few things to consider monitoring as you adjust settings are origin CPU usage and memory utilization, and increases in 5xx errors which can be reviewed in Cloudflare’s Web Analytics tab. Analyze historical traffic patterns during similar events or periods of high demand. If you observe that previous traffic surges were successfully managed without site instability or crashes, it provides a strong signal that you can consider increasing waiting room limits.

Utilize the Active user chart as well as the New users per minute chart to determine which limit is primarily responsible for triggering queuing so that you know which limit to adjust. After considering these signals and making adjustments to your waiting room’s traffic limits, closely monitor the impact of these changes using the waiting room analytics dashboard. Continuously assess the performance and stability of your site, keeping an eye on server metrics and user feedback.

How we built Waiting Room Analytics

At Cloudflare, we love to build on top of our own products. Waiting Room is built on Workers and Durable Objects. Workers have the ability to auto-scale based on the request rate. They are built using isolates enabling them to spin up hundreds or thousands of isolates within 5 milliseconds without much overhead. Every request that goes to an application behind a waiting room, goes to a Worker.

We optimized the way in which we track end users visiting the application while maintaining their position in the queue. Tracking every user individually would incur more overhead in terms of maintaining state, consuming more CPU & memory. Instead, we decided to divide the users into buckets based on the timestamp of the first request made by the end user. For example, all the users who visited a waiting room between 00:00:00 and 00:00:59 for the first time are assigned to the bucket 00:00:00. Every end user gets a unique encrypted cookie on the first request to the waiting room. The contents of the cookie keep getting updated based on the status of the users in the queue. Once the end user is accepted to the origin, we set the cookie expiry to session_duration minutes, which can be set in the dashboard, from the last request timestamp. In the cookie we track the timestamp of when the end user joined the queue which is used in the calculation of time waited in queue.

Collection of metrics in a distributed environment

In a distributed environment, the challenge when building out analytics is to collect data from multiple nodes and aggregate them. Each worker running at every data center sees user requests and needs to report metrics based on those to another coordinating service at every data center. The data aggregation could have been done in two ways.

i) Writing data from every worker when a request is received
In this design, every worker that receives a request is responsible for reporting the metrics. This would enable us to write the raw data to our analytics pipeline. We would not have the overhead of aggregating the data before writing. This would mean that we would write data for every request, but Waiting Room configurations are minute based and every user is put into a bucket based on the timestamp of the first request. All our configurations are minute and user based and the data written from workers is not related to time or user.

ii) Using the existing aggregation pipeline
Waiting Room is designed in such a way that we do not track every request, instead we group users into buckets based on the first time we saw them. This is tracked in the cookie that is issued to the user when they make the first request. In the current system, the data reported by the workers is sent upstream to the data center coordinator which is responsible for aggregating the data seen from all the workers for that particular data center. This aggregated data is then further processed and sent upstream to the global Durable Object which aggregates the data from all the other data centers. This data is used for making decisions whether to queue the user or to send them to the origin.

We decided to use the existing pipeline that is used for Waiting Room routing for analytics. Data aggregated this way provides more value to customers as it matches the model we use for routing decisions. Therefore, customers can see directly how changing their settings affects waiting room behavior. Also, it is an optimization in terms of space. Instead of writing analytics data per request, we are writing a pre-processed and aggregated analytics log every minute. This way the data is much less noisy.

Histograms

The metrics available via Cloudflare’s GraphQL API are a combination of configured values set by the customer when creating a waiting room and values that are computed based on traffic seen by a waiting room. Waiting Room aggregates data every minute for each metric based on the requests it sees. While some metrics like new users per minute, total active users are counts and can be pre-processed and aggregated with a simple summation, metrics like time on origin and total time waited in queue cannot simply be added together into a single metric.

For example, there could be users who waited in the queue for four minutes and there could be a new user who joined the queue two minutes ago. However, if we simply sum these two data points, it would not make sense because six minutes would be an incorrect representation of the total time waited in the queue. Therefore, to capture this value more accurately, we store the data in a histogram. This enables us to represent the typical case (50th percentile) and the approximate worst case (in this case 95th percentile) for that metric.

Intuitively, we decided to store time on origin and total time waited in queue into a histogram distribution so that we would be able to represent the data and calculate quantiles precisely. We used Hdr Histogram – A High Dynamic Range Histogram for recording the data in histograms. Hdr Histograms are scalable and we were able to record dynamic numbers of values with auto-resizing without inflating CPU, memory or introducing latency. The time to record a value in the Hdr Histograms range from 3-6 nanoseconds. Querying and recording values can be done in constant time. Two histogram data structures can simply be added together into a bigger histogram. Also, the histograms can be compressed and encoded/decoded into base64 strings. This enabled us to scalably pass the data structure within our internal services for further aggregation.

The memory footprint of hdr histograms is constant and depends on the size of the bucket, precision and range of the histogram. The size of the bucket we use is the default 32 bits bucket size. The precision of the histogram is set to include up to three significant digits. The histogram has a dynamic range of values enabled through the use of the auto-resize functionality. However, to ensure efficient data storage, a limit of 5,000 recorded points per minute has been imposed. Although this limit was chosen arbitrarily, it has proven to be adequate for storing data points transmitted from the workers to the Durable Objects on a minute-by-minute basis.

The requests to the website behind a Waiting Room go to a Cloudflare data center that is close to their location. Our workers from around the world record values in the histogram which is compressed and sent to the data center Durable Object periodically. The histograms from multiple workers are uncompressed and aggregated into a single histogram per data center. The resulting histogram is compressed and sent upstream to the Global Durable objects where the histograms from all data centers receiving traffic are uncompressed and aggregated. The resulting histogram is the final data structure which is used for statistical analysis. We directly query the aggregated histogram for the quantile values. The histogram objects were instantiated once at the start of the service, they were reset after every successful sync with the upstream service.

Writing data to the pipeline

The global Durable Object aggregates all the metrics, computes quantiles and sends the data to a worker which is responsible for analytics reporting. This worker reads data from Workers KV in order to get the Waiting Room configurations. All the metrics are aggregated into a single analytics message. These messages are written every minute to Clickhouse. We leveraged an internal version of Workers Analytics Engine in order to write the data. This allowed us to quickly write our logs to Clickhouse with minimum interactions with all the systems involved in the pipeline.

We write analytics events from the runtime in the form of blobs and doubles with a specific schema and the event data gets written to a Clickhouse cluster. We extract the data into a Clickhouse view and apply ABR to facilitate fast queries at any timescale. You can expand the time range to vary from 30 minutes to 30 days without any lag. ABR adaptively chooses the resolution of data based on the query. For example, it would choose a lower resolution for a long time range and vice versa. As of now, the analytics data is available in the Clickhouse table for 30 days, implying that you can not query data older than 30 days in the dashboard as well.

Sampling

Waiting Room Analytics samples the data in order to effectively run large queries while providing consistent response times. Indexing the data on Waiting Room id has enabled us to run quicker and more efficient scans, however we still need to elegantly handle unbounded data. To tackle this we use Adaptive Bit Rate which enables us to write the data at multiple resolutions (100%, 10%, 1%…) and then read the best resolution of the data. The sample rate “adapts” based on how long the query takes to run. If the query takes too long to run in 100% resolution, the next resolution is picked and so on until the first successful result. However, since we pre-process and aggregate data before writing to the pipeline, we expect 100% resolution of data on reads for shorter periods of time (up to 7 days). For a longer time range, the data will be sampled.

Get Waiting Room Analytics via GraphQL API

Lastly, to make metrics available to customers and to the Waiting Room dashboard, we exposed the analytics data available in Clickhouse via GraphQL API.

If you prefer to build your own dashboards or systems based on waiting room traffic data, then Waiting Room Analytics via GraphQL API is for you. Build your own custom dashboards using the GraphQL framework and use a GraphQL client such as GraphiQL to run queries and explore the schema.

The Waiting Room Analytics dataset can be found under the Zones Viewer as waitingRoomAnalyticsAdaptive and waitingRoomAnalyticsAdaptiveGroups. You can filter the dataset per zone, waiting_room_id and the request time period, see the dataset schema under ZoneWaitingRoomAnalyticsAdaptive. You can order the data by ascending or descending order of the metric values.

You can explore the dimensions under waitingRoomAnalyticsAdaptiveGroups that can be used to group the data based on time, Waiting Room id and so on. The "max", “min”, “avg”, “sum” functions give the maximum, minimum, average and sum values of a metric aggregated over a time period. Additionally, there is a function called "avgWeighted" that calculates the weighted average of the metric. This approach is used for metrics stored in histograms, such as the time spent on the origin and total time waited. Instead of using a simple average, the weighted average is computed to provide a more accurate representation. This approach takes into account the distribution and significance of different data points, ensuring a more precise analysis and interpretation of the metric.

For example, to evaluate the weighted average for time spent on origin, the value of total active users is used as a weight. To better illustrate this concept, let’s consider an example. Imagine there is a website behind a Waiting Room and we want to evaluate the average time spent on the origin over a certain time period, let’s say an hour. During this hour, the number of active users on the website fluctuates. At some points, there may be more users actively browsing the site while at other times the number of active users might decrease. To calculate the weighted average for the time spent on the origin, we take into account the number of total active users at each instant in time. The rationale behind this is that the more users are actively using the website, the more representative their time spent on origin becomes in relation to the overall user activity.

By incorporating the total active users as weights in the calculation, we give more importance to the time spent on the origin during periods when there are more users actively engaging with the website. This provides a more accurate representation of the average time spent on the origin, accounting for variations in user activity throughout the designated time period.

The value of new users per minute is used as a weight to compute the weighted average for total time waited in queue. This is because when we talk about the total time weighted in the queue, the value of new users per minute for that instant in time takes importance as it signifies the number of users that joined the queue and certainly went into the origin.

You can apply these aggregation functions to the list of metrics exposed under each function. However, if you just want the logs per minute for a time period, rather than the breakdown of the time period (minute, fifteen minutes, hours), you can remove the datetime dimension from the query. For a list of sample queries to get you started, refer to our dev docs.

Below is a query to calculate the average, maximum and minimum of total active users, estimated wait time, total queued users and session duration every fifteen minutes. It also calculates the weighted average of time spent in queue and time spent on origin. The query is done on the zone level. The response is obtained in a JSON format.

Following is an example query to find the weighted averages of time on origin (50th percentile) and total time waited (90th percentile) for a certain period and aggregate this data over one hour.

{
 viewer {
   zones(filter: {zoneTag: "example-zone"}) {
     waitingRoomAnalyticsAdaptiveGroups(limit: 10, filter: {datetime_geq: "2023-03-15T04:00:00Z", datetime_leq: "2023-03-15T04:45:00Z", waitingRoomId: "example-waiting-room-id"}, orderBy: [datetimeHour_ASC]) {
       avgWeighted {
         timeOnOriginP50
         totalTimeWaitedP90
       }
       dimensions {
         datetimeHour
       }
     }

Sample Response

{
  "data": {
    "viewer": {
      "zones": [
        {
          "waitingRoomAnalyticsAdaptiveGroups": [
            {
              "avgWeighted": {
                "timeOnOriginP50": 83.83,
                "totalTimeWaitedP90": 994.45
              },
              "dimensions": {
                "datetimeHour": "2023-05-24T04:00:00Z"
              }
            }
          ]
        }
      ]
    }
  },
  "errors": null
}

You can find more examples in our developer documentation.
Waiting Room Analytics is live and available to all Business and Enterprise customers and we are excited for you to explore it! Don’t have a waiting room set up? Make sure your site is always protected from unexpected traffic surges. Try out Waiting Room today!

Introducing Waiting Room Bypass Rules

2023-01-19 Arielle Olache

Post Syndicated from Arielle Olache original https://blog.cloudflare.com/waiting-room-bypass-rules/

Introducing Waiting Room Bypass Rules

Leveraging the power and versatility of Cloudflare’s Ruleset Engine, Waiting Room now offers customers more fine-tuned control over their waiting room traffic. Queue only the traffic you want to with Waiting Room Bypass Rules, now available to all Enterprise customers with an Advanced Purchase of Waiting Room.

Customers depend on Waiting Room for always-on protection from unexpected and overwhelming traffic surges that would otherwise bring their site down. Waiting Room places excess users in a fully customizable virtual waiting room, admitting new visitors dynamically as spots become available on a customer’s site. Instead of throwing error pages or delivering poorly-performing site pages, Waiting Room empowers customers to take control of their end-user experience during unmanageable traffic surges.

Additionally, customers use Waiting Room Event Scheduling to manage user flow and ensure reliable site performance before, during, and after online events such as product restocks, seasonal sales, and ticket sales. With Event Scheduling, customers schedule changes to their waiting rooms’ settings and custom queuing page ahead of time, with options to pre-queue early arrivers and offload event traffic from their origins after the event has concluded.

As part of the simple, no-coding-necessary process for deploying a waiting room, customers specify a hostname and path combination, which defines the location of their waiting room on their site. When a site visitor makes a preliminary request to that hostname and path or any of its subpaths, they will be issued a waiting room cookie and placed in the queue if the waiting room is queuing at that time.

The hostname and path approach to defining the placement of a waiting room is intuitive and makes it easy to deploy a waiting room to a site. But, many customers needed more granular control over what traffic and parts of their site their waiting room did or did not cover under a waiting room’s configured hostname and path. Use cases for allowing specific traffic to bypass a waiting room were varied.

Examples included allowing internal administrators site access at all times, never blocking internal user agents performing operations like synthetic monitoring, not applying waiting room to specific subpaths or query strings, or queuing only traffic from certain countries. Given the diversity of our customers’ requests to exclude traffic from a waiting room’s coverage, we built a bypass feature that gave customers the versatility necessary to deploy waiting rooms aligned with their existing site architecture and use cases. With the release of Event Scheduling, Waiting Room customers could queue when they wanted to; now, we are excited to announce that customers now have more flexibility to queue who and where they want to with Waiting Room Bypass Rules.

Who gets queued is up to you

Waiting Room Bypass Rules allow customers to write expressions that define what traffic should bypass their waiting room. A waiting room will not apply to incoming requests matching conditions based on one or more available fields, like IP address, URI path, query string, and country. Bypass rules supersede all Waiting Room features; even in Queue-all mode, event pre-queuing, or any other waiting room state, traffic matching your enabled rules’ expressions will never be queued or issued a waiting room cookie. Waiting Room rules are created via the Waiting Room API or the Waiting Room dashboard using a familiar rule management interface found throughout the Cloudflare dashboard. Waiting Room rules are managed at the individual waiting room level for precise control over each waiting room’s traffic.

Built with the Ruleset Engine

We love building Cloudflare products on top of Cloudflare products; we knew that the versatility we wanted to offer to our customers with regard to what traffic a waiting room should apply to would best be achieved by integrating with Cloudflare’s powerful Ruleset Engine. The Ruleset Engine provides the infrastructure for many of our highly customizable products, such as the new Origin Rules feature and our WAF Managed Rulesets, by providing a unified way to represent the concept of “rules” and an easy-to-integrate software library for consistently executing those rules. This allows all sorts of Cloudflare products to provide extremely similar rules capabilities, with the same language for defining conditions (which can grow quite complex!) and very similar APIs.

We’ve even found that some of Waiting Room’s product functionality can be best implemented on top of the Ruleset Engine; earlier this year we migrated some select core Waiting Room logic into the Ruleset Engine, implemented as rulesets that we now transparently deploy to every zone with a waiting room. This newly migrated implementation has been running for months, and the flexibility of the Ruleset Engine will make certain future Waiting Room features easier than ever to build.

The Ruleset Engine works on two major concepts: rulesets and rules. A ruleset is a list of rules that the engine executes in order. A rule consists of a condition and an action, with the action being executed if the condition evaluates to “true”.

Under the hood, when you create the first rule for a waiting room, we create a hidden ruleset attached to that waiting room. We put together a ruleset of our own which runs on every request to your zone, dispatching to your custom ruleset if a request matches the attached waiting room. This all sounds somewhat complicated, but don’t worry. You simply manage rules at the individual waiting room level, while we abstract away the complexity of the underlying rulesets.

The Waiting Room Bypass action’s core implementation is quite simple: when the condition on one of your bypass rules is true for a request, the bypass action simply clears the flag on the request that would have told the waiting room code to kick in. This way, the bypass action ensures that Waiting Room doesn’t touch your request and the request doesn’t affect your waiting room’s statistics or queuing.

Bypass rules in action

Creating a bypass rule is easy and requires no coding or application changes. Let’s walk through a couple of real-world scenarios to demonstrate how easy it is to deploy a bypass rule from the Waiting Room dashboard or API.

Set up an Administrative IP Bypass Rule via the Waiting Room Dashboard

As mentioned before, many customers wanted to ensure that their site administrators and other internal employees–which they identify by IP address–could access their site at all times, regardless of a waiting room’s queueing status. Allowing unrestricted access to specific internal employees is especially important before online events when customers pre-queue early site visitors ahead of an event’s start time. Without the ability to bypass the waiting room before the event starts, internal employees could not review their event pages in a production environment, adding friction and uncertainty to their review process. Let’s see how we can fix that with a Waiting Room bypass rule.

Before setting up your administrator bypass rule, you can create an IP list if you have more than a handful of IPs you’d like to bypass your waiting room. Then, you will need to configure a waiting room from the Waiting Room dashboard. From that waiting room’s expanded view, navigate to Manage rules, where you can create, disable, and delete rules for a waiting room.

Using the rule builder, give your rule a descriptive name and build your expression. To build an expression for this example, we will select “IP Source Address” from the Field drop-down, “is in list” from the Operator drop-down, and then select the name of the IP list we created earlier. Once we’ve built the expression, we can either save this rule as a draft and deploy it later or save the rule and deploy it now.

Once saved, the rule will appear on this waiting room’s rules management page along with all other rules for this waiting room. All requests for which this rule’s expression evaluates to true will bypass the waiting room. Thus, any user with an IP address from this managed list will never be queued when this rule is enabled.

For added oversight, there are indicators on the Waiting Room dashboard that clearly signal if a waiting room has any bypass rules enabled. Now that we have deployed the admin bypass rule, we can see from the Waiting Room table that there is an active rule for this waiting room.

Bypass rules via the Waiting Room API – path bypass example

Another common customer request now achievable using Waiting Room Bypass Rules is path exclusions. As mentioned previously, a waiting room applies to all requests hitting the hostname and path combination of the configured waiting room and any URLs under that path. For many Waiting Room customers, there were specific URLs, paths, or query strings that they did not want a waiting room to apply to under their configured hostname and path. There are various reasons why a customer would want to make exceptions like this but let’s consider the following use case to illustrate the utility of bypassing specific parts of a site or application.

Consider a movie ticketing platform that wants to protect its ticketing web application from purchasing surges due to blockbuster releases. They create a waiting room to cover their ticketing web app by placing it at ticketing.example.com/. After tickets are purchased, the ticketing platform sends movie-goers an email or a text which links back to a URL under ticketing.example.com/. The URL the user receives via text or email is: ticketing.example.com/myaccount/mobiletickets/userverified?ticketid=<ticketID>

This link opens a page in the user’s mobile browser containing a QR code that the movie theater will scan in place of a physical ticket. They want to ensure that the waiting room does not apply to customers trying to open their mobile tickets.

To do this, they would create the following bypass rule via the Waiting Room API as follows:

With a waiting room already configured at ticketing.example.com/, use the following API call to create a bypass rule:

curl -X POST \"https://api.cloudflare.com/client/v4/zones/<ZONE_ID>/waiting_rooms/<ROOM_ID>/rules" \-H "Authorization: Bearer <API_TOKEN>" \
-d '{    
"description": "ticket holders bypass waiting room",    
"expression": "ends_with(http.request.uri.path, \"/userverified\")  
 "action": "bypass_waiting_room"
}'

Let’s break down this call. First, there is the URL, which needs to be populated with the zone id and the waiting room id. Then we pass our API token in the Authorization header. This call will create a new Waiting Room rule that will be added after any existing rules for this waiting room.

Within the body of the call, first define an optional description parameter. Give the rule an optional description to indicate the purpose of the rule for easy reference later.

"description": "ticket holders bypass waiting room"

Next, write the expression, which defines the exact traffic which should bypass the waiting room. In this example, customers using the direct link sent via text should bypass the waiting room. These direct links end with the path userverified so create an ends_with condition.

"expression": "ends_with(http.request.uri.path, \"/userverified\")

When creating a bypass rule for path, query string, or URLs, make sure to include in the expression exclusions for subrequests that load assets on the pages covered by a waiting room. Let’s assume in this example we are hosting assets on a different subdomain not covered by the waiting room. Therefore, we do not need to include these subrequests in the expression.

Lastly, set the action parameter to bypass_waiting_room to indicate that this traffic should bypass the waiting room and deploy the rule. This customer’s waiting room now covers precisely the parts of their application they want to cover. Their waiting room will protect their web application from ticket purchasing traffic while ensuring that customers who need to display their mobile tickets at the movie theater can do so reliably without being placed in a Waiting Room queue.

With the addition of Waiting Room Bypass Rules, customers now have more flexibility to deploy waiting rooms on their terms, to cover exactly the traffic they want. For more on Waiting Room Bypass Rules and Waiting Room, check out our developer documentation.

Not using Cloudflare yet? Start now with our Business plan which includes one basic Waiting Room or contact us about our advanced Waiting Room with Event Scheduling, Customized Templates, and Waiting Room Bypass Rules available on our Enterprise plan.

Waiting Room Event Scheduling protects your site during online events

2022-07-12 Arielle Olache

Post Syndicated from Arielle Olache original https://blog.cloudflare.com/waiting-room-event-scheduling/

Waiting Room Event Scheduling protects your site during online events

You’ve got big plans for your ecommerce strategy in the form of online events — seasonal sales, open registration periods, product drops, ticket sales, and more. With all the hype you’ve generated, you’ll get a lot of site traffic, and that’s a good thing! With Waiting Room Event Scheduling, you can protect your servers from being overloaded during your event while delivering a user experience that is unique to the occasion and consistent with your brand. Available now to enterprise customers with an advanced Waiting Room subscription, Event Scheduling allows you to plan changes to your waiting room’s settings and custom queueing page ahead of time, ensuring flawless execution of your online event.

More than always-on protection

We launched Waiting Room to protect our customers’ servers during traffic spikes. Waiting Room sends excess visitors to a virtual queue during traffic surges, letting visitors in dynamically as spots become available on your site. By automatically queuing traffic that exceeds your site’s capacity, Waiting Room protects your origin servers and your customer experience. Additionally, the Waiting Room’s queuing page can be customized to match the look and feel of your site so that your users never feel as though they have left your application, ensuring a seamless customer experience.

While many of our customers use Waiting Room as an always-on failsafe against potential traffic spikes, some customers use Waiting Room to manage traffic during time-boxed online events. While these events can undoubtedly result in traffic spikes that Waiting Room safeguards against, they present unique challenges for our customers.

In the lifecycle of an online event, various stages of the event generally require the settings of a waiting room to be updated. While each customer’s event requirements are unique, consider the following customer use cases. To prevent a mad rush to a landing page that could overwhelm their site ahead of an event, some customers want to queue early arrivers in the days or hours leading up to the event. During an event, some customers want to impose stricter limits on how long visitors have to browse and complete transactions to ensure that as many visitors as possible get a fair chance to partake. After an event has concluded, many customers want to offload event traffic, blocking access to the event pages while informing users that the event has ended.

For each of these scenarios, our customers want to communicate expectations to their users and customize the look and feel of their queuing page to ensure a seamless, on-brand user experience. Combine all the use cases in the example above into one timeline, and you’ve got at least three event stages that would require waiting room settings and queuing pages to be updated, all with perfect timing.

While these use cases were technically feasible with Waiting Room, they required on-the-spot updates to its configuration settings. This strategy was not ideal or practical when customers needed to be absolutely sure their waiting room would update in lockstep with the timeline of their event. In short, many customers needed to schedule changes to the behavior of their waiting room ahead of time. We built Event Scheduling to give our customers flexibility and control over when and how their waiting room settings change, ensuring that these changes will happen automatically as planned.

Introducing Waiting Room Event Scheduling

With Event Scheduling, you can schedule cascading changes to your waiting room ahead of time as discrete events. For each waiting room event, you can customize traffic thresholds, session duration, queuing method, and the content and styling of your queuing page. Refer to the Create Event API documentation, for a complete list of customizable event settings.

New Queuing Methods

Giving our customers the ability to schedule changes to a waiting room’s settings is a game-changer for customers with event-based Waiting Room requirements, but we didn’t stop there. We’ve also added two new queueing methods — Reject and Passthrough — to give our customers more options for controlling user flow before, during and after their online events.

In the example where customers wanted to offload site traffic after an event, the Reject queuing method would do just that! A waiting room with the Reject queuing method configured will offload traffic from your site, presenting users with a static, fully customizable HTML page. Conversely, the Passthrough queuing method allows all visitors unrestricted access to your site. Unlike simply disabling your waiting room to achieve this, Passthrough has the advantage of being scheduled to turn on or off ahead of time and providing user traffic stats through the Waiting Room status endpoint.

If you prefer to have the waiting room in a completely passive state, having the waiting room on and configured with the Passthrough queueing method allows you to turn on Queue All quickly. Queue All places all new site visitors in a queue, which is a life-saver in the case of unexpected site downtime or any other crisis. Before deactivating Queue All, you can see how many users are waiting to enter your site. Queue All also overrides any active waiting room event, giving you authoritative, fast control in an emergency.

Event Pre-queuing

As part of an event’s configuration, you can also enable a pre-queue, a virtual holding area for visitors that arrive at your event before its start time. Pre-queues add extra protection against traffic spikes that can cause your site to crash.

To illustrate how, imagine your customer is a devoted fan trying to get tickets to their favorite band’s upcoming concert. Ticket sales open in one hour, so they visit your sales page about ten minutes before sales open. There is a static landing page on your site where the ticket sales page will be. The fan starts refreshing the page in the minutes leading up to the start time, hoping to get access as soon as sales open. Now multiply that one hopeful concert-goer by many thousands of fans, and before your sale has even begun, your site is already overwhelmed and at risk of crashing. Having a pre-queue in place protects your site from this type of user activity that has the potential to overwhelm your site at a time when stakes are very high for your brand. And with the ability to fully customize the pre-queuing page, you can still generate the same excitement you would have with your event’s landing page.

Taking it a step further, you can elect to shuffle the pre-queue, randomly assigning users who reach your application during the pre-queue period a place in line when the event starts. If your event uses the First in First Out queuing method, randomizing your pre-queue can help promote fairness, especially if your pre-queuing period spans many time zones. Like the Random queuing method, implementing a randomized pre-queue neutralizes the advantage customers in earlier time zones have to grab a place in line ahead of the event’s start time. Ultimately, fairness for your event is unique to you and your customers’ perspectives and needs. With the order of entry options available for both the pre-queue and overflow queuing during your event, you have control over managing fairness of entry to align with your unique requirements.

Creating a waiting room event

Similarly to configuring a waiting room, scheduling events with Waiting Room is incredibly easy and requires no coding or application changes. You will first need to have a baseline waiting room configured. Then, you can schedule events for this waiting room from the Waiting Room dashboard. In the event creation workflow, you’ll indicate when you would like the event to start and end and configure an optional pre-queue.

Unless specified otherwise, your event will always inherit the configuration settings of its associated waiting room. That way, you only need to update the waiting room settings that you would like to change for the duration of the event. You can optionally create a queuing page for your users that is unique to the event and preview what your event queuing page will look like in different queuing states and browsers, ensuring that your end-user’s experience doesn’t result in garbled CSS or broken looking pages!

Before saving your event, you will be able to review your waiting room and event settings side by side, making it easy to verify how the behavior of your waiting room will change for the duration of the event.

Once created, your event will appear in the Waiting Room dashboard nested under its associated waiting room. The date of each waiting room’s next event is indicated in the dashboard’s default view so that you can tell at a glance if any waiting room’s settings may change due to an upcoming event. You can expand each waiting room’s row to list upcoming events and associated durations. Additionally, if an event is live, a green dot will appear next to this date, adding extra assurance that it has kicked in.

Event Scheduling in action

Tying it all together, let’s walk through a real-world scenario to demonstrate the versatility and practicality of Event Scheduling. In this example, we have a major women’s fashion retailer, let’s call them Shopflare, with an upcoming flash sale. A flash sale is an online sales event that is different from a regular sale in that it lasts for a brief time and offers substantial discounts or limited stock. Often, retailers target a specific audience using marketing campaigns in the days leading up to a flash sale.

In our example, Shopflare’s marketing team plans to send an email campaign to a set of target customers, promoting their Spring Flash Sale, where they will be offering free shipping and 40% off on their freshest spring arrivals for one day only! How could Shopflare use Waiting Room and Event Scheduling to help this sales event go off without a hitch?

Preparing for the flash sale

One week before the sale, Shopflare’s web team creates a landing page with a countdown for their spring flash sale at example.com/sales/spring_flash_sale. They place a waiting room at this URL with a First In First Out queuing method and their desired traffic thresholds to allow traffic while ensuring their site remains performant. They then send an email campaign to their target audience directly linking to the sale’s landing page. With their baseline waiting room in place, early traffic to the URL will not overwhelm their site. Shopflare’s team also prepares for the upcoming sale by scheduling two cascading waiting room events ahead of time. Let’s review Shopflare’s flash sale requirements related to Waiting Room and review the steps they would take with Event Scheduling to satisfy them.

Pre-queueing and event overflow queuing

A few hours before the sale starts, Shopflare wants to allow shoppers to start “lining up” to secure a spot ahead of those who arrive after the event start time. They want to create a lottery for these early arrivers, randomly assigning them a place in line when the sale starts to mitigate the advantage that customers from earlier time zones have to secure a spot in line. To do this, they would create an event for the waiting room they have already configured.

The event creation workflow consists of four main steps: Details, Settings, Customization, and Review. On the Details page of the event creation workflow, they would enter their sale start and end times, set the start time of the pre-queue and enable “Shuffle at Event Start” to create a randomized pre-queue.

While the sale is in progress, Shopflare wants an overflow queue to protect their site from being overwhelmed by traffic in excess of their waiting room limits, letting these users in First in First Out when spots open up on their site. Since their underlying waiting room is already configured with the traffic thresholds they want to enforce for the duration of the event, they would simply leave the Settings page of the event creation workflow unchanged and proceed to Customization.

On the Customization step, Shopflare will create a custom queuing experience for their sale by uploading a custom HTML template that contains the HTML for both their pre-queueing page and their overflow queue.

Shopflare wants their pre-queuing page to get shoppers excited about the beginning of the sale. They ensure it is branded and unique to the flash sale while setting clear expectations for shoppers. For their overflow queue, they want the same look and feel of their pre-queueing page, with updated messaging that gives shoppers an estimated wait time and explains the reason for queuing. Check out the two sample queuing pages below to see how they create a unique and informative experience for their queued customers in both the pre-queue and overflow queue.

Sale conclusion

Once the sale has ended, Shopflare wants to allow active shoppers a five-minute grace period to complete their purchases without admitting any more new visitors. For 48 hours post-sale, they would like to present all visitors with a static page letting them know the sale has concluded while providing a redirect link back to their homepage. To achieve this, Shopflare would create another event for the baseline waiting room that starts when the previous event ends without a pre-queue enabled.

To offload all new site traffic after the sale has ended while giving active shoppers a five-minute grace period, from the Settings page of the event creation workflow, they would set session duration to five minutes, disable session renewal and select the Reject All queuing method.

Once again, on the Customization tab, they would elect to override the underlying waiting room template with a custom event template and upload their custom Reject page HTML. They would then Review and save their event and it will appear along with their previously created event in the Waiting Room dashboard.

And that’s it! With their waiting room events in place, Shopflare can rest assured that their site will be protected and that their customers have an on-brand and transparent shopping experience on the big day. Each customer and online event is unique. However, you choose to manage your user traffic for your online event, Event Scheduling for Cloudflare Waiting Rooms offers the options necessary to deliver a stellar and fair user experience while protecting your application during your online event. We can’t wait to support you in your next online event!

For more on Event Scheduling and Waiting Room, check out our developer documentation.

…
We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet application, ward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Waiting Room: Random Queueing and Custom Web/Mobile Apps

2021-10-07 Tyler Caslin

Post Syndicated from Tyler Caslin original https://blog.cloudflare.com/waiting-room-random-queueing-and-custom-web-mobile-apps/

Waiting Room: Random Queueing and Custom Web/Mobile Apps

Today, we are announcing the general availability of Cloudflare Waiting Room to customers on our Enterprise plans, making it easier than ever to protect your website against traffic spikes. We are also excited to present several new features that have user experience in mind — an alternative queueing method and support for custom web/mobile applications.

First-In-First-Out (FIFO) Queueing

Whether you’ve waited to check out at a supermarket or stood in line at a bank, you’ve undoubtedly experienced FIFO queueing. FIFO stands for First-In-First-Out, which simply means that people are seen in the order they arrive — i.e., those who arrive first are processed before those who arrive later.

When Waiting Room was introduced earlier this year, it was first deployed to protect COVID-19 vaccine distributors from overwhelming demand — a service we offer free of charge under Project Fair Shot. At the time, FIFO queueing was the natural option due to its wide acceptance in day-to-day life and accurate estimated wait times. One problem with FIFO is that users who arrive later could see long estimated wait times and decide to abandon the website.

We take customer feedback seriously and improve products based on it. A frequent request was to handle users irrespective of the time they arrive in the Waiting Room. In response, we developed an additional approach: random queueing.

A New Approach to Fairness: Random Queueing

You can think of random queueing as participating in a raffle for a prize. In a raffle, people obtain tickets and put them into a big container. Later, tickets are drawn at random to determine the winners. The more time you spend in the raffle, the better your chances of winning at least once, since there will be fewer tickets in the container. No matter what, everyone participating in the raffle has an opportunity to win.

Similarly, in a random queue, users are selected from the Waiting Room at random, regardless of their initial arrival time. This means that you could be let into the application before someone who arrived earlier than you, or vice versa. Just like how you can buy more tickets in a raffle, joining a random queue earlier than someone else will give you more attempts to be accepted, but does not guarantee you will be let in. However, at any particular time, you will have the same chance to be let into the website as anyone else. This is different from a raffle, where you could have more tickets than someone else at a given time, providing you with an advantage.

Random queueing is designed to give everyone a fair chance. Imagine waking up excited to purchase new limited-edition sneakers only to find that the FIFO queue is five hours long and full of users that either woke up in the middle of the night to get in line or joined from earlier time zones. Even if you waited five hours, those sneakers would likely be sold out by the time you reach the website. In this case, you’d probably abandon the Waiting Room completely and do something else. On the other hand, if you were aware that the queue was random, you’d likely stick around. After all, you have a chance to be accepted and make a purchase!

As a result, random queueing is perfect for short-lived scenarios with lots of hype, such as product launches, holiday traffic, special events, and limited-time sales.

By contrast, when the event ends and traffic returns to normal, a FIFO queue is likely more suitable, since its widely accepted structure and accurate estimated wait times provide a consistent user experience.

How Does Random Queueing Work?

Perhaps the best part about random queueing is that it maintains the same internal structure that powers FIFO. As a result, if you change the queueing method in the dashboard — even when you may be actively queueing users — the transition to the new method is seamless. Imagine you have users 1, 2, 3, 4, and 5 waiting in a FIFO queue in the order 5 → 4 → 3 → 2 → 1, where user 1 will be the next user to access the application. Let’s assume you switch to random queueing. Now, any user can be accepted next. Let’s assume user 4 is accepted. If you decide to immediately switch back to FIFO queueing, the queue will reflect the order 5 → 3 → 2 → 1. In other words, transitioning from FIFO to random and back to FIFO will respect the initial queue positions of the users! But how does this work? To understand, we first need to remember how we built Waiting Room for FIFO.

Recall the Waiting Room configurations:

Total Active Users. The total number of active users that can be using the application at any given time.
New Users Per Minute. The maximum number of new users per minute that can be accepted to the application.

Next, remember that Waiting Room is powered by cookies. When you join the Waiting Room for the first time, you are assigned an encrypted cookie. You bring this cookie back to the Waiting Room and update it with every request, using it to prove your initial arrival time and status.

Properties in the Waiting Room cookie include:

bucketId. The timestamp rounded down to the nearest minute of the user’s first request to the Waiting Room. If you arrive at 10:23:45, you will be grouped into a bucket for 10:23:00.
acceptedAt. The timestamp when the user got accepted to the origin website for the first time.
refreshIntervalSeconds. When queueing, this is the number of seconds the user must wait before sending another request to the Waiting Room.
lastCheckInTime. The last time each user checked into the Waiting Room or origin website. When queueing, this is only updated for requests every refreshIntervalSeconds.

For any given minute, we can calculate the number of users we can let into the origin website. Let’s say we deploy a Waiting Room on “https://example.com/waitingroom” that can support 10,000 Total Active Users, and we allow up to 2,000 New Users Per Minute. If there are currently 7,000 active users on the website, we have 10,000 – 7,000 = 3,000 open slots. However, we need to take the minimum (3,000, 2,000) = 2,000 since we need to respect the New Users Per Minute limit. Thus, we have 2,000 available slots we can give out.

Let’s assume there are 2,500 queued users that joined over the last three minutes in groups of 500, 1,000, and 1,000, respectively for the timestamps 15:54, 15:55, and 15:56. To respect FIFO queueing, we will take our 2,000 available slots and try to reserve them for users who joined first. Thus, we will reserve 500 available slots for the users who joined at 15:54 and then reserve 1000 available slots for the users who joined at 15:55. When we get to the users for 15:56, we see that we only have 500 slots left, which is not enough for the 1,000 queued users for this minute:

{
	"activeUsers": 7000,
	"buckets": [{
			"key": "Thu, 27 May 2021 15:54:00 GMT",
			"data": {
				"waiting": 500,
				"reservedSlots": 500
			}
		},
		{
			"key": "Thu, 27 May 2021 15:55:00 GMT",
			"data": {
				"waiting": 1000,
				"reservedSlots": 1000
			}
		},
		{
			"key": "Thu, 27 May 2021 15:56:00 GMT",
			"data": {
				"waiting": 1000,
				"reservedSlots": 500
			}
		}
	]
}

Since we have reserved slots for all users with bucketIds of 15:54 and 15:55, they can be let into the origin website from any data center. However, we can only let in a subset of the users who initially arrived at 15:56.

Timestamp (bucketId)	Queued Users	Reserved Slots	Strategy
15:54	500	500	Accept all users
15:55	1,000	1,000	Accept all users
15:56	1,000	500	Accept subset of users

These 500 slots for 15:56 are allocated to each Cloudflare edge data center based on its respective historical traffic data, and further divided for each Cloudflare Worker within the data center. For example, let’s assume there are two data centers — Nairobi and Dublin — which share 60% and 40% of the traffic, respectively, for this minute. In this case, we will allocate 500 * .6 = 300 slots for Nairobi and 500 * .4 = 200 slots for Dublin. In Nairobi, let’s say there are 3 active workers, so we will grant each of them 300 / 3 = 100 slots. If you make a request to a worker in Nairobi and your bucketId is 15:56, you will be allowed in and consume a slot if the worker still has at least one of its 100 slots available. Since we have reserved all 2,000 available slots, users with bucketIds after 15:56 will have to continue queueing.

Let’s modify this case and assume we only have 200 queued users, all of which are in the 15:54 bucket. First, we reserve 200 slots for these queued users, leaving us 2,000 – 200 = 1,800 remaining slots. Since we have reserved slots for all queued users, we can use the remaining 1,800 slots on new users — people who have just made their first request to the Waiting Room and don’t have a cookie or bucketId yet. Similar to how we handle buckets with fewer slots than queued users, we will distribute these 1,800 slots to each data center, allocating 1,800 * .6 = 1,080 to Nairobi and 1,800 * .4 = 720 to Dublin. In Nairobi, we will split these equally across the 3 workers, giving them 1,080 / 3 = 360 slots each. If you are a new user making a request to a worker in Nairobi, you will be accepted and take a slot if the worker has at least one of its 360 slots available, otherwise you will be marked as a queued user and enter the Waiting Room.

Now that we have outlined the concepts for FIFO, we can understand how random queueing operates. Simply put, random queueing functions the same way as FIFO, except we pretend that every user is new. In other words, we will not look at reserved slots when making the decision if the user should be let in. Let’s revisit the last case with 200 queued users in the 15:54 bucket and 2,000 available slots. When random queueing, we allocate the full 2,000 slots to new users, meaning Nairobi gets 2,000 * .6 = 1,200 slots and each of its 3 workers gets 1,200 / 3 = 400 slots. No matter how many users are queued or freshly joining the Waiting Room, all of them will have a chance at taking these slots.

Finally, let’s reiterate that we are only pretending that all users are new — we still assign them to bucketIds and reserve slots as if we were FIFO queueing, but simply don’t make any use of this logic while random queueing is active. That way, we can maintain the same FIFO structure while we are random queueing so that if necessary, we can smoothly transition back to FIFO queueing and respect initial user arrival times.

How “Random” is Random Queueing?

Since random queueing is basically a race for available slots, we were concerned that it could be exploited if the available user slots and the queued user check-ins did not occur randomly.

To ensure all queued users can attempt to get into the website at the same rate, we store (in the encrypted cookie) the last time each user checked into the Waiting Room (lastCheckInTime) to prevent them from attempting to gain access to the website until a number of seconds have passed (refreshIntervalSeconds). This means that spamming the page refresh button will not give you an advantage over other queued users! Be patient — the browser will refresh automatically the moment you are eligible for another chance.

Next, let’s imagine five queued users checking into the Waiting Room every refreshIntervalSeconds=30 at approximately the :00 and :30 minute marks. A new queued user joins the Waiting Room and checks in at approximately :15 and :45. If new slots are randomly released, this new user will have about a 50% chance of being selected next, since it monopolizes over the :00-15 and :30-45 ranges. On the other hand, the other five queued users share the :15-30 and :45-00 ranges, giving them about a 50% / 5 = 10% chance each. Let’s consider that new slots are not randomly released and assume they are always released at :59. In this case, the new queued user will have virtually no chance to be selected before the other five queued users because these users will always check in one second later at :00, immediately consuming any newly released slots.

To address this vulnerability, we changed our implementation to ensure that slots are released randomly and encouraged users to check in at random offsets from each other. To help split up users that are checking in at similar times, we vary each user’s refreshIntervalSeconds by a small, pseudo-randomly generated offset for each check-in and store this new refresh interval in the encrypted Waiting Room cookie for validation on the next request. Thus, a user who previously checked in every 30 seconds might now check in after 29 seconds, then 31 seconds, then 27 seconds, and so on — but still averaging a 30-second refresh interval. Over time, these slight check-in variations become significant, spreading out user check-in times and strengthening the randomness of the queue. If you are curious to learn more about the apparent “randomness” behind mixing user check-in intervals, you can think of it as a chaotic system subjected to the butterfly effect.

Nevertheless, we weren’t convinced our efforts were enough and wanted to test random queueing empirically to validate its integrity. We conducted a simulation of 10,000 users joining a Waiting Room uniformly across 30 minutes. When let into the application, users spent approximately 1 minute “browsing” before they stopped checking in. We ran this experiment for both FIFO and random queueing and graphed each user’s observed wait time in seconds in the Waiting Room against the minute they initially arrived (starting from 0). Recall that users are grouped by minute using bucketIds, so each user’s arrival minute is truncated down to the current minute.

Based on our data, we can see immediately for FIFO queueing that, as the arrival minute increases, the observed wait time increases linearly. This makes sense for a FIFO queue, since the “line” will just get longer if there are more users entering the queue than leaving it. For each arrival minute, there is very little variation among user wait times, meaning that if you and your friend join a Waiting Room at approximately the same time, you will both be accepted around the same time. If you join a couple of minutes before your friend, you will almost always be accepted first.

When looking at the results for random queueing, we observe users experiencing varied wait times regardless of the arrival minute. This is expected, and helps prove the “randomness” of the random queue! We can see that, if you join five minutes after your friend, although your friend will have more chances to get in, you may still be accepted first! However, there are so many data points overlapping with each other in the plot that it is hard to tell how they are distributed. For instance, it could be possible that most of these data points experience extreme wait times, but as humans we aren’t able to tell.

As a result, we created heatmaps of these plots in Python using numpy.histogram2d and displayed them with matplotlib.pyplot:

import json
import numpy as np
import matplotlib.pyplot as plt
import sys
 
filename = sys.argv[1]
 
with open(filename) as file:
   data = json.load(file)
 
   x = data["ArrivalMinutes"]
   y = data["WaitTimeSeconds"]
 
   heatmap, _, _ = np.histogram2d(x, y, bins=(30, 30))
 
   plt.clf()
   plt.title(filename)
   plt.xlabel('Arrival Minute Buckets')
   plt.ylabel('WaitTime Buckets')
   plt.imshow(heatmap.T, origin='lower')
   plt.show()

The heatmaps display where the data points are concentrated in the original plot, using brighter (hotter) colors to represent areas containing more points:

By inspecting the generated heatmaps, we can conclude that FIFO and random queueing are working properly. For FIFO queueing, users are being accepted in the order they arrive. For random queueing, we can see that users are accepted to the origin regardless of arrival time. Overall, we can see the heatmap for random queueing is well distributed, indicating it is sufficiently random!

If you are curious why random queueing has very hot colors along the lowest wait times followed by very dark colors afterward, it is actually because of how we are simulating the queue. For the simulation, we spoofed the bucketIds of the users and let them all join the Waiting Room at once to see who would be let in first. In the random queueing heatmap, the bright colors along the lowest wait time buckets indicate that many users were accepted quickly after joining the queue across all bucketIds. This is expected, demonstrating that random queueing does not give an edge to users who join earlier, giving each user a fair chance regardless of its bucketId. The reason why these users were almost immediately accepted in WaitTime Bucket 0 is because this simulation started with no users on the origin, meaning new users would be accepted until the Waiting Room limits were reached. Since this first wave of accepted users “browsed” on the origin for a minute before leaving, no additional users during this time were let in. Thus, the colors are very dark for WaitTime Buckets 1 and 2. Similarly, the second wave of users is randomly selected afterward, followed by another period of time when no users were accepted in WaitTime Bucket 5. As the wait time increases, the more attempts a particular user will have to be let in, meaning it is unlikely for users to have extreme wait times. We can see this by observing the colors grow darker as the WaitTime Bucket approaches 29.

How Is Estimated Time Calculated for Random Queueing?

In a random queue, you can be accepted at any moment… so how can you display an estimated wait time? For a particular user, this is an impossible task, but when you observe all the users together, you can accurately account for most user experiences using a probabilistic estimated wait time range.

At any given moment, we know:

letInPerMinute. The current average users per minute being let into the origin.
currentlyWaiting. The current number of users waiting in the queue.

Therefore, we can calculate the probability of a user being let into the origin in the next minute:

P(LetInOverMinute) = letInPerMinute / currentlyWaiting

If there are 100 users waiting in the queue, and we are currently letting in 10 users per minute, the probability a user will be let in over the next minute is 10 / 100 = .1 (10%).

Using P(LetInOverMinute), we can determine the n minutes needed for a p chance of being let into the origin:

p = 1 – (1 – P(LetInOverMinute))ⁿ

Recall that the probability of getting in at least once is the complement of not getting in at all. The probability of not being let into the origin over n minutes is (1 – P(LetInOverMinutes))ⁿ. Therefore, the probability of getting in at least once is 1 – (1 – P(LetInOverMinute))ⁿ. This equation can be simplified further:

n = log(1 – p) / log(1 – P(LetInOverMinute))

Thus, if we want to calculate the estimated wait time to have a p = .5 (50%) chance of getting into the origin with the probability of getting let in during a particular minute P(LetInOverMinute) = .1 (10%), we calculate:

n = log(1 – .5) / log(1 – .1) ≈ 6.58 minutes or 6 minutes and 35 seconds

In this case, we estimate that 50% of users will wait less than 6 minutes and 35 seconds and the remaining 50% of users will wait longer than this.

So, which estimated wait times are displayed to the user? It is up to you! If you create a Mustache HTML template for a Waiting Room, you will now be able to use the variables waitTime25Percentile, waitTime50Percentile, and waitTime75Percentile to display the estimated wait times in minutes when p = .25, p = .5, and p = .75, respectively. There are also new variables that are used to display and determine the queueing method, such as queueingMethod, isFIFOQueue, and isRandomQueue. If you want to display something more dynamic like a custom view in a mobile app, keep reading to learn about our new JSON response, which provides a REST API for the same set of variables.

Supporting Dynamic Applications with a JSON Response

Before, customers could only deploy static Mustache HTML templates to customize the style of their Waiting Rooms. These templates work well for most use cases, but fall short if you want to display anything that requires state. Let’s imagine you’re queueing to buy concert tickets on your mobile device, and you see an embedded video of your favorite song. Naturally, you click on it and start singing along! A couple seconds later, the browser refreshes the page automatically to update your status in the Waiting Room, resetting your video to the start.

The purpose of the new JSON response is to give full control to a custom application, allowing it to determine what to display to the user and when to refresh. As a result, the application can maintain state and make sure your videos are never interrupted again!

Once the JSON response is enabled for a Waiting Room, any request to the Waiting Room with the header Accept: application/json will receive a JSON object with all the fields from the Mustache template.

An example request when the queueing method is FIFO:

curl -X GET "https://example.com/waitingroom" \
    -H "Accept: application/json"
{
    "cfWaitingRoom": {
        "inWaitingRoom": true,
        "waitTimeKnown": true,
        "waitTime": 10,
        "waitTime25Percentile": 0,
        "waitTime50Percentile": 0,
        "waitTime75Percentile": 0,
        "waitTimeFormatted": "10 minutes",
        "queueIsFull": false,
        "queueAll": false,
        "lastUpdated": "2020-08-03T23:46:00.000Z",
        "refreshIntervalSeconds": 20,
        "queueingMethod": "fifo",
        "isFIFOQueue": true,
        "isRandomQueue": false
    }
}

An example request when the queueing method is random:

curl -X GET "https://example.com/waitingroom" \
    -H "Accept: application/json"
{
    "cfWaitingRoom": {
        "inWaitingRoom": true,
        "waitTimeKnown": true,
        "waitTime": 10,
        "waitTime25Percentile": 5,
        "waitTime50Percentile": 10,
        "waitTime75Percentile": 15,
        "waitTimeFormatted": "5 minutes to 15 minutes",
        "queueIsFull": false,
        "queueAll": false,
        "lastUpdated": "2020-08-03T23:46:00.000Z",
        "refreshIntervalSeconds": 20,
        "queueingMethod": "random",
        "isFIFOQueue": false,
        "isRandomQueue": true
    }
}

A few important reminders before you get started:

Don’t forget that Waiting Room uses a cookie to maintain a user’s status! Without a cookie in the request, the Waiting Room will think the user has just joined the queue.
Don’t forget to refresh! Inspect the ‘Refresh’ HTTP response header or the refreshIntervalSeconds property and send another request to the Waiting Room after that number of seconds.
Keep in mind that if the user’s request is let into the origin, JSON may not necessarily be returned. To gracefully parse all responses, send JSON from the origin website if the header Accept: application/json is present. For example, the origin could return:

{
	"cfWaitingRoom": {
		"inWaitingRoom": false
	},
	"authToken": "abcd"
}

Embedding a Waiting Room in a Webpage: SameSite Cookies and IFrames

What are SameSite cookies and IFrames?

SameSite and Secure are attributes in the HTTP response Set-Cookie header. SameSite is used to determine when cookies are sent to a website while Secure indicates if there must be a secure context (HTTPS).

There are three different values of SameSite:

SameSite=Lax. This is the default value when the SameSite attribute is not present. Cookies are not sent on cross-site sub-requests unless the user is following a link to the third-party site. If you are on example1.com, cookies will not be sent to example2.com unless you click a link that navigates to example2.com.
SameSite=Strict. Cookies are sent only in first-party contexts. If you are on example1.com, cookies will never be sent to example2.com even if you click a link that navigates to example2.com.
SameSite=None. Cookies are sent for all contexts, but the Secure attribute must be set. If you are on example1.com, cookies will be sent to example2.com for all sub-requests. If Secure is not set, the browser will block the cookie.

IFrames (Inline Frames) allow HTML documents to embed other HTML documents, such as an advertisement, video, or webpage. When an application from a third-party website is rendered inside an IFrame, cookies will only be sent to it if SameSite=None is set.

Why is this all important? In the past, we did not set SameSite, meaning it defaulted to SameSite=Lax for all responses. As a result, a user queueing through an IFrame would never have its cookie updated and appear to the Waiting Room as joining for the first time on every request. Today, we are introducing customization for both the SameSite and Secure attributes, which will allow Waiting Rooms to be displayed in IFrames!

At the moment, this is only configurable through the Cloudflare API. By default, the configuration for SameSite and Secure will be set to “auto”, automatically selecting the most flexible option. In this case, SameSite will be set to None if Always Use HTTPS is enabled, otherwise it will be set to Lax. Similarly, Secure will only be set if Always Use HTTPS is enabled. In other words, Waiting Room IFrames will work properly by default as long as Always Use HTTPS is toggled. If you are wondering why Always Use HTTPS is used here, remember that SameSite=None requires that Secure is also set, or else the browser will block the Waiting Room cookie.

If you decide to manually configure the behavior of SameSite and Secure through the API, be careful! We do guard against setting SameSite=None without Secure, but if you decide to set Secure on every request (secure=”always”) and don’t have Always Use HTTPS enabled, this means that a user who sends an insecure (HTTP) request to the Waiting Room will have its cookie blocked by its browser!

If you want to explore using IFrames with Waiting Room yourself, here is a simple example of a Cloudflare Worker that renders the Waiting Room on “https://example.com/waitingroom” in an IFrame:

const html = `<!DOCTYPE html>
<html>
 <head>
   <meta charset="UTF-8" />
   <meta name="viewport" content="width=device-width,initial-scale=1" />
   <title>Waiting Room IFrame Example</title>
 </head>
 <body>
   <h1>Waiting Room IFrame!</h1>
   <iframe src="https://example.com/waitingroom" width="1200" height="700"></iframe>
 </body>
</html>
`
 
addEventListener('fetch', event => {
 event.respondWith(handleRequest(event.request))
})
 
async function handleRequest(request) {
 return new Response(html, {
   headers: { "Content-Type": "text/html" },
 })
}

Looking Forward

Waiting Room still has plenty of room to grow! Every day, we are seeing more Waiting Rooms deployed to protect websites from traffic spikes. As Waiting Room continues to be used for new purposes, we will keep adding features to make it as customizable and user-friendly as possible.

Stay tuned — what we have announced today is just the tip of the iceberg of what we have planned for Waiting Room!

Cloudflare and COVID-19: Project Fair Shot Update

2021-07-29 Brian Batraski

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/cloudflare-and-covid-19-project-fair-shot-update/

Cloudflare and COVID-19: Project Fair Shot Update

In February 2021, Cloudflare launched Project Fair Shot — a program that gave our Waiting Room product free of charge to any government, municipality, private/public business, or anyone responsible for the scheduling and/or dissemination of the COVID-19 vaccine.

By having our Waiting Room technology in front of the vaccine scheduling application, it ensured that:

Applications would remain available, reliable, and resilient against massive spikes of traffic for users attempting to get their vaccine appointment scheduled.
Visitors could wait for their long-awaited vaccine with confidence, arriving at a branded queuing page that provided accurate, estimated wait times.
Vaccines would get distributed equitably, and not just to folks with faster reflexes or Internet connections.

Since February, we’ve seen a good number of participants in Project Fair Shot. To date, we have helped more than 100 customers across more than 10 countries to schedule approximately 100 million vaccinations. Even better, these vaccinations went smoothly, with customers like the County of San Luis Obispo regularly dealing with more than 20,000 appointments in a day. “The bottom line is Cloudflare saved lives today. Our County will forever be grateful for your participation in getting the vaccine to those that need it most in an elegant, efficient and ethical manner” — Web Services Administrator for the County of San Luis Obispo.

We are happy to have helped not just in the US, but worldwide as well. In Canada, we partnered with a number of organizations and the Canadian government to increase access to the vaccine. One partner stated: “Our relationship with Cloudflare went from ‘Let’s try Waiting Room’ to ‘Unless you have this, we’re not going live with that public-facing site.’” — CEO of Verto Health. In another country in Europe, we saw over three million people go through the Waiting Room in less than 24 hours, leading to a significantly smoother and less stressful experience. Cities in Japan, — working closely with our partner, Classmethod — have been able to vaccinate over 40 million people and are on track to complete their vaccination process across 317 cities. If you want more stories from Project Fair Shot, check out our case studies.

We are continuing to add more customers to Project Fair Shot every day to ensure we are doing all that we can to help distribute more vaccines. With the emergence of the Delta variant and others, vaccine distribution (and soon, booster shots) is still very much a real problem to keep everyone healthy and resilient. Because of these new developments, Cloudflare will be extending Project Fair Shot until at least July 1, 2022. Though we are not excited to see the pandemic continue, we are humbled to be able to provide our services and be a critical part in helping us collectively move towards a better tomorrow.

Building Waiting Room on Workers and Durable Objects

2021-06-16 Fabienne Semeria

Post Syndicated from Fabienne Semeria original https://blog.cloudflare.com/building-waiting-room-on-workers-and-durable-objects/

Building Waiting Room on Workers and Durable Objects

In January, we announced the Cloudflare Waiting Room, which has been available to select customers through Project Fair Shot to help COVID-19 vaccination web applications handle demand. Back then, we mentioned that our system was built on top of Cloudflare Workers and the then brand new Durable Objects. In the coming days, we are making Waiting Room available to customers on our Business and Enterprise plans. As we are expanding availability, we are taking this opportunity to share how we came up with this design.

What does the Waiting Room do?

You may have seen lines of people queueing in front of stores or other buildings during sales for a new sneaker or phone. That is because stores have restrictions on how many people can be inside at the same time. Every store has its own limit based on the size of the building and other factors. If more people want to get inside than the store can hold, there will be too many people in the store.

The same situation applies to web applications. When you build a web application, you have to budget for the infrastructure to run it. You make that decision according to how many users you think the site will have. But sometimes, the site can see surges of users above what was initially planned. This is where the Waiting Room can help: it stands between users and the web application and automatically creates an orderly queue during traffic spikes.

The main job of the Waiting Room is to protect a customer’s application while providing a good user experience. To do that, it must make sure that the number of users of the application around the world does not exceed limits set by the customer. Using this product should not degrade performance for end users, so it should not add significant latency and should admit them automatically. In short, this product has three main requirements: respect the customer’s limits for users on the web application, keep latency low, and provide a seamless end user experience.

When there are more users trying to access the web application than the limits the customer has configured, new users are given a cookie and greeted with a waiting room page. This page displays their estimated wait time and automatically refreshes until the user is automatically admitted to the web application.

Configuring Waiting Rooms

The important configurations that define how the waiting room operates are:

Total Active Users – the total number of active users that can be using the application at any given time
New Users Per Minute – how many new users per minute are allowed into the application, and
Session Duration – how long a user session lasts. Note: the session is renewed as long as the user is active. We terminate it after Session Duration minutes of inactivity.

How does the waiting room work?

If a web application is behind Cloudflare, every request from an end user to the web application will go to a Cloudflare data center close to them. If the web application enables the waiting room, Cloudflare issues a ticket to this user in the form of an encrypted cookie.

At any given moment, every waiting room has a limit on the number of users that can go to the web application. This limit is based on the customer configuration and the number of users currently on the web application. We refer to the number of users that can go into the web application at any given time as the number of user slots. The total number of users slots is equal to the limit configured by the customer minus the total number of users that have been let through.

When a traffic surge happens on the web application the number of user slots available on the web application keeps decreasing. Current user sessions need to end before new users go in. So user slots keep decreasing until there are no more slots. At this point the waiting room starts queueing.

The chart above is a customer’s traffic to a web application between 09:40 and 11:30. The configuration for total active users is set to 250 users (yellow line). As time progresses there are more and more users on the application. The number of user slots available (orange line) in the application keeps decreasing as more users get into the application (green line). When there are more users on the application, the number of slots available decreases and eventually users start queueing (blue line). Queueing users ensures that the total number of active users stays around the configured limit.

To effectively calculate the user slots available, every service at the edge data centers should let its peers know how many users it lets through to the web application.

Coordination within a data center is faster and more reliable than coordination between many different data centers. So we decided to divide the user slots available on the web application to individual limits for each data center. The advantage of doing this is that only the data center limits will get exceeded if there is a delay in traffic information getting propagated. This ensures we don’t overshoot by much even if there is a delay in getting the latest information.

The next step was to figure out how to divide this information between data centers. For this we decided to use the historical traffic data on the web application. More specifically, we track how many different users tried to access the application across every data center in the preceding few minutes. The great thing about historical traffic data is that it’s historical and cannot change anymore. So even with a delay in propagation, historical traffic data will be accurate even when the current traffic data is not.

Let’s see an actual example: the current time is Thu, 27 May 2021 16:33:20 GMT. For the minute Thu, 27 May 2021 16:31:00 GMT there were 50 users in Nairobi and 50 in Dublin. For the minute Thu, 27 May 2021 16:32:00 GMT there were 45 users in Nairobi and 55 in Dublin. This was the only traffic on the application during that time.

Every data center looks at what the share of traffic to each data center was two minutes in the past. For Thu, 27 May 2021 16:33:20 GMT that value is Thu, 27 May 2021 16:31:00 GMT.

Thu, 27 May 2021 16:31:00 GMT: 
{
  Nairobi: 0.5, //50/100(total) users
  Dublin: 0.5,  //50/100(total) users
},
Thu, 27 May 2021 16:32:00 GMT: 
{
  Nairobi: 0.45, //45/100(total) users
  Dublin: 0.55,  //55/100(total) users
}

For the minute Thu, 27 May 2021 16:33:00 GMT, the number of user slots available will be divided equally between Nairobi and Dublin as the traffic ratio for Thu, 27 May 2021 16:31:00 GMT is 0.5 and 0.5. So, if there are 1000 slots available, Nairobi will be able to send 500 and Dublin can send 500.

For the minute Thu, 27 May 2021 16:34:00 GMT, the number of user slots available will be divided using the ratio 0.45 (Nairobi) to 0.55 (Dublin). So if there are 1000 slots available, Nairobi will be able to send 450 and Dublin can send 550.

The service at the edge data centers counts the number of users it let into the web application. It will start queueing when the data center limit is approached. The presence of limits for the data center that change based on historical traffic helps us to have a system that doesn’t need to communicate often between data centers.

Clustering

In order to let people access the application fairly we need a way to keep track of their position in the queue. A bucket has an identifier (bucketId) calculated based on the time the user tried to visit the waiting room for the first time. All the users who visited the waiting room between 19:51:00 and 19:51:59 are assigned to the bucketId 19:51:00. It’s not practical to track every end user in the waiting room individually. When end users visit the application around the same time, they are given the same bucketId. So we cluster users who came around the same time as one time bucket.

We mentioned an encrypted cookie that is assigned to the user when they first visit the waiting room. Every time the user comes back, they bring this cookie with them. The cookie is a ticket for the user to get into the web application. The content below is the typical information the cookie contains when visiting the web application. This user first visited around Wed, 26 May 2021 19:51:00 GMT, waited for around 10 minutes and got accepted on Wed, 26 May 2021 20:01:13 GMT.

{
  "bucketId": "Wed, 26 May 2021 19:51:00 GMT",
  "lastCheckInTime": "Wed, 26 May 2021 20:01:13 GMT",
  "acceptedAt": "Wed, 26 May 2021 20:01:13 GMT",
 }

Here

bucketId – the bucketId is the cluster the ticket is assigned to. This tracks the position in the queue.

acceptedAt – the time when the user got accepted to the web application for the first time.

lastCheckInTime – the time when the user was last seen in the waiting room or the web application.

Once a user has been let through to the web application, we have to check how long they are eligible to spend there. Our customers can customize how long a user spends on the web application using Session Duration. Whenever we see an accepted user we set the cookie to expire Session Duration minutes from when we last saw them.

Waiting Room State

Previously we talked about the concept of user slots and how we can function even when there is a delay in communication between data centers. The waiting room state helps to accomplish this. It is formed by historical data of events happening in different data centers. So when a waiting room is first created, there is no waiting room state as there is no recorded traffic. The only information available is the customer’s configured limits. Based on that we start letting users in. In the background the service (introduced later in this post as Data Center Durable Object) running in the data center periodically reports about the tickets it has issued to a co-ordinating service and periodically gets a response back about things happening around the world.

As time progresses more and more users with different bucketIds show up in different parts of the globe. Aggregating this information from the different data centers gives the waiting room state.

Let’s look at an example: there are two data centers, one in Nairobi and the other in Dublin. When there are no user slots available for a data center, users start getting queued. Different users who were assigned different bucketIds get queued. The data center state from Dublin looks like this:

activeUsers: 50,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 20,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 40,
    }
  }
]

The same thing is happening in Nairobi and the data from there looks like this:

activeUsers: 151,
buckets: 
[ 
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
    },
  } 
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 30,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 20,
    }
  }
]

This information from data centers are reported in the background and aggregated to form a data structure similar to the one below:

activeUsers: 201, // 151(Nairobi) + 50(Dublin)
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2, // 2 users from (Nairobi)
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT", 
    data: 
    {
      waiting: 50, // 20 from Nairobi and 30 from Dublin
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60, // 20 from Nairobi and 40 from Dublin
    }
  }
]

The data structure above is a sorted list of all the bucketIds in the waiting room. The waiting field has information about how many people are waiting with a particular bucketId. The activeUsers field has information about the number of users who are active on the web application.

Imagine for this customer, the limits they have set in the dashboard are

Total Active Users – 200
New Users Per Minute – 200

As per their configuration only 200 customers can be at the web application at any time. So users slots available for the waiting room state above are 200 – 201(activeUsers) = -1. So no one can go in and users get queued.

Now imagine that some users have finished their session and activeUsers is now 148.

Now userSlotsAvailable = 200 – 148 = 52 users. We should let 52 of the users who have been waiting the longest into the application. We achieve this by giving the eligible slots to the oldest buckets in the queue. In the example below 2 users are waiting from bucket Thu, 27 May 2021 15:54:00 GMT and 50 users are waiting from bucket Thu, 27 May 2021 15:55:00 GMT. These are the oldest buckets in the queue who get the eligible slots.

activeUsers: 148,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
      eligibleSlots: 2,
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 50,
      eligibleSlots: 50,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60,
      eligibleSlots: 0,
    }
  }
]

If there are eligible slots available for all the users in their bucket, then they can be sent to the web application from any data center. This ensures the fairness of the waiting room.

There is another case that can happen where we do not have enough eligible slots for a whole bucket. When this happens things get a little more complicated as we cannot send everyone from that bucket to the web application. Instead, we allocate a share of eligible slots to each data center.

key: "Thu, 27 May 2021 15:56:00 GMT",
data: 
{
  waiting: 60,
  eligibleSlots: 20,
}

As we did before, we use the ratio of past traffic from each data center to decide how many users it can let through. So if the current time is Thu, 27 May 2021 16:34:10 GMT both data centers look at the traffic ratio in the past at Thu, 27 May 2021 16:32:00 GMT and send a subset of users from those data centers to the web application.

Thu, 27 May 2021 16:32:00 GMT: 
{
  Nairobi: 0.25, // 0.25 * 20 = 5 eligibleSlots
  Dublin: 0.75,  // 0.75 * 20 = 15 eligibleSlots
}

Estimated wait time

When a request comes from a user we look at their bucketId. Based on the bucketId it is possible to know how many people are in front of the user’s bucketId from the sorted list. Similar to how we track the activeUsers we also calculate the average number of users going to the web application per minute. Dividing the number of people who are in front of the user by the average number of users going to the web application gives us the estimated time. This is what is shown to the user who visits the waiting room.

avgUsersToWebApplication:  30,
activeUsers: 148,
buckets: 
[  
  {
    key: "Thu, 27 May 2021 15:54:00 GMT",
    data: 
    {
      waiting: 2,
      eligibleSlots: 2,
    },
  }
  {
    key: "Thu, 27 May 2021 15:55:00 GMT",
    data: 
    {
      waiting: 50,
      eligibleSlots: 50,
    }
  },
  {
    key: "Thu, 27 May 2021 15:56:00 GMT",
    data: 
    {
      waiting: 60,
      eligibleSlots: 0,
    }
  }
]

In the case above for a user with bucketId Thu, 27 May 2021 15:56:00 GMT, there are 60 users ahead of them. With 30 activeUsersToWebApplication per minute, the estimated time to get into the web application is 60/30 which is 2 minutes.

Implementation with Workers and Durable Objects

Now that we have talked about the user experience and the algorithm, let’s focus on the implementation. Our product is specifically built for customers who experience high volumes of traffic, so we needed to run code at the edge in a highly scalable manner. Cloudflare has a great culture of building upon its own products, so we naturally thought of Workers. The Workers platform uses Isolates to scale up and can scale horizontally as there are more requests.

The Workers product has an ecosystem of tools like wrangler which help us to iterate and debug things quickly.

Workers also reduce long-term operational work.

For these reasons, the decision to build on Workers was easy. The more complex choice in our design was for the coordination. As we have discussed before, our workers need a way to share the waiting room state. We need every worker to be aware of changes in traffic patterns quickly in order to respond to sudden traffic spikes. We use the proportion of traffic from two minutes before to allocate user slots among data centers, so we need a solution to aggregate this data and make it globally available within this timeframe. Our design also relies on having fast coordination within a data center to react quickly to changes. We considered a few different solutions before settling on Cache and Durable Objects.

Idea #1: Workers KV

We started to work on the project around March 2020. At that point, Workers offered two options for storage: the Cache API and KV. Cache is shared only at the data center level, so for global coordination we had to use KV. Each worker writes its own key to KV that describes the requests it received and how it processed them. Each key is set to expire after a few minutes if the worker stopped writing. To create a workerState, the worker periodically does a list operation on the KV namespace to get the state around the world.

This design has some flaws because KV wasn’t built for a use case like this. The state of a waiting room changes all the time to match traffic patterns. Our use case is write intensive and KV is intended for read-intensive workflows. As a consequence, our proof of concept implementation turned out to be more expensive than expected. Moreover, KV is eventually consistent: it takes time for information written to KV to be available in all of our data centers. This is a problem for Waiting Room because we need fine-grained control to be able to react quickly to traffic spikes that may be happening simultaneously in several locations across the globe.

Idea #2: Centralized Database

Another alternative was to run our own databases in our core data centers. The Cache API in Workers lets us use the cache directly within a data center. If there is frequent communication with the core data centers to get the state of the world, the cached data in the data center should let us respond with minimal latency on the request hot path. There would be fine-grained control on when the data propagation happens and this time can be kept low.

As noted before, this application is very write-heavy and the data is rather short-lived. For these reasons, a standard relational database would not be a good fit. This meant we could not leverage the existing database clusters maintained by our in-house specialists. Rather, we would need to use an in-memory data store such as Redis, and we would have to set it up and maintain it ourselves. We would have to install a data store cluster in each of our core locations, fine tune our configuration, and make sure data is replicated between them. We would also have to create a proxy service running in our core data centers to gate access to that database and validate data before writing to it.

We could likely have made it work, at the cost of substantial operational overhead. While that is not insurmountable, this design would introduce a strong dependency on the availability of core data centers. If there were issues in the core data centers, it would affect the product globally whereas an edge-based solution would be more resilient. If an edge data center goes offline Anycast takes care of routing the traffic to the nearby data centers. This will ensure a web application will not be affected.

The Scalable Solution: Durable Objects

Around that time, we learned about Durable Objects. The product was in closed beta back then, but we decided to embrace Cloudflare’s thriving dogfooding culture and did not let that deter us. With Durable Objects, we could create one global Durable Object instance per waiting room instead of maintaining a single database. This object can exist anywhere in the world and handle redundancy and availability. So Durable Objects give us sharding for free. Durable Objects gave us fine-grained control as well as better availability as they run in our edge data centers. Additionally, each waiting room is isolated from the others: adverse events affecting one customer are less likely to spill over to other customers.

Implementation with Durable Objects
Based on these advantages, we decided to build our product on Durable Objects.

As mentioned above, we use a worker to decide whether to send users to the Waiting Room or the web application. That worker periodically sends a request to a Durable Object saying how many users it sent to the Waiting Room and how many it sent to the web application. A Durable Object instance is created on the first request and remains active as long as it is receiving requests. The Durable Object aggregates the counters sent by every worker to create a count of users sent to the Waiting Room and a count of users on the web application.

A Durable Object instance is only active as long as it is receiving requests and can be restarted during maintenance. When a Durable Object instance is restarted, its in-memory state is cleared. To preserve the in-memory data on Durable Object restarts, we back up the data using the Cache API. This offers weaker guarantees than using the Durable Object persistent storage as data may be evicted from cache, or the Durable Object can be moved to a different data center. If that happens, the Durable Object will have to start without cached data. On the other hand, persistent storage at the edge still has limited capacity. Since we can rebuild state very quickly from worker updates, we decided that cache is enough for our use case.

Scaling up
When traffic spikes happen around the world, new workers are created. Every worker needs to communicate how many users have been queued and how many have been let through to the web application. However, while workers automatically scale horizontally when traffic increases, Durable Objects do not. By design, there is only one instance of any Durable Object. This instance runs on a single thread so if it receives requests more quickly than it can respond, it can become overloaded. To avoid that, we cannot let every worker send its data directly to the same Durable Object. The way we achieve scalability is by sharding: we create per data center Durable Object instances that report up to one global instance.

The aggregation is done in two stages: at the data-center level and at the global level.

Data Center Durable Object
When a request comes to a particular location, we can see the corresponding data center by looking at the cf.colo field on the request. The Data Center Durable Object keeps track of the number of workers in the data center. It aggregates the state from all those workers. It also responds to workers with important information within a data center like the number of users making requests to a waiting room or number of workers. Frequently, it updates the Global Durable Object and receives information about other data centers as the response.

Worker User Slots

Above we talked about how a data center gets user slots allocated to it based on the past traffic patterns. If every worker in the data center talks to the Data Center Durable Object on every request, the Durable Object could get overwhelmed. Worker User Slots help us to overcome this problem.

Every worker keeps track of the number of users it has let through to the web application and the number of users that it has queued. The worker user slots are the number of users a worker can send to the web application at any point in time. This is calculated from the user slots available for the data center and the worker count in the data center. We divide the total number of user slots available for the data center by the number of workers in the data center to get the user slots available for each worker. If there are two workers and 10 users that can be sent to the web application from the data center, then we allocate five as the budget for each worker. This division is needed because every worker makes its own decisions on whether to send the user to the web application or the waiting room without talking to anyone else.

When the traffic changes, new workers can spin up or old workers can die. The worker count in a data center is dynamic as the traffic to the data center changes. Here we make a trade off similar to the one for inter data center coordination: there is a risk of overshooting the limit if many more workers are created between calls to the Data Center Durable Object. But too many calls to the Data Center Durable Object would make it hard to scale. In this case though, we can use Cache for faster synchronization within the data center.

Cache

On every interaction to the Data Center Durable Object, the worker saves a copy of the data it receives to the cache. Every worker frequently talks to the cache to update the state it has in memory with the state in cache. We also adaptively adjust the rate of writes from the workers to the Data Center Durable Object based on the number of workers in the data center. This helps to ensure that we do not take down the Data Center Durable Object when traffic changes.

Global Durable Object

The Global Durable Object is designed to be simple and stores the information it receives from any data center in memory. It responds with the information it has about all data centers. It periodically saves its in-memory state to cache using the Workers Cache API so that it can withstand restarts as mentioned above.

Recap

This is how the waiting room works right now. Every request with the enabled waiting room goes to a worker at a Cloudflare edge data center. When this happens, the worker looks for the state of the waiting room in the Cache first. We use cache here instead of Data Center Durable Object so that we do not overwhelm the Durable Object instance when there is a spike in traffic. Plus, reading data from cache is faster. The workers periodically make a request to the Data Center Durable Object to get the waiting room state which they then write to the cache. The idea here is that the cache should have a recent copy of the waiting room state.

Workers can examine the request to know which data center they are in. Every worker periodically makes a request to the corresponding Data Center Durable Object. This interaction updates the worker state in the Data Center Durable Object. In return, the workers get the waiting room state from the Data Center Durable Object. The Data Center Durable Object sends the data center state to the Global Durable Object periodically. In the response, the Data Center Durable Object receives all data center states globally. It then calculates the waiting room state and returns that state to a worker in its response.

The advantage of this design is that it’s possible to adjust the rate of writes from workers to the Data Center Durable Object and from the Data Center Durable Object to the Global Durable Object based on the traffic received in the waiting room. This helps us respond to requests during high traffic without overloading the individual Durable Object instances.

Conclusion

By using Workers and Durable Objects, Waiting Room was able to scale up to keep web application servers online for many of our early customers during large spikes of traffic. It helped keep vaccination sign-ups online for companies and governments around the world for free through Project Fair Shot: Verto Health was able to serve over 4 million customers in Canada; Ticket Tailor reduced their peak resource utilization from 70% down to 10%; the County of San Luis Obispo was able to stay online during traffic surges of up to 23,000 users; and the country of Latvia was able to stay online during surges of thousands of requests per second. These are just a few of the customers we served and will continue to serve until Project Fair Shot ends.

In the coming days, we are rolling out the Waiting Room to customers on our business plan. Sign up today to prevent spikes of traffic to your web application. If you are interested in access to Durable Objects, it’s currently available to try out in Open Beta.

Cloudflare Waiting Room

2021-01-22 Brian Batraski

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/cloudflare-waiting-room/

Cloudflare Waiting Room

Today, we are excited to announce Cloudflare Waiting Room! It will first be available to select customers through a new program called Project Fair Shot which aims to help with the problem of overwhelming demand for COVID-19 vaccinations causing appointment registration websites to fail. General availability in our Business and Enterprise plans will be added in the near future.

Wait, you’re excited about a… Waiting Room?

Most of us are familiar with the concept of a waiting room, and rarely are we excited about the idea of being in one. Usually our first experience of one is at a doctor’s office — yes, you have an appointment, but sometimes the doctor is running late (or one of the patients was). Given the doctor can only see one person at a time… the waiting room was born, as a mechanism to queue up patients.

While servers can handle more concurrent requests than a doctor can, they too can be overwhelmed. If, in a pre-COVID world, you’ve ever tried buying tickets to a popular concert or event, you’ve probably encountered a waiting room online. It limits requests inbound to an application, and places these requests into a virtual queue. Once the number of users in the application has reduced, new users are let in within the defined thresholds the application can handle. This protects the origin servers supporting the application from being inundated with too many requests, while also ensuring equity from a user perspective — users who try to access a resource when the system is overloaded are not unfairly dropped and forced to reconnect, hoping to join their chance in the queue.

Why Now?

Given not many of us are going to live concerts any time soon, why is Cloudflare doing this now?

Well, perhaps we aren’t going to concerts, but the second order effects of COVID-19 have created a huge need for waiting rooms. First of all, given social distancing and the closing of many places of business and government, customers and citizens have shifted to online channels, putting substantially more strain on business and government infrastructure.

Second, the pandemic and the flow-on consequences of it have meant many folks around the world have come to rely on resources that they didn’t need twelve months earlier. To be specific, these are often health or government-related resources — for example, unemployment insurance websites. The online infrastructure was set up to handle a peak load that didn’t foresee the impact of COVID-19. We’re seeing a similar pattern emerge with websites that are related to vaccines.

Historically, the number of organizations that needed waiting rooms was quite small. The nature of most businesses online usually involve a more consistent user load, rather than huge crushes of people all at once. Those organizations were able to build custom waiting rooms and were integrated deeply into their application (for example, buying tickets). With Cloudflare’s Waiting Room, no code changes to the application are necessary and a Waiting Room can be set up in a matter of minutes for any website without writing a single line of code.

Whether you are an engineering architect or a business operations analyst, setting up a Waiting Room is simple. We make it quick and easy to ensure your applications are reliable and protected from unexpected spikes in traffic. Other features we felt were important are automatic enablement and dynamic outflow. In other words, a waiting room should turn on automatically when thresholds are exceeded and as users finish their tasks in the application, let out different sized buckets of users and intake new ones already in the queue. It should just work. Lastly, we’ve seen the major impact COVID-19 has made on users and businesses alike, especially, but not limited to, the health and government sectors. We wanted to provide another way to ensure these applications remain available and functional so all users can receive the care that they need and not errors within their browser.

How does Cloudflare’s Waiting Room work?

We built Waiting Room on top of our edge network and our Workers product. By leveraging Workers and our new Durable Objects offerings, we were able to remove the need for any customer coding and provide a seamless, out of the box product that will ‘just work’. On top of this, we get the benefits of the scale and performance of our Workers product to ensure we maintain extremely low latency overhead, keep estimated times presented to end users accurate as can be and not keep any user in the queue longer than needed. But building a centralized system in a decentralized network is no easy task. When requests come into an application from around the world, we need to be able to get a broad, accurate view of what that load looks like inbound and outbound to a given application.

These requests, as fast as they are, still take time to travel across the planet. And so, a unique edge case was presented. What if a website is getting reasonable traffic from North America and Europe, but then a sudden major spike of traffic takes place from South America – how do we know when to keep letting users into the application and when to kick in the Waiting Room to protect the origin servers from being overloaded?

Thanks to some clever engineering and our Workers product, we were able to create a system that almost immediately keeps itself synced with global demand to an application giving us the necessary insight into when we should and should not be queueing users into the Waiting Room. By leveraging our global Anycast network and over 200+ data centers, we remove any single point of failure to protect our customers’ infrastructure yet also provide a great experience to end-users who have to wait a small amount of time to enter the application under high load.

How to setup a Waiting Room

Setting up a Waiting Room is incredibly easy and very fast! At the easiest side of the scale, a user needs to fill out only five fields: 1) the name of the Waiting Room, 2) a hostname (which will already be pre-populated with the zone it’s being configured on), 3) the total active users that can be in the application at any given time, 4) the new users per minute allowed into the application, and 5) the session duration for any given user. No coding or any application changes are necessary.

We provide the option of using our default Waiting Room template for customers who don’t want to add additional branding. This simplifies the process of getting a Waiting Room up and running.

That’s it! Press save and the Waiting Room is ready to go!

For customers with more time and technical ability, the same process is followed, except we give full customization capabilities to our users so they can brand the Waiting Room, ensuring it matches the look and feel of their overall product.

Lastly, managing different Waiting Rooms is incredibly easy. With our Manage Waiting Room table, at a glance you are able to get a full snapshot of which rooms are actively queueing, not queueing, and/or disabled.

We are very excited to put the power of our Waiting Room into the hands of our customers to ensure they continue to focus on their businesses and customers. Keep an eye out for another blog post coming soon with major updates to our Waiting Room product for Enterprise!

How bots impact the Waiting Room user experience

Quantifying bot traffic to Waiting Room with an invisible Turnstile challenge

New Infinite Queue feature

How Waiting Room integrates with Turnstile

Further improve wait times using session revocation

Introducing Session Revocation

Waiting Room site placement

Why customers needed broader Waiting Room coverage

Getting started with multi-host and path coverage

Deploying a multi-language waiting room

How we handle cookies with a multihostname and path waiting room

Approaches to setting multiple cookies

How was the Waiting Room built, and what are the challenges?

How a waiting room decides when to queue users

Queuing at the worker level can cause users to get queued before the slots available for the data center

What is the advantage of workers making these decisions?

Allocating more slots when traffic is low relative to waiting room limits

How do we achieve more slots at lower utilization?

Risk of over provisioning

Decrease the number of times we divide the slots by introducing Data Center Counters

Why do counters work well for us?

Summary

How Waiting Room settings impact traffic

Waiting Room Analytics in the dash

User insights

Analyze past waiting room traffic

Adjusting settings with Waiting Room Analytics

How we built Waiting Room Analytics

Collection of metrics in a distributed environment

Histograms

Writing data to the pipeline

Sampling

Get Waiting Room Analytics via GraphQL API

Who gets queued is up to you

Built with the Ruleset Engine

Bypass rules in action

Set up an Administrative IP Bypass Rule via the Waiting Room Dashboard

Bypass rules via the Waiting Room API – path bypass example

More than always-on protection

Introducing Waiting Room Event Scheduling

New Queuing Methods

Event Pre-queuing

Creating a waiting room event

Event Scheduling in action

Preparing for the flash sale

Pre-queueing and event overflow queuing

Sale conclusion

First-In-First-Out (FIFO) Queueing

A New Approach to Fairness: Random Queueing

How Does Random Queueing Work?

How “Random” is Random Queueing?

How Is Estimated Time Calculated for Random Queueing?

Supporting Dynamic Applications with a JSON Response

Embedding a Waiting Room in a Webpage: SameSite Cookies and IFrames

Looking Forward

What does the Waiting Room do?

Configuring Waiting Rooms

How does the waiting room work?

Clustering

What is in the cookie returned to an end user?

Waiting Room State

Estimated wait time

Implementation with Workers and Durable Objects

Idea #1: Workers KV

Idea #2: Centralized Database

The Scalable Solution: Durable Objects

Worker User Slots

Cache

Global Durable Object

Recap

Conclusion

Wait, you’re excited about a… Waiting Room?

Why Now?

How does Cloudflare’s Waiting Room work?

How to setup a Waiting Room

The collective thoughts of the interwebz