Tag Archives: Customers

Cloudflare’s tenant platform in action: Meter deploys DNS filtering at scale

Post Syndicated from Mythili Prabhu original http://blog.cloudflare.com/gateway-managed-service-provider-meter/

Cloudflare’s tenant platform in action: Meter deploys DNS filtering at scale

In January 2023, we announced support for Managed Service Providers (MSPs) and other businesses to create 'parent-child' and account-level policy configurations when deploying Cloudflare for DNS filtering. Specifically, organizations leverage the integration between our Tenant API and Cloudflare Gateway, our Secure Web Gateway (SWG) to protect their remote or office end users with web filtering and inspection. Already, customers like the US federal government, MalwareBytes, and a large global ISP take advantage of this integration to enable simpler, more flexible policy management across larger deployments across their end customers

Today, we're excited to showcase another similar story: Meter, a provider of Internet infrastructure, is leveraging the Tenant API integration for DNS filtering to help their clients enforce acceptable Internet use policies.

How Meter deploys Cloudflare to secure Internet browsing

Meter, headquartered in San Francisco and founded in 2015, provides Internet infrastructure that includes routing, switching, wireless, and applications. They help deliver faster, more efficient, more secure networking experiences for a diverse range of corporate spaces, including offices, warehouses, retail, manufacturing, biotech, and education institutions.

Meter integrates with the Cloudflare Tenant API to provide DNS filtering to their customers. With the Meter dashboard, Meter customers can set policies to block or allow Internet traffic to domains, categorized by security risks (phishing, malware, DGA, etc.) or content theme (adult, gambling, shopping, etc.)

Cloudflare’s tenant platform in action: Meter deploys DNS filtering at scale

Across this customer base, having parent-child relationships in security policies is often critical. For example, specific schools within an overall district may have different policies about what Internet browsing is or is not acceptable.

Cloudflare’s parent-child configurability means that Meter administrators are equipped to set differential, granular policies for specific offices, retail locations, or warehouses (‘child accounts’) within a larger business (‘parent account’). DNS queries are first filtered against parent account policies before filtering against more specific child account policies.

At a more technical level, each “child” customer account can have its own users and tokens to manage accounts. Customers of Meter can set up their DNS endpoints via Gateway locations and may be defined as IPv4, IPv6, DoH, and DoT endpoints. DNS policies can be defined for these Gateway locations. In addition to this, each customer of Meter can customize their block page and even upload their own certificates to serve their custom block page.

Cloudflare’s tenant platform in action: Meter deploys DNS filtering at scale

What’s next

MSPs and infrastructure companies like Meter play a vital role in bringing cybersecurity solutions to customers of all sizes and needs. Cloudflare will continue to invest in our tenant architecture to equip MSPs with the flexibility and simplicity they need to serve their end customers.

DNS filtering to protect users on the Internet is a valuable solution for MSPs to deliver with Cloudflare. But DNS filtering is just the first of several Zero Trust services that Cloudflare intends to support via our tenant platform, so stay tuned for more.

If you are an MSP or an Infrastructure company looking to deliver Cloudflare security for your end customers, learn more here.

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile

Post Syndicated from Dirk-Jan van Helmond original http://blog.cloudflare.com/how-cloudflare-scaled-and-protected-eurovision-2023-voting/

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile

2023 was the first year that non-participating countries could vote for their favorites during the Eurovision Song Contest, adding millions of additional viewers and voters to an already impressive 162 million tuning in from the participating countries. It became a truly global event with a potential for disruption from multiple sources. To prepare for anything, Cloudflare helped scale and protect the voting application, used by millions of dedicated fans around the world to choose the winner.

In this blog we will cover how once.net built their platform based.io to monitor, manage and scale the Eurovision voting application to handle all traffic using many Cloudflare services. The speed with which DNS changes made through the Cloudflare API propagate globally allowed them to scale their backend within seconds. At the same time, Cloudflare Pages was ready to serve any amount of traffic to the voting landing page so fans didn’t miss a beat. And to cap it off, by combining Cloudflare CDN, DDoS protection, WAF, and Turnstile, they made sure that attackers didn’t steal any of the limelight.

The unsung heroes

Based.io is a resilient live data platform built by the once.net team, with the capability to scale up to 400 million concurrent connected users. It’s built from the ground up for speed and performance, consisting of an observable real time graph database, networking layer, cloud functions, analytics and infrastructure orchestration. Since all system information, traffic analysis and disruptions are monitored in real time, it makes the platform instantly responsive to variable demand, which enables real time scaling of your infrastructure during spikes, outages and attacks.

Although the based.io platform on its own is currently in closed beta, it is already serving a few flagship customers in production assisted by the software and services of the once.net team. One such customer is Tally, a platform used by multiple broadcasters in Europe to add live interaction to traditional television. Over 100 live shows have been performed using the platform. Another is Airhub, a startup that handles and logs automatic drone flights. And of course the star of this blog post, the Eurovision Song Contest.

Setting the stage

The Eurovision Song Contest is one of the world’s most popular broadcasted contests, and this year it reached 162 million people across 38 broadcasting countries. In addition, on TikTok the three live shows were viewed 4.8 million times, while 7.6 million people watched the Grand Final live on YouTube. With such an audience, it is no surprise that Cloudflare sees the impact of it on the Internet. Last year, we wrote a blog post where we showed lower than average traffic during, and higher than average traffic after the grand final. This year, the traffic from participating countries showed an even more remarkable surge:

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile
HTTP Requests per Second from Norway, with a similar pattern visible in countries such as the UK, Sweden and France. Internet traffic spiked at 21:20 UTC, when voting started.

Such large amounts of traffic are nothing new to the Eurovision Song Contest. Eurovision has relied on Cloudflare’s services for over a decade now and Cloudflare has helped to protect Eurovision.tv and improve its performance through noticeable faster load time to visitors from all corners of the world. Year after year, the team of Eurovision continued to use our services more, discovering additional features to improve performance and reliability further, with increasingly fine-grained control over their traffic flows. Eurovision.tv uses Page Rules to cache additional content on Cloudflare’s edge, speeding up delivery without sacrificing up-to-the-minute updates during the global event. Finally, to protect their backend and content management system, the team has placed their admin portals behind Cloudflare Zero Trust to delegate responsibilities down to individual levels.

Since then the contest itself has also evolved – sometimes by choice, sometimes by force. During the COVID-19 pandemic in-person cheering became impossible for many people due to a reduced live audience, resulting in the Eurovision Song Contest asking once.net to build a new iOS and Android application in which fans could cheer virtually. The feature was an instant hit, and it was clear that it would become part of this year’s contest as well.

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile
A screenshot of the official Eurovision Song Contest application showing the real-time number of connected fans (1) and allowing them to cheer (2) for their favorites.

In 2023, once.net was also asked to handle the paid voting from the regions where phone and SMS voting was not possible. It was the first time that Eurovision allowed voting online. The challenge that had to be overcome was the extreme peak demand on the platform when the show was live, and especially when the voting window started.

Complicating it further, was the fact that during last year’s show, there had been a large number of targeted and coordinated attacks.

To prepare for these spikes in demand and determined adversaries, once.net needed a platform that isn’t only resilient and highly scalable, but could also act as a mitigation layer in front of it. once.net selected Cloudflare for this functionality and integrated Cloudflare deeply with its real-time monitoring and management platform. To understand how and why, it’s essential to understand based.io underlying architecture.

The based.io platform

Instead of relying on network or HTTP load balancers, based.io uses a client-side service discovery pattern, selecting the most suitable server to connect to and leveraging Cloudflare's fast cache propagation infrastructure to handle spikes in traffic (both malicious and benign).

First, each server continuously registers a unique access key that has an expiration of 15 seconds, which must be used when a client connects to the server. In addition, the backend servers register their health (such as active connections, CPU, memory usage, requests per second, etc.) to the service registry every 300 milliseconds. Clients then request the optimal server URL and associated access key from a central discovery registry and proceed to establish a long lived connection with that server. When a server gets overloaded it will disconnect a certain amount of clients and those clients will go through the discovery process again.

The central discovery registry would normally be a huge bottleneck and attack target. based.io resolves this by putting the registry behind Cloudflare's global network with a cache time of three seconds. Since the system relies on real-time stats to distribute load and uses short lived access keys, it is crucial that the cache updates fast and reliably. This is where Cloudflare’s infrastructure proved its worth, both due to the fast updating cache and reducing load with Tiered Caching.

Not using load balancers means the based.io system allows clients to connect to the backend servers through Cloudflare, resulting in  better performance and a more resilient infrastructure by eliminating the load balancers as potential attack surface. It also results in a better distribution of connections, using the real-time information of server health, amount of active connections, active subscriptions.

Scaling up the platform happens automatically under load by deploying additional machines that can each handle 40,000 connected users. These are spun up in batches of a couple of hundred and as each machine spins up, it reaches out directly to the Cloudflare API to configure its own DNS record and proxy status. Thanks to Cloudflare’s high speed DNS system, these changes are then propagated globally within seconds, resulting in a total machine turn-up time of around three seconds. This means faster discovery of new servers and faster dynamic rebalancing from the clients. And since the voting window of the Eurovision Song Contest is only 45 minutes, with the main peak within minutes after the window opens, every second counts!

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile
High level architecture of the based.io platform used for the 2023 Eurovision Song Contest‌ ‌

To vote, users of the mobile app and viewers globally were pointed to the voting landing page, esc.vote. Building a frontend web application able to handle this kind of an audience is a challenge in itself. Although hosting it yourself and putting a CDN in front seems straightforward, this still requires you to own, configure and manage your origin infrastructure. once.net decided to leverage Cloudflare’s infrastructure directly by hosting the voting landing page on Cloudflare Pages. Deploying was as quick as a commit to their Git repository, and they never had to worry about reachability or scaling of the webpage.

once.net also used Cloudflare Turnstile to protect their payment API endpoints that were used to validate online votes. They used the invisible Turnstile widget to make sure the request was not coming from emulated browsers (e.g. Selenium). And best of all, using the invisible Turnstile widget the user did not have to go through extra steps, which allowed for a better user experience and better conversion.

Cloudflare Pages stealing the show!

After the two semi-finals went according to plan with approximately 200,000 concurrent users during each,May 13 brought the Grand Final. The once.net team made sure that there were enough machines ready to take the initial load, jumped on a call with Cloudflare to monitor and started looking at the number of concurrent users slowly increasing. During the event, there were a few attempts to DDoS the site, which were automatically and instantaneously mitigated without any noticeable impact to any visitors.

The based.io discovery registry server also got some attention. Since the cache TTL was set quite low at five seconds, a high rate of distributed traffic to it could still result in a significant load. Luckily, on its own, the highly optimized based.io server can already handle around 300,000 requests per second. Still, it was great to see that during the event the cache hit ratio for normal traffic was 20%, and during one significant attack the cache hit ratio peaked towards 80%. This showed how easy it is to leverage a combination of Cloudflare CDN and DDoS protection to mitigate such attacks, while still being able to serve dynamic and real time content.

When the curtains finally closed, 1.3 million concurrent users connected to the based.io platform at peak. The based.io platform handled a total of 350 million events and served seven million unique users in three hours. The voting landing page hosted by Cloudflare Pages served 2.3 million requests per second at peak, and made sure that the voting payments were by real human fans using Turnstile. Although the Cloudflare platform didn’t blink for such a flood of traffic, it is no surprise that it shows up as a short crescendo in our traffic statistics:

How we scaled and protected Eurovision 2023 voting with Pages and Turnstile

Get in touch with us

If you’re also working on or with an application that would benefit from Cloudflare’s speed and security, but don’t know where to start, reach out and we’ll work together.

Goodbye, section 2.8 and hello to Cloudflare’s new terms of service

Post Syndicated from Eugene Kim original http://blog.cloudflare.com/updated-tos/

Goodbye, section 2.8 and hello to Cloudflare’s new terms of service

Goodbye, section 2.8 and hello to Cloudflare’s new terms of service

Earlier this year, we blogged about an incident where we mistakenly throttled a customer due to internal confusion about a potential violation of our Terms of Service. That incident highlighted a growing point of confusion for many of our customers. Put simply, our terms had not kept pace with the rapid innovation here at Cloudflare, especially with respect to our Developer Platform. We’re excited to announce new updates that will modernize our terms and cut down on customer confusion and frustration.

We want our terms to set clear expectations about what we’ll deliver and what customers can do with our services. But drafting terms is often an iterative process, and iteration over a decade can lead to bloat, complexity, and vestigial branches in need of pruning. Now, time to break out the shears.

Snip, snip

To really nip this in the bud, we started at the source–the content-based restriction housed in Section 2.8 of our Self-Serve Subscription Agreement:

Goodbye, section 2.8 and hello to Cloudflare’s new terms of service

Cloudflare is much, much more than a CDN, but that wasn’t always the case. The CDN was one of our first services and originally designed to serve HTML content like webpages. User attempts to serve video and other large files hosted outside of Cloudflare were disruptive on many levels. So, years ago, we added Section 2.8 to give Cloudflare the means to preserve the original intent of the CDN: limiting use of the CDN to webpages.

Over time, Cloudflare’s network became larger and more robust and its portfolio broadened to include services like Stream, Images, and R2. These services are explicitly designed to allow customers to serve non-HTML content like video, images, and other large files hosted directly by Cloudflare. And yet, Section 2.8 persisted in our Self-Serve Subscription Agreement–the umbrella terms that apply to all services. We acknowledge that this didn’t make much sense.

To address the problem, we’ve done a few things. First, we moved the content-based restriction concept to a new CDN-specific section in our Service-Specific Terms. We want to be clear that this restriction only applies to use of our CDN. Next, we got rid of the antiquated HTML vs. non-HTML construct, which was far too broad. Finally, we made it clear that customers can serve video and other large files using the CDN so long as that content is hosted by a Cloudflare service like Stream, Images, or R2. This will allow customers to confidently innovate on our Developer Platform while leveraging the speed, security, and reliability of our CDN. Video and large files hosted outside of Cloudflare will still be restricted on our CDN, but we think that our service features, generous free tier, and competitive pricing (including zero egress fees on R2) make for a compelling package for developers that want to access the reach and performance of our network.

Here are a few diagrams to help understand how our terms of service fit together for various use cases.

Customer A is on a free, pro, or business plan and wants to use the CDN service:

Goodbye, section 2.8 and hello to Cloudflare’s new terms of service

Customer B is on a free, pro, or business plan and wants to use the Developer Platform and Zero Trust services:

Goodbye, section 2.8 and hello to Cloudflare’s new terms of service

Customer C is on a free, pro, or business plan and wants to use Stream with the CDN service and Magic Transit with the CDN service:

Goodbye, section 2.8 and hello to Cloudflare’s new terms of service

Quality of life upgrades

We also took this opportunity to tune up other aspects of our Terms of Service to make for a more user-first experience. For example, we streamlined our Self-Serve Subscription Agreement to make it clearer and easier to understand from the start.

We also heard previous complaints and removed an old restriction on benchmarking–we’re confident in the performance of our network and services, unlike some of our competitors. Last but not least, we renamed the Supplemental Terms to the Service-Specific Terms and gave them a major facelift to improve clarity and usability.

Goodbye, section 2.8 and hello to Cloudflare’s new terms of service

Users first

We’ve learned a lot from our users throughout this process, and we are always grateful for your feedback. Our terms were never meant to act as a gating mechanism that stifled innovation. With these updates, we hope that customers will feel confident in building the next generation of apps and services on Cloudflare. And we’ll keep the shears handy as we continue to work to help build a better Internet.

How Cloudflare erroneously throttled a customer’s web traffic

Post Syndicated from Jeremy Hartman original https://blog.cloudflare.com/how-cloudflare-erroneously-throttled-a-customers-web-traffic/

How Cloudflare erroneously throttled a customer’s web traffic

How Cloudflare erroneously throttled a customer’s web traffic

Over the years when Cloudflare has had an outage that affected our customers we have very quickly blogged about what happened, why, and what we are doing to address the causes of the outage. Today’s post is a little different. It’s about a single customer’s website not working correctly because of incorrect action taken by Cloudflare.

Although the customer was not in any way banned from Cloudflare, or lost access to their account, their website didn’t work. And it didn’t work because Cloudflare applied a bandwidth throttle between us and their origin server. The effect was that the website was unusable.

Because of this unusual throttle there was some internal confusion for our customer support team about what had happened. They, incorrectly, believed that the customer had been limited because of a breach of section 2.8 of our Self-Serve Subscription Agreement which prohibits use of our self-service CDN to serve excessive non-HTML content, such as images and video, without a paid plan that includes those services (this is, for example, designed to prevent someone building an image-hosting service on Cloudflare and consuming a huge amount of bandwidth; for that sort of use case we have paid image and video plans).

However, this customer wasn’t breaking section 2.8, and they were both a paying customer and a paying customer of Cloudflare Workers through which the throttled traffic was passing. This throttle should not have happened. In addition, there is and was no need for the customer to upgrade to some other plan level.

This incident has set off a number of workstreams inside Cloudflare to ensure better communication between teams, prevent such an incident happening, and to ensure that communications between Cloudflare and our customers are much clearer.

Before we explain our own mistake and how it came to be, we’d like to apologize to the customer. We realize the serious impact this had, and how we fell short of expectations. In this blog post, we want to explain what happened, and more importantly what we’re going to change to make sure it does not happen again.

Background

On February 2, an on-call network engineer received an alert for a congesting interface with Equinix IX in our Ashburn data center. While this is not an unusual alert, this one stood for two reasons. First, it was the second day in a row that it happened, and second, the congestion was due to a sudden and extreme spike of traffic.

How Cloudflare erroneously throttled a customer’s web traffic

The engineer in charge identified the customer’s domain as being responsible for this sudden spike of traffic between Cloudflare and their origin network, a storage provider. Because this congestion happens on a physical interface connected to external peers, there was an immediate impact to many of our customers and peers. A port congestion like this one typically incurs packet loss, slow throughput and higher than usual latency. While we have automatic mitigation in place for congesting interfaces, in this case the mitigation was unable to resolve the impact completely.

The traffic from this customer went suddenly from an average of 1,500 requests per second, and a 0.5 MB payload per request, to 3,000 requests per second (2x) and more than 12 MB payload per request (25x).

How Cloudflare erroneously throttled a customer’s web traffic

The congestion happened between Cloudflare and the origin network. Caching did not happen because the requests were all unique URLs going to the origin, and therefore we had no ability to serve from cache.

A Cloudflare engineer decided to apply a throttling mechanism to prevent the zone from pulling so much traffic from their origin. Let’s be very clear on this action: Cloudflare does not have an established process to throttle customers that consume large amounts of bandwidth, and does not intend to have one. This remediation was a mistake, it was not sanctioned, and we deeply regret it.

We lifted the throttle through internal escalation 12 hours and 53 minutes after having set it up.

What’s next

To make sure a similar incident does not happen, we are establishing clear rules to mitigate issues like this one. Any action taken against a customer domain, paying or not, will require multiple levels of approval and clear communication to the customer. Our tooling will be improved to reflect this. We have many ways of traffic shaping in situations where a huge spike of traffic affects a link and could have applied a different mitigation in this instance.

We are in the process of rewriting our terms of service to better reflect the type of services that our customers deliver on our platform today. We are also committed to explaining to our users in plain language what is permitted under self-service plans. As a developer-first company with transparency as one of its core principles, we know we can do better here. We will follow up with a blog post dedicated to these changes later.

Once again, we apologize to the customer for this action and for the confusion it created for other Cloudflare customers.