Incorrect proxying of 24 hostnames on January 24, 2022

Post Syndicated from Jeremy Hartman original https://blog.cloudflare.com/incorrect-proxying-of-24-hostnames-on-january-24-2022/

Incorrect proxying of 24 hostnames on January 24, 2022

On January 24, 2022, as a result of an internal Cloudflare product migration, 24 hostnames (including www.cloudflare.com) that were actively proxied through the Cloudflare global network were mistakenly redirected to the wrong origin. During this incident, traffic destined for these hostnames was passed through to the clickfunnels.com origin and may have resulted in a clickfunnels.com page being displayed instead of the intended website. This was our doing and clickfunnels.com was unaware of our error until traffic started to reach their origin.

API calls or other expected responses to and from these hostnames may not have responded properly, or may have failed completely. For example, if you were making an API call to api.example.com, and api.example.com was an impacted hostname, you likely would not have received the response you would have expected.

Here is what happened:

At 2022-01-24 22:24 UTC we started a migration of hundreds of thousands of custom hostnames to the Cloudflare for SaaS product. Cloudflare for SaaS allows SaaS providers to manage their customers’ websites and SSL certificates at scale – more information is available here. This migration was intended to be completely seamless, with the outcome being enhanced features and security for our customers. The migration process was designed to read the custom hostname configuration from a database and migrate it from SaaS v1 (the old system) to SaaS v2 (the current version) automatically.

To better understand what happened next, it’s important to explain a bit more about how custom hostnames are configured.

First, Cloudflare for SaaS customers can configure any hostname; but before we will proxy traffic to them, they must prove (via DNS validation) that they actually are allowed to handle that hostname’s traffic.

When the Cloudflare for SaaS customer first configures the hostname, it is marked as pending until DNS validation has occurred. Pending hostnames are very common for Cloudflare for SaaS customers as the hostname gets provisioned, and then the SaaS provider will typically work with their customer to put in place the appropriate DNS validation that proves ownership.

Once a hostname passes DNS validation, it moves from a pending state to an active state and can be proxied. Except in one case: there’s a special check for whether the hostname is marked as blocked within Cloudflare’s system. A blocked hostname is one that can’t be activated without manual approval by our Trust & Safety team. Some scenarios that could lead to a hostname being blocked include when the hostname is a Cloudflare-owned property, a well known brand, or a hostname in need of additional review for a variety of reasons.

During this incident, a very small number of blocked hostnames were erroneously moved to the active state while migrating clickfunnels.com’s customers. Once that occurred, traffic destined for those previously blocked hostnames was then processed by a configuration belonging to clickfunnels.com, sending traffic to the clickfunnels.com’s origin. One of those hostnames was www.cloudflare.com. Note that it was www.cloudflare.com and not cloudflare.com, so subdomains like dash.cloudflare.com, api.cloudflare.com, cdnjs.cloudflare.com, and so on were unaffected by this problem.

As the migration process continued down the list of hostnames, additional traffic was re-routed to the clickfunnels.com origin. At 23:06 UTC www.cloudflare.com was affected. At 23:15 UTC an incident was declared internally. Since the first alert we received was for www.cloudflare.com, we started our investigation there. In the next 19 minutes, the team restored www.cloudflare.com to its correct origin, determined the breadth of the impact and the root cause of the incident, and began remediation for the remaining affected hostnames.

By 2022-01-25 00:13 UTC, all custom hostnames had been restored to their proper configuration and the incident was closed. We have contacted all the customers who were affected by this error. We have worked with ClickFunnels to delete logs of this event to ensure that no data erroneously sent to the clickfunnels.com’s origin is retained by them and are very grateful for their speedy assistance.

Here is a graph (on a log scale) of requests to clickfunnels.com during the event. Out of a total of 268,430,157 requests redirected, 268,220,296 (99.92%) were for www.cloudflare.com:

Incorrect proxying of 24 hostnames on January 24, 2022

At Cloudflare, we take these types of incidents very seriously, dedicating massive amounts of resources in preventative action and in follow-up engineering. In this case, there are both procedural and technical follow-ups to prevent reoccurrence. Here are our next steps:

  • No more blocked hostname overrides. All blocked hostname changes will route through our verification pipeline as part of the migration process.
  • All migrations will require explicit validation and approval from SaaS customers for a blocked hostname to be considered for activation.
  • Additional monitoring will be added to the hostnames being migrated to spot potential erroneous traffic patterns and alert the migration team.
  • Additional monitoring added for www.cloudflare.com.
  • Stage hostname activations on non-production elements prior to promoting to production will enable the ability to verify the new hostname state is expected. This will allow us to catch issues before they hit production traffic.

Conclusion

This event exposed previously unknown gaps in our process and technology that directly impacted our customers. We are truly sorry for the disruption to our customers and any potential visitor to the impacted properties. Our commitment is to provide fully reliable and secure products, and we will continue to make every effort possible to deliver just that for our customers and partners.