Proof of Stake and our next experiments in web3

Post Syndicated from Wesley Evans original https://blog.cloudflare.com/next-gen-web3-network/

Proof of Stake and our next experiments in web3

Proof of Stake and our next experiments in web3

A little under four years ago we announced Cloudflare’s first experiments in web3 with our gateway to the InterPlanetary File System (IPFS). Three years ago we announced our experimental Ethereum Gateway. At Cloudflare, we often take experimental bets on the future of the Internet to help new technologies gain maturity and stability and for us to gain deep understanding of them.

Four years after our initial experiments in web3, it’s time to launch our next series of experiments to help advance the state of the art. These experiments are focused on finding new ways to help solve the scale and environmental challenges that face blockchain technologies today. Over the last two years there has been a rapid increase in the use of the underlying technologies that power web3. This growth has been a catalyst for a generation of startups; and yet it has also had negative externalities.

At Cloudflare, we are committed to helping to build a better Internet. That commitment balances technology and impact. The impact of certain older blockchain technologies on the environment has been challenging. The Proof of Work (PoW) consensus systems that secure many blockchains were instrumental in the bootstrapping of the web3 ecosystem. However, these systems do not scale well with the usage rates we see today. Proof of Work networks rely on complex cryptographic functions to create consensus and prevent attacks. Every node on the network is in a race to find a solution to this cryptographic problem. This cryptographic function is called a hash function. A hash function takes a number in x, and gives a number out y. If you know x, it’s easy to compute y. However, given y, it’s prohibitively costly to find a matching x. Miners have to find an x that would output a y with certain properties, such as 10 leading zero. Even with advanced chips, this is costly and heavily based on chance.

Proof of Stake and our next experiments in web3

While that process is incredibly secure because of the vast amount of hashing that occurs, Proof of Work networks are wasteful. This waste is driven by the fact that Proof of Work consensus mechanisms are electricity-intensive. In a PoW ecosystem, energy consumption scales directly with usage. So, for example, when web3 took off in 2020, the Bitcoin network started to consume the same amount of electricity as Switzerland (according to Cambridge Bitcoin Electricity Consumption Index).

This inefficiency in resource use is by design to make it difficult for any single actor to threaten the security of the blockchain, and for the majority of the last decade, while use was low, this inefficiency was an acceptable trade off. The other major trade off accepted with the security of Proof of Work has been the number of transactions the network can handle per second. For instance, the Ethereum network can currently process about 30 transactions per second. This ultra-low transaction rate was acceptable when the technology was in development, but does not scale to meet the current demand.

As recently as two years ago, web3 was an up-and-coming ecosystem with minimal traffic compared to the majority of the traffic on the Internet. The last two years changed the web3 use landscape. Traffic to the Bitcoin and the Ethereum networks has exploded. Technologies like Smart Contracts that power DeFi, NFTs, and DAOs have started to become commonplace in the developer community (and in the Twitter lexicon). The next generation of blockchains like Flow, Avalanche, Solana, as well as Polygon are being developed and adopted by major companies like Meta.

In order to balance our commitment to the environment and to help build a better Internet, it is clear to us that we should launch new experiments to help find a path forward. A path that balances the clear need to drastically reduce the energy consumption of web3 technologies and the capability to scale the web3 networks by orders of magnitude if the need arises to support increased user adoption over the next five years. How do we do that? We believe that the next generation of consensus protocols is likely to be based on Proof of Stake and Proof of Spacetime consensus mechanisms.

Proof of Stake and our next experiments in web3

Today, we are excited to announce the start of our experiments for the next generation of web3 networks that are embracing Proof of Stake; beginning with Ethereum. Over the next few months, Cloudflare will launch, and fully stake, Ethereum validator nodes on the Cloudflare global network as the community approaches its transition from Proof of Work to Proof of Stake with “The Merge.” These nodes will serve as a testing ground for research on energy efficiency, consistency management, and network speed.

There is a lot in that paragraph so let’s break it down! The short version though is that Cloudflare is going to participate in the research and development of the core infrastructure that helps keep Ethereum secure, fast, as well as energy efficient for everyone.

Proof of Stake is a next-generation consensus protocol to secure blockchains. Unlike Proof of Work that relies on miners racing each other with increasingly complex cryptography to mine a block, Proof of Stake secures new transactions to the network through self-interest. Validator’s nodes (people who verify new blocks for the chain) are required to put a significant asset up as collateral in a smart contract to prove that they will act in good faith. For instance, for Ethereum that is 32 ETH. Validator nodes that follow the network’s rules earn rewards; validators that violate the rules will have portions of their stake taken away. Anyone can operate a validator node as long as they meet the stake requirement. This is key. Proof of Stake networks require lots and lots of validators nodes to validate and attest to new transactions. The more participants there are in the network, the harder it is for bad actors to launch a 51% attack to compromise the security of the blockchain.

To add new blocks to the Ethereum chain, once it shifts to Proof of Stake, validators are chosen at random to create new blocks (validate). Once they have a new block ready it is submitted to a committee of 128 other validators who act as attestors. They verify that the transactions included in the block are legitimate and that the block is not malicious. If the block is found to be valid, it is this attestation that is added to the blockchain. Critically, the validation and attestation process is not computationally complex. This reduction in complexity leads to significant energy efficiency improvements.

The energy required to operate a Proof of Stake validator node is magnitudes less than a Proof of Work miner. Early estimates from the Ethereum Foundation estimate that the entire Ethereum network could use as little as 2.6 megawatts of power. Put another way, Ethereum will use 99.5% less energy post merge than today.

We are going to support the development and deployment of Ethereum by running Proof of Stake validator nodes on our network. For the Ethereum ecosystem, running validator nodes on our network allows us to offer even more geographic decentralization in places like EMEA, LATAM, and APJC while also adding infrastructure decentralization to the network. This helps keep the network secure, outage resistant, and closer to the goal of global decentralization for everyone. For us, It’s a commitment to helping research and experiment with scaling the next generation of blockchain networks that could be a key part of the story of the Internet. Running blockchain technology at scale is incredibly difficult, and Proof of Stake systems have their own unique requirements that will take time to understand and improve. Part of this experimentation though is a commitment. As part of our experiments Cloudflare has not and will not run our own Proof of Work infrastructure on our network.

Finally, what is “The Merge”? The Merge is the planned combination of the Ethereum Mainnet (the chain you interact with today when you make an Eth transaction) and the Ethereum Beacon Chain. The Beacon chain is the Proof of Stake blockchain running in parallel with the Ethereum Mainnet. The Merge is much more significant than a consensus update.

Consensus updates have happened multiple times already on Ethereum. This could be because of a vulnerability, or the need to introduce new features. The Merge is different in a way that it cannot be made by a normal upgrade. Previous forks have been enabled at a specific block number. This means that if you mine enough blocks, the new consensus rules apply automatically. With the Merge, there should only be one state root for the fork to happen. The merge state root would be the start of the Ethereum Mainnet PoS journey. It is chosen and set at a certain difficulty, instead of block height, to avoid merging a high block with low difficulty.

Once The Merge occurs later this year in 2022, Ethereum will be a full Proof of Stake ecosystem that no longer relies on Proof of Work.

This is just the start of our commitment to help build the next generation of web3 networks. We are excited to work with our partners across the cryptography, web3, and infrastructure communities to help create the next generation of blockchain ecosystems that are environmentally sustainable, secure, and blazing fast.

Public access for our Ethereum and IPFS gateways now available

Post Syndicated from Wesley Evans original https://blog.cloudflare.com/ea-web3-gateways/

Public access for our Ethereum and IPFS gateways now available

Public access for our Ethereum and IPFS gateways now available

Today we are excited to announce that our Ethereum and IPFS gateways are publicly available to all Cloudflare customers for the first time. Since our announcement of our private beta last September the interest in our Eth and IPFS gateways has been overwhelming. We are humbled by the demand for these tools, and we are excited to get them into as many developers’ hands as possible. Starting today, any Cloudflare customer can log into the dashboard and configure a zone for Ethereum, IPFS, or both!

Over the last eight months of the private beta, we’ve been busy working to fully operationalize the gateways to ensure they meet the needs of our customers!

First, we have created a new API with end-to-end managed hostname deployment. This ensures the creation and management of gateways as you continue to scale remains extremely quick and easy! It is paramount to give time and focus back to developers to focus on your core product and services and leave the infrastructural components to us!

Second, we’ve added a brand new UI bringing web3 to Cloudflare’s zone-level dashboard. Now, regardless of the workflow you are used to, we have parity between our UI and API to ensure we fit into your existing processes and no time is wasted internally to have to figure out ‘how we integrate’, but rather, a quick setup and start to serve content or connect your services!

Third, we are pleased to say that you will soon have testnet support to ensure your new development can be easily tested, hardened, and deployed to your mainnet without incurring additional risk to your brand trust, product availability, or concern that something may fail silently and begin a cascade of problems throughout your network. We want to ensure that anyone leveraging our web3 gateways is able to achieve more confidence and trust that whatever changes are pushed forward do not impact end user experience. At the end of the day, the Internet is for end users and their experience and perception must always be kept within our purview at all times.

Lastly, Cloudflare loves to build on top of Cloudflare. This helps us stay resilient and also shows our commitment and belief in all the products we create! We have always used our SSL for SaaS and Workers products in the background. Building on our own services gives our customers the ability to define and control their own HTTP features on top of traffic destined for web3 gateways, including: rate limits, WAF rules, custom security filters, serving video, customer defined Workers logic, custom redirects and more!

Public access for our Ethereum and IPFS gateways now available

Today thousands of different individuals, companies, and DAO’s are building new products leveraging our web3 gateways — the most reliable web3 infrastructure with the largest network

Here are just a few snippets of how people are already using our web3 Gateways, and we can’t wait to see what you build on them:

  • DeFi DAO’s use the Cloudflare IPFS gateway to serve their front end web applications globally without latency or cache penalties.
  • NFT designers use the Ethereum Gateway to effortlessly drop new offerings, and the IPFS gateway to store them in a fully decentralized system.
  • Large Dapp developers trust us to handle huge traffic spikes quickly and efficiently, without rate limits or overage caps. They combine all our offerings into a single pane of glass so that they don’t have to juggle multiple systems.

As part of this announcement, we will begin migrating our existing users away from the legacy gateway endpoints and onto our new API, which is easier, highly managed, and more robust. To ensure a smooth transition, you will first need to make sure you have signed up for a Cloudflare account if you did not already have one. On top of that, we have made sure to keep our free users in mind and thus our free users will continue to use the gateways at no cost with our free tier option! This includes no cap in the amount of traffic that can be pushed through our gateways along with offering the most transparent and forecastable pricing models in the market today. We are very excited about the future and look forward to sharing the next iterations of web3 at Cloudflare!

Also, if you can’t wait to start building on our gateways, check out our product documentation for more guidance.

Serving Cloudflare Pages sites to the IPFS network

Post Syndicated from Thibault Meunier original https://blog.cloudflare.com/cloudflare-pages-on-ipfs/

Serving Cloudflare Pages sites to the IPFS network

Serving Cloudflare Pages sites to the IPFS network

Four years ago, Cloudflare went Interplanetary by offering a gateway to the IPFS network. This meant that if you hosted content on IPFS, we offered to make it available to every user of the Internet through HTTPS and with Cloudflare protection. IPFS allows you to choose a storage provider you are comfortable with, while providing a standard interface for Cloudflare to serve this data.

Since then, businesses have new tools to streamline web development. Cloudflare Workers, Pages, and R2 are enabling developers to bring services online in a matter of minutes, with built-in scaling, security, and analytics.

Today, we’re announcing we’re bridging the two. We will make it possible for our customers to serve their sites on the IPFS network.

In this post, we’ll learn how you will be able to build your website with Cloudflare Pages, and leverage the IPFS integration to make your content accessible and available across multiple providers.

A primer on IPFS

The InterPlanetary FileSystem (IPFS) is a peer-to-peer network for storing content on a distributed file system. It is composed of a set of computers called nodes that store and relay content using a common addressing system. In short, a set of participants agree to maintain a shared index of content the network can provide, and where to find it.

Let’s take two random participants in the network: Alice, a cat person, and Bob, a dog person.

As a participant in the network, Alice keeps connections with a subset of participants, referred to as her peers. When Alice is making her content 🐱 available on IPFS, it means she announces to her peers she has content 🐱, and these peers stored in their routing table 🐱 is provided by Alice’s node.

Serving Cloudflare Pages sites to the IPFS network

Each node has a routing table, and a datastore. The routing table retains a mapping of content to peers, and the datastore retains the content a given node stores. In the above case, only Alice has content, a 🐱.

Serving Cloudflare Pages sites to the IPFS network

When Bob wants to retrieve 🐱, he tells his peers they want 🐱. These peers point him to Alice. Alice then provides 🐱 to Bob. Bob can verify 🐱 is the content they were looking for, because the content identifier he requested is derived from the 🐱 content itself, using a secure, cryptographic hash function.

Serving Cloudflare Pages sites to the IPFS network

Even better, if Bob becomes a cat person, they can announce to their peers they are also providing a cat. Bob’s love for cats could be genuine, or because they have interest in providing the content, such as a contract with Alice. IPFS provides a common ground to share content, without being opinionated on how this content has to be stored and its guarantees.

How Pages websites are made available on IPFS

Content is made available as follows.

Serving Cloudflare Pages sites to the IPFS network

The components are:

  • Pages storage: Storage solution for Cloudflare Pages.
  • IPFS Index Proxy: Service maintaining a map between IPFS CID and location of the data. This is operated on Cloudflare Workers and using Workers KV to store the mapping.
  • IPFS node: Cloudflare-hosted IPFS node serving Pages content. It has an in-house datastore module, able to communicate with the IPFS Index Proxy.
  • IPFS network: The rest of the IPFS network.

When you opt in serving your Cloudflare Page on IPFS, a call is made to the IPFS index proxy. This call first fetches your Pages content, transforms it into a CID, then both populates IndexDB to associate the CID with the content and reaches out to Cloudflare IPFS node to tell them they are able to provide the CID.

For example, imagine your website structure is as follows:

  • /
    • index.html
    • static/
      • cats.txt
      • beautiful_cats.txt

To provide this website on IPFS, Cloudflare has to compute a CID for /. CIDs are built recursively. To compute the CID for a given folder /, one needs to have the CID of index.html and static/. index.html CID is derived from its binary representation, and static/ from cats.txt and beautiful_cats.txt.

Serving Cloudflare Pages sites to the IPFS network

This works similarly to a Merkle tree, except nodes can reference each other as long as they still form a Directed Acyclic Graph (DAG). This structure is therefore referred to as a MerkleDAG.

In our example, it’s not unlikely for cats.txt and beautiful_cats.txt to have data in common. In this case, Cloudflare can be smart in the way it builds the MerkleDAG.

Serving Cloudflare Pages sites to the IPFS network

This reduces the storage and bandwidth requirement when accessing the website on IPFS.

If you want to learn more about how you can model a file system on IPFS, you can check the UnixFS specification.

Cloudflare stores every CID and the content it references in IndexDB. This allows Cloudflare IPFS nodes to serve Cloudflare Pages assets when requested.

Let’s put this together

Let’s take an example: pages-on-ipfs.com is hosted on IPFS. It is built using Hugo, a static site generator, and Cloudflare Pages with the public documentation template. Its source is available on GitHub. If you have an IPFS compatible client, you can access it at ipns://pages-on-ipfs.com as well.

1. Read Cloudflare Pages deployment documentation

For the purpose of this blog, I want to create a simple Cloudflare page website. I have experience with Hugo, so I choose it as my framework for the project.

I type “cloudflare pages” in the search bar of my web browser, then head to the Read the docs > Framework Guide > Deploy a Hugo site.

2. Create a site

This is where I use Hugo, and your configuration might differ. In short, I use hugo new site pages-on-ipfs, create an index and two static resources, et voilà. The result is available on the source GitHub for this project.

3. Deploy using Cloudflare Pages

On the Cloudflare Dashboard, I go to Account Home > Pages > Create a project. I select the GitHub repository I created, and configure the build tool to build Hugo website. Basically, I follow what’s written on Cloudflare Pages documentation.

Upon clicking Save and Deploy, my website is deployed on pages-on-ipfs.pages.dev. I also configure it to be available at pages-on-ipfs.com

4. Serve my content on IPFS

First, I opt in my zone on Cloudflare Pages integration with IPFS. This feature is not available yet for everyone to try out.

This allows Cloudflare to index the content of my website. Once indexed, I get the CID for my deployment baf…1. I can check that my content is available at this hash on IPFS using an IPFS gateway https://baf…1.ipfs.cf-ipfs.com/.

5. Make my IPFS website available at pages-on-ipfs.com

Having one domain name to access both Cloudflare Pages and IPFS version, depending on if the client supports IPFS or not is ideal. Fortunately, the IPFS ecosystem supports such a feature via DNSLink. DNSLink is a way to specify the IPFS content a domain is associated with.

For pages-on-ipfs.com, I create a TXT record on _dnslink.pages-on-ipfs.com with value dnslink=/ipfs/baf…1. Et voilà. I can now access ipns://pages-on-ipfs.com via an IPFS client.

6. (Optional) Replicate my website

The content of my website can now easily be replicated and pinned by other IPFS nodes. This can either be done at home via an IPFS client or using a pinning service such as Pinata.

What’s next

We’ll make this service available later this year as it is being refined. We are committed to make information move freely and help build a better Internet. Cloudflare Pages work of solving developer problems continues, as developers are now able to make their site accessible to more users.

Over the years, IPFS has been used by more and more people. While Cloudflare started by making IPFS content available to web users through an HTTP interface, we now think it’s time to give back. Allowing Cloudflare assets to be served over a public distributed network extends developers and users capability on an open web.

Common questions

  • I am already hosting my website on IPFS. Can I pin it using Cloudflare?
    • No. This project aims at serving Cloudflare hosted content via IPFS. We are still investigating how to best replicate and re-provide already-existing IPFS content via Cloudflare infrastructure.
  • Does this make IPFS more centralized?
    • No. Cloudflare does not have the authority to decide who can be a node operator nor what content they provide.
  • Are there guarantees the content is going to be available?
    • Yes. As long as you choose Cloudflare to host your website on IPFS, it will be available on IPFS. Should you move to another provider, it would be your responsibility to make sure the content remains available. IPFS allows for this transition to be smooth using a pinning service.
  • Is IPFS private?
    • It depends. Generally, no. IPFS is a p2p protocol. The nodes you peer with and request content from would know the content you are looking for. The privacy depends on the trust you have in your peers to not snoop on the data you request.
  • Can users verify the integrity of my website?
    • Yes. They need to access your website through an IPFS compatible client. Ideally, IPFS content integrity is turned into a web standard, similar to subresource integrity.

Gaining visibility in IPFS systems

Post Syndicated from Pop Chunhapanya original https://blog.cloudflare.com/ipfs-measurements/

Gaining visibility in IPFS systems

Gaining visibility in IPFS systems

We have been operating an IPFS gateway for the last four years. It started as a research experiment in 2018, providing end-to-end integrity with IPFS. A year later, we made IPFS content faster to fetch. Last year, we announced we were committed to making IPFS gateway part of our product offering. Through this process, we needed to inform our design decisions to know how our setup performed.

To this end, we’ve developed the IPFS Gateway monitor, an observability tool that runs various IPFS scenarios on a given gateway endpoint. In this post, you’ll learn how we use this tool and go over discoveries we made along the way.

Refresher on IPFS

IPFS is a distributed system for storing and accessing files, websites, applications, and data. It’s different from a traditional centralized file system in that IPFS is completely distributed. Any participant can join and leave at any time without the loss of overall performance.

However, in order to access any file in IPFS, users cannot just use web browsers. They need to run an IPFS node to access the file from IPFS using its own protocol. IPFS Gateways play the role of enabling users to do this using only web browsers.

Cloudflare provides an IPFS gateway at https://cloudflare-ipfs.com, so anyone can just access IPFS files by using the gateway URL in their browsers.

As IPFS and the Cloudflare IPFS Gateway have become more and more popular, we need a way to know how performant it is: how much time it takes to retrieve IPFS-hosted content and how reliable it is. However, IPFS gateways are not like normal websites which only receive HTTP requests and return HTTP responses. The gateways need to run IPFS nodes internally and sometimes do content routing and peer routing to find the nodes which provide IPFS contents. They sometimes also need to resolve IPNS names to discover the contents. So, in order to measure the performance, we need to do measurements many times for many scenarios.

Enter the IPFS Gateway monitor

IPFS Gateway monitor is this tool. It allows anyone to check the performance of their gateway and export it to the Prometheus monitoring system.

This monitor is composed of three independent binaries:

  1. ipfs-gw-measure is the tool that calls the gateway URL and does the measurement scenarios.
  2. ipfs-gw-monitor is a way to call the measurement tool multiple times.
  3. Prometheus Exporter exposes prometheus-readable metrics.

To interact with the IPFS network, the codebase also provides a way to start an IPFS node.

A scenario is a set of instructions a user performs given the state of our IPFS system. For instance, we want to know how fast newly uploaded content can be found by the gateway, or if popular content has a low query time. We’ll discuss more of this in the next section.

Putting this together, the system is the following.

Gaining visibility in IPFS systems

During this experience, we have operated both the IPFS Monitor, and a test IPFS node. The IPFS node allows the monitor to provide content to the IPFS network. This allows us to be sure that the content provided is fresh, and actually hosted. Peers have not been fixed, and leverage the IPFS default peer discovery mechanism.

Cloudflare IPFS Gateway is treated as an opaque system. The monitor performs end-to-end tests by contacting the gateway via its public API, either https://cloudflare-ipfs.com or via a custom domain registered with the gateway.

The following scenarios have been run on consumer hardware in March. They are not representative of all IPFS users. All values provided below have been sourced against Cloudflare’s IPFS gateway endpoint.

First scenarios: Gateway to serve immutable IPFS contents

Gaining visibility in IPFS systems

IPFS contents are the most primitive contents being served by the IPFS network and IPFS gateways. By their nature, they are immutable and addressable only by the hash of the content. Users can access the IPFS contents by putting the Content IDentifier (CID) in the URL path of the gateway. For example, ipfs://bafybeig7hluox6xefqdgmwcntvsguxcziw2oeogg2fbvygex2aj6qcfo64. We measure three common scenarios that users will often encounter.

The first one is when users fetch popular content which has a high probability of being found in our cache already. During our experiment, we measured a response time for such content is around 116ms.

The second one is the case where the users create and upload the content to the IPFS network, such as via the integration between Cloudflare Pages and IPFS (LINK TO BLOG). This scenario is a lot slower than the first because the content is not in our cache yet, and it takes some time to discover the content. The content that we upload during this measurement is a random piece of 32-byte content.

The last measurement is when users try to download content that does not exist. This one is the slowest. Because of the nature of content routing of IPFS, there is no indication that tells us that content doesn’t exist. So, setting the timeout is the only way to tell if the content exists or not. Currently, the Cloudflare IPFS gateway has a timeout of around five minutes.

Scenarios Min (s) Max (s) Avg (s)
1 ipfs/newly-created-content 0.276 343 44.4
2 ipfs/in-cache-content 0.0825 0.539 0.116
3 ipfs/unavailable-cid 90 341 279

Second scenarios: Gateway to serve mutable IPNS named contents

Gaining visibility in IPFS systems

Since IPFS contents are immutable, when users want to change the content, the only way to do so is to create new content and distribute the new CID to other users. Sometimes distributing the new CID is hard, and is out of scope of IPFS. The InterPlanetary Naming System (IPNS) tries to solve this. IPNS is a naming system that — instead of locating the content using the CID — allows users to locate the content using the IPNS name instead. This name is a hash of a user’s public key. Internally, IPNS maintains a section of the IPFS DHT which maps from a name to a CID. Therefore, when the users want to download the contents using names through the gateway, the gateway has to first resolve the name to get the CID, then use the CID to query the IPFS content as usual.

The example for fetching the IPNS named content is at ipns://k51qzi5uqu5diu2krtwp4jbt9u824cid3a4gbdybhgoekmcz4zhd5ivntan5ey

We measured many scenarios for IPNS as shown in the table below. Three scenarios are similar to the ones involving IPFS contents. There are two more scenarios added: the first one is measuring the response time when users query the gateway using an existing IPNS name, and the second one is measuring the response time when users query the gateway immediately after new content is published under the name.

Scenarios Min (s) Max (s) Avg (s)
1 ipns/newly-created-name 5.50 110 33.7
2 ipns/existing-name 7.19 113 28.0
3 ipns/republished-name 5.62 80.4 43.8
4 ipns/in-cache-content 0.0353 0.0886 0.0503
5 ipns/unavailable-name 60.0 146 81.0

Even though users can use IPNS to provide others a stable address to fetch the content, the address can still be hard to remember. This is what DNSLink is for. Users can address their content using DNSLink, which is just a normal domain name in the Domain Name System (DNS). As a domain owner, you only have to create a TXT record with the value dnslink=/ipfs/baf…1, and your domain can be fetched from a gateway.

There are two ways to access the DNSLink websites using the gateway. The first way is to access the website using the domain name as a URL hostname, for example, https://ipfs.example.com. The second way is to put the domain name as a URL path, for example, https://cloudflare-ipfs.com/ipns/ipfs.example.com.

Scenarios Min (s) Max (s) Avg (s)
1 dnslink/ipfs-domain-as-url-hostname 0.251 18.6 0.831
2 dnslink/ipfs-domain-as-url-path 0.148 1.70 0.346
3 dnslink/ipns-domain-as-url-hostname 7.87 44.2 21.0
4 dnslink/ipns-domain-as-url-path 6.87 72.6 19.0

What does this mean for regular IPFS users?

These measurements highlight that IPFS content is fetched best when cached. This is an order of magnitude faster than fetching new content. This result is expected as content publication on IPFS can take time for nodes to retrieve, as highlighted in previous studies. Then, when it comes to naming IPFS resources, leveraging DNSLink appears to be the faster strategy. This is likely due to the poor connection of the IPFS node used in this setup, preventing IPNS name from propagating rapidly. Overall, IPNS names would benefit from using a resolver to speed up resolution without losing the trust they provide.

As we mentioned in September, IPFS use has seen important growth. So has our tooling. The IPFS Gateway monitor can be found on GitHub, and we will keep looking at improving this first set of metrics.

At the time of writing, using IPFS via a gateway seems to provide lower retrieval times, while allowing for finer grain control over security settings in the browser context. This configuration preserves the content validity properties offered by IPFS, but reduces the number of nodes a user is peering with to one: the gateway. Ideally, we would like users to peer with Cloudflare because we’re offering the best service, while still having the possibility to retrieve content from external sources if they want to. We’ll be conducting more measurements to better understand how to best leverage Cloudflare presence in 270 cities to better serve the IPFS network.

The NSA Says that There are No Known Flaws in NIST’s Quantum-Resistant Algorithms

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/05/the-nsa-says-that-there-are-no-known-flaws-in-nists-quantum-resistant-algorithms.html

Rob Joyce, the director of cybersecurity at the NSA, said so in an interview:

The NSA already has classified quantum-resistant algorithms of its own that it developed over many years, said Joyce. But it didn’t enter any of its own in the contest. The agency’s mathematicians, however, worked with NIST to support the process, trying to crack the algorithms in order to test their merit.

“Those candidate algorithms that NIST is running the competitions on all appear strong, secure, and what we need for quantum resistance,” Joyce said. “We’ve worked against all of them to make sure they are solid.”

The purpose of the open, public international scrutiny of the separate NIST algorithms is “to build trust and confidence,” he said.

I believe him. This is what the NSA did with NIST’s candidate algorithms for AES and then for SHA-3. NIST’s Post-Quantum Cryptography Standardization Process looks good.

I still worry about the long-term security of the submissions, though. In 2018, in an essay titled “Cryptography After the Aliens Land,” I wrote:

…there is always the possibility that those algorithms will fall to aliens with better quantum techniques. I am less worried about symmetric cryptography, where Grover’s algorithm is basically an upper limit on quantum improvements, than I am about public-key algorithms based on number theory, which feel more fragile. It’s possible that quantum computers will someday break all of them, even those that today are quantum resistant.

It took us a couple of decades to fully understand von Neumann computer architecture. I’m sure it will take years of working with a functional quantum computer to fully understand the limits of that architecture. And some things that we think of as computationally hard today will turn out not to be.

Celebrate Scratch Week with us

Post Syndicated from Joanne Vincent original https://www.raspberrypi.org/blog/scratch-week/

Scratch Week is a global celebration of Scratch that takes place from 15 to 21 May this year. Below, we’ve put together some free resources to help get kids coding with this easy-to-use, block-based programming language. If you’re not sure what Scratch is, check out our introduction video for parents.

""

Visit Scratch Island on Code Club World

Code Club World is a great place to start coding for children who have never done any coding or programming before. The Code Club World online platform lets them begin their coding journey with fun activities, starting by creating their own personal avatar.

The islands on Code Club World.

Then on Scratch Island, kids can code a game to find a hidden bug, design a fun ‘silly eyes’ app, or animate a story. No experience necessary! We’ve just added a parents’ guide to explain how Code Club World works.

Explore Scratch projects 

For kids who feel ready to move beyond the basics of Scratch this Scratch Week, our Projects site offers a catalogue of projects that further enhance kids’ coding skills as they earn badges and explore, design, and invent.

A platform game your kids can code in Scratch with our project path.

With the More Scratch path, they will create six projects to make apps, games, and simulations using message broadcasting, if..then and if..then..else decisions, and variables. Then with the Further Scratch path, they can explore the advanced features of Scratch in another six projects to use boolean logic, functions, and clones while creating apps, games, computer-generated art, and simulations.

Discover young people’s Scratch creations

Be inspired by the amazing things young tech creators worldwide code in Scratch by visiting the Coolest Projects Global 2022 showcase. Young people are showing off Scratch games, stories, art, and more. In our Coolest Projects online gallery, these creations are displayed amongst hundreds of others from around the world — it’s the ideal place to get inspired.

A young coder shows off her tech project for Coolest Projects to two other young tech creators.

Learn something new with our Introduction to Scratch course 

Are you curious about coding too? If you would like to start learning so you can better help young people with their creative projects, our online course Introduction to Programming with Scratch is perfect for you. It’s available on-demand, so you can join at any time and receive four weeks’ free access (select the ‘limited access’ option when you register). This course is a fun, inspiring, and colourful starting point if you have never tried coding before. 

If you’re a parent looking for more coding activities to share with your kids, you can sign up to our parent-focused newsletter.

We hope you enjoy exploring these resources during Scratch Week. 

The post Celebrate Scratch Week with us appeared first on Raspberry Pi.

По-добре късно, отколкото никога

Post Syndicated from original https://bivol.bg/%D0%BF%D0%BE-%D0%B4%D0%BE%D0%B1%D1%80%D0%B5-%D0%BA%D1%8A%D1%81%D0%BD%D0%BE-%D0%BE%D1%82%D0%BA%D0%BE%D0%BB%D0%BA%D0%BE%D1%82%D0%BE-%D0%BD%D0%B8%D0%BA%D0%BE%D0%B3%D0%B0.html

понеделник 16 май 2022


Демократична България излезе с една декларация в Народното събрание. Декларация, в която бяха казани не всички, но някои истини. Не знам колко. Аз не претендирам да знам причините България да…

Can we fix bearer tokens?

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/59704.html

Last month I wrote about how bearer tokens are just awful, and a week later Github announced that someone had managed to exfiltrate bearer tokens from Heroku that gave them access to, well, a lot of Github repositories. This has inevitably resulted in a whole bunch of discussion about a number of things, but people seem to be largely ignoring the fundamental issue that maybe we just shouldn’t have magical blobs that grant you access to basically everything even if you’ve copied them from a legitimate holder to Honest John’s Totally Legitimate API Consumer.

To make it clearer what the problem is here, let’s use an analogy. You have a safety deposit box. To gain access to it, you simply need to be able to open it with a key you were given. Anyone who turns up with the key can open the box and do whatever they want with the contents. Unfortunately, the key is extremely easy to copy – anyone who is able to get hold of your keyring for a moment is in a position to duplicate it, and then they have access to the box. Wouldn’t it be better if something could be done to ensure that whoever showed up with a working key was someone who was actually authorised to have that key?

To achieve that we need some way to verify the identity of the person holding the key. In the physical world we have a range of ways to achieve this, from simply checking whether someone has a piece of ID that associates them with the safety deposit box all the way up to invasive biometric measurements that supposedly verify that they’re definitely the same person. But computers don’t have passports or fingerprints, so we need another way to identify them.

When you open a browser and try to connect to your bank, the bank’s website provides a TLS certificate that lets your browser know that you’re talking to your bank instead of someone pretending to be your bank. The spec allows this to be a bi-directional transaction – you can also prove your identity to the remote website. This is referred to as “mutual TLS”, or mTLS, and a successful mTLS transaction ends up with both ends knowing who they’re talking to, as long as they have a reason to trust the certificate they were presented with.

That’s actually a pretty big constraint! We have a reasonable model for the server – it’s something that’s issued by a trusted third party and it’s tied to the DNS name for the server in question. Clients don’t tend to have stable DNS identity, and that makes the entire thing sort of awkward. But, thankfully, maybe we don’t need to? We don’t need the client to be able to prove its identity to arbitrary third party sites here – we just need the client to be able to prove it’s a legitimate holder of whichever bearer token it’s presenting to that site. And that’s a much easier problem.

Here’s the simple solution – clients generate a TLS cert. This can be self-signed, because all we want to do here is be able to verify whether the machine talking to us is the same one that had a token issued to it. The client contacts a service that’s going to give it a bearer token. The service requests mTLS auth without being picky about the certificate that’s presented. The service embeds a hash of that certificate in the token before handing it back to the client. Whenever the client presents that token to any other service, the service ensures that the mTLS cert the client presented matches the hash in the bearer token. Copy the token without copying the mTLS certificate and the token gets rejected. Hurrah hurrah hats for everyone.

Well except for the obvious problem that if you’re in a position to exfiltrate the bearer tokens you can probably just steal the client certificates and keys as well, and now you can pretend to be the original client and this is not adding much additional security. Fortunately pretty much everything we care about has the ability to store the private half of an asymmetric key in hardware (TPMs on Linux and Windows systems, the Secure Enclave on Macs and iPhones, either a piece of magical hardware or Trustzone on Android) in a way that avoids anyone being able to just steal the key.

How do we know that the key is actually in hardware? Here’s the fun bit – it doesn’t matter. If you’re issuing a bearer token to a system then you’re already asserting that the system is trusted. If the system is lying to you about whether or not the key it’s presenting is hardware-backed then you’ve already lost. If it lied and the system is later compromised then sure all your apes get stolen, but maybe don’t run systems that lie and avoid that situation as a result?

Anyway. This is covered in RFC 8705 so why aren’t we all doing this already? From the client side, the largest generic issue is that TPMs are astonishingly slow in comparison to doing a TLS handshake on the CPU. RSA signing operations on TPMs can take around half a second, which doesn’t sound too bad, except your browser is probably establishing multiple TLS connections to subdomains on the site it’s connecting to and performance is going to tank. Fixing this involves doing whatever’s necessary to convince the browser to pipe everything over a single TLS connection, and that’s just not really where the web is right at the moment. Using EC keys instead helps a lot (~0.1 seconds per signature on modern TPMs), but it’s still going to be a bottleneck.

The other problem, of course, is that ecosystem support for hardware-backed certificates is just awful. Windows lets you stick them into the standard platform certificate store, but the docs for this are hidden in a random PDF in a Github repo. Macs require you to do some weird bridging between the Secure Enclave API and the keychain API. Linux? Well, the standard answer is to do PKCS#11, and I have literally never met anybody who likes PKCS#11 and I have spent a bunch of time in standards meetings with the sort of people you might expect to like PKCS#11 and even they don’t like it. It turns out that loading a bunch of random C bullshit that has strong feelings about function pointers into your security critical process is not necessarily something that is going to improve your quality of life, so instead you should use something like this and just have enough C to bridge to a language that isn’t secretly plotting to kill your pets the moment you turn your back.

And, uh, obviously none of this matters at all unless people actually support it. Github has no support at all for validating the identity of whoever holds a bearer token. Most issuers of bearer tokens have no support for embedding holder identity into the token. This is not good! As of last week, all three of the big cloud providers support virtualised TPMs in their VMs – we should be running CI on systems that can do that, and tying any issued tokens to the VMs that are supposed to be making use of them.

So sure this isn’t trivial. But it’s also not impossible, and making this stuff work would improve the security of, well, everything. We literally have the technology to prevent attacks like Github suffered. What do we have to do to get people to actually start working on implementing that?

comment count unavailable comments

За руската и украинската артилерия… и не само

Post Syndicated from Григор original http://www.gatchev.info/blog/?p=2443

Разказаха ми нещо, което ме замисли.

Във войната в Украйна руската артилерия действа по класическия начин от Втората световна война. Дислоцира се батареята, приготвя се за стрелба и чака заповеди от комбата по какво да стреля. Право на избор къде да се дислоцира, по кого да стреля и т.н. няма – прави се каквото е заповядано. Без заповед само се отбранява, ако я атакуват. (Ако не ѝ е забранено и това.)

Украинската артилерия е пръсната на малки единици – често дори само едно оръдие. Имат (доста голям) район, в който да се дислоцират – те си избират къде точно, кога и къде да се местят и т.н. Централни заповеди почти няма. Вместо тях има приложение за смартфони. Операторите на дронове въвеждат в него координати на забелязаната цел, оръдейците – данни за позицията си, и приложението моментално им дава данни за поставяне на оръдието, за да бъде ударена конкретната цел с висока точност. Стрелят по нея на които оръдия е в обсега и според важността ѝ. Децентрализирано управление на огъня.

Като резултат украинската артилерия често дори успява да надмине руската като ефективност и резултатност. Въпреки че е няколко пъти по-малка, и с изключение на доставените от Запад към 10% от оръдията ѝ, по-стара и по-хилава от руската. Унищожението на руската армия при опитите ѝ за създаване на плацдарм през Сиверски Донец го демонстрира изключително убедително… А в същото време е много по-малко уязвима от руската – единично оръдие се мести и крие много по-лесно, поразява се със стандартна артилерия много по-трудно и т.н.

А толкова ли са изостанали руснаците, че не прилагат същото? Нямат ли си кадърни програмисти?

О-хо-хо! Имат, и още как! 98% от ботнетовете са реализирани и контролирани от руснаци. Вярно е, не от случайни киберкримки, а от поделение 26165 на ГРУ – хакерите на руската армия. Но това ги прави не по-малко, а по-достъпни за военни цели. А киберпрестъпността печели и плаща доста, лесно наема качествен талант, иска се изключително майсторство, за да си водещ сред нея. Очевидно хакерите на ГРУ са.

А защо тогава не го правят?

Защото подобно приложение би нарушило принцип номер едно на руската армия – тя е, за да изпълнява заповеди. В нея личната инициатива се поощрява единствено когато е в изпълнение на конкретна заповед, и дори тогава не винаги. Във всички други случаи се хвали на думи и наказва на дело.

А това е заради един фундаментален факт. Под „капиталистическата“ боя Русия продължава да е феодална държава.

Например в нея изгодните бизнеси се водят юридически собственост на съответните олигарси, но реално са тяхно ленно владение, дадено им за вярна служба. В момента, в който „сеньорът“ прецени, че друг би му служил по-добре, владението преминава в „собственост“ на този друг, срещу непублична сума (като правило нула). Ако настоящият му „собственик“ не е наясно със системата – примерно се заблуждава, че бизнесът е наистина негова собственост – му се случва каквото ще му докаже грешката му (Ходорковски, anyone?). Подобно е положението и с всички други видове „владения“ – високи постове, привилегии и т.н.

Размерът и важността на „владението“ се определя от точно едно нещо – до каква степен „собственикът“ му е над закона. Това не винаги съвпада с обичайните представи. Например типичният армейски генерал командва като минимум бригада, няколко хиляди войници и огромно количество бойна техника, а лейтенантът от „службите“ често няма нито един подчинен и единственото му оръжие е един пистолет. Ако обаче лейтенантът изкомандва генерала да му пази колата, генералът ще го изпълни на мига, въпреки че лейтенантът официално е много далеч под него в йерархията. Просто „службите“, като реално политическа полиция, са благородническа категория безусловно над всички други, тъй като са по-доверени на властта от всички други. Съответно идеята, че можете да им се опънете за нещо, или да ги съдите, би разсмяла нормалния руснак.

Затова за Русия безпрекословното подчинение в армията е от върховна важност. Заради тежкото си въоръжение армията е по-силова институция дори от „службите“, а в същото време е много по-малко доверена на феодалната върхушка. Позволи ли се в нея култура на самоинициатива, върхушката е под заплаха, по-страшна от външните врагове. За всяка феодална върхушка най-страшният ѝ враг е простолюдието, което тя тъпче – а армията е набрана предимно от него… Затова подобна гъвкавост, базирана на самоинициатива, няма да бъде допусната в руската армия дори при риск за загуба на войната.

Накратко, Русия е феодализъм с тънък до прозрачност слой капиталистическа боя отгоре. Докато Украйна е доста несъвършена и дори с феодални остатъци демокрация, но все пак по-скоро демокрация. Затова тя може да си позволи да даде свобода на инициативата на бойците си – може да разчита на лоялността им. Докато Русия няма как да посмее да ѝ разчита.

А свободата на инициативата и гъвкавостта при изпълнение на задачи са огромни предимства в съвременната война. На теория най-ефективното ѝ постижение, интеграцията на видовете оръжия и родовете войски, е чисто командна задача. На практика обаче, подобно на икономиката, съвременната военна обстановка е твърде сложна и динамична, за да може да бъде обхваната адекватно от централизирана командна система. Съответно делегирането на инициатива и децентрализирането на действията носят огромен плюс. През 1982 г. Израел успява да разгроми многократно по-многобройните арабски армии основно защото позволява инициатива и децентрализиране на действията. Това, че израелската армия е въоръжена със западно оръжие, а арабските с руско, също е фактор, но с по-малко значение.

Ето затова Украйна, въпреки че има няколко пъти по-малки човешки и военни ресурси от Русия, има отлични шансове да спечели тази война. Тя е по същество война на капитализма срещу феодализма. Виждали сме я неведнъж, и на военния фронт, и на икономическия, и на социалния. Знаем как свършва.

А и не е зле да се замислим над още нещо. През 2014 г. Украйна успя да изкара от властта част от руските агенти там. Малко, но все пак. Като резултат, за само 8 години доста украинци стигнаха дотам да карат коли, които по българските критерии изглеждат луксозни. И завистливите ганьовци ги плюят за това. Вместо да се замислят защо те не карат такива – и какво е нужно, за да получат един ден тази възможност.

Всъщност, не за да я получат те – за да я получим всички в България… Те няма да го проумеят. Ако бяха способни на това, нямаше да са завистливи ганьовци. Но може би ние, които смятаме себе си за извън тази група, е добре да вземем мерки по въпроса.

Upcoming Speaking Engagements

Post Syndicated from Schneier.com Webmaster original https://www.schneier.com/blog/archives/2022/05/upcoming-speaking-engagements-19.html

This is a current list of where and when I am scheduled to speak:

  • I’m speaking on “Securing a World of Physically Capable Computers” at OWASP Belgium’s chapter meeting in Antwerp, Belgium, on May 17, 2022.
  • I’m speaking at Future Summits in Antwerp, Belgium, on May 18, 2022.
  • I’m speaking at IT-S Now 2022 in Vienna, Austria, on June 2, 2022.
  • I’m speaking at the 14th International Conference on Cyber Conflict, CyCon 2022, in Tallinn, Estonia, on June 3, 2022.
  • I’m speaking at the RSA Conference 2022 in San Francisco, June 6-9, 2022.
  • I’m speaking at the Dublin Tech Summit in Dublin, Ireland, June 15-16, 2022.

The list is maintained on this page.

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Post Syndicated from Alex Krivit original https://blog.cloudflare.com/part1-coreless-purge/

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

There is a famous quote attributed to a Netscape engineer: “There are only two difficult problems in computer science: cache invalidation and naming things.” While naming things does oddly take up an inordinate amount of time, cache invalidation shouldn’t.

In the past we’ve written about Cloudflare’s incredibly fast response times, whether content is cached on our global network or not. If content is cached, it can be served from a Cloudflare cache server, which are distributed across the globe and are generally a lot closer in physical proximity to the visitor. This saves the visitor’s request from needing to go all the way back to an origin server for a response. But what happens when a webmaster updates something on their origin and would like these caches to be updated as well? This is where cache “purging” (also known as “invalidation”) comes in.

Customers thinking about setting up a CDN and caching infrastructure consider questions like:

  • How do different caching invalidation/purge mechanisms compare?
  • How many times a day/hour/minute do I expect to purge content?
  • How quickly can the cache be purged when needed?

This blog will discuss why invalidating cached assets is hard, what Cloudflare has done to make it easy (because we care about your experience as a developer), and the engineering work we’re putting in this year to make the performance and scalability of our purge services the best in the industry.

What makes purging difficult also makes it useful

(i) Scale
The first thing that complicates cache invalidation is doing it at scale. With data centers in over 270 cities around the globe, our most popular users’ assets can be replicated at every corner of our network. This also means that a purge request needs to be distributed to all data centers where that content is cached. When a data center receives a purge request, it needs to locate the cached content to ensure that subsequent visitor requests for that content are not served stale/outdated data. Requests for the purged content should be forwarded to the origin for a fresh copy, which is then re-cached on its way back to the user.

This process repeats for every data center in Cloudflare’s fleet. And due to Cloudflare’s massive network, maintaining this consistency when certain data centers may be unreachable or go offline, is what makes purging at scale difficult.

Making sure that every data center gets the purge command and remains up-to-date with its content logs is only part of the problem. Getting the purge request to data centers quickly so that content is updated uniformly is the next reason why cache invalidation is hard.  

(ii) Speed
When purging an asset, race conditions abound. Requests for an asset can happen at any time, and may not follow a pattern of predictability. Content can also change unpredictably. Therefore, when content changes and a purge request is sent, it must be distributed across the globe quickly. If purging an individual asset, say an image, takes too long, some visitors will be served the new version, while others are served outdated content. This data inconsistency degrades user experience, and can lead to confusion as to which version is the “right” version. Websites can sometimes even break in their entirety due to this purge latency (e.g. by upgrading versions of a non-backwards compatible JavaScript library).

Purging at speed is also difficult when combined with Cloudflare’s massive global footprint. For example, if a purge request is traveling at the speed of light between Tokyo and Cape Town (both cities where Cloudflare has data centers), just the transit alone (no authorization of the purge request or execution) would take over 180ms on average based on submarine cable placement. Purging a smaller network footprint may reduce these speed concerns while making purge times appear faster, but does so at the expense of worse performance for customers who want to make sure that their cached content is fast for everyone.

(iii) Scope
The final thing that makes purge difficult is making sure that only the unneeded web assets are invalidated. Maintaining a cache is important for egress cost savings and response speed. Webmasters’ origins could be knocked over by a thundering herd of requests, if they choose to purge all content needlessly. It’s a delicate balance of purging just enough: too much can result in both monetary and downtime costs, and too little will result in visitors receiving outdated content.

At Cloudflare, what to invalidate in a data center is often dictated by the type of purge. Purge everything, as you could probably guess, purges all cached content associated with a website. Purge by prefix purges content based on a URL prefix. Purge by hostname can invalidate content based on a hostname. Purge by URL or single file purge focuses on purging specified URLs. Finally, Purge by tag purges assets that are marked with Cache-Tag headers. These markers offer webmasters flexibility in grouping assets together. When a purge request for a tag comes into a data center, all assets marked with that tag will be invalidated.

With that overview in mind, the remainder of this blog will focus on putting each element of invalidation together to benchmark the performance of Cloudflare’s purge pipeline and provide context for what performance means in the real-world. We’ll be reviewing how fast Cloudflare can invalidate cached content across the world. This will provide a baseline analysis for how quick our purge systems are presently, which we will use to show how much we will improve by the time we launch our new purge system later this year.

How does purge work currently?

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

In general, purge takes the following route through Cloudflare’s data centers.

  • A purge request is initiated via the API or UI. This request specifies how our data centers should identify the assets to be purged. This can be accomplished via cache-tag header(s), URL(s), entire hostnames, and much more.
  • The request is received by any Cloudflare data center and is identified to be a purge request. It is then routed to a Cloudflare core data center (a set of a few data centers responsible for network management activities).
  • When a core data center receives it, the request is processed by a number of internal services that (for example) make sure the request is being sent from an account with the appropriate authorization to purge the asset. Following this, the request gets fanned out globally to all Cloudflare data centers using our distribution service.
  • When received by a data center, the purge request is processed and all assets with the matching identification criteria are either located and removed, or marked as stale. These stale assets are not served in response to requests and are instead re-pulled from the origin.
  • After being pulled from the origin, the response is written to cache again, replacing the purged version.

Now let’s look at this process in practice. Below we describe Cloudflare’s purge benchmarking that uses real-world performance data from our purge pipeline.

Benchmarking purge performance design

In order to understand how performant Cloudflare’s purge system is, we measured the time it took from sending the purge request to the moment that the purge is complete and the asset is no longer served from cache.  

In general, the process of measuring purge speeds involves: (i) ensuring that a particular piece of content is cached, (ii) sending the command to invalidate the cache, (iii) simultaneously checking our internal system logs for how the purge request is routed through our infrastructure, and (iv) measuring when the asset is removed from cache (first miss).

This process measures how quickly cache is invalidated from the perspective of an average user.

  • Clock starts
    As noted above, in this experiment we’re using sampled RUM data from our purge systems. The goal of this experiment is to benchmark current data for how long it can take to purge an asset on Cloudflare across different regions. Once the asset was cached in a region on Cloudflare, we identify when a purge request is received for that asset. At that same instant, the clock started for this experiment. We include in this time any retrys that we needed to make (due to data centers missing the initial purge request) to ensure that the purge was done consistently across our network. The clock continues as the request transits our purge pipeline  (data center > core > fanout > purge from all data centers).  
  • Clock stops
    What caused the clock to stop was the purged asset being removed from cache, meaning that the data center is no longer serving the asset from cache to visitor’s requests. Our internal logging measures the precise moment that the cache content has been removed or expired and from that data we were able to determine the following benchmarks for our purge types in various regions.  

Results

We’ve divided our benchmarks in two ways: by purge type and by region.

We singled out Purge by URL because it identifies a single target asset to be purged. While that asset can be stored in multiple locations, the amount of data to be purged is strictly defined.

We’ve combined all other types of purge (everything, tag, prefix, hostname) together because the amount of data to be removed is highly variable. Purging a whole website or by assets identified with cache tags could mean we need to find and remove a multitude of content from many different data centers in our network.

Secondly, we have segmented our benchmark measurements by regions and specifically we confined the benchmarks to specific data center servers in the region because we were concerned about clock skews between different data centers. This is the reason why we limited the test to the same cache servers so that even if there was skew, they’d all be skewed in the same way.  

We took the latency from the representative data centers in each of the following regions and the global latency. Data centers were not evenly distributed in each region, but in total represent about 90 different cities around the world:  

  • Africa
  • Asia Pacific Region (APAC)
  • Eastern Europe (EEUR)
  • Eastern North America (ENAM)
  • Oceania
  • South America (SA)
  • Western Europe (WEUR)
  • Western North America (WNAM)

The global latency numbers represent the purge data from all Cloudflare data centers in over 270 cities globally. In the results below, global latency numbers may be larger than the regional numbers because it represents all of our data centers instead of only a regional portion so outliers and retries might have an outsized effect.

Below are the results for how quickly our current purge pipeline was able to invalidate content by purge type and region. All times are represented in seconds and divided into P50, P75, and P99 quantiles. Meaning for “P50” that 50% of the purges were at the indicated latency or faster.  

Purge By URL

P50 P75 P99
AFRICA 0.95s 1.94s 6.42s
APAC 0.91s 1.87s 6.34s
EEUR 0.84s 1.66s 6.30s
ENAM 0.85s 1.71s 6.27s
OCEANIA 0.95s 1.96s 6.40s
SA 0.91s 1.86s 6.33s
WEUR 0.84s 1.68s 6.30s
WNAM 0.87s 1.74s 6.25s
GLOBAL 1.31s 1.80s 6.35s

Purge Everything, by Tag, by Prefix, by Hostname

P50 P75 P99
AFRICA 1.42s 1.93s 4.24s
APAC 1.30s 2.00s 5.11s
EEUR 1.24s 1.77s 4.07s
ENAM 1.08s 1.62s 3.92s
OCEANIA 1.16s 1.70s 4.01s
SA 1.25s 1.79s 4.106s
WEUR 1.19s 1.73s 4.04s
WNAM 0.9995s 1.53s 3.83s
GLOBAL 1.57s 2.32s 5.97s

A general note about these benchmarks — the data represented here was taken from over 48 hours (two days) of RUM purge latency data in May 2022. If you are interested in how quickly your content can be invalidated on Cloudflare, we suggest you test our platform with your website.

Those numbers are good and much faster than most of our competitors. Even in the worst case, we see the time from when you tell us to purge an item to when it is removed globally is less than seven seconds. In most cases, it’s less than a second. That’s great for most applications, but we want to be even faster. Our goal is to get cache purge to as close as theoretically possible to the speed of light limit for a network our size, which is 200ms.

Intriguingly, LEO satellite networks may be able to provide even lower global latency than fiber optics because of the straightness of the paths between satellites that use laser links. We’ve done calculations of latency between LEO satellites that suggest that there are situations in which going to space will be the fastest path between two points on Earth. We’ll let you know if we end up using laser-space-purge.

Just as we have with network performance, we are going to relentlessly measure our cache performance as well as the cache performance of our competitors. We won’t be satisfied until we verifiably are the fastest everywhere. To do that, we’ve built a new cache purge architecture which we’re confident will make us the fastest cache purge in the industry.

Our new architecture

Through the end of 2022, we will continue this blog series incrementally showing how we will become the fastest, most-scalable purge system in the industry. We will continue to update you with how our purge system is developing  and benchmark our data along the way.

Getting there will involve rearchitecting and optimizing our purge service, which hasn’t received a systematic redesign in over a decade. We’re excited to do our development in the open, and bring you along on our journey.

So what do we plan on updating?

Introducing Coreless Purge

The first version of our cache purge system was designed on top of a set of central core services including authorization, authentication, request distribution, and filtering among other features that made it a high-reliability service. These core components had ultimately become a bottleneck in terms of scale and performance as our network continues to expand globally. While most of our purge dependencies have been containerized, the message queue used was still running on bare metals, which led to increased operational overhead when our system needed to scale.

Last summer, we built a proof of concept for a completely decentralized cache invalidation system using in-house tech – Cloudflare Workers and Durable Objects. Using Durable Objects as a queuing mechanism gives us the flexibility to scale horizontally by adding more Durable Objects as needed and can reduce time to purge with quick regional fanouts of purge requests.

In the new purge system we’re ripping out the reliance on core data centers and moving all that functionality to every data center, we’re calling it coreless purge.

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Here’s a general overview of how coreless purge will work:

  • A purge request will be initiated via the API or UI. This request will specify how we should identify the assets to be purged.
  • The request will be routed to the nearest Cloudflare data center where it is identified to be a purge request and be passed to a Worker that will perform several of the key functions that currently occur in the core (like authorization, filtering, etc).
  • From there, the Worker will pass the purge request to a Durable Object in the data center. The Durable Object will queue all the requests and broadcast them to every data center when they are ready to be processed.
  • When the Durable Object broadcasts the purge request to every data center, another Worker will pass the request to the service in the data center that will invalidate the content in cache (executes the purge).

We believe this re-architecture of our system built by stringing together multiple services from the Workers platform will help improve both the speed and scalability of the purge requests we will be able to handle.

Conclusion

We’re going to spend a lot of time building and optimizing purge because, if there’s one thing we learned here today, it’s that cache invalidation is a difficult problem but those are exactly the types of problems that get us out of bed in the morning.

If you want to help us optimize our purge pipeline, we’re hiring.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close