Tag Archives: Platform Week

Network performance update: Platform Week

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-platform-week/

Network performance update: Platform Week

Network performance update: Platform Week

In September 2021, we shared extensive benchmarking results of 1,000 networks all around the world. The results showed that on a range of tests (TCP connection time, time to first byte, time to last byte), and on different measures (p95, mean), Cloudflare was the fastest provider in 49% of networks around the world. Since then, we’ve worked to continuously improve performance, with the ultimate goal of being the fastest everywhere and an intermediate goal to grow the number of networks where we’re the fastest by at least 10% every Innovation Week. We met that goal during Security Week (March 2022), and we’re carrying the work over to Platform Week (May 2022).

We’re excited to update you on the latest results, but before we do: after running with this benchmark for nine months, we’ve also been looking for ways to improve the benchmark itself — to make it even more representative of speeds in the real world. To that end, we’re expanding our measured networks from 1,000 to 3,000, to give an even more accurate sense of real world performance across the globe.

In terms of results: using the old benchmark of 1,000 networks, we’re the fastest in 69% of them. In this new expanded view of 3,000 networks, we’re the fastest in 42% of them. We’ve demonstrated a consistent ability to improve our performance against what we measure, and we’re excited to optimize our performance and lift our ranking in these smaller networks all around the world.

In addition to sharing a general update on where our network performance stands, we’re also sharing updated performance metrics on our Workers platform (given that it’s Platform Week!). We’ve done an extensive benchmark of Cloudflare Workers vs Fastly’s [email protected]

We’ve got the results below, but before we get to that, we want to spend a bit of time on our revamped measurements.

Revamped measurements

A few months ago, we discussed the performance of Cloudflare Workers, as compared to other similar offerings out there. We compared our performance to Fastly’s [email protected], showing that Workers was significantly faster.

After we published our results, there were questions and suggestions on how to improve our testing methodology (including sharing more detail about where and how we ran the tests). As we re-ran tests for this iteration, we made some small changes, and also worked to address the suggestions from the community. Let’s talk about what’s changed and why.

Measuring what matters

To quantify global network performance, we have to get enough data from around the world, across all manner of different networks, comparing ourselves with other providers. We used Real User Measurements (RUM) to fetch a 100kB file from different providers. Users around the world report the performance of different providers. The more users who report the data, the higher fidelity the signal is. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

In the process of quantifying network performance, it became clear where we needed to expand our scope in measuring performance. After Security Week, we were fastest in 71% of networks, and we decided that we wanted to expand the pool of networks from the top 1,000 most reported networks to the top 3,000 most reported networks.

We’ve shown the graph detailing the number of networks where Cloudflare is #1 in network performance, but from here on out, we will only be showing this graph in percentages of networks where we are number one out of 100%, since the denominator will be changing going forward as we set ourselves harder and harder challenges.

Benchmarking for everyone

At the time we published our earlier set of benchmarks, we had an unnecessary “no benchmarking” clause in our Terms of Service. It has since been removed. It’s been a while since we’ve worried about such things, and the clause lived past its intended life in our ToS.

We’ve done the work to show where we are the fastest provider, and it’s important that everyone else be able to validate that independently. We’re also confident that well run benchmarks will only help further improve performance for Workers, other Cloudflare products, and the Internet as a whole. Game on.

Measuring Workers performance from real users

To run our tests, we used Catchpoint, an industry standard “synthetic” testing tool, and measurements collected from real users distributed around the world. Catchpoint is a monitoring platform that has around 2,000 total endpoints distributed around the world that can be configured to fetch specific resources and time each test. Catchpoint is useful for network providers like us as it provides a consistent, repeatable way to measure end-to-end performance of a workload, and delivers a best-effort approximation for what a user sees.

Catchpoint has a series of backbone nodes that are embedded in ISPs around the world. That means that these nodes are plugged into ISP routers just like you are, and the traffic goes through the ISP network to each endpoint they are monitoring. These can approximate a real user, but they will never truly approximate a real user. For example, the bandwidth for these nodes is 100% dedicated for platform monitoring, as opposed to your home Internet connection, where your Internet experience will be a mixed bag of different use cases, some of which won’t talk to Workers applications at all.

For this new test, we chose 300 backbone nodes that are embedded in last mile ISPs around the world. We filtered out nodes in cloud providers, or in metro areas with multiple transit options, trying to remove duplicate paths as much as possible.

We cross-checked these tests with our own data set, which is collected from users connecting to free websites when they are served 1xxx error pages, similar to how we collect data for global network performance. When a user sees this error page, that page will contain a call that will execute these tests and upload performance metrics on these calls to Cloudflare. Users would run these calls independently of the error page, ensuring that Cloudflare did not get a head start in these tests.

We also changed our test methodology to use paid accounts for both Fastly and Cloudflare.

Changing the numbers we look at

For this new test, we decided to look at Wait time in addition to Time to First Byte. Time to First Byte is the time it takes for a web server to establish a connection, fetch the content, and deliver the first byte of a response to a user, and Wait time is the time the client spends waiting for the server to send back that first byte after the connection is established. We are using this particular measurement to look at the actual time it takes for the server to compute the request on the machine. Wait time is a subcomponent of TTFB, and contains the machine processing time, the time it takes the server to give the response to the socket to be sent back to the user, and the time it takes the response to reach the user. There could be latency in passing the request to the socket on the machine, but we measured the actual time it took the server to send the request for all tests and found it to be zero:

Network performance update: Platform Week

So we would calculate the wait time as the amount of time spent on the box plus the time it took the server to send the packet over the wire to the client.

However, others have noted that using Time to First Byte to measure performance of serverless computing solutions could potentially be misleading because user performance measurements can be influenced by more than just the time spent computing functions on the machine. For example, things like the time to connect to the server, DNS resolution, and cache times can impact Time to First Byte. Time to connect to the server (connect) can be impacted by how well peered a network is (or how poorly). DNS resolution can be impacted by client behavior, local DNS behavior, or by the performance of the provider’s DNS. Cache times can be driven by the performance of the cache on the server itself.

We agree that it is difficult to tease out computing time from Time to First Byte. To look at that particular aspect of serverless computing, we look at Wait, which is a value that contains the least amount of variables: we aren’t caching anything during our tests, DNS isn’t part of the wait time, and the only thing impacting Wait aside from the time spent on the machine is the Connect time.

However, since we’re trying to measure the end user experience and not just the amount of time spent on a server, we want to use both the Wait and TTFB so that we can show the time spent both on the server and how that impacts end-to-end performance.

What’s in the test?

For our test this time around, we decided to measure three things: a simple JavaScript function like last time, a complex JavaScript function, and a complex Rust function. Here are the functions for each of them:

async function getErrorResponse(event, message, status) {
  return new Response(message, {status: status, headers: {'Content-Type': 'text/plain'}});
}

function testHardBusyLoop() {
  let value = 0;
  let offset = Date.now();

  for (let n = 0; n < 15000; n++) {
    value += Math.floor(Math.abs(Math.sin(offset + n)) * 10);
  }

  return value;
}

fn test_hard_busy_loop() -> i32 {
  let mut value = 0;
  let offset = Date::now().as_millis();

  for n in 0..15000 {
    value += (((offset + n) as f64).sin().abs() * 10.0) as i32;
  }

  value
}

The goals of each of these tests is simple: test the ability of Workers and [email protected] to perform compute actions.

Latest update on network performance

We have two updates on data: a general network performance update, and then data on how Workers compares with [email protected]

At Security Week (March 2022), we shared that we were faster in more of the most reported networks than our competitors. Out of the top 1,000 networks in the world (by number of IPv4 addresses advertised), here’s a breakdown of how many providers are number one in p95 TCP Connection Time, which represents the time it takes for a user to connect to the provider. This data is from Security Week (March 2022):

Network performance update: Platform Week

Recognizing that we are now looking at different numbers of networks, here is what the distribution looks like for the top 3,000 networks for Platform Week (May 2022):

Network performance update: Platform Week

In addition to being the fastest across popular networks, Cloudflare is also committed to being the fastest provider in every country.

Using data on the top 1,000 networks from Full Stack Week (March 2022), here’s what the world map looks like:

Network performance update: Platform Week

And here’s what the world looks like while looking at the top 3,000 networks:

Network performance update: Platform Week

Cloudflare became #1 in more countries in Africa and Europe, some of which previously did not have enough samples to officially be counted to one provider or another.

Workers vs [email protected]

Moving on from general network results, what does performance look like when comparing serverless compute products across providers — in this case, Cloudflare Workers performance to [email protected]? Looking at the Catchpoint data, the first thing we noticed was Cloudflare is faster than Fastly in all tests at the 95th percentile for Time to First Byte:

Network performance update: Platform Week

Test 95th percentile TTFB (ms)
Cloudflare JS no-op 469
Fastly JS no-op 596
Cloudflare JS hard 481
Fastly JS hard 631
Cloudflare Rust hard 493
Fastly Rust hard 577

Cloudflare is significantly faster at all tests compared to Fastly. But let’s dig into why that is. Looking at p95 Wait, we can see that Cloudflare does have an edge in most tests related to compute on-box:

Network performance update: Platform Week

Test 95th Percentile Wait (ms)
Cloudflare JS no-op 123
Fastly JS no-op 123
Cloudflare JS hard 136
Fastly JS hard 170
Cloudflare Rust hard 160
Fastly Rust hard 121

Looking at the Wait times, you can see that Cloudflare does have a significant edge in on-box performance, except in Rust, where Fastly claims their workloads are the most optimized. This data backs up that claim. But why is Fastly so much slower on Time to First Byte? The answer lies in the rest of the request. While latency spent in compute matters, it only matters in conjunction with the rest of the network performance. And Cloudflare has an advantage over Fastly in Connect times and SSL establishment times:

Network performance update: Platform Week

Test 95th Percentile Connect (ms) 95th Percentile SSL (ms)
Cloudflare JS no-op 81 289
Fastly JS no-op 88 293
Cloudflare JS hard 81 286
Fastly JS hard 88 291
Cloudflare Rust hard 81 288
Fastly Rust hard 88 295

Cloudflare’s hyper-optimized web stack allows us to process all requests faster, meaning that Workers code gets started faster. Having that extra head start in Connect and SSL allows us to further increase the distance between us and [email protected]

To verify these results, we compared the Catchpoint results to our own data set. Here is the p95 TTFB for the JavaScript and Rust hard loops for both Fastly and Cloudflare from our data:

Network performance update: Platform Week

As you can see, Cloudflare is faster on JavaScript and Rust calls. And when you look at wait time, you will see that outside of Catchpoint results, Cloudflare even beats Fastly in the Rust hard tests:

Network performance update: Platform Week

The big takeaway from this is that in addition to Cloudflare being faster for the time spent processing requests, Cloudflare’s network and performance optimizations as a whole set us apart and make our Workers platform even faster.

A fast network makes for a faster developer platform

Cloudflare’s commitment to building the fastest network allows us to deliver unparalleled performance for all applications, including our developer tools. Whether it’s by accelerating Cloudflare Pages performance by hosting on every single Cloudflare server, or by deploying Workers in 275 cities without developers needing to configure a thing, Cloudflare’s developer tools are all built on top of Cloudflare’s global network. We’re committed to making our network faster so that our developer products are as performant as they could possibly be.

And it’s not just the developer platform. Cloudflare runs an integrated and optimized stack that includes DDoS protection, WAF, rate limiting, bot management, caching, SSL, smart routing, and more. By having a single software stack we are able to offer the widest range of features while ensuring that performance (no matter which of our products you use) remains excellent. We don’t want you to have to compromise performance to get security or vice versa.

Proof of Stake and our next experiments in web3

Post Syndicated from Wesley Evans original https://blog.cloudflare.com/next-gen-web3-network/

Proof of Stake and our next experiments in web3

Proof of Stake and our next experiments in web3

A little under four years ago we announced Cloudflare’s first experiments in web3 with our gateway to the InterPlanetary File System (IPFS). Three years ago we announced our experimental Ethereum Gateway. At Cloudflare, we often take experimental bets on the future of the Internet to help new technologies gain maturity and stability and for us to gain deep understanding of them.

Four years after our initial experiments in web3, it’s time to launch our next series of experiments to help advance the state of the art. These experiments are focused on finding new ways to help solve the scale and environmental challenges that face blockchain technologies today. Over the last two years there has been a rapid increase in the use of the underlying technologies that power web3. This growth has been a catalyst for a generation of startups; and yet it has also had negative externalities.

At Cloudflare, we are committed to helping to build a better Internet. That commitment balances technology and impact. The impact of certain older blockchain technologies on the environment has been challenging. The Proof of Work (PoW) consensus systems that secure many blockchains were instrumental in the bootstrapping of the web3 ecosystem. However, these systems do not scale well with the usage rates we see today. Proof of Work networks rely on complex cryptographic functions to create consensus and prevent attacks. Every node on the network is in a race to find a solution to this cryptographic problem. This cryptographic function is called a hash function. A hash function takes a number in x, and gives a number out y. If you know x, it’s easy to compute y. However, given y, it’s prohibitively costly to find a matching x. Miners have to find an x that would output a y with certain properties, such as 10 leading zero. Even with advanced chips, this is costly and heavily based on chance.

Proof of Stake and our next experiments in web3

While that process is incredibly secure because of the vast amount of hashing that occurs, Proof of Work networks are wasteful. This waste is driven by the fact that Proof of Work consensus mechanisms are electricity-intensive. In a PoW ecosystem, energy consumption scales directly with usage. So, for example, when web3 took off in 2020, the Bitcoin network started to consume the same amount of electricity as Switzerland (according to Cambridge Bitcoin Electricity Consumption Index).

This inefficiency in resource use is by design to make it difficult for any single actor to threaten the security of the blockchain, and for the majority of the last decade, while use was low, this inefficiency was an acceptable trade off. The other major trade off accepted with the security of Proof of Work has been the number of transactions the network can handle per second. For instance, the Ethereum network can currently process about 30 transactions per second. This ultra-low transaction rate was acceptable when the technology was in development, but does not scale to meet the current demand.

As recently as two years ago, web3 was an up-and-coming ecosystem with minimal traffic compared to the majority of the traffic on the Internet. The last two years changed the web3 use landscape. Traffic to the Bitcoin and the Ethereum networks has exploded. Technologies like Smart Contracts that power DeFi, NFTs, and DAOs have started to become commonplace in the developer community (and in the Twitter lexicon). The next generation of blockchains like Flow, Avalanche, Solana, as well as Polygon are being developed and adopted by major companies like Meta.

In order to balance our commitment to the environment and to help build a better Internet, it is clear to us that we should launch new experiments to help find a path forward. A path that balances the clear need to drastically reduce the energy consumption of web3 technologies and the capability to scale the web3 networks by orders of magnitude if the need arises to support increased user adoption over the next five years. How do we do that? We believe that the next generation of consensus protocols is likely to be based on Proof of Stake and Proof of Spacetime consensus mechanisms.

Proof of Stake and our next experiments in web3

Today, we are excited to announce the start of our experiments for the next generation of web3 networks that are embracing Proof of Stake; beginning with Ethereum. Over the next few months, Cloudflare will launch, and fully stake, Ethereum validator nodes on the Cloudflare global network as the community approaches its transition from Proof of Work to Proof of Stake with “The Merge.” These nodes will serve as a testing ground for research on energy efficiency, consistency management, and network speed.

There is a lot in that paragraph so let’s break it down! The short version though is that Cloudflare is going to participate in the research and development of the core infrastructure that helps keep Ethereum secure, fast, as well as energy efficient for everyone.

Proof of Stake is a next-generation consensus protocol to secure blockchains. Unlike Proof of Work that relies on miners racing each other with increasingly complex cryptography to mine a block, Proof of Stake secures new transactions to the network through self-interest. Validator’s nodes (people who verify new blocks for the chain) are required to put a significant asset up as collateral in a smart contract to prove that they will act in good faith. For instance, for Ethereum that is 32 ETH. Validator nodes that follow the network’s rules earn rewards; validators that violate the rules will have portions of their stake taken away. Anyone can operate a validator node as long as they meet the stake requirement. This is key. Proof of Stake networks require lots and lots of validators nodes to validate and attest to new transactions. The more participants there are in the network, the harder it is for bad actors to launch a 51% attack to compromise the security of the blockchain.

To add new blocks to the Ethereum chain, once it shifts to Proof of Stake, validators are chosen at random to create new blocks (validate). Once they have a new block ready it is submitted to a committee of 128 other validators who act as attestors. They verify that the transactions included in the block are legitimate and that the block is not malicious. If the block is found to be valid, it is this attestation that is added to the blockchain. Critically, the validation and attestation process is not computationally complex. This reduction in complexity leads to significant energy efficiency improvements.

The energy required to operate a Proof of Stake validator node is magnitudes less than a Proof of Work miner. Early estimates from the Ethereum Foundation estimate that the entire Ethereum network could use as little as 2.6 megawatts of power. Put another way, Ethereum will use 99.5% less energy post merge than today.

We are going to support the development and deployment of Ethereum by running Proof of Stake validator nodes on our network. For the Ethereum ecosystem, running validator nodes on our network allows us to offer even more geographic decentralization in places like EMEA, LATAM, and APJC while also adding infrastructure decentralization to the network. This helps keep the network secure, outage resistant, and closer to the goal of global decentralization for everyone. For us, It’s a commitment to helping research and experiment with scaling the next generation of blockchain networks that could be a key part of the story of the Internet. Running blockchain technology at scale is incredibly difficult, and Proof of Stake systems have their own unique requirements that will take time to understand and improve. Part of this experimentation though is a commitment. As part of our experiments Cloudflare has not and will not run our own Proof of Work infrastructure on our network.

Finally, what is “The Merge”? The Merge is the planned combination of the Ethereum Mainnet (the chain you interact with today when you make an Eth transaction) and the Ethereum Beacon Chain. The Beacon chain is the Proof of Stake blockchain running in parallel with the Ethereum Mainnet. The Merge is much more significant than a consensus update.

Consensus updates have happened multiple times already on Ethereum. This could be because of a vulnerability, or the need to introduce new features. The Merge is different in a way that it cannot be made by a normal upgrade. Previous forks have been enabled at a specific block number. This means that if you mine enough blocks, the new consensus rules apply automatically. With the Merge, there should only be one state root for the fork to happen. The merge state root would be the start of the Ethereum Mainnet PoS journey. It is chosen and set at a certain difficulty, instead of block height, to avoid merging a high block with low difficulty.

Once The Merge occurs later this year in 2022, Ethereum will be a full Proof of Stake ecosystem that no longer relies on Proof of Work.

This is just the start of our commitment to help build the next generation of web3 networks. We are excited to work with our partners across the cryptography, web3, and infrastructure communities to help create the next generation of blockchain ecosystems that are environmentally sustainable, secure, and blazing fast.

Public access for our Ethereum and IPFS gateways now available

Post Syndicated from Wesley Evans original https://blog.cloudflare.com/ea-web3-gateways/

Public access for our Ethereum and IPFS gateways now available

Public access for our Ethereum and IPFS gateways now available

Today we are excited to announce that our Ethereum and IPFS gateways are publicly available to all Cloudflare customers for the first time. Since our announcement of our private beta last September the interest in our Eth and IPFS gateways has been overwhelming. We are humbled by the demand for these tools, and we are excited to get them into as many developers’ hands as possible. Starting today, any Cloudflare customer can log into the dashboard and configure a zone for Ethereum, IPFS, or both!

Over the last eight months of the private beta, we’ve been busy working to fully operationalize the gateways to ensure they meet the needs of our customers!

First, we have created a new API with end-to-end managed hostname deployment. This ensures the creation and management of gateways as you continue to scale remains extremely quick and easy! It is paramount to give time and focus back to developers to focus on your core product and services and leave the infrastructural components to us!

Second, we’ve added a brand new UI bringing web3 to Cloudflare’s zone-level dashboard. Now, regardless of the workflow you are used to, we have parity between our UI and API to ensure we fit into your existing processes and no time is wasted internally to have to figure out ‘how we integrate’, but rather, a quick setup and start to serve content or connect your services!

Third, we are pleased to say that you will soon have testnet support to ensure your new development can be easily tested, hardened, and deployed to your mainnet without incurring additional risk to your brand trust, product availability, or concern that something may fail silently and begin a cascade of problems throughout your network. We want to ensure that anyone leveraging our web3 gateways is able to achieve more confidence and trust that whatever changes are pushed forward do not impact end user experience. At the end of the day, the Internet is for end users and their experience and perception must always be kept within our purview at all times.

Lastly, Cloudflare loves to build on top of Cloudflare. This helps us stay resilient and also shows our commitment and belief in all the products we create! We have always used our SSL for SaaS and Workers products in the background. Building on our own services gives our customers the ability to define and control their own HTTP features on top of traffic destined for web3 gateways, including: rate limits, WAF rules, custom security filters, serving video, customer defined Workers logic, custom redirects and more!

Public access for our Ethereum and IPFS gateways now available

Today thousands of different individuals, companies, and DAO’s are building new products leveraging our web3 gateways — the most reliable web3 infrastructure with the largest network

Here are just a few snippets of how people are already using our web3 Gateways, and we can’t wait to see what you build on them:

  • DeFi DAO’s use the Cloudflare IPFS gateway to serve their front end web applications globally without latency or cache penalties.
  • NFT designers use the Ethereum Gateway to effortlessly drop new offerings, and the IPFS gateway to store them in a fully decentralized system.
  • Large Dapp developers trust us to handle huge traffic spikes quickly and efficiently, without rate limits or overage caps. They combine all our offerings into a single pane of glass so that they don’t have to juggle multiple systems.

As part of this announcement, we will begin migrating our existing users away from the legacy gateway endpoints and onto our new API, which is easier, highly managed, and more robust. To ensure a smooth transition, you will first need to make sure you have signed up for a Cloudflare account if you did not already have one. On top of that, we have made sure to keep our free users in mind and thus our free users will continue to use the gateways at no cost with our free tier option! This includes no cap in the amount of traffic that can be pushed through our gateways along with offering the most transparent and forecastable pricing models in the market today. We are very excited about the future and look forward to sharing the next iterations of web3 at Cloudflare!

Also, if you can’t wait to start building on our gateways, check out our product documentation for more guidance.

Serving Cloudflare Pages sites to the IPFS network

Post Syndicated from Thibault Meunier original https://blog.cloudflare.com/cloudflare-pages-on-ipfs/

Serving Cloudflare Pages sites to the IPFS network

Serving Cloudflare Pages sites to the IPFS network

Four years ago, Cloudflare went Interplanetary by offering a gateway to the IPFS network. This meant that if you hosted content on IPFS, we offered to make it available to every user of the Internet through HTTPS and with Cloudflare protection. IPFS allows you to choose a storage provider you are comfortable with, while providing a standard interface for Cloudflare to serve this data.

Since then, businesses have new tools to streamline web development. Cloudflare Workers, Pages, and R2 are enabling developers to bring services online in a matter of minutes, with built-in scaling, security, and analytics.

Today, we’re announcing we’re bridging the two. We will make it possible for our customers to serve their sites on the IPFS network.

In this post, we’ll learn how you will be able to build your website with Cloudflare Pages, and leverage the IPFS integration to make your content accessible and available across multiple providers.

A primer on IPFS

The InterPlanetary FileSystem (IPFS) is a peer-to-peer network for storing content on a distributed file system. It is composed of a set of computers called nodes that store and relay content using a common addressing system. In short, a set of participants agree to maintain a shared index of content the network can provide, and where to find it.

Let’s take two random participants in the network: Alice, a cat person, and Bob, a dog person.

As a participant in the network, Alice keeps connections with a subset of participants, referred to as her peers. When Alice is making her content 🐱 available on IPFS, it means she announces to her peers she has content 🐱, and these peers stored in their routing table 🐱 is provided by Alice’s node.

Serving Cloudflare Pages sites to the IPFS network

Each node has a routing table, and a datastore. The routing table retains a mapping of content to peers, and the datastore retains the content a given node stores. In the above case, only Alice has content, a 🐱.

Serving Cloudflare Pages sites to the IPFS network

When Bob wants to retrieve 🐱, he tells his peers they want 🐱. These peers point him to Alice. Alice then provides 🐱 to Bob. Bob can verify 🐱 is the content they were looking for, because the content identifier he requested is derived from the 🐱 content itself, using a secure, cryptographic hash function.

Serving Cloudflare Pages sites to the IPFS network

Even better, if Bob becomes a cat person, they can announce to their peers they are also providing a cat. Bob’s love for cats could be genuine, or because they have interest in providing the content, such as a contract with Alice. IPFS provides a common ground to share content, without being opinionated on how this content has to be stored and its guarantees.

How Pages websites are made available on IPFS

Content is made available as follows.

Serving Cloudflare Pages sites to the IPFS network

The components are:

  • Pages storage: Storage solution for Cloudflare Pages.
  • IPFS Index Proxy: Service maintaining a map between IPFS CID and location of the data. This is operated on Cloudflare Workers and using Workers KV to store the mapping.
  • IPFS node: Cloudflare-hosted IPFS node serving Pages content. It has an in-house datastore module, able to communicate with the IPFS Index Proxy.
  • IPFS network: The rest of the IPFS network.

When you opt in serving your Cloudflare Page on IPFS, a call is made to the IPFS index proxy. This call first fetches your Pages content, transforms it into a CID, then both populates IndexDB to associate the CID with the content and reaches out to Cloudflare IPFS node to tell them they are able to provide the CID.

For example, imagine your website structure is as follows:

  • /
    • index.html
    • static/
      • cats.txt
      • beautiful_cats.txt

To provide this website on IPFS, Cloudflare has to compute a CID for /. CIDs are built recursively. To compute the CID for a given folder /, one needs to have the CID of index.html and static/. index.html CID is derived from its binary representation, and static/ from cats.txt and beautiful_cats.txt.

Serving Cloudflare Pages sites to the IPFS network

This works similarly to a Merkle tree, except nodes can reference each other as long as they still form a Directed Acyclic Graph (DAG). This structure is therefore referred to as a MerkleDAG.

In our example, it’s not unlikely for cats.txt and beautiful_cats.txt to have data in common. In this case, Cloudflare can be smart in the way it builds the MerkleDAG.

Serving Cloudflare Pages sites to the IPFS network

This reduces the storage and bandwidth requirement when accessing the website on IPFS.

If you want to learn more about how you can model a file system on IPFS, you can check the UnixFS specification.

Cloudflare stores every CID and the content it references in IndexDB. This allows Cloudflare IPFS nodes to serve Cloudflare Pages assets when requested.

Let’s put this together

Let’s take an example: pages-on-ipfs.com is hosted on IPFS. It is built using Hugo, a static site generator, and Cloudflare Pages with the public documentation template. Its source is available on GitHub. If you have an IPFS compatible client, you can access it at ipns://pages-on-ipfs.com as well.

1. Read Cloudflare Pages deployment documentation

For the purpose of this blog, I want to create a simple Cloudflare page website. I have experience with Hugo, so I choose it as my framework for the project.

I type “cloudflare pages” in the search bar of my web browser, then head to the Read the docs > Framework Guide > Deploy a Hugo site.

2. Create a site

This is where I use Hugo, and your configuration might differ. In short, I use hugo new site pages-on-ipfs, create an index and two static resources, et voilà. The result is available on the source GitHub for this project.

3. Deploy using Cloudflare Pages

On the Cloudflare Dashboard, I go to Account Home > Pages > Create a project. I select the GitHub repository I created, and configure the build tool to build Hugo website. Basically, I follow what’s written on Cloudflare Pages documentation.

Upon clicking Save and Deploy, my website is deployed on pages-on-ipfs.pages.dev. I also configure it to be available at pages-on-ipfs.com

4. Serve my content on IPFS

First, I opt in my zone on Cloudflare Pages integration with IPFS. This feature is not available yet for everyone to try out.

This allows Cloudflare to index the content of my website. Once indexed, I get the CID for my deployment baf…1. I can check that my content is available at this hash on IPFS using an IPFS gateway https://baf…1.ipfs.cf-ipfs.com/.

5. Make my IPFS website available at pages-on-ipfs.com

Having one domain name to access both Cloudflare Pages and IPFS version, depending on if the client supports IPFS or not is ideal. Fortunately, the IPFS ecosystem supports such a feature via DNSLink. DNSLink is a way to specify the IPFS content a domain is associated with.

For pages-on-ipfs.com, I create a TXT record on _dnslink.pages-on-ipfs.com with value dnslink=/ipfs/baf…1. Et voilà. I can now access ipns://pages-on-ipfs.com via an IPFS client.

6. (Optional) Replicate my website

The content of my website can now easily be replicated and pinned by other IPFS nodes. This can either be done at home via an IPFS client or using a pinning service such as Pinata.

What’s next

We’ll make this service available later this year as it is being refined. We are committed to make information move freely and help build a better Internet. Cloudflare Pages work of solving developer problems continues, as developers are now able to make their site accessible to more users.

Over the years, IPFS has been used by more and more people. While Cloudflare started by making IPFS content available to web users through an HTTP interface, we now think it’s time to give back. Allowing Cloudflare assets to be served over a public distributed network extends developers and users capability on an open web.

Common questions

  • I am already hosting my website on IPFS. Can I pin it using Cloudflare?
    • No. This project aims at serving Cloudflare hosted content via IPFS. We are still investigating how to best replicate and re-provide already-existing IPFS content via Cloudflare infrastructure.
  • Does this make IPFS more centralized?
    • No. Cloudflare does not have the authority to decide who can be a node operator nor what content they provide.
  • Are there guarantees the content is going to be available?
    • Yes. As long as you choose Cloudflare to host your website on IPFS, it will be available on IPFS. Should you move to another provider, it would be your responsibility to make sure the content remains available. IPFS allows for this transition to be smooth using a pinning service.
  • Is IPFS private?
    • It depends. Generally, no. IPFS is a p2p protocol. The nodes you peer with and request content from would know the content you are looking for. The privacy depends on the trust you have in your peers to not snoop on the data you request.
  • Can users verify the integrity of my website?
    • Yes. They need to access your website through an IPFS compatible client. Ideally, IPFS content integrity is turned into a web standard, similar to subresource integrity.

Gaining visibility in IPFS systems

Post Syndicated from Pop Chunhapanya original https://blog.cloudflare.com/ipfs-measurements/

Gaining visibility in IPFS systems

Gaining visibility in IPFS systems

We have been operating an IPFS gateway for the last four years. It started as a research experiment in 2018, providing end-to-end integrity with IPFS. A year later, we made IPFS content faster to fetch. Last year, we announced we were committed to making IPFS gateway part of our product offering. Through this process, we needed to inform our design decisions to know how our setup performed.

To this end, we’ve developed the IPFS Gateway monitor, an observability tool that runs various IPFS scenarios on a given gateway endpoint. In this post, you’ll learn how we use this tool and go over discoveries we made along the way.

Refresher on IPFS

IPFS is a distributed system for storing and accessing files, websites, applications, and data. It’s different from a traditional centralized file system in that IPFS is completely distributed. Any participant can join and leave at any time without the loss of overall performance.

However, in order to access any file in IPFS, users cannot just use web browsers. They need to run an IPFS node to access the file from IPFS using its own protocol. IPFS Gateways play the role of enabling users to do this using only web browsers.

Cloudflare provides an IPFS gateway at https://cloudflare-ipfs.com, so anyone can just access IPFS files by using the gateway URL in their browsers.

As IPFS and the Cloudflare IPFS Gateway have become more and more popular, we need a way to know how performant it is: how much time it takes to retrieve IPFS-hosted content and how reliable it is. However, IPFS gateways are not like normal websites which only receive HTTP requests and return HTTP responses. The gateways need to run IPFS nodes internally and sometimes do content routing and peer routing to find the nodes which provide IPFS contents. They sometimes also need to resolve IPNS names to discover the contents. So, in order to measure the performance, we need to do measurements many times for many scenarios.

Enter the IPFS Gateway monitor

IPFS Gateway monitor is this tool. It allows anyone to check the performance of their gateway and export it to the Prometheus monitoring system.

This monitor is composed of three independent binaries:

  1. ipfs-gw-measure is the tool that calls the gateway URL and does the measurement scenarios.
  2. ipfs-gw-monitor is a way to call the measurement tool multiple times.
  3. Prometheus Exporter exposes prometheus-readable metrics.

To interact with the IPFS network, the codebase also provides a way to start an IPFS node.

A scenario is a set of instructions a user performs given the state of our IPFS system. For instance, we want to know how fast newly uploaded content can be found by the gateway, or if popular content has a low query time. We’ll discuss more of this in the next section.

Putting this together, the system is the following.

Gaining visibility in IPFS systems

During this experience, we have operated both the IPFS Monitor, and a test IPFS node. The IPFS node allows the monitor to provide content to the IPFS network. This allows us to be sure that the content provided is fresh, and actually hosted. Peers have not been fixed, and leverage the IPFS default peer discovery mechanism.

Cloudflare IPFS Gateway is treated as an opaque system. The monitor performs end-to-end tests by contacting the gateway via its public API, either https://cloudflare-ipfs.com or via a custom domain registered with the gateway.

The following scenarios have been run on consumer hardware in March. They are not representative of all IPFS users. All values provided below have been sourced against Cloudflare’s IPFS gateway endpoint.

First scenarios: Gateway to serve immutable IPFS contents

Gaining visibility in IPFS systems

IPFS contents are the most primitive contents being served by the IPFS network and IPFS gateways. By their nature, they are immutable and addressable only by the hash of the content. Users can access the IPFS contents by putting the Content IDentifier (CID) in the URL path of the gateway. For example, ipfs://bafybeig7hluox6xefqdgmwcntvsguxcziw2oeogg2fbvygex2aj6qcfo64. We measure three common scenarios that users will often encounter.

The first one is when users fetch popular content which has a high probability of being found in our cache already. During our experiment, we measured a response time for such content is around 116ms.

The second one is the case where the users create and upload the content to the IPFS network, such as via the integration between Cloudflare Pages and IPFS (LINK TO BLOG). This scenario is a lot slower than the first because the content is not in our cache yet, and it takes some time to discover the content. The content that we upload during this measurement is a random piece of 32-byte content.

The last measurement is when users try to download content that does not exist. This one is the slowest. Because of the nature of content routing of IPFS, there is no indication that tells us that content doesn’t exist. So, setting the timeout is the only way to tell if the content exists or not. Currently, the Cloudflare IPFS gateway has a timeout of around five minutes.

Scenarios Min (s) Max (s) Avg (s)
1 ipfs/newly-created-content 0.276 343 44.4
2 ipfs/in-cache-content 0.0825 0.539 0.116
3 ipfs/unavailable-cid 90 341 279

Second scenarios: Gateway to serve mutable IPNS named contents

Gaining visibility in IPFS systems

Since IPFS contents are immutable, when users want to change the content, the only way to do so is to create new content and distribute the new CID to other users. Sometimes distributing the new CID is hard, and is out of scope of IPFS. The InterPlanetary Naming System (IPNS) tries to solve this. IPNS is a naming system that — instead of locating the content using the CID — allows users to locate the content using the IPNS name instead. This name is a hash of a user’s public key. Internally, IPNS maintains a section of the IPFS DHT which maps from a name to a CID. Therefore, when the users want to download the contents using names through the gateway, the gateway has to first resolve the name to get the CID, then use the CID to query the IPFS content as usual.

The example for fetching the IPNS named content is at ipns://k51qzi5uqu5diu2krtwp4jbt9u824cid3a4gbdybhgoekmcz4zhd5ivntan5ey

We measured many scenarios for IPNS as shown in the table below. Three scenarios are similar to the ones involving IPFS contents. There are two more scenarios added: the first one is measuring the response time when users query the gateway using an existing IPNS name, and the second one is measuring the response time when users query the gateway immediately after new content is published under the name.

Scenarios Min (s) Max (s) Avg (s)
1 ipns/newly-created-name 5.50 110 33.7
2 ipns/existing-name 7.19 113 28.0
3 ipns/republished-name 5.62 80.4 43.8
4 ipns/in-cache-content 0.0353 0.0886 0.0503
5 ipns/unavailable-name 60.0 146 81.0

Even though users can use IPNS to provide others a stable address to fetch the content, the address can still be hard to remember. This is what DNSLink is for. Users can address their content using DNSLink, which is just a normal domain name in the Domain Name System (DNS). As a domain owner, you only have to create a TXT record with the value dnslink=/ipfs/baf…1, and your domain can be fetched from a gateway.

There are two ways to access the DNSLink websites using the gateway. The first way is to access the website using the domain name as a URL hostname, for example, https://ipfs.example.com. The second way is to put the domain name as a URL path, for example, https://cloudflare-ipfs.com/ipns/ipfs.example.com.

Scenarios Min (s) Max (s) Avg (s)
1 dnslink/ipfs-domain-as-url-hostname 0.251 18.6 0.831
2 dnslink/ipfs-domain-as-url-path 0.148 1.70 0.346
3 dnslink/ipns-domain-as-url-hostname 7.87 44.2 21.0
4 dnslink/ipns-domain-as-url-path 6.87 72.6 19.0

What does this mean for regular IPFS users?

These measurements highlight that IPFS content is fetched best when cached. This is an order of magnitude faster than fetching new content. This result is expected as content publication on IPFS can take time for nodes to retrieve, as highlighted in previous studies. Then, when it comes to naming IPFS resources, leveraging DNSLink appears to be the faster strategy. This is likely due to the poor connection of the IPFS node used in this setup, preventing IPNS name from propagating rapidly. Overall, IPNS names would benefit from using a resolver to speed up resolution without losing the trust they provide.

As we mentioned in September, IPFS use has seen important growth. So has our tooling. The IPFS Gateway monitor can be found on GitHub, and we will keep looking at improving this first set of metrics.

At the time of writing, using IPFS via a gateway seems to provide lower retrieval times, while allowing for finer grain control over security settings in the browser context. This configuration preserves the content validity properties offered by IPFS, but reduces the number of nodes a user is peering with to one: the gateway. Ideally, we would like users to peer with Cloudflare because we’re offering the best service, while still having the possibility to retrieve content from external sources if they want to. We’ll be conducting more measurements to better understand how to best leverage Cloudflare presence in 270 cities to better serve the IPFS network.

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Post Syndicated from Alex Krivit original https://blog.cloudflare.com/part1-coreless-purge/

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

There is a famous quote attributed to a Netscape engineer: “There are only two difficult problems in computer science: cache invalidation and naming things.” While naming things does oddly take up an inordinate amount of time, cache invalidation shouldn’t.

In the past we’ve written about Cloudflare’s incredibly fast response times, whether content is cached on our global network or not. If content is cached, it can be served from a Cloudflare cache server, which are distributed across the globe and are generally a lot closer in physical proximity to the visitor. This saves the visitor’s request from needing to go all the way back to an origin server for a response. But what happens when a webmaster updates something on their origin and would like these caches to be updated as well? This is where cache “purging” (also known as “invalidation”) comes in.

Customers thinking about setting up a CDN and caching infrastructure consider questions like:

  • How do different caching invalidation/purge mechanisms compare?
  • How many times a day/hour/minute do I expect to purge content?
  • How quickly can the cache be purged when needed?

This blog will discuss why invalidating cached assets is hard, what Cloudflare has done to make it easy (because we care about your experience as a developer), and the engineering work we’re putting in this year to make the performance and scalability of our purge services the best in the industry.

What makes purging difficult also makes it useful

(i) Scale
The first thing that complicates cache invalidation is doing it at scale. With data centers in over 270 cities around the globe, our most popular users’ assets can be replicated at every corner of our network. This also means that a purge request needs to be distributed to all data centers where that content is cached. When a data center receives a purge request, it needs to locate the cached content to ensure that subsequent visitor requests for that content are not served stale/outdated data. Requests for the purged content should be forwarded to the origin for a fresh copy, which is then re-cached on its way back to the user.

This process repeats for every data center in Cloudflare’s fleet. And due to Cloudflare’s massive network, maintaining this consistency when certain data centers may be unreachable or go offline, is what makes purging at scale difficult.

Making sure that every data center gets the purge command and remains up-to-date with its content logs is only part of the problem. Getting the purge request to data centers quickly so that content is updated uniformly is the next reason why cache invalidation is hard.  

(ii) Speed
When purging an asset, race conditions abound. Requests for an asset can happen at any time, and may not follow a pattern of predictability. Content can also change unpredictably. Therefore, when content changes and a purge request is sent, it must be distributed across the globe quickly. If purging an individual asset, say an image, takes too long, some visitors will be served the new version, while others are served outdated content. This data inconsistency degrades user experience, and can lead to confusion as to which version is the “right” version. Websites can sometimes even break in their entirety due to this purge latency (e.g. by upgrading versions of a non-backwards compatible JavaScript library).

Purging at speed is also difficult when combined with Cloudflare’s massive global footprint. For example, if a purge request is traveling at the speed of light between Tokyo and Cape Town (both cities where Cloudflare has data centers), just the transit alone (no authorization of the purge request or execution) would take over 180ms on average based on submarine cable placement. Purging a smaller network footprint may reduce these speed concerns while making purge times appear faster, but does so at the expense of worse performance for customers who want to make sure that their cached content is fast for everyone.

(iii) Scope
The final thing that makes purge difficult is making sure that only the unneeded web assets are invalidated. Maintaining a cache is important for egress cost savings and response speed. Webmasters’ origins could be knocked over by a thundering herd of requests, if they choose to purge all content needlessly. It’s a delicate balance of purging just enough: too much can result in both monetary and downtime costs, and too little will result in visitors receiving outdated content.

At Cloudflare, what to invalidate in a data center is often dictated by the type of purge. Purge everything, as you could probably guess, purges all cached content associated with a website. Purge by prefix purges content based on a URL prefix. Purge by hostname can invalidate content based on a hostname. Purge by URL or single file purge focuses on purging specified URLs. Finally, Purge by tag purges assets that are marked with Cache-Tag headers. These markers offer webmasters flexibility in grouping assets together. When a purge request for a tag comes into a data center, all assets marked with that tag will be invalidated.

With that overview in mind, the remainder of this blog will focus on putting each element of invalidation together to benchmark the performance of Cloudflare’s purge pipeline and provide context for what performance means in the real-world. We’ll be reviewing how fast Cloudflare can invalidate cached content across the world. This will provide a baseline analysis for how quick our purge systems are presently, which we will use to show how much we will improve by the time we launch our new purge system later this year.

How does purge work currently?

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

In general, purge takes the following route through Cloudflare’s data centers.

  • A purge request is initiated via the API or UI. This request specifies how our data centers should identify the assets to be purged. This can be accomplished via cache-tag header(s), URL(s), entire hostnames, and much more.
  • The request is received by any Cloudflare data center and is identified to be a purge request. It is then routed to a Cloudflare core data center (a set of a few data centers responsible for network management activities).
  • When a core data center receives it, the request is processed by a number of internal services that (for example) make sure the request is being sent from an account with the appropriate authorization to purge the asset. Following this, the request gets fanned out globally to all Cloudflare data centers using our distribution service.
  • When received by a data center, the purge request is processed and all assets with the matching identification criteria are either located and removed, or marked as stale. These stale assets are not served in response to requests and are instead re-pulled from the origin.
  • After being pulled from the origin, the response is written to cache again, replacing the purged version.

Now let’s look at this process in practice. Below we describe Cloudflare’s purge benchmarking that uses real-world performance data from our purge pipeline.

Benchmarking purge performance design

In order to understand how performant Cloudflare’s purge system is, we measured the time it took from sending the purge request to the moment that the purge is complete and the asset is no longer served from cache.  

In general, the process of measuring purge speeds involves: (i) ensuring that a particular piece of content is cached, (ii) sending the command to invalidate the cache, (iii) simultaneously checking our internal system logs for how the purge request is routed through our infrastructure, and (iv) measuring when the asset is removed from cache (first miss).

This process measures how quickly cache is invalidated from the perspective of an average user.

  • Clock starts
    As noted above, in this experiment we’re using sampled RUM data from our purge systems. The goal of this experiment is to benchmark current data for how long it can take to purge an asset on Cloudflare across different regions. Once the asset was cached in a region on Cloudflare, we identify when a purge request is received for that asset. At that same instant, the clock started for this experiment. We include in this time any retrys that we needed to make (due to data centers missing the initial purge request) to ensure that the purge was done consistently across our network. The clock continues as the request transits our purge pipeline  (data center > core > fanout > purge from all data centers).  
  • Clock stops
    What caused the clock to stop was the purged asset being removed from cache, meaning that the data center is no longer serving the asset from cache to visitor’s requests. Our internal logging measures the precise moment that the cache content has been removed or expired and from that data we were able to determine the following benchmarks for our purge types in various regions.  

Results

We’ve divided our benchmarks in two ways: by purge type and by region.

We singled out Purge by URL because it identifies a single target asset to be purged. While that asset can be stored in multiple locations, the amount of data to be purged is strictly defined.

We’ve combined all other types of purge (everything, tag, prefix, hostname) together because the amount of data to be removed is highly variable. Purging a whole website or by assets identified with cache tags could mean we need to find and remove a multitude of content from many different data centers in our network.

Secondly, we have segmented our benchmark measurements by regions and specifically we confined the benchmarks to specific data center servers in the region because we were concerned about clock skews between different data centers. This is the reason why we limited the test to the same cache servers so that even if there was skew, they’d all be skewed in the same way.  

We took the latency from the representative data centers in each of the following regions and the global latency. Data centers were not evenly distributed in each region, but in total represent about 90 different cities around the world:  

  • Africa
  • Asia Pacific Region (APAC)
  • Eastern Europe (EEUR)
  • Eastern North America (ENAM)
  • Oceania
  • South America (SA)
  • Western Europe (WEUR)
  • Western North America (WNAM)

The global latency numbers represent the purge data from all Cloudflare data centers in over 270 cities globally. In the results below, global latency numbers may be larger than the regional numbers because it represents all of our data centers instead of only a regional portion so outliers and retries might have an outsized effect.

Below are the results for how quickly our current purge pipeline was able to invalidate content by purge type and region. All times are represented in seconds and divided into P50, P75, and P99 quantiles. Meaning for “P50” that 50% of the purges were at the indicated latency or faster.  

Purge By URL

P50 P75 P99
AFRICA 0.95s 1.94s 6.42s
APAC 0.91s 1.87s 6.34s
EEUR 0.84s 1.66s 6.30s
ENAM 0.85s 1.71s 6.27s
OCEANIA 0.95s 1.96s 6.40s
SA 0.91s 1.86s 6.33s
WEUR 0.84s 1.68s 6.30s
WNAM 0.87s 1.74s 6.25s
GLOBAL 1.31s 1.80s 6.35s

Purge Everything, by Tag, by Prefix, by Hostname

P50 P75 P99
AFRICA 1.42s 1.93s 4.24s
APAC 1.30s 2.00s 5.11s
EEUR 1.24s 1.77s 4.07s
ENAM 1.08s 1.62s 3.92s
OCEANIA 1.16s 1.70s 4.01s
SA 1.25s 1.79s 4.106s
WEUR 1.19s 1.73s 4.04s
WNAM 0.9995s 1.53s 3.83s
GLOBAL 1.57s 2.32s 5.97s

A general note about these benchmarks — the data represented here was taken from over 48 hours (two days) of RUM purge latency data in May 2022. If you are interested in how quickly your content can be invalidated on Cloudflare, we suggest you test our platform with your website.

Those numbers are good and much faster than most of our competitors. Even in the worst case, we see the time from when you tell us to purge an item to when it is removed globally is less than seven seconds. In most cases, it’s less than a second. That’s great for most applications, but we want to be even faster. Our goal is to get cache purge to as close as theoretically possible to the speed of light limit for a network our size, which is 200ms.

Intriguingly, LEO satellite networks may be able to provide even lower global latency than fiber optics because of the straightness of the paths between satellites that use laser links. We’ve done calculations of latency between LEO satellites that suggest that there are situations in which going to space will be the fastest path between two points on Earth. We’ll let you know if we end up using laser-space-purge.

Just as we have with network performance, we are going to relentlessly measure our cache performance as well as the cache performance of our competitors. We won’t be satisfied until we verifiably are the fastest everywhere. To do that, we’ve built a new cache purge architecture which we’re confident will make us the fastest cache purge in the industry.

Our new architecture

Through the end of 2022, we will continue this blog series incrementally showing how we will become the fastest, most-scalable purge system in the industry. We will continue to update you with how our purge system is developing  and benchmark our data along the way.

Getting there will involve rearchitecting and optimizing our purge service, which hasn’t received a systematic redesign in over a decade. We’re excited to do our development in the open, and bring you along on our journey.

So what do we plan on updating?

Introducing Coreless Purge

The first version of our cache purge system was designed on top of a set of central core services including authorization, authentication, request distribution, and filtering among other features that made it a high-reliability service. These core components had ultimately become a bottleneck in terms of scale and performance as our network continues to expand globally. While most of our purge dependencies have been containerized, the message queue used was still running on bare metals, which led to increased operational overhead when our system needed to scale.

Last summer, we built a proof of concept for a completely decentralized cache invalidation system using in-house tech – Cloudflare Workers and Durable Objects. Using Durable Objects as a queuing mechanism gives us the flexibility to scale horizontally by adding more Durable Objects as needed and can reduce time to purge with quick regional fanouts of purge requests.

In the new purge system we’re ripping out the reliance on core data centers and moving all that functionality to every data center, we’re calling it coreless purge.

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Here’s a general overview of how coreless purge will work:

  • A purge request will be initiated via the API or UI. This request will specify how we should identify the assets to be purged.
  • The request will be routed to the nearest Cloudflare data center where it is identified to be a purge request and be passed to a Worker that will perform several of the key functions that currently occur in the core (like authorization, filtering, etc).
  • From there, the Worker will pass the purge request to a Durable Object in the data center. The Durable Object will queue all the requests and broadcast them to every data center when they are ready to be processed.
  • When the Durable Object broadcasts the purge request to every data center, another Worker will pass the request to the service in the data center that will invalidate the content in cache (executes the purge).

We believe this re-architecture of our system built by stringing together multiple services from the Workers platform will help improve both the speed and scalability of the purge requests we will be able to handle.

Conclusion

We’re going to spend a lot of time building and optimizing purge because, if there’s one thing we learned here today, it’s that cache invalidation is a difficult problem but those are exactly the types of problems that get us out of bed in the morning.

If you want to help us optimize our purge pipeline, we’re hiring.

Announcing the Cloudflare Images Sourcing Kit

Post Syndicated from Paulo Costa original https://blog.cloudflare.com/cloudflare-images-sourcing-kit/

Announcing the Cloudflare Images Sourcing Kit

Announcing the Cloudflare Images Sourcing Kit

When we announced Cloudflare Images to the world, we introduced a way to store images within the product and help customers move away from the egress fees met when using remote sources for their deliveries via Cloudflare.

To store the images in Cloudflare, customers can upload them via UI with a simple drag and drop, or via API for scenarios with a high number of objects for which scripting their way through the upload process makes more sense.

To create flexibility on how to import the images, we’ve recently also included the ability to upload via URL or define custom names and paths for your images to allow a simple mapping between customer repositories and the objects in Cloudflare. It’s also possible to serve from a custom hostname to create flexibility on how your end-users see the path, to improve the delivery performance by removing the need to do TLS negotiations or to improve your brand recognition through URL consistency.

Still, there was no simple way to tell our product: “Tens of millions of images are in this repository URL. Go and grab them all from me”.  

In some scenarios, our customers have buckets with millions of images to upload to Cloudflare Images. Their goal is to migrate all objects to Cloudflare through a one-time process, allowing you to drop the external storage altogether.

In another common scenario, different departments in larger companies use independent systems configured with varying storage repositories, all of which they feed at specific times with uneven upload volumes. And it would be best if they could reuse definitions to get all those new Images in Cloudflare to ensure the portfolio is up-to-date while not paying egregious egress fees by serving the public directly from those multiple storage providers.

These situations required the upload process to Cloudflare Images to include logistical coordination and scripting knowledge. Until now.

Announcing the Cloudflare Images Sourcing Kit

Today, we are happy to share with you our Sourcing Kit, where you can define one or more sources containing the objects you want to migrate to Cloudflare Images.

But, what exactly is Sourcing? In industries like manufacturing, it implies a number of operations, from selecting suppliers, to vetting raw materials and delivering reports to the process owners.

So, we borrowed that definition and translated it into a Cloudflare Images set of capabilities allowing you to:

  1. Define one or multiple repositories of images to bulk import;
  2. Reuse those sources and import only new images;
  3. Make sure that only actual usable images are imported and not other objects or file types that exist in that source;
  4. Define the target path and filename for imported images;
  5. Obtain Logs for the bulk operations;

The new kit does it all. So let’s go through it.

How the Cloudflare Images Sourcing Kit works

In the Cloudflare Dashboard, you will soon find the Sourcing Kit under Images.

In it, you will be able to create a new source definition, view existing ones, and view the status of the last operations.

Announcing the Cloudflare Images Sourcing Kit

Clicking on the create button will launch the wizard that will guide you through the first bulk import from your defined source:

Announcing the Cloudflare Images Sourcing Kit

First, you will need to input the Name of the Source and the URL for accessing it. You’ll be able to save the definitions and reuse the source whenever you wish.
After running the necessary validations, you’ll be able to define the rules for the import process.

The first option you have allows an Optional Prefix Path. Defining a prefix allows a unique identifier for the images uploaded from this particular source, differentiating the ones imported from this source.

Announcing the Cloudflare Images Sourcing Kit

The naming rule in place respects the source image name and path already, so let’s assume there’s a puppy image to be retrieved at:

https://my-bucket.s3.us-west-2.amazonaws.com/folderA/puppy.png

When imported without any Path Prefix, you’ll find the image at

https://imagedelivery.net/<AccountId>/folderA/puppy.png

Now, you might want to create an additional Path Prefix to identify the source, for example by mentioning that this bucket is from the Technical Writing department. In the puppy case, the result would be:

https://imagedelivery.net/<AccountId>/techwriting/folderA/puppy.png

Custom Path prefixes also provide a way to prevent name clashes coming from other sources.

Still, there will be times when customers don’t want to use them. And, when re-using the source to import images, a same path+filename destinations clash might occur.

By default, we don’t overwrite existing images, but we allow you to select that option and refresh your catalog present in the Cloudflare pipeline.

Announcing the Cloudflare Images Sourcing Kit

Once these inputs are defined, a click on the Create and start migration button at the bottom will trigger the upload process.

Announcing the Cloudflare Images Sourcing Kit

This action will show the final wizard screen, where the migration status is displayed. The progress log will report any errors obtained during the upload and is also available to download.

Announcing the Cloudflare Images Sourcing Kit
Announcing the Cloudflare Images Sourcing Kit

You can reuse, edit or delete source definitions when no operations are running, and at any point, from the home page of the kit, it’s possible to access the status and return to the ongoing or last migration report.

Announcing the Cloudflare Images Sourcing Kit

What’s next?

With the Beta version of the Cloudflare Images Sourcing Kit, we will allow you to define AWS S3 buckets as a source for the imports. In the following versions, we will enable definitions for other common repositories, such as the ones from Azure Storage Accounts or Google Cloud Storage.

And while we’re aiming for this to be a simple UI, we also plan to make everything available through CLI: from defining the repository URL to starting the upload process and retrieving a final report.

Apply for the Beta version

We will be releasing the Beta version of this kit in the following weeks, allowing you to source your images from third party repositories to Cloudflare.

If you want to be the first to use Sourcing Kit, request to join the waitlist on the Cloudflare Images dashboard.

Announcing the Cloudflare Images Sourcing Kit

Send email using Workers with MailChannels

Post Syndicated from Erwin van der Koogh original https://blog.cloudflare.com/sending-email-from-workers-with-mailchannels/

Send email using Workers with MailChannels

Send email using Workers with MailChannels

Here at Cloudflare we often talk about HTTP and related protocols as we work to help build a better Internet. However, the Simple Mail Transfer Protocol (SMTP) — used to send emails — is still a massive part of the Internet too.

Even though SMTP is turning 40 years old this year, most businesses still rely on email to validate user accounts, send notifications, announce new features, and more.

Sending an email is simple from a technical standpoint, but getting an email actually delivered to an inbox can be extremely tricky. Because of the enormous amount of spam that is sent every single day, all major email providers are very wary of things like new domains and IP addresses that start sending emails.

That is why we are delighted to announce a partnership with MailChannels. MailChannels has created an email sending service specifically for Cloudflare Workers that removes all the friction associated with sending emails. To use their service, you do not need to validate a domain or create a separate account. MailChannels filters spam before sending out an email, so you can feel safe putting user-submitted content in an email and be confident that it won’t ruin your domain reputation with email providers. But the absolute best part? Thanks to our friends at MailChannels, it is completely free to send email.

In the words of their CEO Ken Simpson: “Cloudflare Workers and Pages are changing the game when it comes to ease of use and removing friction to get started. So when we sat down to see what friction we could remove from sending out emails, it turns out that with our incredible anti-spam and anti-phishing, the answer is “everything”. We can’t wait to see what applications the community is going to build on top of this.”

The only constraint currently is that the integration only works when the request comes from a Cloudflare IP address. So it won’t work yet when you are developing on your local machine or running a test on your build server.

First let’s walk you through how to send out your first email using a Worker.

export default {
  async fetch(request) {
    send_request = new Request('https://api.mailchannels.net/tx/v1/send', {
      method: 'POST',
      headers: {
        'content-type': 'application/json',
      },
      body: JSON.stringify({
        personalizations: [
          {
            to: [{ email: '[email protected]', name: 'Test Recipient' }],
          },
        ],
        from: {
          email: '[email protected]',
          name: 'Workers - MailChannels integration',
        },
        subject: 'Look! No servers',
        content: [
          {
            type: 'text/plain',
            value: 'And no email service accounts and all for free too!',
          },
        ],
      }),
    })
  },
}

That is all there is to it. You can modify the example to make it send whatever email you want.

The MailChannels integration makes it easy to send emails to and from anywhere with Workers. However, we also wanted to make it easier to send emails to yourself from a form on your website. This is perfect for quickly and painlessly setting up pages such as “Contact Us” forms, landing pages, and sales inquiries.

The Pages Plugin Framework that we announced earlier this week allows other people to email you without exposing your email address.

The only thing you need to do is copy and paste the following code snippet in your /functions/_middleware.ts file. Now, every form that has the data-static-form-name attribute will automatically be emailed to you.

import mailchannelsPlugin from "@cloudflare/pages-plugin-mailchannels";

export const onRequest = mailchannelsPlugin({
  personalizations: [
    {
      to: [{ name: "ACME Support", email: "[email protected]" }],
    },
  ],
  from: { name: "Enquiry", email: "[email protected]" },
  respondWith: () =>
    new Response(null, {
      status: 302,
      headers: { Location: "/thank-you" },
    }),
});

Here is an example of what such a form would look like. You can make the form as complex as you like, the only thing it needs is the data-static-form-name attribute. You can give it any name you like to be able to distinguish between different forms.

<!DOCTYPE html>
<html>
  <body>
    <h1>Contact</h1>
    <form data-static-form-name="contact">
      <div>
        <label>Name<input type="text" name="name" /></label>
      </div>
      <div>
        <label>Email<input type="email" name="email" /></label>
      </div>
      <div>
        <label>Message<textarea name="message"></textarea></label>
      </div>
      <button type="submit">Send!</button>
    </form>
  </body>
</html>

So as you can see there is no barrier left when it comes to sending out emails. You can copy and paste the above Worker or Pages code into your projects and immediately start to send email for free.

If you have any questions about using MailChannels in your Workers, or want to learn more about Workers in general, please join our Cloudflare Developer Discord server.

Route to Workers, automate your email processing

Post Syndicated from Joao Sousa Botto original https://blog.cloudflare.com/announcing-route-to-workers/

Route to Workers, automate your email processing

Route to Workers, automate your email processing

Cloudflare Email Routing has quickly grown to a few hundred thousand users, and we’re incredibly excited with the number of feature requests that reach our product team every week. We hear you, we love the feedback, and we want to give you all that you’ve been asking for. What we don’t like is making you wait, or making you feel like your needs are too unique to be addressed.

That’s why we’re taking a different approach – we’re giving you the power tools that you need to implement any logic you can dream of to process your emails in the fastest, most scalable way possible.

Today we’re announcing Route to Workers, for which we’ll start a closed beta soon. You can join the waitlist today.

How this works

When using Route to Workers your Email Routing rules can have a Worker process the messages reaching any of your custom Email addresses.

Route to Workers, automate your email processing

Even if you haven’t used Cloudflare Workers before, we are making onboarding as easy as can be. You can start creating Workers straight from the Email Routing dashboard, with just one click.

Route to Workers, automate your email processing

After clicking Create, you will be able to choose a starter that allows you to get up and running with minimal effort. Starters are templates that pre-populate your Worker with the code you would write for popular use cases such as creating a blocklist or allowlist, content based filtering, tagging messages, pinging you on Slack for urgent emails, etc.

Route to Workers, automate your email processing

You can then use the code editor to make your new Worker process emails in exactly the way you want it to – the options are endless.

Route to Workers, automate your email processing

And for those of you that prefer to jump right into writing their own code, you can go straight to the editor without using a starter. You can write Workers with a language you likely already know. Cloudflare built Workers to execute JavaScript and WebAssembly and has continuously added support for new languages.

The Workers you’ll use for processing emails are just regular Workers that listen to incoming events, implement some logic, and reply accordingly. You can use all the features that a normal Worker would.

The main difference being that instead of:

export default {
  async fetch(request, env, ctx) {
    handleRequest(request);
  }
}

You’ll have:

export default {
  async email(message, env, ctx) {
    handleEmail(message);
  }
}

The new `email` event will provide you with the “from”, “to” fields, the full headers, and the raw body of the message. You can then use them in any way that fits your use case, including calling other APIs and orchestrating complex decision workflows. In the end, you can decide what action to take, including rejecting or forwarding the email to one of your Email Routing destination addresses.

With these capabilities you can easily create logic that, for example, only accepts messages coming from one specific address and, when one matches the criteria, forwards to one or more of your verified destination addresses while also immediately alerting you on Slack. Code for such feature could be as simple as this:

export default {
   async email(message, env, ctx) {
       switch (message.to) {
           case "[email protected]":
               await fetch("https://webhook.slack/notification", {
                   body: `Got a marketing email from ${ message.from }, subject: ${ message.headers.get("subject") }`,
               });
               sendEmail(message, [
                   "[email protected]",
                   "[email protected]",
               ]);
               break;

           default:
               message.reject("Unknown address");
       }
   },
};

Route to Workers enables everyone to programmatically process their emails and use them as triggers for any other action. We think this is pretty powerful.

Process up to 100,000 emails/day for free

The first 100,000 Worker requests (or Email Triggers) each day are free, and paid plans start at just $5 per 10 million requests. You will be able to keep track of your Email Workers usage right from the Email Routing dashboard.

Route to Workers, automate your email processing

Join the Waitlist

You can join the waitlist today by going to the Email section of your dashboard, navigating to the Email Workers tab, and clicking the Join Waitlist button.

Route to Workers, automate your email processing

We are expecting to start the closed beta in just a few weeks, and can’t wait to hear about what you’ll build with it!

As usual, if you have any questions or feedback about Email Routing, please come see us in the Cloudflare Community and the Cloudflare Discord.

Closed Caption support coming to Stream Live

Post Syndicated from Mickie Betz original https://blog.cloudflare.com/stream-live-captions/

Closed Caption support coming to Stream Live

Closed Caption support coming to Stream Live

Building inclusive technology is core to the Cloudflare mission. Cloudflare Stream has supported captions for on-demand videos for several years. Soon, Stream will auto-detect embedded captions and include it in the live stream delivered to your viewers.

Thousands of Cloudflare customers use the Stream product to build video functionality into their apps. With live caption support, Stream customers can better serve their users with a more comprehensive viewing experience.

Enabling Closed Captions in Stream Live

Stream Live scans for CEA-608 and CEA-708 captions in incoming live streams ingested via SRT and RTMPS.  Assuming the live streams you are pushing to Cloudflare Stream contain captions, you don’t have to do anything further: the captions will simply get included in the manifest file.

Closed Caption support coming to Stream Live

If you are using the Stream Player, these captions will be rendered by the Stream Player. If you are using your own player, you simply have to configure your player to display captions.  

Closed Caption support coming to Stream Live

Currently, Stream Live supports captions for a single language during the live event. While the support for captions is limited to one language during the live stream, you can upload captions for multiple languages once the event completes and the live event becomes an on-demand video.

What is CEA-608 and CEA-708?

When captions were first introduced in 1973, they were open captions. This means the captions were literally overlaid on top of the picture in the video and therefore, could not be turned off. In 1982, we saw the introduction of closed captions during live television. Captions were no longer imprinted on the video and were instead passed via a separate feed and rendered on the video by the television set.

CEA-608 (also known as Line 21) and CEA-708 are well-established standards used to transmit captions. CEA-708 is a modern iteration of CEA-608, offering support for nearly every language and text positioning–something not supported with CEA-608.

Availability

Live caption support will be available in closed beta next month. To request access, sign up for the closed beta.

Including captions in any video stream is critical to making your content more accessible. For example, the majority of live events are watched on mute and thereby, increasing the value of captions. While Stream Live does not generate live captions yet, we plan to build support for automatic live captions in the future.

Stream with sub-second latency is like a magical HDMI cable to the cloud

Post Syndicated from J. Scott Miller original https://blog.cloudflare.com/magic-hdmi-cable/

Stream with sub-second latency is like a magical HDMI cable to the cloud

Stream with sub-second latency is like a magical HDMI cable to the cloud

Starting today, in open beta, Cloudflare Stream supports video playback with sub-second latency over SRT or RTMPS at scale. Just like HLS and DASH formats, playback over RTMPS and SRT costs $1 per 1,000 minutes delivered regardless of video encoding settings used.

Stream is like a magic HDMI cable to the cloud. You can easily connect a video stream and display it from as many screens as you want wherever you want around the world.

What do we mean by sub-second?

Video latency is the time it takes from when a camera sees something happen live to when viewers of a broadcast see the same thing happen via their screen. Although we like to think what’s on TV is happening simultaneously in the studio and your living room at the same time, this is not the case. Often, cable TV takes five seconds to reach your home.

On the Internet, the range of latencies across different services varies widely from multiple minutes down to a few seconds or less. Live streaming technologies like HLS and DASH, used on by the most common video streaming websites typically offer 10 to 30 seconds of latency, and this is what you can achieve with Stream Live today. However, this range does not feel natural for quite a few use cases where the viewers interact with the broadcasters. Imagine a text chat next to an esports live stream or Q&A session in a remote webinar. These new ways of interacting with the broadcast won’t work with typical latencies that the industry is used to. You need one to two seconds at most to achieve the feeling that the viewer is in the same room as the broadcaster.

We expect Cloudflare Stream to deliver sub-second latencies reliably in most parts of the world by routing the video as much as possible within the Cloudflare network. For example, when you’re sending video from San Francisco on your Comcast home connection, the video travels directly to the nearest point where Comcast and Cloudflare connect, for example, San Jose. Whenever a viewer joins, say from Austin, the viewer connects to the Cloudflare location in Dallas, which then establishes a connection using the Cloudflare backbone to San Jose. This setup avoids unreliable long distance connections and allows Cloudflare to monitor the reliability and latency of the video all the way from broadcaster the last mile to the viewer last mile.

Serverless, dynamic topology

With Cloudflare Stream, the latency of content from the source to the destination is purely dependent on the physical distance between them: with no centralized routing, each Cloudflare location talks to other Cloudflare locations and shares the video among each other. This results in the minimum possible latency regardless of the locale you are broadcasting from.

We’ve tested about 500ms of glass to glass latency from San Francisco to London, both from and to residential networks. If both the broadcaster and the viewers were in California, this number would be lower, simply because of lower delay caused by less distance to travel over speed of light. An early tester was able to achieve 300ms of latency by broadcasting using OBS via RTMPS to Cloudflare Stream and pulling down that content over SRT using ffplay.

Stream with sub-second latency is like a magical HDMI cable to the cloud

Any server in the Cloudflare Anycast network can receive and publish low-latency video, which means that you’re automatically broadcasting to the nearest server with no configuration necessary. To minimize latency and avoid network congestion, we route video traffic between broadcaster and audience servers using the same network telemetry as Argo.

On top of this, we construct a dynamic distribution topology, unique to the stream, which grows to meet the capacity needs of the broadcast. We’re just getting started with low-latency video, and we will continue to focus on latency and playback reliability as our real-time video features grow.

An HDMI cable to the cloud

Most video on the Internet uses HTTP – the protocol for loading websites on your browser to deliver video. This has many advantages, such as easy to achieve interoperability across viewer devices. Maybe more importantly, HTTP can use the existing infrastructure like caches which reduce the cost of video delivery.

Using HTTP has a cost in latency as it is not a protocol built to deliver video. There’s been many attempts made to deliver low latency video over HTTP, with some reducing latency to a few seconds, but none reach the levels achievable by protocols designed with video in mind. WebRTC and video delivery over QUIC have the potential to further reduce latency, but face inconsistent support across platforms today.

Video-oriented protocols, such as RTMPS and SRT, side-step some of the challenges above but often require custom client libraries and are not available in modern web browsers. While we now support low latency video today over RTMPS and SRT, we are actively exploring other delivery protocols.

There’s no silver bullet – yet, and our goal is to make video delivery as easy as possible by supporting the set of protocols that enables our customers to meet their unique and creative needs. Today that can mean receiving RTMPS and delivering low-latency SRT, or ingesting SRT while publishing HLS. In the future, that may include ingesting WebRTC or publishing over QUIC or HTTP/3 or WebTransport. There are many interesting technologies on the horizon.

We’re excited to see new use cases emerge as low-latency video becomes easier to integrate and less costly to manage. A remote cycling instructor can ask her students to slow down in response to an increase in heart rate; an esports league can effortlessly repeat their live feed to remote broadcasters to provide timely, localized commentary while interacting with their audience.

Creative uses of low latency video

Viewer experience at events like a concert or a sporting event can be augmented with live video delivered in real time to participants’ phones. This way they can experience the event in real-time and see the goal scored or details of what’s going happening on the stage.

Often in big cities, people who cheer loudly across the city can be heard before seeing a goal scored on your own screen. This can be eliminated by when every video screen shows the same content at the same time.

Esports games, large company meetings or conferences where presenters or commentators react real time to comments on chat. The delay between a fan making a comment and them seeing the reaction on the video stream can be eliminated.

Online exercise bikes can provide even more relevant and timely feedback from the live instructors, adding to the sense of community developed while riding them.

Participants in esports streams can be switched from a passive viewer to an active live participant easily as there is no delay in the broadcast.

Security cameras can be monitored from anywhere in the world without having to open ports or set up centralized servers to receive and relay video.

Getting Started

Get started by using your existing inputs on Cloudflare Stream. Without the need to reconnect, they will be available instantly for playback with the RTMPS/SRT playback URLs.

If you don’t have any inputs on Stream, sign up for $5/mo. You will get the ability to push live video, broadcast, record and now pull video with sub-second latency.

You will need to use a computer program like FFmpeg or OBS to push video. To playback RTMPS you can use VLC and FFplay for SRT. To integrate in your native app, you can utilize FFmpeg wrappers for native apps such as ffmpeg-kit for iOS.

RTMPS and SRT playback work with the recently launched custom domain support, so you can use the domain of your choice and keep your branding.

Bring your own ingest domain to Stream Live

Post Syndicated from Zaid Farooqui original https://blog.cloudflare.com/bring-your-own-ingest-domain-to-stream-live/

Bring your own ingest domain to Stream Live

Bring your own ingest domain to Stream Live

The last two years have given rise to hundreds of live streaming platforms. Most live streaming platforms enable their creators to go live by providing them with a server and an RTMP/SRT key that they can configure in their broadcasting app.

Until today, even if your live streaming platform was called live-yoga-classes.com, your users would need to push the RTMPS feed to live.cloudflare.com. Starting today, every Stream account can configure its own domain in the Stream dashboard. And your creators can broadcast to a domain such as push.live-yoga-classes.com.

This feature is available to all Stream accounts, including self-serve customers at no additional cost. Every Cloudflare account with a Stream subscription can add up to five ingest domains.

Secure CNAMEing for live video ingestion

Cloudflare Stream only supports encrypted video ingestion using RTMPS and SRT protocols. These are secure protocols and, similar to HTTPS, ensure encryption between the broadcaster and Cloudflare servers. Unlike non-secure protocols like RTMP, secure RTMP (or RTMPS) protects your users from monster-in-the-middle attacks.

In an unsecure world, you could simply CNAME a domain to another domain regardless of whether you own the domain you are sending traffic to. Because Stream Live intentionally does not support insecure live streams, you cannot simply CNAME your domain to live.cloudflare.com. So we leveraged other Cloudflare products such as Spectrum to natively support custom-branded domains in the Stream Live product without making the live streams less private for your broadcasters.

Configuring Custom Domain for Live Ingestion

To begin configuring your custom domain, add the domain to your Cloudflare account as a regular zone.

Bring your own ingest domain to Stream Live
Add a zone to your Cloudflare account

Next, CNAME the domain to live.cloudflare.com.

Bring your own ingest domain to Stream Live
CNAME the zone to live.cloudflare.com

Assuming you have a Stream subscription, visit the Inputs page and click on the Settings icon:

Bring your own ingest domain to Stream Live
Click on Settings icon on Live Inputs page

Next, add the domain you configured in the previous step as a Live Ingest Domain:

Bring your own ingest domain to Stream Live
Add Custom Ingest Domain

If your domain is successfully added, you will see a confirmation:

Bring your own ingest domain to Stream Live
Confirmation of domain being added as an ingest domain

Once you’ve added your ingest domain, test it by changing your existing configuration in your broadcasting software to your ingest domain. You can read the complete docs and limitations in the Stream Live developer docs.

What’s Next

Besides the branding upside —  you don’t have to instruct your users to configure a domain such as live.cloudflare.com — custom domains help you avoid vendor lock-in and seamless migration. For example, if you have an existing live video pipeline that you are considering moving to Stream Live, this makes the migration one step easier because you no longer have to ask your users to change any settings in their broadcasting app.

A natural next step is to support custom keys. Currently, your users must still use keys that are provided by Stream Live. Soon, you will be able to bring your own keys. Custom domains combined with custom keys will help you migrate to Stream Live with zero breaking changes for your end users.

Cloudflare Stream simplifies creator management for creator platforms

Post Syndicated from Ben Krebsbach original https://blog.cloudflare.com/stream-creator-management/

Cloudflare Stream simplifies creator management for creator platforms

Cloudflare Stream simplifies creator management for creator platforms

Creator platforms across the world use Cloudflare Stream to rapidly build video experiences into their apps. These platforms serve a diverse range of creators, enabling them to share their passion with their beloved audience. While working with creator platforms, we learned that many Stream customers track video usage on a per-creator basis in order to answer critical questions such as:

  • “Who are our fastest growing creators?”
  • “How much do we charge or pay creators each month?”
  • “What can we do more of in order to serve our creators?”

Introducing the Creator Property

Creator platforms enable artists, teachers and hobbyists to express themselves through various media, including video. We built Cloudflare Stream for these platforms, enabling them to rapidly build video use cases without needing to build and maintain a video pipeline at scale.

At its heart, every creator platform must manage ownership of user-generated content. When a video is uploaded to Stream, Stream returns a video ID. Platforms using Stream have traditionally had to maintain their own index to track content ownership. For example, when a user with internal user ID 83721759 uploads a video to Stream with video ID 06aadc28eb1897702d41b4841b85f322, the platform must maintain a database table of some sort to keep track of the fact that Stream video ID 06aadc28eb1897702d41b4841b85f322 belongs to internal user 83721759.

With the introduction of the creator property, platforms no longer need to maintain this index. Stream already has a direct creator upload feature to enable users to upload videos directly to Stream using tokenized URLs and without exposing account-wide auth information. You can now set the creator field with your user’s internal user ID at the time of requesting a tokenized upload URL:

curl -X POST "https://api.cloudflare.com/client/v4/accounts/023e105f4ecef8ad9ca31a8372d0c353/stream/direct_upload" \
     -H "X-Auth-Email: [email protected]" \
     -H "X-Auth-Key: c2547eb745079dac9320b638f5e225cf483cc5cfdda41" \
     -H "Content-Type: application/json" \
     --data '{"maxDurationSeconds":300,"expiry":"2021-01-02T02:20:00Z","creator": "<CREATOR_ID>", "thumbnailTimestampPct":0.529241,"allowedOrigins":["example.com"],"requireSignedURLs":true,"watermark":{"uid":"ea95132c15732412d22c1476fa83f27a"}}'

When the user uploads the video, the creator property would be automatically set to your internal user ID and can be leveraged for operations such as pulling analytics data for your creators.

Query By Creator Property

Setting the creator property on your video uploads is just the beginning. You can now filter Stream Analytics via the Dashboard or the GraphQL API using the creator property.

Cloudflare Stream simplifies creator management for creator platforms
Filter Stream Analytics in the Dashboard using the Creator property

Previously, if you wanted to generate a monthly report of all your creators and the number of minutes of watch time recorded for their videos, you’d likely use a scripting language such as Python to do the following:

  1. Call the Stream GraphQL API requesting a list of videos and their watch time
  2. Traverse through the list of videos and query your internal index to determine which creator each video belongs to
  3. Sum up the video watch time for each creator to get a clean report showing you video consumption grouped by the video creator

The creator property eliminates this three step manual process. You can make a single API call to the GraphQL API to request a list of creators and the consumption of their videos for a given time period. Here is an example GraphQL API query that returns minutes delivered by creator:

query {
  viewer {
    accounts(filter: { accountTag: "<ACCOUNT_ID>" }) {
      streamMinutesViewedAdaptiveGroups(
        limit: 10
        orderBy: [sum_minutesViewed_DESC]
        filter: { date_lt: "2022-04-01", date_gt: "2022-04-31" }
      ) {
        sum {
          minutesViewed
        }
        dimensions {
          creator
        }
      }
    }
  }
}

Stream is focused on helping creator platforms innovate and scale. Matt Ober, CTO of NFT media management platform Piñata, says “By allowing us to upload and then query using creator IDs, large-scale analytics of Cloudflare Stream is about to get a lot easier.”

Getting Started

Read the docs to learn more about setting the creator property on new and previously uploaded videos. You can also set the creator property on live inputs, so the recorded videos generated from the live event will already have the creator field populated.

Being able to filter analytics is just the beginning. We can’t wait to enable more creator-level operations, so you can spend more time on what makes your idea unique and less time maintaining table stakes infrastructure.

Announcing Pub/Sub: Programmable MQTT-based Messaging

Post Syndicated from Matt Silverlock original https://blog.cloudflare.com/announcing-pubsub-programmable-mqtt-messaging/

Announcing Pub/Sub: Programmable MQTT-based Messaging

Announcing Pub/Sub: Programmable MQTT-based Messaging

One of the underlying questions that drives Platform Week is “how do we enable developers to build full stack applications on Cloudflare?”. With Workers as a serverless environment for easily deploying distributed-by-default applications, KV and Durable Objects for caching and coordination, and R2 as our zero-egress cost object store, we’ve continued to discuss what else we need to build to help developers both build new apps and/or bring existing ones over to Cloudflare’s Developer Platform.

With that in mind, we’re excited to announce the private beta of Cloudflare Pub/Sub, a programmable message bus built on the ubiquitous and industry-standard MQTT protocol supported by tens of millions of existing devices today.

In a nutshell, Pub/Sub allows you to:

  • Publish event, telemetry or sensor data from any MQTT capable client (and in the future, other client-facing protocols)
  • Write code that can filter, aggregate and/or modify messages as they’re published to the broker using Cloudflare Workers, and before they’re distributed to subscribers, without the need to ferry messages to a single “cloud region”
  • Push events from applications in other clouds, or from on-prem, with Pub/Sub acting as a programmable event router or a hook into persistent data storage (such as R2 or KV)
  • Move logic out of the client, where it can be hard (or risky!) to push updates, or where running code on devices raises the materials cost (CPU, memory), while still keeping latency as low as possible (your code runs in every location).

And there’s likely a long list of things we haven’t even predicted yet. We’ve seen developers build incredible things on top of Cloudflare Workers, and we’re excited to see what they build with the power of a programmable message bus like Pub/Sub, too.

Why, and what is, MQTT?

If you haven’t heard of MQTT before, you might be surprised to know that it’s one of the most pervasive “messaging protocols” deployed today. There are tens of millions (at least!) of devices that speak MQTT today, from connected payment terminals through to autonomous vehicles, cell phones, and even video games. Sensor readings, telemetry, financial transactions and/or mobile notifications & messages are all common use-cases for MQTT, and the flexibility of the protocol allows developers to make trade-offs around reliability, topic hierarchy and persistence specific to their use-case.

We chose MQTT as the foundation for Cloudflare Pub/Sub as we believe in building on top of open, accessible standards, as we did when we chose the Service Worker API as the foundation for Workers, and with our recently announced participation in the Winter Community Group around server-side runtime APIs. We also wanted to enable existing clients an easy path to benefit from Cloudflare’s scale and programmability, and ensure that developers have a rich ecosystem of client libraries in languages they’re familiar with today.

Beyond that, however, we also think MQTT meets the needs of a modern “publish-subscribe” messaging service. It has flexible delivery guarantees, TLS for transport encryption (no bespoke crypto!), a scalable topic creation and subscription model, extensible per-message metadata, and importantly, it provides a well-defined specification with clear error messages.

With that in mind, we expect to support many more “on-ramps” to Pub/Sub: a lot of the best parts of MQTT can be abstracted away from clients who might want to talk to us over HTTP or WebSockets.

Building Blocks

Given the ability to write code that acts on every message published to a Pub/Sub Broker, what does it look like in practice?

Here’s a simple-but-illustrative example of handling Pub/Sub messages directly in a Worker. We have clients (in this case, payment terminals) reporting back transaction data, and we want to capture the number of transactions processed in each region, so we can track transaction volumes over time.

Specifically, we:

  1. Filter on a specific topic prefix for messages we care about
  2. Parse the message for a specific key:value pair as a metric
  3. Write that metric directly into Workers Analytics Engine, our new serverless time-series analytics service, so we can directly query it with GraphQL.

This saves us having to stand up and maintain an external metrics service, configure another cloud service, or think about how it will scale: we can do it all directly on Cloudflare.

# language: TypeScript

async function pubsub(
  messages: Array<PubSubMessage>,
  env: any,
  ctx: ExecutionContext
): Promise<Array<PubSubMessage>> {
  
  for (let msg of messages) {
    // Extract a value from the payload and write it to Analytics Engine
    // In this example, a transactionsProcessed counter that our clients are sending
    // back to us.
    if (msg.topic.startsWith(“/transactions/”)) {
      // This is non-blocking, and doesn’t hold up our message
      // processing.
      env.TELEMETRY.writeDataPoint({
        // We label this metric so that we can query against these labels
        labels: [`${msg.broker}.${msg.namespace}`, msg.payload.region, msg.payload.merchantId],
        metrics: [msg.payload.transactionsProcessed ?? 0]
      });
    }
  }

  // Return our messages back to the Broker
  return messages;
}

const worker = {
  async fetch(req: Request, env: any, ctx: ExecutionContext) {
    // Critical: you must validate the incoming request is from your Broker
    // In the future, Workers will be able to do this on your behalf for Workers
    // in the same account as your Pub/Sub Broker.
    if (await isValidBrokerRequest(req)) {

      // Parse the incoming PubSub messages
      let incomingMessages: Array<PubSubMessage> = await req.json();
      
      // Pass the message to our pubsub handler, and capture the returned
      // messages
      let outgoingMessages = await pubsub(incomingMessages, env, ctx);

      // Re-serialize the messages and return a HTTP 200 so our Broker
      // knows we’ve successfully handled them
      return new Response(JSON.stringify(outgoingMessages), { status: 200 });
    }

    return new Response("not a valid Broker request", { status: 403 });
  },
};

export default worker;

We can then query these metrics directly using a familiar language: SQL. Our query takes the metrics we’ve written and gives us a breakdown of transactions processed by our payment devices, grouped by merchant (and again, all on Cloudflare):

SELECT
  label_2 as region,
  label_3 as merchantId,
  sum(metric_1) as total_transactions
FROM TELEMETRY
WHERE
  metric_1 > 0
  AND timestamp >= now() - 604800
GROUP BY
  region,
  merchantId
ORDER BY
  total_transactions DESC
LIMIT 10

You could replace or augment the calls to Analytics Engine with any number of examples:

  • Asynchronously writing messages (using ctx.waitUntil) on specific topics to our R2 object storage without blocking message delivery
  • Rewriting messages on-the-fly with data populated from KV, before the message is pushed to subscribers
  • Aggregate messages based on their payload and HTTP POST them to legacy infrastructure hosted outside of Cloudflare

Pub/Sub gives you a way to get data into Cloudflare’s network, filter, aggregate and/or mutate it, and push it back out to subscribers — whether there’s 10, 1,000 or 10,000 of them listening on that topic.

Where are we headed?

As we often like to say: we’re just getting started. The private beta for Pub/Sub is just the beginning of our journey, and we have a long list of capabilities we’re already working on.

Critically, one of our priorities is to cover as much of the MQTT v5.0 specification as we can, so that customers can migrate existing deployments and have it “just work”. Useful capabilities like shared subscriptions that allow you to load-balance messages across many subscribers; wildcard subscriptions (both single- and multi-tier) for aggregation use cases, stronger delivery guarantees (QoS), and support for additional authentication modes (specifically, Mutual TLS) are just a few of the things we’re working on.

Beyond that, we’re focused on making sure Pub/Sub’s developer experience is the best it  can be, and during the beta we’ll be:

  • Supporting a new set of “pubsub” sub-commands in Wrangler, our developer CLI, so that getting started is as low-friction as possible
  • Building ‘native’ bindings (similar to how Workers KV operates) that allow you to publish messages and subscribe to topics directly from Worker code, regardless of whether the message originates from (or is destined for) a client beyond Cloudflare
  • Exploring more ways to publish & subscribe from non-MQTT based clients, including HTTP requests and WebSockets, so that integrating existing code is even easier.

Our developer documentation will cover these capabilities as we land them.

We’re also aware that pricing is a huge part of developer experience, and are committed to ensuring that there is an accessible and flexible free tier. We want to enable developers to experiment, prototype and solve problems we haven’t thought of yet. We’ll be sharing more on pricing during the course of the beta.

Getting Started

If you want to start using Pub/Sub, sign up for the private beta: we plan to start enabling access within the next month. We’re looking forward to collecting feedback from developers and seeing what folks start to build.

In the meantime, review the brand-new Pub/Sub developer documentation to understand how Pub/Sub works under the hood, the MQTT protocol, and how it integrates with Cloudflare Workers.

How we built config staging and versioning with HTTP applications

Post Syndicated from Garrett Galow original https://blog.cloudflare.com/version-and-stage-configuration-changes-with-http-applications/

How we built config staging and versioning with HTTP applications

How we built config staging and versioning with HTTP applications

Last December, we announced a closed beta of a new product, HTTP Applications, giving customers the ability to better control their L7 Cloudflare configuration with versioning and staging capabilities. Today, we are expanding this beta to all enterprise customers who want to participate. In this post, I will talk about some of the improvements that have landed and go into more detail about how this product works.

HTTP Applications

A quick recap of what HTTP Applications are and what they can do. For a deeper dive on how to use them see the previous blog post.

As previously mentioned: HTTP Applications are a way to manage configuration by use case, rather than by hostname. Each HTTP Application has a purpose, whether that is handling the configuration of your marketing website or an internal application. Each HTTP Application consists of a set of versions where each represents a snapshot of settings for managing traffic — Page Rules, Firewall Rules, cache settings, etc.  Each version of configuration inside the HTTP Application is independent of the others, and when a new version is created, it is initialized as a copy of the version that preceded it.

An HTTP Application can be represented with the following diagram:

How we built config staging and versioning with HTTP applications

Each HTTP Application is sourced from an existing zone. That zone’s current configuration will be copied to instantiate the first version of the HTTP Application. After that any changes made to the zone or version 1 will be independent of the other. Versions themselves don’t affect any traffic for a zone until they are deployed via the use of Routing Rules.

Routing Rules

Unlike zones, each version of an HTTP Application is independent of any specific hostname. So if versions are not tied to a hostname, like zones, then how do you decide which version of an HTTP Application will affect a specific set of traffic? The answer is Routing Rules. With Routing Rules, you get to decide which version of an HTTP Application is applied to traffic. Routing Rules are powered by Cloudflare’s Ruleset Engine and rely on the use of conditional “if, then” rules to map hostnames controlled in your Cloudflare account to a version of configuration. As an example, a rule could be:

If zone.name = `example.com`
Then use configuration of HTTP Application id: xyz, version 2

When this rule executes on our global network, instead of applying the regular zone configuration of example.com, we will instead use the configuration defined in version 2 of the HTTP Application.

Expanding the previous diagram we get the following:

How we built config staging and versioning with HTTP applications

The combination of Routing Rules and HTTP Applications means you can ‘stage’ a set of changes, via a version, without requiring a separate staging zone as has been required in the past. Cloudflare will provide you with specific IPs that can be used to test the configuration before rolling it out to production. This means you can catch misconfigurations in rules or other settings before it impacts your customers.

How HTTP Applications and Routing Rules work

Let’s break down how this all works behind the scenes and gives you a safe way to test changes to your configuration. In all of Cloudflare’s data centers around the world, every request is first inspected and associated with an account/config pair so that we know what configuration settings we should apply to this request. If you have the zone ‘example.com’, with an id of 123, in your account, with an id of 777, then when a request for example.com/cat.jpg arrives at the Cloudflare network, the ownership lookup will return a value like 777:123 which then denotes the account and config settings we should use to process that request.

When HTTP Applications and Routing Rules are being used, the ownership lookup occurs as normal, but instead of loading configuration based on the zone for the account:config pair, Cloudflare does one additional lookup to see if any routing rules are in place that would change which configuration should be used. If a rule exists like before:

If zone.name = `example.com`
Then use configuration of HTTP Application id: xyz, version 2

Then when ownership is evaluated, instead of loading configuration for account:config 777:123, Cloudflare will load the configuration of the version of that HTTP Application, let’s say that version 2 from the rule has a config id of 456. Then the lookup value for loading configuration will instead be 777:456.

How we built config staging and versioning with HTTP applications

Because Routing Rules are implemented with the Ruleset Engine, we can implement a special type of rule to allow a version to be staged such that it is only executed for requests when the request is sent to IPs reserved for testing. The resulting diagram is almost the same, but because the request is being sent to staging IPs, Cloudflare’s network will route that request to a different version of the HTTP Application that has a set of changes not yet deployed for all other traffic.

How we built config staging and versioning with HTTP applications

This is what enables you to safely test a set of changes and then simultaneously deploy the exact same configuration to all traffic. If anything goes wrong when testing in staging or rolling out to production, you can simply roll back the configuration to the previous version that was deployed. No need to try and hunt down what settings may have changed. That investigation can be done after the issue has been resolved through a quick, one-click rollback.

Now available for Enterprise Customers

HTTP Applications and Routing Rules put power and safety in customer’s hands so that configuration changes can be made more easily. When issues do arise they can be resolved quickly through rollbacks. We will continue to be expanding the capabilities offered throughout the year, but if you are interested in trying it out now and are an enterprise customer, talk to your account manager to get access!

Introducing Custom Domains for Workers

Post Syndicated from Kabir Sikand original https://blog.cloudflare.com/custom-domains-for-workers/

Introducing Custom Domains for Workers

Introducing Custom Domains for Workers

Today, we’re happy to announce Custom Domains for Workers. Custom Domains allow you to hook up a domain to your Worker, without having to fuss about certificates, origin servers or DNS – it just works. Let’s take a look at how we built Custom Domains and how you can use them.

The magic of Cloudflare DNS

Under the hood, we’re leveraging Cloudflare DNS to register your Worker as the origin for your domain. All you need to do is head to your Worker, go to the Triggers tab, and click Add Custom Domain. Cloudflare will handle creating the DNS record and issuing a certificate on your behalf. In seconds, your domain will point to your Worker, and all you need to worry about is writing your code. We’ll also help guide you through the process of creating these new records and replace any existing ones. We built this with a straightforward ethos in mind: we should be clear and transparent about actions we’re taking, and make it easy to understand.

We’ve made a few welcome changes when you’re using a Custom Domain on your Worker. First off, when you send a request to any path on that Custom Domain, your Worker will be triggered. No need to create a route with /* at the end.

Introducing Custom Domains for Workers

Second, your Custom Domain Worker is considered the ‘origin server’. That means, no need to `fetch(event.request)` once you’re in that Worker; instead, talk to any internal or external services you need to by creating request objects in your code, or talk to other Workers services using any of our available bindings. We’ve increased the limit of external requests you can make, when using our Unbound usage model, to 1,000. You can talk to any services you’d like to – things like payment, communication, analytics, or tracking services come to mind, not to mention your databases. If that’s not enough for your use case, feel free to reach out via Discord or support, and we’ll happily chat.

Introducing Custom Domains for Workers

Finally, what if you need to talk to your Worker from another one? Since these Workers act as an origin server, you can just send a normal request to the associated endpoint, and we’ll invoke that Worker – even if it’s on the same Cloudflare zone.

Introducing Custom Domains for Workers

Let’s build an example application

We’ll start by writing our application. I have an example that I’ve called api-gateway below. It checks the incoming request path, and delegates work to downstream Workers using Service Bindings. For any privileged endpoints, it performs an authorization check before executing that code:

export default {
 async fetch(request, environment) {
   const url = new URL(request.url);
   switch (url.pathname) {
     case '/login':
       return await environment.login.fetch(request);

     case '/logout':
       return await environment.logout.fetch(request);

     case '/admin': {
       // Check that the "Authorization" header is sent when authenticated.
       const authCheck = await environment.auth.fetch(request.clone());
       if (authCheck.status != 200) { return authCheck }
       // If the auth check passes, send the request to the /admin endpoint
       return await environment.admin.fetch(request);
     }

     case '/robots.txt':
       return new Response(null, { status: 204 });
   }
   return new Response('Not Found.', { status: 404 });
 }
}```

Now that I have a working application, I want to serve it on my Custom Domain. To hook this up, head over to Workers, Triggers, click ‘Add Custom Domain’, and type in your desired hostname. You’ll be guided through a simple workflow to generate your new Worker record, and your Worker will be the target.

Best of all, with Custom Domains you can reap the performance benefits of DNS-routable Workers; Cloudflare never has to look through a routing table to invoke your Worker. And, by leveraging Service Bindings, you can customize your routing to your heart’s content – using URL parameters, headers, request content, or query strings to appropriately invoke the right Worker at the right time.

We’re excited to see what you build with Custom Domains. Custom Domains are available in an Open Beta starting today. Support is built right into the Cloudflare Dashboard and API’s, and CLI support via Wrangler is coming soon.

Introducing Pages Plugins

Post Syndicated from Nevi Shah original https://blog.cloudflare.com/cloudflare-pages-plugins/

Introducing Pages Plugins

Introducing Pages Plugins

Last November, we announced that Pages is now a full-stack development platform with our open beta integration with Cloudflare Workers. Using file-based routing, you can drop your Pages Functions into a /functions folder and deploy them alongside your static assets to add dynamic functionality to your site. However, throughout this beta period, we observed the types of projects users have been building, noticed some common patterns, and identified ways to make these users more efficient.

There are certain functionalities that are shared between projects; for example, validating authorization headers, creating an API server, reporting errors, and integrating with third-party vendors to track aspects like performance. The frequent need for these patterns across projects made us wonder, “What if we could provide the ready-made code for developers to add to their existing project?”

Introducing Pages Plugins!

What’s a Pages Plugin?

With Pages Functions, we introduced file-based routing, so users could avoid writing their own routing logic, significantly reducing the amount of boilerplate code a typical application requires. Pages Plugins aims to offer a similar experience!

A Pages Plugin is a reusable – and customizable – chunk of runtime code that can be incorporated anywhere within your Pages application. A Plugin is effectively a composable Pages Function, granting Plugins the full power of Functions (and therefore, Workers), including the ability to set up middleware, parameterized routes, and static assets.

How does it work?

Today, Pages Plugins is launching with a couple of ready-made solutions for Sentry, Honeycomb, and Stytch (more below), but it’s important to note that developers anywhere can create and share their Pages Plugins, too! You just need to install a Plugin, mount it to a route within the /functions directory, and configure the Plugin according to its needs.

Let’s take a look at a Plugins example for a hypothetical ACME logger:

Assume you find an @acme/pages-plugin-logger package on npm and want to use it within your application – you’d install, import, and invoke it as you would any other npm module. After passing through the required (hypothetical) configuration and mounting it as the top-level middleware’s onRequest export, the ACME logger will be reporting on all incoming requests:

// file: /functions/_middleware.ts

import MyLogger from "@acme/pages-plugin-logger";

// Setup logging for all URL routes & methods
export const onRequest = MyLogger({
 endpoint: "https://logs.acme.com/new",
 secret: "password",
});

You can help grow the Plugins ecosystem by building and sharing your Plugins on npm and our developer documentation, and you can immediately get started by using one of Cloudflare’s official launch partner Plugins today.

Introducing our Plugins launch partners

With Pages, we’re always working to see how we can best cater to user workflows by integrating directly with users’ preferred tools. We see Plugins as an excellent opportunity to collaborate with popular third-party observability, monitoring, and authentication providers to provide their own Pages Plugins.

Today, we’re proud to launch our Pages Plugins with Sentry, Honeycomb and Stytch as official partners!

Introducing Pages Plugins

Sentry

Sentry provides developer-first application monitoring for real-time insights into your production deployments. With Sentry you can see the errors, crashes, or latencies experienced while using your app and get the deep context needed to solve issues quickly, like the line of code where the error occurred, the developer or commit that introduced the error, or the API call or database query causing the slowdown. The Sentry Plugin automatically captures any exceptions in your Pages Functions and sends them to Sentry where you can aggregate, analyze, and triage any issues your application encounters.

// ./functions/_middleware.ts

import sentryPlugin from "@cloudflare/pages-plugin-sentry";

export const onRequest = sentryPlugin({
 dsn: "YOUR_SENTRY_DSN",
});

Honeycomb

Similarly, Honeycomb is also an observability and monitoring platform meant to visualize, analyze and improve application quality and performance to help you find patterns and outliers in your application data. The Honeycomb Plugin creates traces for every request that your Pages application receives and automatically sends that information to Honeycomb for analysis.

// ./functions/_middleware.ts

import honeycombPlugin from "@cloudflare/pages-plugin-honeycomb";

export const onRequest = honeycombPlugin({
 apiKey: "YOUR_HONEYCOMB_API_KEY",
 dataset: "YOUR_HONEYCOMB_DATASET_NAME",
});

Stytch

Observability is just one use case of how Pages Plugins can help you build a more powerful app. Stytch is an API-first platform that improves security and promotes a better user experience with passwordless authentication. Our Stytch Plugin transparently validates user sessions, allowing you to easily protect parts of your application behind a Stytch login.

// ./functions/_middleware.ts

import stytchPlugin from "@cloudflare/pages-plugin-stytch";
import { envs } from "@cloudflare/pages-plugin-stytch/api";

export const onRequest = stytchPlugin({
  project_id: "YOUR_STYTCH_PROJECT_ID",
  secret: "YOUR_STYTCH_PROJECT_SECRET",
  env: envs.live
});

More Plugins, more fun!

As a developer platform, it’s crucial to build relationships with the creators of the tooling and frameworks you use alongside Pages, and we look forward to growing our partnership ecosystem even more in the future. However, beyond partnerships, we realize there are some more extremely useful Plugins that we built out to get you started!

  • Google Chat: creates a Google Chat bot which can respond to messages. It also includes an API for interacting with Google Chat (for example, for creating messages) without the need for user input. This API is useful for situations such as alerts.
  • Cloudflare Access: a middleware to validate Cloudflare Access JWT assertions. It also includes an API to lookup additional information about a given user’s JSON Web Token.
  • Static forms: intercepts static HTML form submissions and can perform actions such as storing the data in KV.
  • GraphQL: creates a GraphQL API for a given schema. It also ships with the GraphQL Playground to simplify development and help you test out your API.

Over the next couple of months we will be working to build out some of the most requested Plugins relevant to your projects. For now, you can find all officially supported Plugins in our developer documentation.

No time to wait? Author your own!

But don’t let us be your bottleneck! The beauty of Plugins is how easy they are to create and distribute. In fact, we encourage you to try out our documentation in order to create and share your own Plugin because chances are if you’re building a Plugin for your own project, there is someone else who would benefit greatly from it too!

We’re excited to see Plugins from the community solving their own common use-cases or as integrations with their favorite platforms. Once you’ve built a Plugin, you can surface your work if you choose by creating a PR against the Community Plugins page in our documentation. This way any Pages user can read about your Plugin and mount it in their own Pages application.

What’s next for Pages Functions

As you try out Plugins and take advantage of our Functions offering, it’s important to note there are some truly exciting updates coming soon. As we march toward the Functions general availability launch, we will provide proper analytics and logging, so you can have better insight into your site’s performance and visibility into issues for debugging. Additionally, with R2 now in open beta, and D1 in the works, we’re excited to provide support for R2 and D1 bindings for your full stack Pages projects soon!

Of course, because Functions is still in open beta, we currently offer 100k requests per day for free, however, as we aim for general availability of Functions, you can expect a billing model similar to the Workers Paid billing model.

Share what you build

While you begin building out your Plugin, be sure to reach out to us in our Cloudflare Developers Discord server in the #pages-plugins channel. We’d love to see what you’re building and help you along the way!

Our growing Developer Platform partner ecosystem

Post Syndicated from Dawn Parzych original https://blog.cloudflare.com/developer-platform-ecosystem/

Our growing Developer Platform partner ecosystem

Our growing Developer Platform partner ecosystem

Deploying an application to the cloud requires a robust network, ample storage, and compute power. But that only covers the architecture, it doesn’t include all the other tools and services necessary to build, deploy, and support your applications. It’s easy to get bogged down in researching whether everything you want to use works together, and that takes time away from actually developing and building.

Cloudflare is focused on building a modern cloud with a developer-first experience. That developer experience includes creating an ecosystem filled with the tools and services developers need.

Deciding to build an application starts with the planning and design elements. What programming languages, frameworks, or runtimes will be used? The answer to that question can influence the architecture and hosting decisions. Choosing a runtime may lock you into using specific vendors. Our goal is to provide flexibility and eliminate the friction involved when wanting to migrate on to (or off) Cloudflare.

You can’t forget the tools necessary for configuration and interoperability. This can include areas such as authentication and infrastructure management. You have multiple pieces of your application that need to talk with one another to ensure a secure and seamless experience. We support and build standards for compatibility and to simplify the configuration and interoperability when deploying your applications.

During the build process and after your application has been deployed, it is essential to test and verify that things are working properly. This is where logging, analytics, and alerting come in. Having the ability to quickly be notified when something isn’t working properly is part of the equation. The other part is reducing the context switching among various debugging tools and applications to quickly resolve the issue.

And finally, improving your application through iterating and extending functionality. This can come from adding plug-ins from third party providers, running A/B tests to determine whether a feature is moving in the right direction, or not.

At Cloudflare our mission is to help build a better Internet, all of these elements are necessary to achieve this. Our focus is on providing the network, storage, and compute power to deliver apps. We look to partners to help in the other areas and provide an ecosystem. Our intention with this strategy is to get out of the way, so developers can build faster and safer.

To achieve our mission we look for partners with similar philosophies. We’ve previously announced integrations around

Everything we do is built on the foundation of the open Internet and open standards. We want to provide choice to developers to pick from a variety of tools and may the best solution win. We believe developers should have the freedom and flexibility to choose the best tool ‘for their work’ not the best for within a ‘locked-in-platform’.

We understand that partnering is key to achieving the above goal. Today, we’re excited to announce a growing ecosystem of partners who are tapping into our world’s best network. Our developer platform products and partner products fit seamlessly into existing stacks and improve developer productivity through simple development and deploy workflows.

Our growing Developer Platform partner ecosystem

Source, build and deploy partners:

Along with having the best network in the world, we also have really good compute and database products. However, for developers to plan, build and continuously iterate, they need a host of other services to work together seamlessly to have a great building experience end to end. This is why we partnered with DevOps platforms and CMSs to make the entire development lifecycle better.

On the DevOps front, developers can create new Pages projects by connecting thier repos stored on GitLab or GitHub and make site changes there via the usual git commands.

Our growing Developer Platform partner ecosystem

“Developers can be more productive when they create, test, secure, and deploy software from a single DevOps application instead of bouncing between multiple different tools. Cloudflare Pages’ integration with GitLab makes it easier for joint users to develop and deploy new code to Cloudflare’s network using the same syntax and git commands they’re already comfortable using.”
— Michael LeBeau, Alliance Manager at GitLab

Also, on the CMS front, we have built deploy hooks that are the key to what allows developers to connect and trigger deployments in Cloudflare Pages via updates made in their favorite headless CMSs.

Our growing Developer Platform partner ecosystem

“With this integration, our customers can work more efficiently and cross-functionally with their teams. Marketing teams can update Contentful directly to automatically trigger deployments without relying on their developers to update any of their code base, which results in a better, more productive experience for all teams involved.”
— Jeff Blattel, Director of Technical Partnerships at Contentful

“We’re delighted to partner with Cloudflare and excited by this new release from Cloudflare Pages. At Sanity, we care deeply about people working with content on our platform. Cloudflare’s new deploy hooks allow developers to automate builds for static sites based on content changes, which is a huge improvement for content creators. Combining these with structured content and our GROQ-powered Webhooks, our customers can be strategic about when these builds should happen. We’re stoked to be part of this release and can’t wait to see what the community will build with Sanity and Cloudflare!”
— Even Westvang, Co-founder, Sanity.io

“At Strapi, we’re excited about this partnership with Cloudflare because it enables developers to abstract away the complexity of deploying changes to production for content teams. By using the Deploy Hook for Strapi, everyone can push new content and updates quickly and autonomously.”
— Pierre Burgy, CEO, Strapi.io

Database partners:
We’ve consistently heard a common theme: developers want to build more of their applications on Workers. In addition to having our own first-party solutions such as Workers KV, Durable Objects, R2 and D1 we have also partnered with best-of-breed data companies to bring their capabilities to the Workers developer platform and open up a wide variety of use cases on Workers.

Our growing Developer Platform partner ecosystem

“With Cloudflare and Macrometa, developers can now build and deliver powerful and compelling data-driven experiences in a way that the centralized clouds will never be able to. The world deserves a better cloud – an edge cloud.”
— Chetan Venkatesh, Co-founder/CEO, Macrometa

“The integration of Cloudflare workers with Fauna allows our joint customers to simply and cost effectively bring both data and logic to the edge. Fauna was developed from the ground up as a globally distributed, serverless database and when combined with Cloudflare Workers serverless compute fabric provides developers with a simple, yet powerful abstraction over the complexities of distributed infrastructure.”
— Evan Weaver, Co-founder/CTO, Fauna

Debugging, analytics, troubleshooting partners:
Troubleshooting is key in the developer workflow. True observability means having an end-to-end view of an entire system, inclusive of third-party integrations and deployments. By combining real-time data with data from other logging and monitoring endpoints, customers can get robust, instant feedback into how their website and how applications are performing all in one place.

Our growing Developer Platform partner ecosystem

“Teams using Cloudflare Workers with Splunk Observability get full-stack visibility and contextualized insights from metrics, traces and logs across all of their infrastructure, applications and user transactions in real-time and at any scale. With Splunk Observability, IT and DevOps teams now have a seamless and analytics-powered workflow across monitoring, troubleshooting, incident response and optimization. We’re excited to partner with Cloudflare to help developers and operations teams slice through the complexity of modern applications and ship code more quickly and reliably.”
— Jeff Lo, Director of Product Marketing, Splunk

“Maintaining a strong security posture means ensuring every part of your toolchain is being monitored – from the datacenter/VPC, to your edge network, all the way to your end users. With Datadog’s partnership with Cloudflare, you get edge computing logs alongside the rest of your application stack’s telemetry – giving you an end to end view of your application’s health, performance and security.”
— Michael Gerstenhaber, Sr. Director, Datadog

“Reduce downtime and solve customer-impacting issues faster with an integrated observability platform for all of your Cloudflare data, including its Workers serverless platform. By using Cloudflare Workers with Sumo Logic, customers can seamlessly correlate system issues measured by performance monitoring, gain deep visibility provided by logging, and monitor user experience provided by tracing and transaction analytics.”
— Abelardo Gonzalez, Director of Product Marketing, Sumo Logic

“Monitoring your Cloudflare Workers serverless environment with New Relic lets you deliver serverless apps with confidence by rapidly identifying when something goes wrong and quickly pinpointing the problem—without wading through millions of invocation logs. Monitor, visualize, troubleshoot, and alert on all your Workers apps in a single experience.”
— Raj Ramanujam, Vice President, Alliances and Channels, New Relic

“With Cloudflare Workers and Sentry, software teams immediately have the tools and information to solve issues and learn continuously about their code health instead of worrying about systems and resources. We’re thrilled to partner with Cloudflare on building technologies that make it easy for developers to deploy with confidence.”
— Elain Szu, Vice President of Marketing, Sentry

“Honeycomb is excited to partner with Cloudflare as they build an ecosystem of tools that support the full lifecycle of delivering successful apps. Writing and deploying code is only part of the equation. Understanding how that code performs and behaves when it is in the hands of users also determines success. Cloudflare and Honeycomb together are shining the light of observability all the way to the edge, which helps technical teams build and operate better apps.”
— Charity Majors, CTO & Co-Founder, Honeycomb

And we are just getting started. Over the last 10 years, Cloudflare has built one of the fastest, most reliable, and most secure networks in the world. We are now enabling developers to build on that network using best in class products seamlessly. Visit our Developer Platform Partner Ecosystem directory for more information on integrations.

If you’re interested in learning more about how to partner with us, reach out to us by filling out this quick form.

Magic NAT: everywhere, unbounded, and lower cost

Post Syndicated from Annika Garbers original https://blog.cloudflare.com/magic-nat/

Magic NAT: everywhere, unbounded, and lower cost

Magic NAT: everywhere, unbounded, and lower cost

Network Address Translation (NAT) is one of the most common and versatile network functions, used by everything from your home router to the largest ISPs. Today, we’re delighted to introduce a new approach to NAT that solves the problems of traditional hardware and virtual solutions. Magic NAT is free from capacity constraints, available everywhere through our global Anycast architecture, and operates across any network (physical or cloud). For Internet connectivity providers, Magic NAT for Carriers operates across high volumes of traffic, removing the complexity and cost associated with NATing thousands or millions of connections.

What does NAT do?

The main function of NAT is in its name:  NAT is responsible for translating the network address in the header of an IP packet from one address to another – for example, translating the private IP 192.168.0.1 to the publicly routable IP 192.0.2.1. Organizations use NAT to grant Internet connectivity from private networks, enable routing within private networks with overlapping IP space, and preserve limited IP resources by mapping thousands of connections to a single IP. These use cases are typically accomplished with a hardware appliance within a physical network or a managed service delivered by a cloud provider.

Let’s look at those different use cases.

Allowing traffic from private subnets to connect to the Internet

Resources within private subnets often need to reach out to the public Internet. The most common example of this is connectivity from your laptop, which might be allocated a private address like 192.168.0.1, reaching out to a public resource like google.com. In order for Google to respond to a request from your laptop, the source IP of your request needs to be publicly routable on the Internet. To accomplish this, your ISP translates the private source IP in your request to a public IP (and reverse-translates for the responses back to you). This use case is often referred to as public NAT, performed by hardware or software acting as a “NAT gateway.”

Magic NAT: everywhere, unbounded, and lower cost
Public NAT translates private IP addresses to public ones so that traffic from within private networks can access the Internet.

Users might also have requirements around the specific IP addresses that outgoing packets are NAT’d to. For example, they may need packets to egress from only one or a small subset of IPs so that the services they’re reaching out to can positively identify them – e.g. “only allow traffic from this specific source IP and block everything else.” They might also want traffic to NAT to IPs that accurately reflect the source’s geolocation, in order to pass the “pizza test”: are the results returned for the search term “pizza near me” geographically relevant? These requirements can increase the complexity of a customer’s NAT setup.

Enabling communication between private subnets with overlapping IP space

NATs are also used for routing traffic within fully private networks, in order to enable communication between resources with overlapping IP space. One example: imagine that you’re an IT architect at a retail company with a hundred geographically distributed store locations and a central data center. To make your life easier, you want to use the same IP address management scheme for all of your stores – e.g. host all of your printers on 10.0.1.0/24, point of sale devices on 10.0.2.0/24, and security cameras on 10.0.3.0/24. These devices need to reach out to resources hosted in your data center, which is also on your private network. The challenge: if multiple devices across your stores have the same source IP, how do return packets from your data center get back to the right device? This is where private NAT comes in.

Magic NAT: everywhere, unbounded, and lower cost
Private NAT translates IPs into a different private range so that devices with overlapping IP space can communicate with each other.

A NAT gateway sitting in a private network can enable connectivity between overlapping subnets by translating the original source IP (the one shared by multiple resources) to an IP in a different range. This can enable communication between mirrored subnets and other resources (like in our store → datacenter example), as well as between the mirrored subnets themselves – e.g. if traffic needed to flow between our store locations directly, such as a VoIP call from one store to another.

Conserving IP address space

As of 2019, the available pool of allocatable IPv4 space has been exhausted, making addresses a limited resource. In order to conserve their IPv4 space while the industry slowly transitions to IPv6, ISPs have adopted carrier-grade NAT solutions to map multiple users to a single IP, maximizing the mileage of the space they have available. This uses the same mechanisms for address translation we’ve already described, but at a large scale – ISPs need to deploy devices that can handle thousands or millions of concurrent connections without impacting traffic performance.

Magic NAT: everywhere, unbounded, and lower cost

Challenges with existing NAT solutions

Today, users accomplish the use cases we’ve described with a physical appliance (often a firewall) or a virtual appliance delivered as a managed service from a cloud provider. These approaches have the same fundamental limitations as other hardware and virtualized hardware solutions traditionally used to accomplish most network functions.

Geography constraints

Physical or virtual devices performing NAT are deployed in one or a few specific locations (e.g. within a company’s data center or in a specific cloud region). Traffic may need to be backhauled out of its way through those specific locations to be NAT’d. A common example is the hub and spoke network architecture, where all Internet-bound traffic is backhauled from geographically distributed locations to be filtered and passed through a NAT gateway to the Internet at a central “hub.” (We’ve written about this challenge previously in the context of hardware firewalls.)

Managed NAT services offered by cloud providers require customers to deploy NAT gateway instances in specific availability zones. This means that if customers have origin services in multiple availability zones, they either need to backhaul traffic from one zone to another, incurring fees and latency, or deploy instances in multiple zones. They also need to plan for redundancy – for example, AWS recommends configuring a NAT gateway in every availability zone for “zone-independent architecture.”

Capacity constraints

Each appliance or virtual device can only support up to a certain amount of traffic, and higher supported traffic volumes usually come at a higher cost. Beyond these limits, users need to deploy multiple NAT instances and design mechanisms to load balance traffic across them, adding additional hardware and network hops to their stack.

Cost challenges

Physical devices that perform NAT functionality have several costs associated – in addition to the upfront CAPEX for device purchases, organizations need to plan for installation, maintenance, and upgrade costs. While managed cloud services don’t carry the same cost line items of traditional hardware, leading providers’ models include multiple costs and variable pricing that can be hard to predict. A combination of hourly charges, data processing charges, and data transfer charges can lead to surprises at the end of the month, especially if traffic experiences momentary spikes.

Hybrid infrastructure challenges

More and more customers we talk to are embracing hybrid (datacenter/cloud), multi-cloud, or poly-cloud infrastructure to diversify their spend and leverage the best of breed features offered by each provider. This means deploying separate NAT instances across each of these networks, which introduces additional complexity, management overhead, and cost.

Magic NAT: everywhere, unbounded, cross-platform, and predictably priced

Over the past few years, as we’ve been growing our portfolio of network services, we’ve heard over and over from customers that you want an alternative to the NAT solutions currently available on the market and a better way to address the challenges we described. We’re excited to introduce Magic NAT, the latest entrant in our “Magic” family of services designed to help customers build their next-generation networks on Cloudflare.

How does it work?

Magic NAT is built on the foundational components of Cloudflare One, our Zero Trust network-as-a-service platform. You can follow a few simple steps to get set up:

  1. Connect to Cloudflare. Magic NAT works with all of our network-layer on-ramps including Anycast GRE or IPsec, CNI, and WARP. Users set up a tunnel or direct connection and route privately sourced traffic across it; packets land at the closest Cloudflare location automatically.
  2. Upgrade for Internet connectivity. Users can enable Internet-bound TCP and UDP traffic (any port) to access resources on the Internet from Cloudflare IPs.
  3. (Optional) Enable dedicated egress IPs. Available if you need traffic to egress from one or multiple dedicated IPs rather than a shared pool. Dedicated egress IPs may be useful if you interact with services that “allowlist” specific IP addresses or otherwise care about which IP addresses are seen by servers on the Internet.
  4. (Optional) Layer on security policies for safe access. Magic NAT works natively with Cloudflare One security tools including Magic Firewall and our Secure Web Gateway. Users can add policies on top of East/West and Internet-bound traffic to secure all network traffic with L3 through L7 protection.

Address translation between IP versions will also be supported, including 4to6 and 6to4 NAT capabilities to ensure backwards and forwards compatibility when clients or servers are only reachable via IPv4 or IPv6.

Magic NAT: everywhere, unbounded, and lower cost

Anycast: Magic NAT is everywhere, automatically

With Cloudflare’s Anycast architecture and global network of over 275 cities across the world, users no longer need to think about deploying NAT capabilities in specific locations or “availability zones.” Anycast on-ramps mean that traffic automatically lands at the closest Cloudflare location. If that location becomes unavailable (e.g. for maintenance), traffic fails over automatically to the next closest – zero configuration work from customers required. Failover from Cloudflare to customer networks is also automatic; we’ll always route traffic across the healthiest available path to you.

Scale: Magic NAT leverages Cloudflare’s entire network capacity

Cloudflare’s global capacity is at 141 Tbps and counting, and automated traffic management systems like Unimog allow us to take full advantage of that capacity to serve high volumes of traffic smoothly. We absorb some of the largest DDoS attacks on the Internet, process hundreds of Gbps for customers through Magic Firewall, and provide privacy for millions of user devices across the world – and Magic NAT is built with this scale in mind. You’ll never need to provision and load balance across multiple instances or worry about traffic throttling or congestion again.

Cost: no more hardware costs and no surprises

Magic NAT, like our other network services, is priced based on the 95th percentile of clean bandwidth for your network: no installation, maintenance, or upgrades, and no surprise charges for data transfer spikes. Unlike managed services offered by cloud providers, we won’t charge you for traffic twice. This means fair, predictable billing based on what you actually use.

Hybrid and multi-cloud: simplify networking across environments

Today, customers deploying NAT across on-prem environments and cloud properties need to manage separate instances for each network. As with Cloudflare’s other products that provide an overlay across multiple environments (e.g. Magic Firewall), we can dramatically simplify this architecture by giving users a single place for all their traffic to NAT through regardless of source/origin network.

Summary

Traditional NAT solutions Magic NAT
Location-dependent
Deploy physical or virtual appliances in one or more locations; additional cost for redundancy.
Anycast
No more planning availability zones. Magic NAT is everywhere and extremely fault-tolerant, automatically.
Capacity-limited
Physical and virtual appliances have upper limits for throughput; need to deploy and load balance across multiple devices to overcome.
Scalable
No more planning for capacity and deploying multiple instances to load balance traffic across – Magic NAT leverages Cloudflare’s entire network capacity, automatically.
High (hardware) and/or unpredictable (cloud) cost
CAPEX plus installation, maintenance, and upgrades or triple charge for managed cloud service.
Fairly and predictably priced
No more sticker shock from unexpected data processing charges at the end of the month.
Tied to physical network or single cloud
Need to deploy multiple instances to cover traffic flows across the entire network.
Multi-cloud
Simplify networking across environments; one control plane across all of your traffic flows.

Learn more

Magic NAT is currently in beta, translating network addresses globally for a variety of workloads, large and small. We’re excited to get your feedback about it and other new capabilities we’re cooking up to help you simplify and future-proof your network – learn more or contact your account team about getting access today!

Introducing Workers Analytics Engine

Post Syndicated from Jon Levine original https://blog.cloudflare.com/workers-analytics-engine/

Introducing Workers Analytics Engine

Introducing Workers Analytics Engine

Today we’re excited to introduce Workers Analytics Engine, a new way to get telemetry about anything using Cloudflare Workers. Workers Analytics Engine provides time series analytics built for the serverless era.

Workers Analytics Engine uses the same technology that powers Cloudflare’s analytics for millions of customers, who generate 10s of millions of events per second. This unique architecture provides significant benefits over traditional metrics systems – and even enables our customers to build analytics for their customers.

Why use Workers Analytics Engine

Workers Analytics Engine can be used to get telemetry about just about anything.

Our initial motivation for building Workers Analytics Engine was to help internal teams at Cloudflare better understand what’s happening in their Workers. For example, one early internal customer is our R2 storage product. The R2 team is using the Analytics Engine to measure how many reads and writes happen in R2, how many users make these requests, how many bytes are transferred, how long the operations take, and so forth.

After seeing quick adoption from internal teams at Cloudflare, we realized that many customers could benefit from using this product.

For example, Workers Analytics Engine can also be used to build custom security rules. You could use it to implement something like fail2ban, a program that can ban malicious traffic. Every time someone logs in, you could record information like their location and IP. On subsequent logins, you could query the rate of login attempts from these attackers, and block them if they’ve attempted to sign in too many times in a given period.

Workers Analytics Engine can even be used to track things in the world that have nothing (yet!) to do with Workers. For example, imagine you have a network of IoT sensors that connect to the Internet to report weather and air quality data, like temperature, air pressure, wind speed, and PM2.5 pollution. Using Workers Analytics Engine, you could deploy a Worker in just a few minutes that collects these reports, and then query and visualize the data using our analytics APIs.

How to use Workers Analytics Engine

There are three steps to get started with Workers Analytics Engine:

  1. Configure your analytics using Wrangler
  2. Write data using the Workers Runtime API
  3. Query your data using our SQL or GraphQL API.

Configuring Workers Analytics Engine in Wrangler

To start using Workers Analytics Engine, you first need to configure it in Wrangler. This is done by creating a binding in wrangler.toml.

[analytics_engine]
bindings = [
    { name = "WEATHER" }
]

Your analytics can be named after the event in the world that they represent. For example, readings from our weather sensor above might be named “WEATHER.”

For our current beta release, customers may only create one binding at a time. In the future, we plan to enable customers to define multiple bindings, or even define them on-the-fly from within the Workers runtime.

Writing data from the Workers runtime

Once a binding is declared in Wrangler, you get a new environment variable in the Workers runtime that represents your Analytics Engine. This variable has a method, writeDataPoint(). A “data point” is a structured event which consists of a vector of labels and a vector of metrics.

A metric is just a “number” type field that can be aggregated in some way – for example, it could be summed, averaged, or quantiled. A label is a “string” type field that can be used for grouping or filtering.

For example, suppose you are collecting air quality samples. Each data point would represent a reading from your weather sensor. Metrics might include numbers like the temperature or air pressure reading. The labels could include the location of the sensor and the hardware identifier of the sensor.

Here’s what this looks like in code:

  async fetch(request: Request, env) {
    env.WEATHER.writeDataPoint({
      labels: ["Seattle", "USA", "pro_sensor_9000”],
      metrics: [25, 0.5]
    });
    return new Response("OK!");
  }

In our initial version, developers are responsible for providing fields in a consistent order, so that they have the same semantics when querying. In a future iteration, we plan to let developers name their labels and metrics in the binding, and then use these names when writing data points in the runtime.

Querying and visualizing data

To query your data, Cloudflare provides a rich SQL API. For example:

SELECT label_1 as city, avg(metric_2) as avg_humidity
FROM WEATHER
WHERE metric_1 > 0
ORDER BY avg_humidity DESC
LIMIT 10

The results would show you the top 10 cities that had the highest average humidity readings when the temperature was above 0.

Note that, for our initial version, labels and metrics are accessed via names that have 1-based indexing. In the future, when we let developers name labels and metrics in their binding, these names will also be available via the SQL API.

Workers Analytics Engine is optimized for powering time series analytics that can be visualized using tools like Grafana. Every event written from the runtime is automatically populated with a timestamp field. This makes it incredibly easy to make time series charts in Grafana:

Introducing Workers Analytics Engine

The macro $timeSeries simply expands to intDiv(toUInt32(timestamp), 60) * 60 * 1000 — i.e. the timestamp rounded to the nearest minute (as defined in our \$step parameter)  and converted into milliseconds. Grafana also provides \$timeFilter which can be changed at the grafana dashboard level. We could easily add another series here by just “grouping” on another field like “city”.

Data can also be queried using our GraphQL API. At this time, the GraphQL API only supports querying total counts for each named binding.

Finally, the Cloudflare dashboard also provides a high-level count of the total number of data points seen for each binding. In the future, we plan to offer rich analytical abilities through the dashboard.

How is this different from traditional metrics systems?

Many developers are familiar with metrics systems like Prometheus. We built Workers Analytics Engine based on our experience providing analytics for millions of Cloudflare customers. Writing structured event logs and querying them using a relational database model is very different from writing metrics – but it’s also much more powerful.

Here are some of the benefits of our model, compared with metrics systems:

  • Unlimited cardinality of label values: In a traditional metrics system, like Prometheus, every time you add a new label value, under the hood you are actually adding a new metric. If you have multiple labels for one data point, this can rapidly increase the number of metrics. Nearly everyone using a metrics system runs into challenges with cardinality. For example, you may start by including a “customer ID” in a label – but what happens when you have thousands or millions of customers? In contrast, when using Workers Analytics Engine, every label value is stored independently – so every data point can have unique label values with no problem.
  • Low latency reporting: Pull-based metrics systems must check for new metrics at some fixed interval, known as a scrape interval. Commonly this is set to one minute or longer – and this is the absolute fastest that your data can be collected. With Workers Analytics Engine, we can report on new data points within a few seconds.
  • Fast queries at any timescale: Everyone who uses Prometheus knows what happens when you expand that range selector in Grafana to change from looking back 30 minutes to seven days… you wait, and you’re lucky if you get any results at all. Whole new pieces of software exist just for the challenge of storing Prometheus metrics long-term. In contrast, Workers Analytics Engine is superfast at querying anything from the last five minutes of data to the last seven days. Look for yourself to see!

And of course, Workers Analytics Engine runs on Cloudflare’s global network. So rather than worrying about running your own Prometheus server, setting up Thanos, and closely tracking cardinality, you can just write data and query it using our SQL API.

What’s next

Today we’re introducing a closed beta for Workers Analytics Engine. You can join the waitlist by signing up here. We already have many teams at Cloudflare happily using this and would love to get your feedback at this early stage, as we are quickly adding new functionality.

We have an ambitious roadmap ahead of us. One critical use case we plan to support is building analytics and usage-based billing for your customers – so if you’re a platform who is looking to build analytics into your product, we’d love to talk to you!

And of course, if this sounds fun to work on, we’re hiring engineers on the Data team to work in San Francisco, London, or remote locations!