Tag Archives: Product News

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Post Syndicated from Alex Krivit original https://blog.cloudflare.com/part1-coreless-purge/

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

There is a famous quote attributed to a Netscape engineer: “There are only two difficult problems in computer science: cache invalidation and naming things.” While naming things does oddly take up an inordinate amount of time, cache invalidation shouldn’t.

In the past we’ve written about Cloudflare’s incredibly fast response times, whether content is cached on our global network or not. If content is cached, it can be served from a Cloudflare cache server, which are distributed across the globe and are generally a lot closer in physical proximity to the visitor. This saves the visitor’s request from needing to go all the way back to an origin server for a response. But what happens when a webmaster updates something on their origin and would like these caches to be updated as well? This is where cache “purging” (also known as “invalidation”) comes in.

Customers thinking about setting up a CDN and caching infrastructure consider questions like:

  • How do different caching invalidation/purge mechanisms compare?
  • How many times a day/hour/minute do I expect to purge content?
  • How quickly can the cache be purged when needed?

This blog will discuss why invalidating cached assets is hard, what Cloudflare has done to make it easy (because we care about your experience as a developer), and the engineering work we’re putting in this year to make the performance and scalability of our purge services the best in the industry.

What makes purging difficult also makes it useful

(i) Scale
The first thing that complicates cache invalidation is doing it at scale. With data centers in over 270 cities around the globe, our most popular users’ assets can be replicated at every corner of our network. This also means that a purge request needs to be distributed to all data centers where that content is cached. When a data center receives a purge request, it needs to locate the cached content to ensure that subsequent visitor requests for that content are not served stale/outdated data. Requests for the purged content should be forwarded to the origin for a fresh copy, which is then re-cached on its way back to the user.

This process repeats for every data center in Cloudflare’s fleet. And due to Cloudflare’s massive network, maintaining this consistency when certain data centers may be unreachable or go offline, is what makes purging at scale difficult.

Making sure that every data center gets the purge command and remains up-to-date with its content logs is only part of the problem. Getting the purge request to data centers quickly so that content is updated uniformly is the next reason why cache invalidation is hard.  

(ii) Speed
When purging an asset, race conditions abound. Requests for an asset can happen at any time, and may not follow a pattern of predictability. Content can also change unpredictably. Therefore, when content changes and a purge request is sent, it must be distributed across the globe quickly. If purging an individual asset, say an image, takes too long, some visitors will be served the new version, while others are served outdated content. This data inconsistency degrades user experience, and can lead to confusion as to which version is the “right” version. Websites can sometimes even break in their entirety due to this purge latency (e.g. by upgrading versions of a non-backwards compatible JavaScript library).

Purging at speed is also difficult when combined with Cloudflare’s massive global footprint. For example, if a purge request is traveling at the speed of light between Tokyo and Cape Town (both cities where Cloudflare has data centers), just the transit alone (no authorization of the purge request or execution) would take over 180ms on average based on submarine cable placement. Purging a smaller network footprint may reduce these speed concerns while making purge times appear faster, but does so at the expense of worse performance for customers who want to make sure that their cached content is fast for everyone.

(iii) Scope
The final thing that makes purge difficult is making sure that only the unneeded web assets are invalidated. Maintaining a cache is important for egress cost savings and response speed. Webmasters’ origins could be knocked over by a thundering herd of requests, if they choose to purge all content needlessly. It’s a delicate balance of purging just enough: too much can result in both monetary and downtime costs, and too little will result in visitors receiving outdated content.

At Cloudflare, what to invalidate in a data center is often dictated by the type of purge. Purge everything, as you could probably guess, purges all cached content associated with a website. Purge by prefix purges content based on a URL prefix. Purge by hostname can invalidate content based on a hostname. Purge by URL or single file purge focuses on purging specified URLs. Finally, Purge by tag purges assets that are marked with Cache-Tag headers. These markers offer webmasters flexibility in grouping assets together. When a purge request for a tag comes into a data center, all assets marked with that tag will be invalidated.

With that overview in mind, the remainder of this blog will focus on putting each element of invalidation together to benchmark the performance of Cloudflare’s purge pipeline and provide context for what performance means in the real-world. We’ll be reviewing how fast Cloudflare can invalidate cached content across the world. This will provide a baseline analysis for how quick our purge systems are presently, which we will use to show how much we will improve by the time we launch our new purge system later this year.

How does purge work currently?

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

In general, purge takes the following route through Cloudflare’s data centers.

  • A purge request is initiated via the API or UI. This request specifies how our data centers should identify the assets to be purged. This can be accomplished via cache-tag header(s), URL(s), entire hostnames, and much more.
  • The request is received by any Cloudflare data center and is identified to be a purge request. It is then routed to a Cloudflare core data center (a set of a few data centers responsible for network management activities).
  • When a core data center receives it, the request is processed by a number of internal services that (for example) make sure the request is being sent from an account with the appropriate authorization to purge the asset. Following this, the request gets fanned out globally to all Cloudflare data centers using our distribution service.
  • When received by a data center, the purge request is processed and all assets with the matching identification criteria are either located and removed, or marked as stale. These stale assets are not served in response to requests and are instead re-pulled from the origin.
  • After being pulled from the origin, the response is written to cache again, replacing the purged version.

Now let’s look at this process in practice. Below we describe Cloudflare’s purge benchmarking that uses real-world performance data from our purge pipeline.

Benchmarking purge performance design

In order to understand how performant Cloudflare’s purge system is, we measured the time it took from sending the purge request to the moment that the purge is complete and the asset is no longer served from cache.  

In general, the process of measuring purge speeds involves: (i) ensuring that a particular piece of content is cached, (ii) sending the command to invalidate the cache, (iii) simultaneously checking our internal system logs for how the purge request is routed through our infrastructure, and (iv) measuring when the asset is removed from cache (first miss).

This process measures how quickly cache is invalidated from the perspective of an average user.

  • Clock starts
    As noted above, in this experiment we’re using sampled RUM data from our purge systems. The goal of this experiment is to benchmark current data for how long it can take to purge an asset on Cloudflare across different regions. Once the asset was cached in a region on Cloudflare, we identify when a purge request is received for that asset. At that same instant, the clock started for this experiment. We include in this time any retrys that we needed to make (due to data centers missing the initial purge request) to ensure that the purge was done consistently across our network. The clock continues as the request transits our purge pipeline  (data center > core > fanout > purge from all data centers).  
  • Clock stops
    What caused the clock to stop was the purged asset being removed from cache, meaning that the data center is no longer serving the asset from cache to visitor’s requests. Our internal logging measures the precise moment that the cache content has been removed or expired and from that data we were able to determine the following benchmarks for our purge types in various regions.  

Results

We’ve divided our benchmarks in two ways: by purge type and by region.

We singled out Purge by URL because it identifies a single target asset to be purged. While that asset can be stored in multiple locations, the amount of data to be purged is strictly defined.

We’ve combined all other types of purge (everything, tag, prefix, hostname) together because the amount of data to be removed is highly variable. Purging a whole website or by assets identified with cache tags could mean we need to find and remove a multitude of content from many different data centers in our network.

Secondly, we have segmented our benchmark measurements by regions and specifically we confined the benchmarks to specific data center servers in the region because we were concerned about clock skews between different data centers. This is the reason why we limited the test to the same cache servers so that even if there was skew, they’d all be skewed in the same way.  

We took the latency from the representative data centers in each of the following regions and the global latency. Data centers were not evenly distributed in each region, but in total represent about 90 different cities around the world:  

  • Africa
  • Asia Pacific Region (APAC)
  • Eastern Europe (EEUR)
  • Eastern North America (ENAM)
  • Oceania
  • South America (SA)
  • Western Europe (WEUR)
  • Western North America (WNAM)

The global latency numbers represent the purge data from all Cloudflare data centers in over 270 cities globally. In the results below, global latency numbers may be larger than the regional numbers because it represents all of our data centers instead of only a regional portion so outliers and retries might have an outsized effect.

Below are the results for how quickly our current purge pipeline was able to invalidate content by purge type and region. All times are represented in seconds and divided into P50, P75, and P99 quantiles. Meaning for “P50” that 50% of the purges were at the indicated latency or faster.  

Purge By URL

P50 P75 P99
AFRICA 0.95s 1.94s 6.42s
APAC 0.91s 1.87s 6.34s
EEUR 0.84s 1.66s 6.30s
ENAM 0.85s 1.71s 6.27s
OCEANIA 0.95s 1.96s 6.40s
SA 0.91s 1.86s 6.33s
WEUR 0.84s 1.68s 6.30s
WNAM 0.87s 1.74s 6.25s
GLOBAL 1.31s 1.80s 6.35s

Purge Everything, by Tag, by Prefix, by Hostname

P50 P75 P99
AFRICA 1.42s 1.93s 4.24s
APAC 1.30s 2.00s 5.11s
EEUR 1.24s 1.77s 4.07s
ENAM 1.08s 1.62s 3.92s
OCEANIA 1.16s 1.70s 4.01s
SA 1.25s 1.79s 4.106s
WEUR 1.19s 1.73s 4.04s
WNAM 0.9995s 1.53s 3.83s
GLOBAL 1.57s 2.32s 5.97s

A general note about these benchmarks — the data represented here was taken from over 48 hours (two days) of RUM purge latency data in May 2022. If you are interested in how quickly your content can be invalidated on Cloudflare, we suggest you test our platform with your website.

Those numbers are good and much faster than most of our competitors. Even in the worst case, we see the time from when you tell us to purge an item to when it is removed globally is less than seven seconds. In most cases, it’s less than a second. That’s great for most applications, but we want to be even faster. Our goal is to get cache purge to as close as theoretically possible to the speed of light limit for a network our size, which is 200ms.

Intriguingly, LEO satellite networks may be able to provide even lower global latency than fiber optics because of the straightness of the paths between satellites that use laser links. We’ve done calculations of latency between LEO satellites that suggest that there are situations in which going to space will be the fastest path between two points on Earth. We’ll let you know if we end up using laser-space-purge.

Just as we have with network performance, we are going to relentlessly measure our cache performance as well as the cache performance of our competitors. We won’t be satisfied until we verifiably are the fastest everywhere. To do that, we’ve built a new cache purge architecture which we’re confident will make us the fastest cache purge in the industry.

Our new architecture

Through the end of 2022, we will continue this blog series incrementally showing how we will become the fastest, most-scalable purge system in the industry. We will continue to update you with how our purge system is developing  and benchmark our data along the way.

Getting there will involve rearchitecting and optimizing our purge service, which hasn’t received a systematic redesign in over a decade. We’re excited to do our development in the open, and bring you along on our journey.

So what do we plan on updating?

Introducing Coreless Purge

The first version of our cache purge system was designed on top of a set of central core services including authorization, authentication, request distribution, and filtering among other features that made it a high-reliability service. These core components had ultimately become a bottleneck in terms of scale and performance as our network continues to expand globally. While most of our purge dependencies have been containerized, the message queue used was still running on bare metals, which led to increased operational overhead when our system needed to scale.

Last summer, we built a proof of concept for a completely decentralized cache invalidation system using in-house tech – Cloudflare Workers and Durable Objects. Using Durable Objects as a queuing mechanism gives us the flexibility to scale horizontally by adding more Durable Objects as needed and can reduce time to purge with quick regional fanouts of purge requests.

In the new purge system we’re ripping out the reliance on core data centers and moving all that functionality to every data center, we’re calling it coreless purge.

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Here’s a general overview of how coreless purge will work:

  • A purge request will be initiated via the API or UI. This request will specify how we should identify the assets to be purged.
  • The request will be routed to the nearest Cloudflare data center where it is identified to be a purge request and be passed to a Worker that will perform several of the key functions that currently occur in the core (like authorization, filtering, etc).
  • From there, the Worker will pass the purge request to a Durable Object in the data center. The Durable Object will queue all the requests and broadcast them to every data center when they are ready to be processed.
  • When the Durable Object broadcasts the purge request to every data center, another Worker will pass the request to the service in the data center that will invalidate the content in cache (executes the purge).

We believe this re-architecture of our system built by stringing together multiple services from the Workers platform will help improve both the speed and scalability of the purge requests we will be able to handle.

Conclusion

We’re going to spend a lot of time building and optimizing purge because, if there’s one thing we learned here today, it’s that cache invalidation is a difficult problem but those are exactly the types of problems that get us out of bed in the morning.

If you want to help us optimize our purge pipeline, we’re hiring.

Announcing the Cloudflare Images Sourcing Kit

Post Syndicated from Paulo Costa original https://blog.cloudflare.com/cloudflare-images-sourcing-kit/

Announcing the Cloudflare Images Sourcing Kit

Announcing the Cloudflare Images Sourcing Kit

When we announced Cloudflare Images to the world, we introduced a way to store images within the product and help customers move away from the egress fees met when using remote sources for their deliveries via Cloudflare.

To store the images in Cloudflare, customers can upload them via UI with a simple drag and drop, or via API for scenarios with a high number of objects for which scripting their way through the upload process makes more sense.

To create flexibility on how to import the images, we’ve recently also included the ability to upload via URL or define custom names and paths for your images to allow a simple mapping between customer repositories and the objects in Cloudflare. It’s also possible to serve from a custom hostname to create flexibility on how your end-users see the path, to improve the delivery performance by removing the need to do TLS negotiations or to improve your brand recognition through URL consistency.

Still, there was no simple way to tell our product: “Tens of millions of images are in this repository URL. Go and grab them all from me”.  

In some scenarios, our customers have buckets with millions of images to upload to Cloudflare Images. Their goal is to migrate all objects to Cloudflare through a one-time process, allowing you to drop the external storage altogether.

In another common scenario, different departments in larger companies use independent systems configured with varying storage repositories, all of which they feed at specific times with uneven upload volumes. And it would be best if they could reuse definitions to get all those new Images in Cloudflare to ensure the portfolio is up-to-date while not paying egregious egress fees by serving the public directly from those multiple storage providers.

These situations required the upload process to Cloudflare Images to include logistical coordination and scripting knowledge. Until now.

Announcing the Cloudflare Images Sourcing Kit

Today, we are happy to share with you our Sourcing Kit, where you can define one or more sources containing the objects you want to migrate to Cloudflare Images.

But, what exactly is Sourcing? In industries like manufacturing, it implies a number of operations, from selecting suppliers, to vetting raw materials and delivering reports to the process owners.

So, we borrowed that definition and translated it into a Cloudflare Images set of capabilities allowing you to:

  1. Define one or multiple repositories of images to bulk import;
  2. Reuse those sources and import only new images;
  3. Make sure that only actual usable images are imported and not other objects or file types that exist in that source;
  4. Define the target path and filename for imported images;
  5. Obtain Logs for the bulk operations;

The new kit does it all. So let’s go through it.

How the Cloudflare Images Sourcing Kit works

In the Cloudflare Dashboard, you will soon find the Sourcing Kit under Images.

In it, you will be able to create a new source definition, view existing ones, and view the status of the last operations.

Announcing the Cloudflare Images Sourcing Kit

Clicking on the create button will launch the wizard that will guide you through the first bulk import from your defined source:

Announcing the Cloudflare Images Sourcing Kit

First, you will need to input the Name of the Source and the URL for accessing it. You’ll be able to save the definitions and reuse the source whenever you wish.
After running the necessary validations, you’ll be able to define the rules for the import process.

The first option you have allows an Optional Prefix Path. Defining a prefix allows a unique identifier for the images uploaded from this particular source, differentiating the ones imported from this source.

Announcing the Cloudflare Images Sourcing Kit

The naming rule in place respects the source image name and path already, so let’s assume there’s a puppy image to be retrieved at:

https://my-bucket.s3.us-west-2.amazonaws.com/folderA/puppy.png

When imported without any Path Prefix, you’ll find the image at

https://imagedelivery.net/<AccountId>/folderA/puppy.png

Now, you might want to create an additional Path Prefix to identify the source, for example by mentioning that this bucket is from the Technical Writing department. In the puppy case, the result would be:

https://imagedelivery.net/<AccountId>/techwriting/folderA/puppy.png

Custom Path prefixes also provide a way to prevent name clashes coming from other sources.

Still, there will be times when customers don’t want to use them. And, when re-using the source to import images, a same path+filename destinations clash might occur.

By default, we don’t overwrite existing images, but we allow you to select that option and refresh your catalog present in the Cloudflare pipeline.

Announcing the Cloudflare Images Sourcing Kit

Once these inputs are defined, a click on the Create and start migration button at the bottom will trigger the upload process.

Announcing the Cloudflare Images Sourcing Kit

This action will show the final wizard screen, where the migration status is displayed. The progress log will report any errors obtained during the upload and is also available to download.

Announcing the Cloudflare Images Sourcing Kit
Announcing the Cloudflare Images Sourcing Kit

You can reuse, edit or delete source definitions when no operations are running, and at any point, from the home page of the kit, it’s possible to access the status and return to the ongoing or last migration report.

Announcing the Cloudflare Images Sourcing Kit

What’s next?

With the Beta version of the Cloudflare Images Sourcing Kit, we will allow you to define AWS S3 buckets as a source for the imports. In the following versions, we will enable definitions for other common repositories, such as the ones from Azure Storage Accounts or Google Cloud Storage.

And while we’re aiming for this to be a simple UI, we also plan to make everything available through CLI: from defining the repository URL to starting the upload process and retrieving a final report.

Apply for the Beta version

We will be releasing the Beta version of this kit in the following weeks, allowing you to source your images from third party repositories to Cloudflare.

If you want to be the first to use Sourcing Kit, request to join the waitlist on the Cloudflare Images dashboard.

Announcing the Cloudflare Images Sourcing Kit

Send email using Workers with MailChannels

Post Syndicated from Erwin van der Koogh original https://blog.cloudflare.com/sending-email-from-workers-with-mailchannels/

Send email using Workers with MailChannels

Send email using Workers with MailChannels

Here at Cloudflare we often talk about HTTP and related protocols as we work to help build a better Internet. However, the Simple Mail Transfer Protocol (SMTP) — used to send emails — is still a massive part of the Internet too.

Even though SMTP is turning 40 years old this year, most businesses still rely on email to validate user accounts, send notifications, announce new features, and more.

Sending an email is simple from a technical standpoint, but getting an email actually delivered to an inbox can be extremely tricky. Because of the enormous amount of spam that is sent every single day, all major email providers are very wary of things like new domains and IP addresses that start sending emails.

That is why we are delighted to announce a partnership with MailChannels. MailChannels has created an email sending service specifically for Cloudflare Workers that removes all the friction associated with sending emails. To use their service, you do not need to validate a domain or create a separate account. MailChannels filters spam before sending out an email, so you can feel safe putting user-submitted content in an email and be confident that it won’t ruin your domain reputation with email providers. But the absolute best part? Thanks to our friends at MailChannels, it is completely free to send email.

In the words of their CEO Ken Simpson: “Cloudflare Workers and Pages are changing the game when it comes to ease of use and removing friction to get started. So when we sat down to see what friction we could remove from sending out emails, it turns out that with our incredible anti-spam and anti-phishing, the answer is “everything”. We can’t wait to see what applications the community is going to build on top of this.”

The only constraint currently is that the integration only works when the request comes from a Cloudflare IP address. So it won’t work yet when you are developing on your local machine or running a test on your build server.

First let’s walk you through how to send out your first email using a Worker.

export default {
  async fetch(request) {
    send_request = new Request('https://api.mailchannels.net/tx/v1/send', {
      method: 'POST',
      headers: {
        'content-type': 'application/json',
      },
      body: JSON.stringify({
        personalizations: [
          {
            to: [{ email: '[email protected]', name: 'Test Recipient' }],
          },
        ],
        from: {
          email: '[email protected]',
          name: 'Workers - MailChannels integration',
        },
        subject: 'Look! No servers',
        content: [
          {
            type: 'text/plain',
            value: 'And no email service accounts and all for free too!',
          },
        ],
      }),
    })
  },
}

That is all there is to it. You can modify the example to make it send whatever email you want.

The MailChannels integration makes it easy to send emails to and from anywhere with Workers. However, we also wanted to make it easier to send emails to yourself from a form on your website. This is perfect for quickly and painlessly setting up pages such as “Contact Us” forms, landing pages, and sales inquiries.

The Pages Plugin Framework that we announced earlier this week allows other people to email you without exposing your email address.

The only thing you need to do is copy and paste the following code snippet in your /functions/_middleware.ts file. Now, every form that has the data-static-form-name attribute will automatically be emailed to you.

import mailchannelsPlugin from "@cloudflare/pages-plugin-mailchannels";

export const onRequest = mailchannelsPlugin({
  personalizations: [
    {
      to: [{ name: "ACME Support", email: "[email protected]" }],
    },
  ],
  from: { name: "Enquiry", email: "[email protected]" },
  respondWith: () =>
    new Response(null, {
      status: 302,
      headers: { Location: "/thank-you" },
    }),
});

Here is an example of what such a form would look like. You can make the form as complex as you like, the only thing it needs is the data-static-form-name attribute. You can give it any name you like to be able to distinguish between different forms.

<!DOCTYPE html>
<html>
  <body>
    <h1>Contact</h1>
    <form data-static-form-name="contact">
      <div>
        <label>Name<input type="text" name="name" /></label>
      </div>
      <div>
        <label>Email<input type="email" name="email" /></label>
      </div>
      <div>
        <label>Message<textarea name="message"></textarea></label>
      </div>
      <button type="submit">Send!</button>
    </form>
  </body>
</html>

So as you can see there is no barrier left when it comes to sending out emails. You can copy and paste the above Worker or Pages code into your projects and immediately start to send email for free.

If you have any questions about using MailChannels in your Workers, or want to learn more about Workers in general, please join our Cloudflare Developer Discord server.

Route to Workers, automate your email processing

Post Syndicated from Joao Sousa Botto original https://blog.cloudflare.com/announcing-route-to-workers/

Route to Workers, automate your email processing

Route to Workers, automate your email processing

Cloudflare Email Routing has quickly grown to a few hundred thousand users, and we’re incredibly excited with the number of feature requests that reach our product team every week. We hear you, we love the feedback, and we want to give you all that you’ve been asking for. What we don’t like is making you wait, or making you feel like your needs are too unique to be addressed.

That’s why we’re taking a different approach – we’re giving you the power tools that you need to implement any logic you can dream of to process your emails in the fastest, most scalable way possible.

Today we’re announcing Route to Workers, for which we’ll start a closed beta soon. You can join the waitlist today.

How this works

When using Route to Workers your Email Routing rules can have a Worker process the messages reaching any of your custom Email addresses.

Route to Workers, automate your email processing

Even if you haven’t used Cloudflare Workers before, we are making onboarding as easy as can be. You can start creating Workers straight from the Email Routing dashboard, with just one click.

Route to Workers, automate your email processing

After clicking Create, you will be able to choose a starter that allows you to get up and running with minimal effort. Starters are templates that pre-populate your Worker with the code you would write for popular use cases such as creating a blocklist or allowlist, content based filtering, tagging messages, pinging you on Slack for urgent emails, etc.

Route to Workers, automate your email processing

You can then use the code editor to make your new Worker process emails in exactly the way you want it to – the options are endless.

Route to Workers, automate your email processing

And for those of you that prefer to jump right into writing their own code, you can go straight to the editor without using a starter. You can write Workers with a language you likely already know. Cloudflare built Workers to execute JavaScript and WebAssembly and has continuously added support for new languages.

The Workers you’ll use for processing emails are just regular Workers that listen to incoming events, implement some logic, and reply accordingly. You can use all the features that a normal Worker would.

The main difference being that instead of:

export default {
  async fetch(request, env, ctx) {
    handleRequest(request);
  }
}

You’ll have:

export default {
  async email(message, env, ctx) {
    handleEmail(message);
  }
}

The new `email` event will provide you with the “from”, “to” fields, the full headers, and the raw body of the message. You can then use them in any way that fits your use case, including calling other APIs and orchestrating complex decision workflows. In the end, you can decide what action to take, including rejecting or forwarding the email to one of your Email Routing destination addresses.

With these capabilities you can easily create logic that, for example, only accepts messages coming from one specific address and, when one matches the criteria, forwards to one or more of your verified destination addresses while also immediately alerting you on Slack. Code for such feature could be as simple as this:

export default {
   async email(message, env, ctx) {
       switch (message.to) {
           case "[email protected]":
               await fetch("https://webhook.slack/notification", {
                   body: `Got a marketing email from ${ message.from }, subject: ${ message.headers.get("subject") }`,
               });
               sendEmail(message, [
                   "marketing@corp",
                   "sales@corp",
               ]);
               break;

           default:
               message.reject("Unknown address");
       }
   },
};

Route to Workers enables everyone to programmatically process their emails and use them as triggers for any other action. We think this is pretty powerful.

Process up to 100,000 emails/day for free

The first 100,000 Worker requests (or Email Triggers) each day are free, and paid plans start at just $5 per 10 million requests. You will be able to keep track of your Email Workers usage right from the Email Routing dashboard.

Route to Workers, automate your email processing

Join the Waitlist

You can join the waitlist today by going to the Email section of your dashboard, navigating to the Email Workers tab, and clicking the Join Waitlist button.

Route to Workers, automate your email processing

We are expecting to start the closed beta in just a few weeks, and can’t wait to hear about what you’ll build with it!

As usual, if you have any questions or feedback about Email Routing, please come see us in the Cloudflare Community and the Cloudflare Discord.

Closed Caption support coming to Stream Live

Post Syndicated from Mickie Betz original https://blog.cloudflare.com/stream-live-captions/

Closed Caption support coming to Stream Live

Closed Caption support coming to Stream Live

Building inclusive technology is core to the Cloudflare mission. Cloudflare Stream has supported captions for on-demand videos for several years. Soon, Stream will auto-detect embedded captions and include it in the live stream delivered to your viewers.

Thousands of Cloudflare customers use the Stream product to build video functionality into their apps. With live caption support, Stream customers can better serve their users with a more comprehensive viewing experience.

Enabling Closed Captions in Stream Live

Stream Live scans for CEA-608 and CEA-708 captions in incoming live streams ingested via SRT and RTMPS.  Assuming the live streams you are pushing to Cloudflare Stream contain captions, you don’t have to do anything further: the captions will simply get included in the manifest file.

Closed Caption support coming to Stream Live

If you are using the Stream Player, these captions will be rendered by the Stream Player. If you are using your own player, you simply have to configure your player to display captions.  

Closed Caption support coming to Stream Live

Currently, Stream Live supports captions for a single language during the live event. While the support for captions is limited to one language during the live stream, you can upload captions for multiple languages once the event completes and the live event becomes an on-demand video.

What is CEA-608 and CEA-708?

When captions were first introduced in 1973, they were open captions. This means the captions were literally overlaid on top of the picture in the video and therefore, could not be turned off. In 1982, we saw the introduction of closed captions during live television. Captions were no longer imprinted on the video and were instead passed via a separate feed and rendered on the video by the television set.

CEA-608 (also known as Line 21) and CEA-708 are well-established standards used to transmit captions. CEA-708 is a modern iteration of CEA-608, offering support for nearly every language and text positioning–something not supported with CEA-608.

Availability

Live caption support will be available in closed beta next month. To request access, sign up for the closed beta.

Including captions in any video stream is critical to making your content more accessible. For example, the majority of live events are watched on mute and thereby, increasing the value of captions. While Stream Live does not generate live captions yet, we plan to build support for automatic live captions in the future.

Stream with sub-second latency is like a magical HDMI cable to the cloud

Post Syndicated from J. Scott Miller original https://blog.cloudflare.com/magic-hdmi-cable/

Stream with sub-second latency is like a magical HDMI cable to the cloud

Stream with sub-second latency is like a magical HDMI cable to the cloud

Starting today, in open beta, Cloudflare Stream supports video playback with sub-second latency over SRT or RTMPS at scale. Just like HLS and DASH formats, playback over RTMPS and SRT costs $1 per 1,000 minutes delivered regardless of video encoding settings used.

Stream is like a magic HDMI cable to the cloud. You can easily connect a video stream and display it from as many screens as you want wherever you want around the world.

What do we mean by sub-second?

Video latency is the time it takes from when a camera sees something happen live to when viewers of a broadcast see the same thing happen via their screen. Although we like to think what’s on TV is happening simultaneously in the studio and your living room at the same time, this is not the case. Often, cable TV takes five seconds to reach your home.

On the Internet, the range of latencies across different services varies widely from multiple minutes down to a few seconds or less. Live streaming technologies like HLS and DASH, used on by the most common video streaming websites typically offer 10 to 30 seconds of latency, and this is what you can achieve with Stream Live today. However, this range does not feel natural for quite a few use cases where the viewers interact with the broadcasters. Imagine a text chat next to an esports live stream or Q&A session in a remote webinar. These new ways of interacting with the broadcast won’t work with typical latencies that the industry is used to. You need one to two seconds at most to achieve the feeling that the viewer is in the same room as the broadcaster.

We expect Cloudflare Stream to deliver sub-second latencies reliably in most parts of the world by routing the video as much as possible within the Cloudflare network. For example, when you’re sending video from San Francisco on your Comcast home connection, the video travels directly to the nearest point where Comcast and Cloudflare connect, for example, San Jose. Whenever a viewer joins, say from Austin, the viewer connects to the Cloudflare location in Dallas, which then establishes a connection using the Cloudflare backbone to San Jose. This setup avoids unreliable long distance connections and allows Cloudflare to monitor the reliability and latency of the video all the way from broadcaster the last mile to the viewer last mile.

Serverless, dynamic topology

With Cloudflare Stream, the latency of content from the source to the destination is purely dependent on the physical distance between them: with no centralized routing, each Cloudflare location talks to other Cloudflare locations and shares the video among each other. This results in the minimum possible latency regardless of the locale you are broadcasting from.

We’ve tested about 500ms of glass to glass latency from San Francisco to London, both from and to residential networks. If both the broadcaster and the viewers were in California, this number would be lower, simply because of lower delay caused by less distance to travel over speed of light. An early tester was able to achieve 300ms of latency by broadcasting using OBS via RTMPS to Cloudflare Stream and pulling down that content over SRT using ffplay.

Stream with sub-second latency is like a magical HDMI cable to the cloud

Any server in the Cloudflare Anycast network can receive and publish low-latency video, which means that you’re automatically broadcasting to the nearest server with no configuration necessary. To minimize latency and avoid network congestion, we route video traffic between broadcaster and audience servers using the same network telemetry as Argo.

On top of this, we construct a dynamic distribution topology, unique to the stream, which grows to meet the capacity needs of the broadcast. We’re just getting started with low-latency video, and we will continue to focus on latency and playback reliability as our real-time video features grow.

An HDMI cable to the cloud

Most video on the Internet uses HTTP – the protocol for loading websites on your browser to deliver video. This has many advantages, such as easy to achieve interoperability across viewer devices. Maybe more importantly, HTTP can use the existing infrastructure like caches which reduce the cost of video delivery.

Using HTTP has a cost in latency as it is not a protocol built to deliver video. There’s been many attempts made to deliver low latency video over HTTP, with some reducing latency to a few seconds, but none reach the levels achievable by protocols designed with video in mind. WebRTC and video delivery over QUIC have the potential to further reduce latency, but face inconsistent support across platforms today.

Video-oriented protocols, such as RTMPS and SRT, side-step some of the challenges above but often require custom client libraries and are not available in modern web browsers. While we now support low latency video today over RTMPS and SRT, we are actively exploring other delivery protocols.

There’s no silver bullet – yet, and our goal is to make video delivery as easy as possible by supporting the set of protocols that enables our customers to meet their unique and creative needs. Today that can mean receiving RTMPS and delivering low-latency SRT, or ingesting SRT while publishing HLS. In the future, that may include ingesting WebRTC or publishing over QUIC or HTTP/3 or WebTransport. There are many interesting technologies on the horizon.

We’re excited to see new use cases emerge as low-latency video becomes easier to integrate and less costly to manage. A remote cycling instructor can ask her students to slow down in response to an increase in heart rate; an esports league can effortlessly repeat their live feed to remote broadcasters to provide timely, localized commentary while interacting with their audience.

Creative uses of low latency video

Viewer experience at events like a concert or a sporting event can be augmented with live video delivered in real time to participants’ phones. This way they can experience the event in real-time and see the goal scored or details of what’s going happening on the stage.

Often in big cities, people who cheer loudly across the city can be heard before seeing a goal scored on your own screen. This can be eliminated by when every video screen shows the same content at the same time.

Esports games, large company meetings or conferences where presenters or commentators react real time to comments on chat. The delay between a fan making a comment and them seeing the reaction on the video stream can be eliminated.

Online exercise bikes can provide even more relevant and timely feedback from the live instructors, adding to the sense of community developed while riding them.

Participants in esports streams can be switched from a passive viewer to an active live participant easily as there is no delay in the broadcast.

Security cameras can be monitored from anywhere in the world without having to open ports or set up centralized servers to receive and relay video.

Getting Started

Get started by using your existing inputs on Cloudflare Stream. Without the need to reconnect, they will be available instantly for playback with the RTMPS/SRT playback URLs.

If you don’t have any inputs on Stream, sign up for $5/mo. You will get the ability to push live video, broadcast, record and now pull video with sub-second latency.

You will need to use a computer program like FFmpeg or OBS to push video. To playback RTMPS you can use VLC and FFplay for SRT. To integrate in your native app, you can utilize FFmpeg wrappers for native apps such as ffmpeg-kit for iOS.

RTMPS and SRT playback work with the recently launched custom domain support, so you can use the domain of your choice and keep your branding.

Bring your own ingest domain to Stream Live

Post Syndicated from Zaid Farooqui original https://blog.cloudflare.com/bring-your-own-ingest-domain-to-stream-live/

Bring your own ingest domain to Stream Live

Bring your own ingest domain to Stream Live

The last two years have given rise to hundreds of live streaming platforms. Most live streaming platforms enable their creators to go live by providing them with a server and an RTMP/SRT key that they can configure in their broadcasting app.

Until today, even if your live streaming platform was called live-yoga-classes.com, your users would need to push the RTMPS feed to live.cloudflare.com. Starting today, every Stream account can configure its own domain in the Stream dashboard. And your creators can broadcast to a domain such as push.live-yoga-classes.com.

This feature is available to all Stream accounts, including self-serve customers at no additional cost. Every Cloudflare account with a Stream subscription can add up to five ingest domains.

Secure CNAMEing for live video ingestion

Cloudflare Stream only supports encrypted video ingestion using RTMPS and SRT protocols. These are secure protocols and, similar to HTTPS, ensure encryption between the broadcaster and Cloudflare servers. Unlike non-secure protocols like RTMP, secure RTMP (or RTMPS) protects your users from monster-in-the-middle attacks.

In an unsecure world, you could simply CNAME a domain to another domain regardless of whether you own the domain you are sending traffic to. Because Stream Live intentionally does not support insecure live streams, you cannot simply CNAME your domain to live.cloudflare.com. So we leveraged other Cloudflare products such as Spectrum to natively support custom-branded domains in the Stream Live product without making the live streams less private for your broadcasters.

Configuring Custom Domain for Live Ingestion

To begin configuring your custom domain, add the domain to your Cloudflare account as a regular zone.

Bring your own ingest domain to Stream Live
Add a zone to your Cloudflare account

Next, CNAME the domain to live.cloudflare.com.

Bring your own ingest domain to Stream Live
CNAME the zone to live.cloudflare.com

Assuming you have a Stream subscription, visit the Inputs page and click on the Settings icon:

Bring your own ingest domain to Stream Live
Click on Settings icon on Live Inputs page

Next, add the domain you configured in the previous step as a Live Ingest Domain:

Bring your own ingest domain to Stream Live
Add Custom Ingest Domain

If your domain is successfully added, you will see a confirmation:

Bring your own ingest domain to Stream Live
Confirmation of domain being added as an ingest domain

Once you’ve added your ingest domain, test it by changing your existing configuration in your broadcasting software to your ingest domain. You can read the complete docs and limitations in the Stream Live developer docs.

What’s Next

Besides the branding upside —  you don’t have to instruct your users to configure a domain such as live.cloudflare.com — custom domains help you avoid vendor lock-in and seamless migration. For example, if you have an existing live video pipeline that you are considering moving to Stream Live, this makes the migration one step easier because you no longer have to ask your users to change any settings in their broadcasting app.

A natural next step is to support custom keys. Currently, your users must still use keys that are provided by Stream Live. Soon, you will be able to bring your own keys. Custom domains combined with custom keys will help you migrate to Stream Live with zero breaking changes for your end users.

Cloudflare Stream simplifies creator management for creator platforms

Post Syndicated from Ben Krebsbach original https://blog.cloudflare.com/stream-creator-management/

Cloudflare Stream simplifies creator management for creator platforms

Cloudflare Stream simplifies creator management for creator platforms

Creator platforms across the world use Cloudflare Stream to rapidly build video experiences into their apps. These platforms serve a diverse range of creators, enabling them to share their passion with their beloved audience. While working with creator platforms, we learned that many Stream customers track video usage on a per-creator basis in order to answer critical questions such as:

  • “Who are our fastest growing creators?”
  • “How much do we charge or pay creators each month?”
  • “What can we do more of in order to serve our creators?”

Introducing the Creator Property

Creator platforms enable artists, teachers and hobbyists to express themselves through various media, including video. We built Cloudflare Stream for these platforms, enabling them to rapidly build video use cases without needing to build and maintain a video pipeline at scale.

At its heart, every creator platform must manage ownership of user-generated content. When a video is uploaded to Stream, Stream returns a video ID. Platforms using Stream have traditionally had to maintain their own index to track content ownership. For example, when a user with internal user ID 83721759 uploads a video to Stream with video ID 06aadc28eb1897702d41b4841b85f322, the platform must maintain a database table of some sort to keep track of the fact that Stream video ID 06aadc28eb1897702d41b4841b85f322 belongs to internal user 83721759.

With the introduction of the creator property, platforms no longer need to maintain this index. Stream already has a direct creator upload feature to enable users to upload videos directly to Stream using tokenized URLs and without exposing account-wide auth information. You can now set the creator field with your user’s internal user ID at the time of requesting a tokenized upload URL:

curl -X POST "https://api.cloudflare.com/client/v4/accounts/023e105f4ecef8ad9ca31a8372d0c353/stream/direct_upload" \
     -H "X-Auth-Email: [email protected]" \
     -H "X-Auth-Key: c2547eb745079dac9320b638f5e225cf483cc5cfdda41" \
     -H "Content-Type: application/json" \
     --data '{"maxDurationSeconds":300,"expiry":"2021-01-02T02:20:00Z","creator": "<CREATOR_ID>", "thumbnailTimestampPct":0.529241,"allowedOrigins":["example.com"],"requireSignedURLs":true,"watermark":{"uid":"ea95132c15732412d22c1476fa83f27a"}}'

When the user uploads the video, the creator property would be automatically set to your internal user ID and can be leveraged for operations such as pulling analytics data for your creators.

Query By Creator Property

Setting the creator property on your video uploads is just the beginning. You can now filter Stream Analytics via the Dashboard or the GraphQL API using the creator property.

Cloudflare Stream simplifies creator management for creator platforms
Filter Stream Analytics in the Dashboard using the Creator property

Previously, if you wanted to generate a monthly report of all your creators and the number of minutes of watch time recorded for their videos, you’d likely use a scripting language such as Python to do the following:

  1. Call the Stream GraphQL API requesting a list of videos and their watch time
  2. Traverse through the list of videos and query your internal index to determine which creator each video belongs to
  3. Sum up the video watch time for each creator to get a clean report showing you video consumption grouped by the video creator

The creator property eliminates this three step manual process. You can make a single API call to the GraphQL API to request a list of creators and the consumption of their videos for a given time period. Here is an example GraphQL API query that returns minutes delivered by creator:

query {
  viewer {
    accounts(filter: { accountTag: "<ACCOUNT_ID>" }) {
      streamMinutesViewedAdaptiveGroups(
        limit: 10
        orderBy: [sum_minutesViewed_DESC]
        filter: { date_lt: "2022-04-01", date_gt: "2022-04-31" }
      ) {
        sum {
          minutesViewed
        }
        dimensions {
          creator
        }
      }
    }
  }
}

Stream is focused on helping creator platforms innovate and scale. Matt Ober, CTO of NFT media management platform Piñata, says “By allowing us to upload and then query using creator IDs, large-scale analytics of Cloudflare Stream is about to get a lot easier.”

Getting Started

Read the docs to learn more about setting the creator property on new and previously uploaded videos. You can also set the creator property on live inputs, so the recorded videos generated from the live event will already have the creator field populated.

Being able to filter analytics is just the beginning. We can’t wait to enable more creator-level operations, so you can spend more time on what makes your idea unique and less time maintaining table stakes infrastructure.

Announcing D1: our first SQL database

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/introducing-d1/

Announcing D1: our first SQL database

Announcing D1: our first SQL database

We announced Cloudflare Workers in 2017, giving developers access to compute on our network. We were excited about the possibilities this unlocked, but we quickly realized — most real world applications are stateful. Since then, we’ve delivered KV, Durable Objects, and R2, giving developers access to various types of storage.

Today, we’re excited to announce D1, our first SQL database.

While the wait on beta access shouldn’t be long — we’ll start letting folks in as early as June (sign up here), we’re excited to share some details of what’s to come.

Meet D1, the database designed for Cloudflare Workers

D1 is built on SQLite. Not only is SQLite the most ubiquitous database in the world, used by billions of devices a day, it’s also the first ever serverless database. Surprised? SQLite was so ahead of its time, it dubbed itself “serverless” before the term gained connotation with cloud services, and originally meant literally “not involving a server”.

Since Workers itself runs between the server and the client, and was inspired by technology built for the client, SQLite seemed like the perfect fit for our first entry into databases.

So what can you build with D1? The true answer is “almost anything!”, that might not be very helpful in triggering the imagination, so how about a live demo?

D1 Demo: Northwind Traders

You can check out an example of D1 in action by trying out our demo running here: northwind.d1sql.com.

If you’re wondering “Who are Northwind Traders?”, Northwind Traders is the “Hello, World!” of databases, if you will. A sample database that Microsoft would provide alongside Microsoft Access to use as their own tutorial. It first appeared 25 years ago in 1997, and you’ll find many examples of its use on the Internet.

It’s a typical business application, with a realistic schema, with many foreign keys, across many different tables — a truly timeless representation of data.

Announcing D1: our first SQL database

When was the recent order of Queso Cabrales shipped, and what ship was it on? You can quickly find out. Someone calling in about ordering some Chai? Good thing Exotic Liquids still has 39 units in stock, for just \$18 each.

Announcing D1: our first SQL database

We welcome you to play and poke around, and answer any questions you have about Northwind Trading’s business.

The Northwind Traders demo also features a dashboard where you can find details and metrics about the D1 SQL queries happening behind the scenes.

Announcing D1: our first SQL database

What can you build with D1?

Going back to our original question before the demo, however, what can you build with D1?

While you may not be running Northwind Traders yourself, you’re likely running a very similar piece of software somewhere. Even at the very core of Cloudflare’s service is a database. A SQL database filled with tables, materialized views and a plethora of stored procedures. Every time a customer interacts with our dashboard they end up changing state in that database.

The reality is that databases are everywhere. They are inside the web browser you’re reading this on, inside every app on your phone, and the storage for your bank transaction, travel reservations, business applications, and on and on. Our goal with D1 is to help you build anything from APIs to rich and powerful applications, including eCommerce sites, accounting software, SaaS solutions, and CRMs.

You can even combine D1 with Cloudflare Access and create internal dashboards and admin tools that are securely locked to only the people in your organization. The world, truly, is your oyster.

The D1 developer experience

We’ll talk about the capabilities, and upcoming features further down in the post, but at the core of it, the strength of D1 is the developer experience: allowing you to go from nothing to a full stack application in an instant. Think back to a tool you’ve used that made development feel magical — that’s exactly what we want developing with Workers and D1 to feel like.

To give you a sense of it, here’s what getting started with D1 will look like.

Creating your first D1 database

With D1, you will be able to create a database, in just a few clicks — define the tables, insert or upload some data, no need to memorize any commands unless you need to.

Announcing D1: our first SQL database

Of course, if the command-line is your jam, earlier this week, we announced the new and improved Wrangler 2, the best tool for wrangling and deploying your Workers, and soon also your tool for deploying D1. Wrangler will also come with native D1 support, so you can create & manage databases with a few simple commands:

Accessing D1 from your Worker

Attaching D1 to your Worker is as easy as creating a new binding. Each D1 database that you attach to your Worker gets attached with its own binding on the env parameter:

export default {
  async fetch(request, env, ctx) {
    const { pathname } = new URL(request.url)
    if (pathname === '/num-products') {
      const { result } = await env.DB.get(`SELECT count(*) AS num_products FROM Product;`)
      return new Response(`There are ${result.num_products} products in the D1 database!`)
    }
  }
}

Or, for a slightly more complex example, you can safely pass parameters from the URL to the database using a Router and parameterised queries:

import { Router } from 'itty-router';
const router = Router();

router.get('/product/:id', async ({ params }, env) => {
  const { result } = await env.DB.get(
    `SELECT * FROM Product WHERE ID = $id;`,
    { $id: params.id }
  )
  return new Response(JSON.stringify(result), {
    headers: {
      'content-type': 'application/json'
    }
  })
})

export default {
  fetch: router.handle,
}

So what can you expect from D1?

First and foremost, we want you to be able to develop with D1, without having to worry about cost.

At Cloudflare, we don’t believe in keeping your data hostage, so D1, like R2, will be free of egress charges. Our plan is to price D1 like we price our storage products by charging for the base storage plus database operations performed.

But, again, we don’t want our customers worrying about the cost or what happens if their business takes off, and they need more storage or have more activity. We want you to be able to build applications as simple or complex as you can dream up. We will ensure that D1 costs less and performs better than comparable centralized solutions. The promise of serverless and a global network like Cloudflare’s is performance and lower cost driven by our architecture.

Here’s a small preview of the features in D1.

Read replication

With D1, we want to make it easy to store your whole application’s state in the one place, so you can perform arbitrary queries across the full data set. That’s what makes relational databases so powerful.

However, we don’t think powerful should be synonymous with cumbersome. Most relational databases are huge, monolithic things and configuring replication isn’t trivial, so in general, most systems are designed so that all reads and writes flow back to a single instance. D1 takes a different approach.

With D1, we want to take configuration off your hands, and take advantage of Cloudflare’s global network. D1 will create read-only clones of your data, close to where your users are, and constantly keep them up-to-date with changes.

Batching

Many operations in an application don’t just generate a single query. If your logic is running in a Worker near your user, but each of these queries needs to execute on the database, then sending them across the wire one-by-one is extremely inefficient.

D1’s API includes batching: anywhere you can send a single SQL statement you can also provide an array of them, meaning you only need a single HTTP round-trip to perform multiple operations. This is perfect for transactions that need to execute and commit atomically:

async function recordPurchase(userId, productId, amount) { 
  const result = await env.DB.exec([
    [
      `UPDATE users SET balance = balance - $amount WHERE user_id = $user_id`,
      { $amount: amount, $user_id: userId },
    ],
    [
      'UPDATE product SET total_sales = total_sales + $amount WHERE product_id = $product_id',
      { $amount: amount, $product_id: productId },
    ],
  ])
  return result
}

Embedded compute

But we’re going further. With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database, giving you total control and maximum performance—each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work.

Backups and redundancy

There are few things as critical as the data stored in your main application’s database, so D1 will automatically save snapshots of your database to Cloudflare’s cloud storage service, R2, at regular intervals, with a one-click restoration process. And, since we’re building on the redundant storage of Durable Objects, your database can physically move locations as needed, resulting in self-healing from even the most catastrophic problems in seconds.

Importing and exporting data

While D1 already supports the SQLite API, making it easy for you to write your queries, you might also need data to run them on. If you’re not creating a brand-new application, you may want to import an existing dataset from another source or database, which is why we’ll be working on allowing you to bring your own data to D1.

Likewise, one of SQLite’s advantages is its portability. If your application has a dedicated staging environment, say, you’ll be able to clone a snapshot of that data down to your local machine to develop against. And we’ll be adding more flexibility, such as the ability to create a new database with a set of test data for each new pull request on your Pages project.

What’s next?

This wouldn’t be a Cloudflare announcement if we didn’t conclude on “we’re just getting started!” — and it’s true! We are really excited about all the powerful possibilities our database on our global network opens up.

Are you already thinking about what you’re going to build with D1 and Workers? Same. Give us your details, and we’ll give you access as soon as we can — look out for a beta invite from us starting as early as June 2022!

Logs on R2: slash your logging costs

Post Syndicated from Tanushree Sharma original https://blog.cloudflare.com/logs-r2/

Logs on R2: slash your logging costs

Logs on R2: slash your logging costs

Hot on the heels of the R2 open beta announcement, we’re excited that Cloudflare enterprise customers can now use Logpush to store logs on R2!

Raw logs from our products are used by our customers for debugging performance issues, to investigate security incidents, to keep up security standards for compliance and much more. You shouldn’t have to make tradeoffs between keeping logs that you need and managing tight budgets. With R2’s low costs, we’re making this decision easier for our customers!

Getting into the numbers

Cloudflare helps customers at different levels of scale — from a few requests per day, up to a million requests per second. Because of this, the cost of log storage also varies widely. For customers with higher-traffic websites, log storage costs can grow large, quickly.

As an example, imagine a website that gets 100,000 requests per second. This site would generate about 9.2 TB of HTTP request logs per day, or 850 GB/day after gzip compression. Over a month, you’ll be storing about 26 TB (compressed) of HTTP logs.

For a typical use case, imagine that you write and read the data exactly once – for example, you might write the data to object storage before ingesting it into an alerting system. Compare the costs of R2 and S3 (note that this excludes costs per operation to read/write data).

Provider Storage price Data transfer price Total cost assuming data is read once
R2 $0.015/GB $0 $390/month
S3 (Standard, US East $0.023/GB $0.09/GB for first 10 TB; then $0.085/GB $2,858/month

In this example, R2 leads to 86% savings! It’s worth noting that querying logs is where another hefty price tag comes in because Amazon Athena charges based on the amount of data scanned. If your team is looking back through historical data, each query can be hundreds of dollars.

Many of our customers have tens to hundreds of domains behind Cloudflare and the majority of our Enterprise customers also use multiple Cloudflare products. Imagine how costs will scale if you need to store HTTP, WAF and Spectrum logs for all of your Internet properties behind Cloudflare.

For SaaS customers that are building the next big thing on Cloudflare, logs are important to get visibility into customer usage and performance. Your customer’s developers may also want access to raw logs to understand errors during development and to troubleshoot production issues. Costs for storing logs multiply and add up quickly!

The flip side: log retrieval

When designing products, one of Cloudflare’s core principles is ease of use. We take on the complexity, so you don’t have to. Storing logs is only half the battle, you also need to be able to access relevant logs when you need them – in the heat of an incident or when doing an in depth analysis.

Our product, Logpull, offers seven days of log retention and an easy to use API to access. Our customers love that Logpull doesn’t need any setup on third parties since it’s completely managed by Cloudflare. However, Logpull is limited in the retention of logs, the type of logs that we store (only HTTP request logs) and the amount of data that can be queried at one time.

We’re building tools for log retrieval that make it super easy to get your data out of R2 from any of our datasets. Similar to Logpull, we’ll start by supporting lookups by time period and rayId. From there, we’ll tackle more complex functions like returning logs within time X and Y that have 500 errors or where WAF action = block.

We’re looking for customers to join a closed beta for our Log Retrieval API. If you’re interested in testing it out, giving feedback and ultimately helping us shape the product sign up here.

Logs on R2: How to get started

Enterprise customers first need to get R2 added to their contract. Reach out to your account team if this is something you’re interested in! Once enabled, create an R2 bucket for your logs and follow the Logpush setup flow to create your job.

Logs on R2: slash your logging costs

It’s that simple! If you have questions, our Logpush to R2 developer docs go into more detail.

More to come

We’re continuing to build out more advanced Logpush features with a focus on customization. Here’s a preview of what’s next on the roadmap:

  • New datasets: Network Analytics Logs, Workers Invocation Logs
  • Log filtering
  • Custom log formatting

We also have exciting plans to build out log analysis and forensics capabilities on top of R2. We want to make log storage tightly coupled to the Cloudflare dash so you can see high level analytics and drill down into individual log lines all in one view. Stay tuned to the blog for more!

Introducing Cache Reserve: massively extending Cloudflare’s cache

Post Syndicated from Alex Krivit original https://blog.cloudflare.com/introducing-cache-reserve/

Introducing Cache Reserve: massively extending Cloudflare’s cache

Introducing Cache Reserve: massively extending Cloudflare’s cache

One hundred percent. 100%. One-zero-zero. That’s the cache ratio we’re all chasing. Having a high cache ratio means that more of a website’s content is served from a Cloudflare data center close to where a visitor is requesting the website. Serving content from Cloudflare’s cache means it loads faster for visitors, saves website operators money on egress fees from origins, and provides multiple layers of resiliency and protection to make sure that content is always available to be served.

Today, I’m delighted to announce a massive extension of the benefits of caching with Cache Reserve: a new way to persistently serve all static content from Cloudflare’s global cache. By using Cache Reserve, customers can see higher cache hit ratios and lower egress bills.

Why is getting a 100% cache ratio difficult?

Every second, Cloudflare serves tens-of-millions of requests from our cache which equates to multiple terabytes-per-second of cached data being delivered to website visitors around the world. With this massive scale, we must ensure that the most requested content is cached in the areas where it is most popular. Otherwise, visitors might wait too long for content to be delivered from farther away and our network would be running inefficiently. If cache storage in a certain region is full, our network avoids imposing these inefficiencies on our customers by evicting less-popular content from the data center and replacing it with more-requested content.

This works well for the majority of use cases, but all customers have long tail content that is rarely requested and may be evicted from cache. This can be a cause of concern for customers, as this unpopular content can be a major cost driver if it is evicted repeatedly and needs to be served from an origin. This concern can be especially significant for customers with massive content libraries. So how can we make sure to keep this less popular content in cache to shield the customer from origin egress?

Cache Reserve removes customer content from this popularity contest and ensures that even if the specific content hasn’t been requested in months, it can still be served from Cloudflare’s cache – avoiding the need to pull it from the origin and saving the customer money on egress. Cache Reserve helps get customers closer to that 100% cache ratio and helps serve all of their content from our global CDN, forever.  

Why is cache eviction needed?

Most content served from our cache starts its journey from an origin server – where content is hosted. In order to be admitted to Cloudflare’s cache the content sent from the origin must meet certain eligibility criteria that ensures it can be reused to respond to other requests for a website (content that doesn’t change based on who is visiting the site).

After content is admitted to cache, the next question to consider is how long it should remain in cache. Since cache ratios are calculated by taking the number of requests for content and identifying the portion that are answered from a cache server instead of an origin server, ensuring content remains cached in an area it is highly requested is paramount to achieving a high cache ratio.

Introducing Cache Reserve: massively extending Cloudflare’s cache

Some CDNs use a pay-to-play model that allows customers to pay more money to ensure content is cached in certain areas for some length of time. At Cloudflare, we don’t charge customers based on where or for how long something is cached. This means that we have to use signals other than a customer’s willingness to pay to make sure that the right content is cached for the right amount of time and in the right areas.

Where to cache a piece of content is pretty straightforward (where it’s being requested), how long content should remain in cache can be highly variable.

Beyond headers like cache-control or cdn-cache-control, which help determine how long a customer wants something to be served from cache, the other element that CDNs must consider is whether they need to evict content early to optimize storage of more popular assets. We do eviction based on an algorithm called “least recently used” or LRU. This means that the least-requested content can be evicted from cache first to make space for more popular content when storage space is full.

This caching strategy requires keeping track of a lot of information about when requests come in and constantly updating the cache to make sure that the hottest content is kept in cache and the least popular content is evicted. This works well and is fair for the wide-array of customers our CDN supports.

However, if a customer has a large library of content that might go through cycles of popularity and which they’d like to serve from cache regardless, then LRU might mean additional origin egress as assets that are requested sparingly over a long time frame are pulled more from the origin.    

That’s where Cache Reserve comes in. Cache Reserve is not an alternative to our popularity-based cache but a complement to it. By backstopping all cacheable content in Cache Reserve customers don’t have to worry about cache eviction or ephemerality any longer.    

Cache Reserve

Cache Reserve is a large, persistent data store that is implemented on top of R2. By pushing a single button in the dashboard, all of your website’s cacheable content will be written to Cache Reserve. In the same way that Tiered Cache builds a hierarchy of caches between your visitors and your origin, Cache Reserve serves as the ultimate upper-tier cache that will reserve storage space for your assets for as long as you want. This ensures that your content is always served from cache, shielding your origin from unneeded egress fees, and improving response performance.

How Does Cache Reserve Work?

Introducing Cache Reserve: massively extending Cloudflare’s cache

Cache Reserve sits between our edge data centers and your origin and provides guaranteed SLAs for how long your content can remain in cache.

As content is pulled from the origin, it will be written to Cache Reserve, followed by upper-tier data centers, and lower-tier data centers until it reaches the client to fulfill the request. Subsequent requests for the same content will not need to go all the way back to the origin for the response and can, instead, be served from a cache closer to the visitor. Improving both performance and costs of serving the assets. As content gets evicted from lower-tiers and upper-tiers, it will be backstopped by Cache Reserve.

Cache Reserve voids the request-based eviction that’s implemented in LRU and ensures that assets will remain in cache as long as they are needed. Cache Reserve extends the benefits of Tiered Cache by reducing the number of times Cloudflare’s network needs to ask an origin for content we should have in cache, while simultaneously limiting the number of connections and requests that our data centers need to open to your origin to ask for missing content. Using Cache Reserve with tiered cache helps collapse the number of requests that result from multiple concurrent cache misses from lower-tiers for the same content.

As an example, let’s assume a cold request for example.com, something our network has never seen before. If a client request comes into the closest lower-tier data center and it is a miss, that lower-tier is mapped to an upper-tier data center. When the lower-tier asks the upper-tier for the content and it is also a miss, the upper-tier will ask Cache Reserve for the content. Now, being the ultimate upper-tier, it will be the only data center that can ask the origin for content if it is not stored on our network. This will help limit the origin resources you need to devote to serving this content as once it’s written to Cache Reserve, your origin doesn’t need to fan out the content to any other part of Cloudflare’s network.

When your content does need updating, Cache Reserve will respect cache-control headers and purge requests. This means that if you want to control how long something remains fresh in Cache Reserve, before Cloudflare goes back to your origin to revalidate the content, set it as a cache-control header and it will be respected without risk of early eviction. Or if you want to update content on the fly, you can send a purge request which will be respected in both Cloudflare’s cache and in Cache Reserve.

How do you use Cache Reserve?

Currently, Cache Reserve is in closed beta, meaning that it’s available to anyone who wants to sign up but we will be slowly rolling it out to customers over the coming weeks to make sure that we are quickly triaging edge cases and making fundamental improvements before we make it generally available to everyone.

To sign up for the Cache Reserve beta:

  • Simply go to the Caching tile in the dashboard.
  • Navigate to the Cache Reserve page and push the sign up button.
Introducing Cache Reserve: massively extending Cloudflare’s cache

The Cache Reserve Plan will mimic the low cost of R2. Storage will be \$0.015 per GB per month and operations will be \$0.36 per million reads, and \$4.50 per million writes. For more information about pricing, please refer to the R2 page to get a general idea (Cache Reserve pricing page will be out soon).  

Try it out!

Cache Reserve holds tremendous promise to increase cache hit ratios — which will improve the economics of running any website while speeding up visitors’ experiences. We’re excited to begin letting people use Cache Reserve soon. Be sure to check out the beta and let us know what you think.

Durable Objects Alarms — a wake-up call for your applications

Post Syndicated from Matt Alonso original https://blog.cloudflare.com/durable-objects-alarms/

Durable Objects Alarms — a wake-up call for your applications

Durable Objects Alarms — a wake-up call for your applications

Since we launched Durable Objects, developers have leveraged them as a novel building block for distributed applications.

Durable Objects provide globally unique instances of a JavaScript class a developer writes, accessed via a unique ID. The Durable Object associated with each ID implements some fundamental component of an application — a banking application might have a Durable Object representing each bank account, for example. The bank account object would then expose methods for incrementing a balance, transferring money or any other actions that the application needs to do on the bank account.

Durable Objects work well as a stateful backend for applications — while Workers can instantiate a new instance of your code in any of Cloudflare’s data centers in response to a request, Durable Objects guarantee that all requests for a given Durable Object will reach the same instance on Cloudflare’s network.

Each Durable Object is single-threaded and has access to a stateful storage API, making it easy to build consistent and highly-available distributed applications on top of them.

This system makes distributed systems’ development easier — we’ve seen some impressive applications launched atop Durable Objects, from collaborative whiteboarding tools to conflict-free replicated data type (CRDT) systems for coordinating distributed state launch.

However, up until now, there’s been a piece missing — how do you invoke a Durable Object when a client Worker is not making requests to it?

As with any distributed system, Durable Objects can become unavailable and stop running. Perhaps the machine you were running on was unplugged, or the datacenter burned down and is never coming back, or an individual object exceeded its memory limit and was reset. Before today, a subsequent request would reinitialize the Durable Object on another machine, but there was no way to programmatically wake up an Object.

Durable Objects Alarms are here to change that, unlocking new use cases for Durable Objects like queues and deferred processing.

What is a Durable Object Alarm?

Durable Object Alarms allow you, from within your Durable Object, to schedule the object to be woken up at a time in the future. When the alarm’s scheduled time comes, the Durable Object’s alarm() handler will be called. If this handler throws an exception, the alarm will be automatically retried using exponential backoff until it succeeds — alarms have guaranteed at-least-once execution.

How are Alarms different from Workers Cron Triggers?

Alarms are more fine-grained than Cron Triggers. While a Workers service can have up to three Cron Triggers configured at once, it can have an unlimited amount of Durable Objects, each of which can have a single alarm active at a time.

Alarms are directly scheduled from and invoke a function within your Durable Object. Cron Triggers, on the other hand, are not programmatic — they execute based on their schedules, which have to be configured via the Cloudflare Dashboard or centralized configuration APIs.

How do I use Alarms?

First, you’ll need to add the durable_object_alarms compatibility flag to your wrangler.toml.

compatibility_flags = ["durable_object_alarms"]

Next, implement an alarm() handler in your Durable Object that will be called when the alarm executes. From anywhere else in your Durable Object, call state.storage.setAlarm() and pass in a time for the alarm to run at. You can use state.storage.getAlarm() to retrieve the currently set alarm time.

In this example, we implemented an alarm handler that wakes the Durable Object up once every 10 seconds to batch requests to a single Durable Object, deferring processing until there is enough work in the queue for it to be worthwhile to process them.

export default {
  async fetch(request, env) {
    let id = env.BATCHER.idFromName("foo");
    return await env.BATCHER.get(id).fetch(request);
  },
};

const SECONDS = 1000;

export class Batcher {
  constructor(state, env) {
    this.state = state;
    this.storage = state.storage;
    this.state.blockConcurrencyWhile(async () => {
      let vals = await this.storage.list({ reverse: true, limit: 1 });
      this.count = vals.size == 0 ? 0 : parseInt(vals.keys().next().value);
    });
  }
  async fetch(request) {
    this.count++;

    // If there is no alarm currently set, set one for 10 seconds from now
    // Any further POSTs in the next 10 seconds will be part of this kh.
    let currentAlarm = await this.storage.getAlarm();
    if (currentAlarm == null) {
      this.storage.setAlarm(Date.now() + 10 * SECONDS);
    }

    // Add the request to the batch.
    await this.storage.put(this.count, await request.text());
    return new Response(JSON.stringify({ queued: this.count }), {
      headers: {
        "content-type": "application/json;charset=UTF-8",
      },
    });
  }
  async alarm() {
    let vals = await this.storage.list();
    await fetch("http://example.com/some-upstream-service", {
      method: "POST",
      body: Array.from(vals.values()),
    });
    await this.storage.deleteAll();
    this.count = 0;
  }
}

Once every 10 seconds, the alarm() handler will be called. In the event an unexpected error terminates the Durable Object, it will be re-instantiated on another machine, following a short delay, after which it can continue processing.

Under the hood, Alarms are implemented by making reads and writes to the storage layer. This means Alarm get and set operations follow the same rules as any other storage operation – writes are coalesced with other writes, and reads have a defined ordering. See our blog post on the caching layer we implemented for Durable Objects for more information.

Durable Objects Alarms guarantee fault-tolerance

Alarms are designed to have no single point of failure and to run entirely on our edge – every Cloudflare data center running Durable Objects is capable of running alarms, including migrating Durable Objects from unhealthy data centers to healthy ones as necessary to ensure that their Alarm executes. Single failures should resolve in under 30 seconds, while multiple failures may take slightly longer.

We achieve this by storing alarms in the same distributed datastore that backs the Durable Object storage API. This allows alarm reads and writes to behave identically to storage reads and writes and to be performed atomically with them, and ensures that alarms are replicated across multiple datacenters.

Within each data center capable of running Durable Objects, there are multiple processes responsible for tracking upcoming alarms and triggering them, providing fault tolerance and scalability within the data center. A single elected leader in each data center is responsible for detecting failure of other data centers and assigning responsibility of those alarms to healthy local processes in its own data center. In the event of leader failure, another leader will be elected and become responsible for executing Alarms in the data center. This allows us to guarantee at-least-once execution for all Alarms.

How do I get started?

Alarms are a great way to build new distributed primitives, like queues, atop Durable Objects. They also provide a method for guaranteeing work within a Durable Object will complete, without relying on a client request to “kick” the Object.

You can get started with Alarms now by enabling Durable Objects in the Cloudflare dashboard. For more info, check the developer docs or jump in our Discord.

A New Hope for Object Storage: R2 enters open beta

Post Syndicated from Greg McKeon original https://blog.cloudflare.com/r2-open-beta/

A New Hope for Object Storage: R2 enters open beta

A New Hope for Object Storage: R2 enters open beta

In September, we announced that we were building our own object storage solution: Cloudflare R2. R2 is our answer to egregious egress charges from incumbent cloud providers, letting developers store as much data as they want without worrying about the cost of accessing that data.

The response has been overwhelming.

  • Independent developers had bills too small for cloud providers to negotiate fair egress rates with them. Egress charges were the largest line-item on their cloud bills, strangling side projects and the new businesses they were building.
  • Large corporations had written off multi-cloud storage – and thus multi-cloud itself – as a pipe dream. They came to us with excitement, pitching new products that integrated data with partner companies.
  • Non-profit research organizations were paying massive egress fees just to share experiment data with one another. Egress fees were having a real impact on their ability to collaborate, driving silos between organizations and restricting the experiments and analyses they could run.

Cloudflare exists to help build a better Internet. Today, the Internet gets what it deserves: R2 is now in open beta.

Self-serve customers can enable R2 in the Cloudflare dashboard. Enterprise accounts can reach out to their CSM for onboarding.

Internal and external APIs

R2 has two APIs: an API accessible only from within Workers, which we call the In-Worker API, and an S3-compatible API, which exposes your bucket on a URL of the form bucket.account.r2storage.com. Before you can make requests to R2, you’ll need to be authenticated — R2 buckets are private by default.

In-Worker API

With the in-Worker API, a bucket is “bound” to a specific Worker, which can then perform PUT, GET, DELETE and LIST operations against the bucket.

S3-compatible API

For the S3-compatible API, authentication is done the same way as on S3: SigV4 against an R2 URL. SigV4 signs requests using a secret key to authenticate them to R2. This means public access to R2 over the Internet is only possible today by hosting a Worker, connecting it to R2, and routing requests through it.

The easiest way to test the S3-compatible API is to use an S3 client. One of the most popular S3 clients is the boto3 SDK.

In Python, copy the following script and fill in the account_id, access_key, and secret_access_key fields with your R2 account credentials.

#!/usr/bin/env python
import boto3
import pprint
from botocore.client import Config
 
account_id = ''
access_key_id = ''
secret_access_key = ''
endpoint = f'https://{account_id}.r2.cloudflarestorage.com'
 
cl = boto3.client(
    's3',
    aws_access_key_id=access_key_id,
    aws_secret_access_key=secret_access_key,
    endpoint_url=endpoint,
    config=Config(
        region_name = endpoints[endpoint_name].get('region', 'auto'),
        s3={'addressing_style': 'path'},
        retries=dict( max_attempts=0 ),
    ),
)
 
printer = pprint.PrettyPrinter().pprint
 
printer(cl.head_bucket(Bucket='some bucket'))
printer(cl.create_bucket(Bucket='some other bucket'))
printer(cl.put_object(Bucket='some bucket', Key='my object', Body='some payload'))

Features

R2 comes with support for all basic create/read/update/delete S3 features through both of its APIs.

During the open beta period, we’re targeting R2 to sustain 1,000 GET operations per second and 100 PUT operations per second, per bucket. R2 supports objects up to approximately 5 TB in size, with individual parts limited to 5 GB of data.

R2 provides strongly consistent access to data. Once a PUT is confirmed by R2, future GET operations will always reflect the new key/value pair. The only exception to this is when deleting a bucket. For a short period of time following deletion, the bucket may still exist and continue to allow reads/writes.

Pricing

When we initially announced R2, we included preliminary pricing numbers. One of our main goals with R2 has been to serve the developers who can’t negotiate large discounts with cloud vendors. To that end, we’re also announcing a forever-free tier that lets developers start building on R2 with no charges at all.

R2 charges depend on the total volume of data stored and the type of operation performed on the data:

  • Storage is priced at \$0.015 / GB, per month.
  • Class A operations (including writes and lists) cost \$4.50 / million.
  • Class B operations cost \$0.36 / million.

Class A operations tend to mutate state, such as creating a bucket, listing objects in a bucket, or writing an object. Class B operations tend to read existing state, for example reading an object from a bucket. You can find more information on pricing and a full list of operation types in the docs.

Of course, there is no charge for egress bandwidth from R2. You can access your bucket to your heart’s content.

R2’s forever-free tier includes:

  • 10 GB-months of stored data
  • 1,000,000 Class A operations, per month
  • 10,000,000 Class B operations, per month

Free usage resets each month. While in the open beta phase, R2 usage over the free tier will be billed.

Future plans

We’ve spent the past six months in closed beta with a number of design partners, building out our storage solution. Backed by Durable Objects, R2’s novel architecture delivers both high availability and consistent performance.

While we’ve made great progress on R2, we still have plenty left to build in the coming months.

Improving performance

Our first priority is to improve performance and reliability. While we’ve thrown internal usage and our design partner’s demands at R2, there’s no substitute for live production traffic.

During the open beta period, R2 can sustain a maximum of 1,000 GET operations per second and 100 PUT operations per second, per bucket. We’ll look to raise these limits as we get comfortable operating the system. If you have higher needs, reach out to us!

When you create a bucket, you won’t see a region selector. Our vision for R2 includes automatically globally distributed storage, where R2 seamlessly places each object into the storage region closest to where the request comes from. Today, R2 primarily stores data in North America, which can lead to higher latencies when accessing content from other regions. We’ll first look to address this by adding additional regions where objects can be created, before adding automatic migration of existing objects across regions. Similar to what we’ve built with jurisdictional restrictions for Durable Objects, we’ll also enable restricting where an R2 bucket places data to comply with privacy regulations.

Expanding R2’s feature set

We’ll then focus on expanding R2 capabilities beyond the basic S3 API. In the near term, we’re focused on delivering:

  • Support for TTLs, so data can automatically be deleted from buckets over time.
  • Public buckets, so a bucket can be exposed to the internet without writing a Worker
  • Pre-signed URL support, which delegates read and write access for a specific key to a token.
  • Integration with Cloudflare’s cache, to scale read requests and provide global distribution of data.

If you have additional feature requests that aren’t listed above, we want to hear from you! Reach out and let us know what you need to make R2 your new, zero-cost egress object store.

Cloudflare Relay Worker

Post Syndicated from Matt Boyle original https://blog.cloudflare.com/cloudflare-relay-worker/

Cloudflare Relay Worker

Cloudflare Relay Worker

Our Notification Center offers first class support for a variety of popular services (a list of which are available here). However, even with such extensive support, you may use a tool that isn’t on that list. In that case, it is possible to leverage Cloudflare Workers in combination with a generic webhook to deliver notifications to any service that accepts webhooks.

Today, we are excited to announce that we are open sourcing a Cloudflare Worker that will make it as easy as possible for you to transform our generic webhook response into any format you require. Here’s how to do it.

For this example, we are going to write a Cloudflare Worker that takes a generic webhook response, transforms it into the correct format and delivers it to Rocket Chat, a popular customer service messaging platform.  When Cloudflare sends you a generic webhook, it will have the following schema, where “text” and “data” will vary depending on the alert that has fired:

{
   "name": "Your custom webhook",
   "text": "The alert text",
   "data": {
       "some": "further",
       "info": [
           "about",
           "your",
           "alert",
           "in"
       ],
       "json": "format"
   },
   "ts": 123456789
}

Whereas Rocket Chat is looking for this format:

{
   "text": "Example message",
   "attachments": [
       {
           "title": "Rocket Chat",
           "title_link": "https://rocket.chat",
           "text": "Rocket.Chat, the best open source chat",
           "image_url": "/images/integration-attachment-example.png",
           "color": "#764FA5"
       }
   ]
}

Getting Started

Firstly, you’ll need to ensure you are ready to develop on the Cloudflare Workers platform. You can find more information on how to do that here. For the purpose of this example, we will assume you have a Cloudflare account and Wrangler, the Workers CLI, setup.

Next, let us see the steps to extend the notifications system in detail.

Step 1
Clone the webhook relay worker GitHub repository: git clone [email protected]:cloudflare/cf-webhook-relay.git

Step 2
Check the webhook payload format required by your communication tool. In this specific case, it would look like the Rocket Chat example payload shared above.

Step 3
Sign up for Rocket Chat and add a webhook integration to accept incoming webhook notifications.

Cloudflare Relay Worker

Step 4
Configure an encrypted wrangler secret for request authentication and the Rocket Chat URL for sending requests in your Worker: Environment variables · Cloudflare Workers docs (for this example, the secret is not encrypted.)

Cloudflare Relay Worker

Step 5
Modify your worker to accept POST webhook requests with the secret configured as a query param for authentication.

if (headers.get("cf-webhook-auth") !== WEBHOOK_SECRET) {
    return new Response(":(", {
        headers: {'content-type': 'text/plain'},
            status: 401
    })
}

Step 6
Convert the incoming request payload from the notification system (like in the example shared above) to the Rocket Chat format in the worker.

let incReq = await request.json()
let msg = incReq.text
let webhookName = incReq.name
let rocketBody = {
    "text": webhookName,
    "attachments": [
        {
            "title": "Cloudflare Webhook",
            "text": msg,
            "title_link": "https://cloudflare.com",
            "color": "#764FA5"
        }
    ]
}

Step 7
Configure the Worker to send POST requests to the Rocket Chat webhook with the converted payload.

const rocketReq = {
    headers: {
        'content-type': 'application/json',
    },
    method: 'POST',
    body: JSON.stringify(rocketBody),
}
const response = await fetch(
    ROCKET_CHAT_URL,
    rocketReq,
)
const res = await response.json()
console.log(res)
return new Response(":)", {
    headers: {'content-type': 'text/plain'},
})

Step 8
Set up deployment configuration in your wrangler.toml file and publish your Worker. You can now see the Worker in the Cloudflare dashboard.

Cloudflare Relay Worker

Step 9
You can manage and monitor the Worker with a variety of available tools.

Cloudflare Relay Worker

Step 10
Add the Worker URL as a generic webhook to the notification destinations in the Cloudflare dashboard: Configure webhooks · Cloudflare Fundamentals docs.

Cloudflare Relay Worker
Cloudflare Relay Worker

Step 11
Create a notification with the destination as the configured generic webhook: Create a Notification · Cloudflare Fundamentals docs.

Cloudflare Relay Worker

Step 12
Tada! With your Cloudflare Worker running, you can now receive all notifications to Rocket Chat. We can configure in the same way for any communication tool.

Cloudflare Relay Worker

We know that a notification system is essential to proactively monitor any issues that may arise within a project. We are excited with this announcement to make notifications available to any communication service without having to worry too much about the system’s compatibility to them. We have lots of updates planned, like adding more alertable events to choose from and extending our support to a wide range of webhook services to receive them.

If you’re interested in building scalable services and solving interesting technical problems, we are hiring engineers on our team in Austin & Lisbon.

Secret Management with HashiCorp Vault

Post Syndicated from Mitz Amano original https://blog.cloudflare.com/secret-management-with-hashicorp-vault/

Secret Management with HashiCorp Vault

Secret Management with HashiCorp Vault

Many applications these days require authentication to external systems with resources, such as users and passwords to access databases and service accounts to access cloud services, and so on. In such cases, private information, like passwords and keys, becomes necessary. It is essential to take extra care in managing such sensitive data. For example, if you write your AWS key information or password in a script for deployment and then push it to a Git repository, all users who can read it will also be able to access it, and you could be in trouble. Even if it’s an internal repository, you run the risk of a potential leak.

How we were managing secrets in the service

Before we talk about Vault, let’s take a look at how we’ve used to manage secrets.

Salt

We use SaltStack as a bare-metal configuration management tool. The core of the Salt ecosystem consists of two major components: the Salt Master and the Salt Minion. The configuration state is owned by Salt Master, and thousands of Salt Minions automatically install packages, generate configuration files, and start services to the node based on the state. The state may contain secrets, such as passwords and API keys. When we deploy secrets to the node, we encrypt plaintext using a Salt Master owned GPG key and fill an ASCII-armored secret into the state file. Once it is applied, the Salt Master decrypts the PGP message using its own key, then the Salt Minion retrieves rendered data from the Master.

Secret Management with HashiCorp Vault

Kubernetes

We were using Lockbox, a secure way to store your Kubernetes secrets offline. The secret is asymmetrically encrypted and can only be decrypted with the Lockbox Kubernetes controller. The controller synchronizes with Secret objects. A Secret generated from Lockbox will also be created in the corresponding namespace. Since namespaces have been assigned administrator privileges by each engineering team, ordinary users cannot read Secret objects.

Secret Management with HashiCorp Vault

Why these secrets management were insufficient

Prior to Vault, GnuPG and Lockbox were used in this way to encrypt and decrypt most secrets in the data center. Nevertheless, they were inadequate in certain cases:

  • Lack of scoping secrets: The secret data in ASCII-armor could only be decrypted by a specific node when the client read it. This was still not enough control. Salt owns a GPG key for each Salt Master, and Core services (k8s, Databases, Storage, Logging, Tracing, Monitoring, etc) are deployed to hundreds of Salt Minions by a few Salt Masters. Nodes are often reused as different services after repairing hardware failure, so we use the same GPG key to decrypt the secrets of various services. Therefore, having a GPG key for each service is complicated. Also, a specific secret is used only for a specific service. For example, an access key for object storage is needed to back up the repository. In previous configurations, the API key is decrypted by a common Salt Master, so there is a risk that the API key will be referenced by another service or for another purpose. It is impossible to scope secret access, as long as we use the same GPG key.

    Another case is Kubernetes. Namespace-scoped access control and API access restrictions by the RBAC model are excellent. And the etcd used by Kubernetes as storage is not encrypted by default, and the Secret object is also saved. We need to think about encryption-at-rest by a third party KMS, or how to prevent Secrets from being stored in etcd. In other words, it is also required to properly control access to the secret for the secret itself.

  • Rotation and static secret: Anyone who has access to the Salt Master GPG key can theoretically decrypt all current and future secrets. And as long as we have many secrets, it’s impossible to rotate the encryption of all the secrets. Current and future secret management requires a process for easy rotation and using dynamically generated secrets instead.
  • Session management: Users/Services with GPG keys can decrypt secrets at any time. So GPG secret decryption is like having no TTL. (You can set an expiration date against the GPG key, but it’s just metadata. If you try to encrypt a new secret, after the expiration date, you’ll get a warning, but you can decrypt the existing secret). A temporary session is required to limit access when not needed.
  • Audit: GPG doesn’t have a way to keep an audit trail. Audit trails help us to trace the event who/when/where read secrets. The audit trail should contain details including the date, time, and user information associated with the secret read (and login), which is required regardless of user or service.

HashiCorp Vault

Armed with our set of requirements, we chose HashiCorp Vault to make better secret management with a better security model.

  • Scoping secrets: When a client logs in, a Vault token is generated through the Auth method (backend). This token has a policy that defines access policies, so it is clear what the client can access the data after logging in.
  • Rotation and dynamic secret: Version-controlled static secret with KV V2 Secret Engine helps us to easily update/rollback secrets with a single request. In addition, dynamic secrets and credentials are available to eliminate manual rotation. Ideally, these are required to be short-lived and have frequent rotation. Service should have restricted access. These are essential to reduce the impact of an attack, but they are operationally difficult, and it is impossible to satisfy them without automation. Vault can solve this problem by allowing operators to provide dynamically generated credentials to their services. Vault manages the credential lifecycle and rotates and revokes it as needed.
  • Session management: Vault provides a login process to get the token and various auth methods are provided. It is possible to link with an Identity Provider and authenticate using JWT. Since the vault token has a TTL, it can be managed as a short-lived credential to access secrets.
  • Audit: Vault supports audit that records who accessed which Vault API, when, and from where.

We also built Vault clusters for HA, Reliability, and handling large numbers of requests.

  • Use Integrated Storage that every node in the Vault cluster has a duplicate copy of Vault’s data. A client can retrieve the same result from any node.
  • Performance Replication offers us the same result as any Vault clusters.
  • Requests from clients are routed from a single Service IP to one of the Clusters. Anycast routes incoming traffic to the nearest cluster that handles requests efficiently. If one cluster goes down, the request will be automatically routed to another available cluster.
Secret Management with HashiCorp Vault

Service integrations

Use the appropriate Auth backend and Secret Engine to integrate the Service and Vault that are responsible for each core component.

Salt

The configuration state is owned by Salt Master, and hundreds of Salt Minions automatically install packages, generate configuration files, and start services to the node based on the role. The state data may contain secrets, such as API keys, and Salt Minion retrieves them from Vault. Salt uses a JWT signed by the Salt Master to log in to the vault using the JWT Auth method.

Secret Management with HashiCorp Vault

Kubernetes

Kubernetes reads Vault secrets through an operator that synchronizes with Secret objects. The Kubernetes Auth method uses the Service Account token JWT to login, just like the JWT Auth method. This JWT contains the service account name, UID, and namespace. Vault can scope namespace based on dynamic policy.

Secret Management with HashiCorp Vault

Identity Provider – User login

Additionally, Vault can work with the Identity Provider through a delegated authorization method based on OAuth 2.0 so that users can get tokens with the right policies. The JWT issued by the Identity Provider contains the group or user ID to which it belongs, and this metadata can be used to assign a Vault policy.

Secret Management with HashiCorp Vault

Integrated ecosystem – Auth x Secret

Vault provides a plugin system for two major components: authentication (Auth method) and secret management (Secret Engine). Vault can enable the officially provided plugins and the custom plugins you can build. The Auth method provides authentication for obtaining a Vault token by various methods. As mentioned in the service integration example above, we mainly use JWT, OIDC, and Kubernetes for login. On the other hand, the secret engine provides secrets in various ways, such as KV for a static secret, PKI for certificate signing, issuing, etc.

And they have an ecosystem. Vault can easily integrate auth methods and secret engines with each other. For instance, if we add a DB dynamic credential secret engine, all existing platforms will instantly be supported, without needing to reinvent the wheel, on how they will auth to a separate service. Similarly, we can add a platform into the mix, and it would instantly have access to all the existing secret engines and their functionalities. Additionally, the Vault can perform permission to the arbitrary endpoint path provided by secret engines based on the authentication method and policies.

Wrap up

Vault integration for the core component is already ongoing and many GPG secrets have been migrated to Vault. We aim to make service integrations in our data centers, dynamic credentials, and improve CI/CD for Vault. Interested? We’re hiring for security platform engineering!

Building many private virtual networks through Cloudflare Zero Trust

Post Syndicated from Nuno Diegues original https://blog.cloudflare.com/building-many-private-virtual-networks-through-cloudflare-zero-trust/

Building many private virtual networks through Cloudflare Zero Trust

We built Cloudflare’s Zero Trust platform to help companies rely on our network to connect their private networks securely, while improving performance and reducing operational burden. With it, you could build a single virtual private network, where all your connected private networks had to be uniquely identifiable.

Starting today, we are thrilled to announce that you can start building many segregated virtual private networks over Cloudflare Zero Trust, beginning with virtualized connectivity for the connectors Cloudflare WARP and Cloudflare Tunnel.

Connecting your private networks through Cloudflare

Consider your team, with various services hosted across distinct private networks, and employees accessing those resources. More than ever, those employees may be roaming, remote, or actually in a company office. Regardless, you need to ensure only they can access your private services. Even then, you want to have granular control over what each user can access within your network.

This is where Cloudflare can help you. We make our global, performant network available to you, acting as a virtual bridge between your employees and private services. With your employees’ devices running Cloudflare WARP, their traffic egresses through Cloudflare’s network. On the other side, your private services are behind Cloudflare Tunnel, accessible only through Cloudflare’s network. Together, these connectors protect your virtual private network end to end.

Building many private virtual networks through Cloudflare Zero Trust

The beauty of this setup is that your traffic is immediately faster and more secure. But you can then take it a step further and extract value from many Cloudflare services for your private network routed traffic: auditing, fine-grained filtering, data loss protection, malware detection, safe browsing, and many others.

Our customers are already in love with our Zero Trust private network routing solution. However, like all things we love, they can still improve.

The problem of overlapping networks

In the image above, the user can access any private service as if they were physically located within the network of that private service. For example, this means typing jira.intra in the browser or SSH-ing to a private IP 10.1.2.3 will work seamlessly despite neither of those private services being exposed to the Internet.

However, this has a big assumption in place: those underlying private IPs are assumed to be unique in the private networks connected to Cloudflare in the customer’s account.

Suppose now that your Team has two (or more) data centers that use the same IP space — usually referred to as a CIDR — such as 10.1.0.0/16. Maybe one is the current primary and the other is the secondary, replicating one another. In such an example situation, there would exist a machine in each of those two data centers, both with the same IP, 10.1.2.3.

Until today, you could not set up that via Cloudflare. You would connect data center 1 with a Cloudflare Tunnel responsible for traffic to 10.1.0.0/16. You would then do the same in data center 2, but receive an error forbidding you to create an ambiguous IP route:

$ cloudflared tunnel route ip add 10.1.0.0/16 dc-2-tunnel

API error: Failed to add route: code: 1014, reason: You already have a route defined for this exact IP subnet

In an ideal world, a team would not have this problem: every private network would have unique IP space. But that is just not feasible in practice, particularly for large enterprises. Consider the case where two companies merge: it is borderline impossible to expect them to rearrange their private networks to preserve IP addressing uniqueness.

Getting started on your new virtual networks

You can now overcome the problem above by creating unique virtual networks that logically segregate your overlapping IP routes. You can think of a virtual network as a group of IP subspaces. This effectively allows you to compose your overall infrastructure into independent (virtualized) private networks that are reachable by your Cloudflare Zero Trust organization through Cloudflare WARP.

Building many private virtual networks through Cloudflare Zero Trust

Let us set up this scenario.

We start by creating two virtual networks, with one being the default:

$ cloudflared tunnel vnet add —-default vnet-frankfurt "For London and Munich employees primarily"

Successfully added virtual network vnet-frankfurt with ID: 8a6ea860-cd41-45eb-b057-bb6e88a71692 (as the new default for this account)

$ cloudflared tunnel vnet add vnet-sydney "For APAC employees primarily"

Successfully added virtual network vnet-sydney with ID: e436a40f-46c4-496e-80a2-b8c9401feac7

We can then create the Tunnels and route the CIDRs to them:

$ cloudflared tunnel create tunnel-fra

Created tunnel tunnel-fra with id 79c5ba59-ce90-4e91-8c16-047e07751b42

$ cloudflared tunnel create tunnel-syd

Created tunnel tunnel-syd with id 150ef29f-2fb0-43f8-b56f-de0baa7ab9d8

$ cloudflared tunnel route ip add --vnet vnet-frankfurt 10.1.0.0/16 tunnel-fra

Successfully added route for 10.1.0.0/16 over tunnel 79c5ba59-ce90-4e91-8c16-047e07751b42

$ cloudflared tunnel route ip add --vnet vnet-sydney 10.1.0.0/16 tunnel-syd

Successfully added route for 10.1.0.0/16 over tunnel 150ef29f-2fb0-43f8-b56f-de0baa7ab9d8

And that’s it! Both your Tunnels can now be run and they will connect your private data centers to Cloudflare despite having overlapping IPs.

Your users will now be routed through the virtual network vnet-frankfurt by default. Should any user want otherwise, they could choose on the WARP client interface settings, for example, to be routed via vnet-sydney.

Building many private virtual networks through Cloudflare Zero Trust

When the user changes the virtual network chosen, that informs Cloudflare’s network of the routing decision. This will propagate that knowledge to all our data centers via Quicksilver in a matter of seconds. The WARP client then restarts its connectivity to our network, breaking existing TCP connections that were being routed to the previously selected virtual network. This may be perceived as if you were disconnecting and reconnecting the WARP client.

Every current Cloudflare Zero Trust organization using private network routing will now have a default virtual network encompassing the IP Routes to Cloudflare Tunnels. You can start using the commands above to expand your private network to have overlapping IPs and reassign a default virtual network if desired.

If you do not have overlapping IPs in your private infrastructure, no action will be required.

What’s next

This is just the beginning of our support for distinct virtual networks at Cloudflare. As you may have seen, last week we announced the ability to create, deploy, and manage Cloudflare Tunnels directly from the Zero Trust dashboard. Today, virtual networks are only supported through the cloudflared CLI, but we are looking to integrate virtual network management into the dashboard as well.

Our next step will be to make Cloudflare Gateway aware of these virtual networks so that Zero Trust policies can be applied to these overlapping IP ranges. Once Gateway is aware of these virtual networks, we will also surface this concept with Network Logging for auditability and troubleshooting moving forward.

Email Routing Insights

Post Syndicated from Joao Sousa Botto original https://blog.cloudflare.com/email-routing-insights/

Email Routing Insights

Email Routing Insights

Have you ever wanted to try a new email service but worried it might lead to you missing any emails? If you have, you’re definitely not alone. Some of us email ourselves to make sure it reaches the correct destination, others don’t rely on a new address for anything serious until they’ve seen it work for a few days. In any case, emails often contain important information, and we need to trust that our emails won’t get lost for any reason.

To help reduce these worries about whether emails are being received and forwarded – and for troubleshooting if needed – we are rolling out a new Overview page to Email Routing. On the Overview tab people now have full visibility into our service and can see exactly how we are routing emails on their behalf.

Routing Status and Metrics

The first thing you will see in the new tab is an at a glance view of the service. This includes the routing status (to know if the service is configured and running), whether the necessary DNS records are configured correctly, and the number of custom and destination addresses on the zone.

Email Routing Insights

Below the configuration summary, you will see more advanced statistics about the number of messages received on your custom addresses, and what happened to those messages. You will see information about the number of emails forwarded or dropped by Email Routing (based on the rules you created), and the number that fall under other scenarios such as being rejected by Email Routing (due to errors, not passing security checks or being considered spam) or rejected by your destination mailbox. You now have the exact counts and a chart, so that you can track these metrics over time.

Email Routing Insights

Activity Log

On the Cloudflare Email Routing tab you’ll also see the Activity Log, where you can drill deeper into specific behaviors. These logs show you details about the email messages that reached one of the custom addresses you have configured on your Cloudflare zone.

For each message the logs will show you the Message ID, Sender, Custom Address, when Cloudflare Email Routing received it, and the action that was taken. You can also expand the row to see the SPF, DMARC, and DKIM status of that message along with any relevant error messaging.

And we know looking at every message can be overwhelming, especially when you might be resorting to the logs for troubleshooting purposes, so you have a few options for filtering:

  • Search for specific people (email addresses) that have messaged you.
  • Filter to show only one of your custom addresses.
  • Filter to show only messages where a specific action was taken.
Email Routing Insights

Routes and Settings

Next to the Overview tab, you will find the Routes tab with the configuration UI that is likely already familiar to you. That’s where you create custom addresses, add and verify destination addresses, and create rules with the relationships between the custom and destination addresses.

Email Routing Insights

Lastly the Settings tab includes less common actions such as the DNS configuration and the options for off boarding from Email Routing.

We hope you enjoy this update. And if you have any questions or feedback about this product, please come see us in the Cloudflare Community and the Cloudflare Discord.

Domain Scoped Roles – Early Access

Post Syndicated from Garrett Galow original https://blog.cloudflare.com/domain-scoped-roles-early-access/

Domain Scoped Roles - Early Access

Domain Scoped Roles - Early Access

Today, Cloudflare is making it easier for enterprise account owners to manage their team’s access to Cloudflare by allowing user access to be scoped to sets of domains. Ensuring users have exactly the access they need and no more is critical, and Domain Scoped Roles provide a significant step forward. Additionally, with the introduction of Domain Groups, account owners can grant users access to domains by group instead of individually. Domains can be added or removed from these groups to automatically update the access of those who have access to the group. This reduces toil in managing user access.

One of the most common uses we have seen for Domain Scoped Roles is to limit access to production domains to a small set of team members, while still allowing development and pre-production domains to be open to the rest of the team. That way, someone can’t make changes to a production domain unless they are given access.

How to use Domain Scoped Roles

If you are an enterprise customer please talk with your CSM to get you and your team enrolled. Note that you must have Super Administrator privileges to be able to modify account memberships.

Once the beta has been enabled for you, here is how to start using it:

  • Log in to dash.cloudflare.com, select your account, and navigate to the members page.
    • From here, you can either invite a new member with a Domain Scoped Role or modify an existing user’s permissions. In this case, we will invite a new user.

Domain Scoped Roles - Early Access

  • When inviting new members there are three things to provide:
    • Which users to invite.
    • The scope of which resource they will have access to:
      • Selecting “All Domains” will allow you to select legacy roles.
    • The role(s) which will decide what permissions are granted.

Domain Scoped Roles - Early Access

  • Before sending the invite, you will be able to confirm the users, scope, and roles.

Domain Scoped Roles - Early Access

Domain Groups

In addition to manually creating inclusion or exclusion lists per user, account owners can also create Domain Groups to allow granting one or more users to a group of domains. Domain Groups can be created from the member invite flow or directly from Account Configurations -> Lists. When creating a domain group, the user selects the domains to include and, from that point on, the group can be used when inviting a user to the account.

Domain Group Creation Screen

Domain Scoped Roles - Early Access

Domain Group selection during member invite

Domain Scoped Roles - Early Access

Introducing Bach

Domain Scoped Roles is possible because of a new permission system called Bach. Bach provides a policy based system for defining authorization to Cloudflare’s control plane. Authorization defines what someone can do in a system. Bach has been powering API Tokens, but going forward all authorization will use Bach. This gives customers the ability to define more granular permissions and resource scoping. Resources can be any object a user interacts with whether that be accounts, zones, worker environments, or DNS records to name a few.. Whereas before Cloudflare’s RBAC system relied on assigning a set of roles in which each role defined broad permissions that applied to an entire account, Bach’s policies allow for deeper permission grants that can be scoped to sets of resources.

Let’s take a look at the legacy system and how it compares to what Bach supports. Typically, user’s permissions are defined by the ‘roles’ that they are assigned to. These included: ‘Super Administrator’, ‘Administrator’, ‘Cloudflare for Teams’, and ‘Cloudflare Workers Admin’ to name a few. In the legacy system, each of these maps to an explicit set of simple permissions like ‘workers:read’, ‘workers:edit’ or ‘zones:edit’. When requests get made either via the Cloudflare API or the Cloudflare dashboard, Cloudflare’s API Gateway would check to see if, for the endpoint requested, an actor had the correct permissions to perform the action. In order to change a zone setting, the actor would need to have the ‘zone:edit’ permission – which could be granted from one of many roles. Below is what a user with only ‘DNS’ permissions is granted in the legacy system:

Legacy DNS User Role Permissions

Permission Edit Read
dns_records
legal
account
subscription
zone
zone_settings

While straightforward, this had many drawbacks. First, this meant the inability to define limits on the resources the permissions applied to, whether that be a set of resources or attributes of resources. Second, permissions for a given ‘resource’ were simply read or edit, where edit included creating and deleting. These don’t fully capture the needs that may exist – for example, limiting deletions while allowing edits.

With Bach we have an expanded capability to define granular permissions for authorization. An example policy defining the aforementioned ‘DNS’ member’s access would look like this:

Bach DNS User Role Policy

(slightly modified for readability)

#Legacy DNS Role - applies to all zones
  policies:
  - id: 186f95f3bda1443c986aeb78b05eb60b
    permission_groups:
    - id: 49ce85367bae433b9f0717ed4fea5c74
      name: DNS
      meta:
        description: Can edit DNS records.
        editable: 'false'
        label: dns_admin
        scopes: com.cloudflare.api.account
      permissions:
      - key: com.cloudflare.registrar.domain.read
      - key: com.cloudflare.registrar.domain.list
      - key: com.cloudflare.registrar.contact.read
      - key: com.cloudflare.registrar.contact.list
      - key: com.cloudflare.api.account.secondary-dns.update
      - key: com.cloudflare.api.account.secondary-dns.read
      - key: com.cloudflare.api.account.secondary-dns.delete
      - key: com.cloudflare.api.account.secondary-dns.create
      - key: com.cloudflare.api.account.zone.secondary-dns.update
      - key: com.cloudflare.api.account.zone.secondary-dns.read
      - key: com.cloudflare.api.account.zone.secondary-dns.delete
      - key: com.cloudflare.api.account.zone.secondary-dns.create
      - key: com.cloudflare.api.account.notification.*
      - key: com.cloudflare.api.account.custom-ns.update
      - key: com.cloudflare.api.account.custom-ns.list
      - key: com.cloudflare.api.account.custom-ns.*
      - key: com.cloudflare.edge.spectrum.app.list
      - key: com.cloudflare.edge.spectrum.app.read
      - key: com.cloudflare.api.account.zone.custom-page.read
      - key: com.cloudflare.api.account.zone.custom-page.list
      - key: com.cloudflare.api.account.zone.setting.read
      - key: com.cloudflare.api.account.zone.setting.list
      - key: com.cloudflare.api.account.zone.dnssec.update
      - key: com.cloudflare.api.account.zone.dnssec.read
      - key: com.cloudflare.api.account.dns-firewall.cluster.delete
      - key: com.cloudflare.api.account.dns-firewall.cluster.update
      - key: com.cloudflare.api.account.dns-firewall.cluster.read
      - key: com.cloudflare.api.account.dns-firewall.cluster.create
      - key: com.cloudflare.api.account.dns-firewall.cluster.list
      - key: com.cloudflare.api.account.zone.dns-record.delete
      - key: com.cloudflare.api.account.zone.dns-record.update
      - key: com.cloudflare.api.account.zone.dns-record.read
      - key: com.cloudflare.api.account.zone.dns-record.create
      - key: com.cloudflare.api.account.zone.dns-record.list
      - key: com.cloudflare.api.account.zone.page-rule.read
      - key: com.cloudflare.api.account.zone.page-rule.list
      - key: com.cloudflare.api.account.zone.railgun-connection.read
      - key: com.cloudflare.api.account.zone.railgun-connection.list
      - key: com.cloudflare.api.account.zone.subscription.read
      - key: com.cloudflare.api.account.zone.aml.read
      - key: com.cloudflare.api.account.zone.read
      - key: com.cloudflare.api.account.zone.list
      - key: com.cloudflare.api.account.audit-log.read
      - key: com.cloudflare.api.account.custom-page.read
      - key: com.cloudflare.api.account.custom-page.list
      - key: com.cloudflare.api.account.railgun.read
      - key: com.cloudflare.api.account.railgun.list
      - key: com.cloudflare.api.account.dpa.read
      - key: com.cloudflare.api.account.subscription.read
      - key: com.cloudflare.api.account.subscription.list
      - key: com.cloudflare.api.account.read
      - key: com.cloudflare.api.account.list
    resource_groups:
    - id: 2fe938e0a5824128bdc8c42f9339b127
      name: com.cloudflare.api.account.a67e14daa5f8dceeb91fe5449ba496eb
      meta:
        editable: 'false'
      scope:
        key: com.cloudflare.api.account.a67e14daa5f8dceeb91fe5449ba496eb
        objects:
        - key: "*"
    access: allow

You can see a greater granularity in the permissions that are defined. In both cases, the user can perform the same actions, but the granularity means we are more explicit about what can be done. Here for DNS records (under com.cloudflare.api.account.zone.dns-record) there are explicit create, read, update, delete and list permissions. In the resource group section, we can see that the scope section defines an account. Any objects like domains in that account will match to the “*” value under ‘objects’. This means that these permissions apply throughout the entire account. Now, let’s modify this user’s permissions to be scoped to a single domain and see how the policy changes.

Bach DNS User Role with Domain Scoping Policy

(slightly modified for readability)

# Zone Scoped DNS role - scoped to 1 zone
policies:
- id: 80b25dd735b040708155c85d0ed8a508
  permission_groups:
  - id: 132c52e7e6654b999c183cfcbafd37d7
    name: Zone DNS
    meta:
      description: Grants access to edit DNS settings for zones in an account.
      editable: 'false'
      label: zone_dns_admin
      scopes: com.cloudflare.api.account.zone
    permissions:
    - key: com.cloudflare.api.account.zone.secondary-dns.*
    - key: com.cloudflare.api.account.zone.dnssec.*
    - key: com.cloudflare.api.account.zone.dns-record.*
    - key: com.cloudflare.api.account.zone.analytics.dns-report.*
    - key: com.cloudflare.api.account.zone.analytics.dns-bytime.*
    - key: com.cloudflare.api.account.zone.setting.read
    - key: com.cloudflare.api.account.zone.setting.list
    - key: com.cloudflare.api.account.zone.rate-plan.read
    - key: com.cloudflare.api.account.zone.subscription.read
    - key: com.cloudflare.api.account.zone.read
    - key: com.cloudflare.api.account.subscription.read
    - key: com.cloudflare.api.account.subscription.list
    - key: com.cloudflare.api.account.read
  resource_groups:
  - scope:
      key: com.cloudflare.api.account.a67e14daa5f8dceeb91fe5449ba496eb
      objects:
      - key: com.cloudflare.api.account.zone.b1fbb152bbde3bd28919a7f4bdca841f
  access: allow

Once we scope the user’s permissions to only include one explicit zone, we see two main differences. First, there is a large reduction in permissions. This is because we do not grant the user access to read or list many account level resources that the legacy account scoped roles grant. The only account level permissions granted are account.read, subscriptions.read, and subscriptions.list. These permissions are necessary for a user to be able to use the dashboard.  When viewing the account in the dashboard the only account level product that will be shown is domains. Other products like Cloudflare Workers, Zero Trust, etc will be hidden.

Second, in the resource groups scope section, we see an explicit mention of a zone (line X: com.cloudflare.api.account.zone.b1fbb152bbde3bd28919a7f4bdca841f). This means that the zone permissions outlined in the policy only apply to that specific zone. Any attempt to access other features of that zone or to modify DNS for any other zones will be rejected by Cloudflare.

What’s next

If you are an enterprise customer and interested in getting started with Domain Scoped Roles, please contact your CSM to get enabled for the Early Access period. This announcement represents a significant milestone in our migration to Bach, an authorization system built for Coudflare’s scale. This will allow us to expand these same capabilities to more products in the future and to create an authorization system that puts customers more in control of their team’s access across all of Cloudflare’s services. Stay tuned as we are just getting started!

Cloudflare Observability

Post Syndicated from Tanushree Sharma original https://blog.cloudflare.com/vision-for-observability/

Cloudflare Observability

Cloudflare Observability

Whether you’re a software engineer deploying a new feature, network engineer updating routes, or a security engineer configuring a new firewall rule: You need visibility to know if your system is behaving as intended — and if it’s not, to know how to fix it.

Cloudflare is committed to helping our customers get visibility into the services they have protected behind Cloudflare. Being a single pane of glass for all network activity has always been one of Cloudflare’s goals. Today, we’re outlining the future vision for Cloudflare observability.

What is observability?

Observability means gaining visibility into the internal state of a system. It’s used to give users the tools to figure out what’s happening, where it’s happening, and why.

At Cloudflare, we believe that observability has three core components: monitoring, analytics, and forensics. Monitoring measures the health of a system – it tells you when something is going wrong. Analytics give you the tools to visualize data to identify patterns and insights. Forensics helps you answer very specific questions about an event.

Observability becomes particularly important in the context of security to validate that any mitigating actions performed by our security products, such as Firewall or Bot Management, are not false positives. Was that request correctly classified as malicious? And if it wasn’t, which detection system classified it as such?

Cloudflare, additionally, has products to improve performance of applications and corporate networks and allow developers to write lightning fast code that runs on our global network. We want to be able to provide our customers with insights into every request, packet, and fetch that goes through Cloudflare’s network.

Monitoring and Notifying

Analytics are fantastic for summarizing data, but how do you know when to look at them? No one wants to sit on the dashboard clicking refresh over and over again just in case something looks off. That’s where notifications come in.

When we talk about something “looking off” on an analytics page, what we really mean is that there’s a significant change in your traffic or network which is reflected by spikes or drops in our analytics. Availability and performance directly affect end users, and our goal is to monitor and notify our customers as soon as we see things going wrong.

Cloudflare Observability

Today, we have many different types of notifications from Origin Error Rates, Security Events, and Advanced Security Events to Usage Based Billing and Health Checks. We’re continuously adding more notification types to have them correspond with our awesome analytics. As our analytics get more customizable, our notifications will as well.

There’s tons of different algorithms that can be used to detect spikes, including using burn rates and z-scores. We’re continuing to iterate on the algorithms that we use for detections to offer more variations, make them smarter, and make sure that our notifications are both accurate and not too noisy.

Analytics

So, you’ve received an alert from Cloudflare. What comes next?

Analytics can be used to get a birds eye view of traffic or focus on specific types of events by adding filters and time ranges. After you receive an alert, we want to show you exactly what’s been triggered through graphs, high level metrics, and top Ns on the Cloudflare dashboard.

Whether you’re a developer, security analyst, or network engineer, the Cloudflare dashboard should be the spot for you to see everything you need. We want to make the dashboard more customizable to serve the diverse use cases of our customers. Analyze data by specifying a timeframe and filter through dropdowns on the dashboard, or build your own metrics and graphs that work alongside the raw logs to give you a clear picture of what’s happening.

Focusing on security, we believe analytics are the best tool to build confidence before deploying security policies. Moving forward, we plan to layer all of our security related detection signals on top of HTTP analytics so you can use the dashboard to answer questions such as: if I were to block all requests that the WAF identifies as an XSS attack, what would I block?

Customers using our enterprise Bot Management may already be familiar with this experience, and as we improve it and build upon it further, all of our other security products will follow.

Cloudflare Observability

Analytics are a powerful tool to see high level patterns and identify anomalies that indicate that something unusual is happening. We’re working on new dashboards, customizations, and features that widen the use cases for our customers. Stay tuned!

Logs

Logs are used when you want to examine specific details about an event. They consist of a timestamp and fields that describe the event and are used to get visibility on a granular level when you need a play-by-play.

In each of our datasets, an event measures something different. For example, in HTTP request logs, an event is when an end user requests content from or sends content to a server. For Firewall logs, an event occurs when the Firewall takes an action on an HTTP request. There can be multiple Firewall events for each HTTP request.

Today, our customers access logs using Logpull, Logpush, or Instant Logs. Logpull and Logpush are great for customers that want to send their logs to third parties (like our Analytics Partners) to store, analyze, and correlate with other data sources. With Instant Logs, our customers can monitor and troubleshoot their traffic in real-time straight from the dashboard or CLI. We’re planning on building out more capabilities to dig into logs on Cloudflare. We’re hard at work on building log storage on R2 – but what’s next?

We’ve heard from customers that the activity log on the Firewall analytics dashboard is incredibly useful. We want to continue to bring the power of logs to the dashboard by adding the same functionality across our products. For customers that will store their logs on Cloudflare R2, this means that we can minimize the use of sampled data.

If you’re looking for something very specific, querying logs is also important, which is where forensics comes in. The goal is to let you investigate from high level analytics all the way down to individual logs lines that make them up. Given a unique identifier, such as the ray ID, you should be able to look up a single request, and then correlate it with all other related activity. Find out the client IP of that ray ID and from there, use cases are plentiful: what other requests from this IP are malicious? What paths did the client follow?

Tracing

Logs are really useful, but they don’t capture the context around a request. Traces show the end-to-end life cycle of a request from when a user requests a resource to each of the systems that are involved in its delivery. They’re another way of applying forensics to help you find something very specific.

These are used to differentiate each part of the application to identify where errors or bottlenecks are occurring. Let’s say that you have a Worker that performs a fetch event to your origin and a third party API. Analytics can show you average execution times and error rates for your Worker, but it doesn’t give you visibility into each of these operations.

Using wrangler dev and console.log statements are really helpful ways to test and debug your code. They bring some of the visibility that’s needed, but it can be tedious to instrument your code like this.

As a developer, you should have the tools to understand what’s going on in your applications so you can deliver the best experience to your end users. We can help you answer questions like: Where is my Worker execution failing? Which operation is causing a spike in latency in my application?

Putting it all together

Notifications, analytics, logs, and tracing each have their distinct use cases, but together, these are powerful tools to provide analysts and developers visibility. Looking forward, we’re excited to bring more and more of these capabilities on the Cloudflare dashboard.

We would love to hear from you as we build these features out. If you’re interested in sharing use cases and helping shape our roadmap, contact your account team!

Announcing the Cloudflare API Gateway

Post Syndicated from Ben Solomon original https://blog.cloudflare.com/api-gateway/

Announcing the Cloudflare API Gateway

Announcing the Cloudflare API Gateway

Over the past decade, the Internet has experienced a tectonic shift. It used to be composed of static websites: with text, images, and the occasional embedded movie. But the Internet has grown enormously. We now rely on API-driven applications to help with almost every aspect of life. Rather than just download files, we are able to engage with apps by exchanging rich data. We track workouts and send the results to the cloud. We use smart locks and all kinds of IoT devices. And we interact with our friends online.

This is all wonderful, but it comes with an explosion of complexity on the back end. Why? Developers need to manage APIs in order to support this functionality. They need to monitor and authenticate every single request. And because these tasks are so difficult, they’re usually outsourced to an API gateway provider.

Unfortunately, today’s gateways leave a lot to be desired. First: they’re not cheap. Then there’s the performance impact. And finally, there’s a data and privacy risk, since more than 50% of traffic reaches APIs (and is presumably sent through a third party gateway). What a mess.

Today we’re announcing the Cloudflare API Gateway. We’re going to completely replace your existing gateway at a fraction of the cost. And our solution uses the technology behind Workers, Bot Management, Access, and Transform Rules to provide the most advanced API toolset on the market.

What is API Gateway?

In short, it’s a package of features that will do everything for your APIs. We break it down into three categories:

Security
These are the products we have already blogged about. Tools like Discovery, Schema Validation, Abuse Detection, and more. We’ve spent a lot of time applying our security expertise to the world of APIs.

Management & Monitoring
These are the foundational tools that keep your APIs in order. Some examples: analytics, routing, and authentication. We are already able to do these things with existing products like Cloudflare Access, and more features are on the way.

Everything Else
These are the small (but crucial) items that keep everything running. Cloudflare already offers SSL/TLS termination, load balancing, and proxy services that can run by default.

Today’s blog post describes each feature in detail. We’re excited to announce that all the security features are now generally available, so let’s start by discussing those.

Discovery

Our customers are eager to protect their APIs. Unfortunately, they don’t always have these endpoints documented—or worse, they think everything is documented, but have unknowingly lost or modified endpoints. These hidden endpoints are sometimes called shadow APIs. We need to begin our journey with an exhaustive (and accurate) picture of API surface area.

That’s where Discovery comes in. Head to the Cloudflare dashboard, select the Security tab, then choose “API Shield.” Activate the feature and tell us how you want to identify your API traffic. Most users provide a header (available today), but we can also use the request body or cookie (available soon).

Announcing the Cloudflare API Gateway

We provide an exhaustive list of your API endpoints. Cloudflare lists each method, path, and additional metadata to help you understand your surface area. We even collapse endpoints that include variables (e.g., /account/217) to become generally applicable (e.g., /account/{var1}).

Discovery is a powerful countermeasure to entropy. Our customers often expect to find 30 endpoints, but are surprised to learn they have over 100 active endpoints.

Schema Validation

Perhaps you already have a schema for your API endpoints. A schema is like a template: it provides the paths, methods, and additional data you expect API requests to include. Many developers follow the OpenAPI standard to generate (and maintain) a schema.

To harden your security, we can validate incoming traffic against this schema. This is a great way to stop basic attacks. Cloudflare will turn away nonconforming requests, discarding nonsense traffic that ignored the dress code. Simply upload your schema to the dashboard, select the actions you want to take, and deploy:

Announcing the Cloudflare API Gateway

Schema Validation has already vetted traffic for some of the world’s largest crypto sites, delivery services, and payment platforms. It’s available now, and we’ll add body validation soon.

Abuse Detection

A robust security approach will use Schema Validation and Discovery in tandem, ensuring traffic matches the expected format. But what about abusive traffic that makes it through?

As Cloudflare discovers new API endpoints, we actually suggest rate limits for each one. That’s the role of Abuse Detection, and it opens the door to a more sophisticated kind of security.

Consider an API endpoint that returns weather updates. Specifically, the endpoint will return “yes” if it is likely to snow in the next hour, and “no” otherwise. Our algorithm might detect that the average user requests this data once every 10 minutes. A small group of scrapers, however, makes 37 requests per 10 minutes. Cloudflare automatically recommends a threshold in between, weighted to provide normal users with some breathing room. This would prevent abusive scraping services from fetching the weather too often.

Announcing the Cloudflare API Gateway

We provide the option to create a rule using our new Advanced Rate Limiting engine. You can use cookies, headers, and more to tune thresholds. We’ve been using Abuse Detection to protect api.cloudflare.com for months now.

Our favorite part of this feature: it relies on the machine learning approach we use for Bot Management. Just another way our products can feed into (and benefit from) each other.

Abuse Detection is available now. If you’re interested in Sequential Abuse Detection, which we use to flag anomalous request flows, check out our previous blog post. The sequential piece is in early access, and we’re continuing to tune it before an official launch.

mTLS

Mutual TLS takes security to a new level. You can use certificates to validate incoming traffic as it reaches your APIs—which is especially useful for mobile and IoT devices. Moreover, this is an excellent positive security model that can (and should) be adopted for most device ecosystems.

Announcing the Cloudflare API Gateway

As an example, let’s return to our weather API. Perhaps this service includes a second endpoint that receives the current temperature from a thermometer. But there’s a problem: anyone can make fake requests, providing inaccurate readings to the endpoint. To prevent this, use mTLS to install a client certificate on the legitimate thermometer, then let Cloudflare validate that certificate. Any other requests will be turned away. Problem solved!

We already offer a set of free certificates to every Cloudflare customer. That will continue. But starting today, API Gateway customers get unlimited certificates by default.

Authentication

Many modern APIs require authentication. In fact, authentication unlocks all sorts of capabilities—it allows sessions (with login), personal data exchange, and infrastructure efficiency. And of course, Cloudflare protects authenticated traffic as it passes through our network.

But with API Gateway, Cloudflare plays a more active role in authenticating traffic, helping to issue and validate the following:

  • API keys
  • JSON web tokens (JWT)
  • OAuth 2.0 tokens

Using access control lists, we help you manage different user groups with varying permissions. And this matters—because your current provider is introducing tons of latency and unnecessary data exchange. If a request has to go somewhere outside the Cloudflare ecosystem, it’s traveling farther than it needs to:

Announcing the Cloudflare API Gateway

Cloudflare can authenticate on our global network and handle requests in a fraction of the time. This kind of technology is difficult to implement, but we felt it was too important to ignore. How did we build it so quickly? Cloudflare Access. We took our experience working with identity providers and, once again, ported it over to the world of APIs. Our gateway includes unlimited authentication and token exchange. These features will be available soon.

Routing & Management

Let’s talk briefly about microservices. Modern applications are behemoths, so developers break them up into smaller chunks called “microservices.”

Consider an application that helps you book a hotel room. It might use a microservice to fetch available dates, another to fetch prices, and still another to fetch room types. Perhaps a different team manages each microservice, but they all need to be available from a single public entry point:

Announcing the Cloudflare API Gateway

That single entry point—traditionally managed by an API gateway—is responsible for routing each request to the right microservice. Many of our customers have been paying standalone services to do this for years. That’s no longer necessary. We’ve built on our Transform Rules product to dynamically re-write and re-route at our edge. It’s easy to configure, fast to deploy, and natively built into API Gateway. Cloudflare can now be your API’s single point of entry.

That’s just the tip of the iceberg. API Gateway can actually replace your microservices through an integration with our Workers product. How? Consider writing a Worker that performs some action; perhaps return hotel prices, which are stored with Durable Objects on our network. With API Gateway, requests arrive at our network, are routed to the correct microservice with Transform Rules, and then are fully served with Workers (still on our network!). These Workers may contact your origin for additional information, where necessary.

Announcing the Cloudflare API Gateway

Workers are faster, cheaper, and simpler than microservice alternatives. This integration will be available soon.

API Analytics

Customers tell us that seeing API traffic is sometimes more important than even acting on it. In fact, this trend isn’t specific to APIs. We published another blog today that explores how one customer uses our bot intelligence to passively log information about threats.

Announcing the Cloudflare API Gateway

With API Analytics, we’ve drawn on our other products to show useful data in real time. You can view popular endpoints, filter by ML-driven insights, see histograms of abuse thresholds, and capture trends.

API Analytics will be available soon. When this happens, you’ll also be able to export custom reports and share insights within your organization.

Logging, Quota Management, and More

All of our established features, like caching, load balancing, and log integrations work natively with API Gateway. These shouldn’t be overlooked as primitive gateway features; they’re essential. And because Cloudflare performs all of these functions in the same place, you get the latency benefits without having to do a thing.

We are also expanding our Enterprise Logs functionality to perform real-time logging. If you choose to authenticate on Cloudflare’s network, you can view detailed logs of each user who has accessed an API. Similarly, we keep track of each request’s lifespan as it is received, validated, routed, and responded to. Everything is logged.

Finally, we are building Quota Management, a feature that counts API requests over a longer period of time (like a month) and allows you to manage thresholds for your users. We’ve also launched Advanced Rate Limiting to help with more sophisticated cases (including body inspection for GraphQL).

Conclusion

Our API security features—Discovery, Schema Validation, Abuse Detection, and mTLS—are available now! We call these features API Shield because they form the shield that protects the remaining gateway functions. Enterprise customers can ask their account teams for access today.

Many of the other portions of API Gateway are now in early access. According to Gartner®, “by 2025, less than 50% of enterprise APIs will be managed, as explosive growth in APIs surpasses the capabilities of API management tools.” Our goal is to offer an affordable gateway that will fight this trend. If you have a specific feature you want to test, let your account team know, so we can onboard you as soon as possible.

Source: Gartner, “Predicts 2022: APIs Demand Improved Security and Management”, Shameen Pillai, Jeremy D’Hoinne, John Santoro, Mark O’Neill, Sham Gill, 6 December 2021. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.