Tag Archives: Cloudflare Workers

Connecting to production: the architecture of remote bindings

Post Syndicated from Samuel Macleod original https://blog.cloudflare.com/connecting-to-production-the-architecture-of-remote-bindings/

Remote bindings are bindings that connect to a deployed resource on your Cloudflare account instead of a locally simulated resource – and recently, we announced that remote bindings are now generally available

With this launch, you can now connect to deployed resources like R2 buckets and D1 databases while running Worker code on your local machine. This means you can test your local code changes against real data and services, without the overhead of deploying for each iteration. 

In this blog post, we’ll dig into the technical details of how we built it, creating a seamless local development experience.

Developing on the Workers platform

A key part of the Cloudflare Workers platform has been the ability to develop your code locally without having to deploy it every time you wanted to test something – though the way we’ve supported this has changed greatly over the years. 

We started with wrangler dev running in remote mode. This works by deploying and connecting to a preview version of your Worker that runs on Cloudflare’s network every time you make a change to your code, allowing you to test things out as you develop. However, remote mode isn’t perfect — it’s complex and hard to maintain. And the developer experience leaves a lot to be desired: slow iteration speed, unstable debugging connections, and lack of support for multi-worker scenarios. 

Those issues and others motivated a significant investment in a fully local development environment for Workers, which was released in mid-2023 and became the default experience for wrangler dev. Since then, we’ve put a huge amount of work into the local dev experience with Wrangler, the Cloudflare Vite plugin (alongside @cloudflare/vitest-pool-workers) & Miniflare.

Still, the original remote mode remained accessible via a flag: wrangler dev --remote. When using remote mode, all the DX benefits of a fully local experience and the improvements we’ve made over the last few years are bypassed. So why do people still use it? It enables one key unique feature: binding to remote resources while locally developing. When you use local mode to develop a Worker locally, all of your bindings are simulated locally using local (initially empty) data. This is fantastic for iterating on your app’s logic with test data – but sometimes that’s not enough, whether you want to share resources across your team, reproduce bugs tied to real data, or just be confident that your app will work in production with real resources.

Given this, we saw an opportunity: If we could bring the best parts of remote mode (i.e. access to remote resources) to wrangler dev, there’d be one single flow for developing Workers that would enable many use cases, while not locking people out of the advancements we’ve made to local development. And that’s what we did! 

As of Wrangler v4.37.0 you can pick on a per-binding basis whether a binding should use remote or local resources, simply by specifying the remote option. It’s important to re-emphasise this—you only need to add remote: true! There’s no complex management of API keys and credentials involved, it all just works using Wrangler’s existing Oauth connection to the Cloudflare API.

{
  "name": "my-worker",
  "compatibility_date": "2025-01-01",
  "kv_namespaces": [{
    "binding": "KV",
    "id": "my-kv-id",
  },{
    "binding": "KV_2",
    "id": "other-kv-id",
    "remote": true
  }],
  "r2_buckets": [{
    "bucket_name": "my-r2-name",
    "binding": "R2"
  }]
}

The eagle-eyed among you might have realised that some bindings already worked like this, accessing remote resources from local dev. Most prominently, the AI binding was a trailblazer for what a general remote bindings solution could look like. From its introduction, the AI binding always connected to a remote resource, since a true local experience that supports all the different models you can use with Workers AI would be impractical and require a huge upfront download of AI models. 

As we realised different products within Workers needed something similar to remote bindings (Images and Hyperdrive, for instance), we ended up with a bit of a patchwork of different solutions. We’ve now unified under a single remote bindings solution that works for all binding types.

How we built it

We wanted to make it really easy for developers to access remote resources without having to change their production Workers code, and so we landed on a solution that required us to fetch data from the remote resource at the point of use in your Worker.

const value = await env.KV.get("some-key")

The above code snippet shows accessing the “some-key” value in the env.KV KV namespace, which is not available locally and needs to be fetched over the network.

So if that was our requirement, how would we get there? For instance, how would we get from a user calling env.KV.put(“key”, “value”) in their Worker to actually storing that in a remote KV store? The obvious solution was perhaps to use the Cloudflare API. We could have just replaced the entire env locally with stub objects that made API calls, transforming env.KV.put() into PUT http:///accounts/{account_id}/storage/kv/namespaces/{namespace_id}/values/{key_name}

This would’ve worked great for KV, R2, D1, and other bindings with mature HTTP APIs, but it would have been a pretty complex solution to implement and maintain. We would have had to replicate the entire bindings API surface and transform every possible operation on a binding to an equivalent API call. Additionally, some binding operations don’t have an equivalent API call, and wouldn’t be supportable using this strategy.

Instead, we realised that we already had a ready-made API waiting for us — the one we use in production! 

How bindings work under the hood in production

Most bindings on the Workers platform boil down to essentially a service binding. A service binding is a link between two Workers that allows them to communicate over HTTP or JSRPC (we’ll come back to JSRPC later). 

For example, the KV binding is implemented as a service binding between your authored Worker and a platform Worker, speaking HTTP. The JS API for the KV binding is implemented in the Workers runtime, and translates calls like env.KV.get() to HTTP calls to the Worker that implements the KV service. 


Diagram showing a simplified model of how a KV binding works in production

You may notice that there’s a natural async network boundary here — between the runtime translating the env.KV.get() call and the Worker that implements the KV service. We realised that we could use that natural network boundary to implement remote bindings. Instead of the production runtime translating env.KV.get() to an HTTP call, we could have the local runtime (workerd) translate env.KV.get() to an HTTP call, and then send it directly to the KV service, bypassing the production runtime. And so that’s what we did!


Diagram showing a locally run worker with a single KV binding, with a single remote proxy client that communicates to the remote proxy server, which in turn communicates with the remote KV

The above diagram shows a local Worker running with a remote KV binding. Instead of being handled by the local KV simulation, it’s now being handled by a remote proxy client. This Worker then communicates with a remote proxy server connected to the real remote KV resource, ultimately allowing the local Worker to communicate with the remote KV data seamlessly.

Each binding can independently either be handled by a remote proxy client (all connected to the same remote proxy server) or by a local simulation, allowing for very dynamic workflows where some bindings are locally simulated while others connect to the real remote resource, as illustrated in the example below:


The above diagram and config shows a Worker (running on your computer) bound to 3 different resources—two local (KV & R2), and one remote (KV_2)

How JSRPC fits in

The above section deals with bindings that are backed by HTTP connections (like KV and R2), but modern bindings use JSRPC. That means we needed a way for the locally running workerd to speak JSRPC to a production runtime instance. 

In a stroke of good luck, a parallel project was going on to make this possible, as detailed in the Cap’n Web blog. We integrated that by making the connection between the local workerd instance and the remote runtime instance communicate over websockets using Cap’n Web, enabling bindings backed by JSRPC to work. This includes newer bindings like Images, as well as JSRPC service bindings to your own Workers.

Remote bindings with Vite, Vitest and the JavaScript ecosystem

We didn’t want to limit this exciting new feature to only wrangler dev. We wanted to support it in our Cloudflare Vite Plugin and vitest-pool-workers packages, as well as allowing any other potential tools and use cases from the JavaScript ecosystem to also benefit from it.

In order to achieve this, the wrangler package now exports utilities such as startRemoteProxySession that allow tools not leveraging wrangler dev to also support remote bindings. You can find more details in the official remote bindings documentation.

How do I try this out?

Just use wrangler dev! As of Wrangler v4.37.0 (@cloudflare/vite-plugin v1.13.0, @cloudflare/vitest-pool-workers v0.9.0), remote bindings are available in all projects, and can be turned on a per-binding basis by adding remote: true to the binding definition in your Wrangler config file.

A closer look at Python Workflows, now in beta

Post Syndicated from Caio Nogueira original https://blog.cloudflare.com/python-workflows/

Developers can already use Cloudflare Workflows to build long-running, multi-step applications on Workers. Now, Python Workflows are here, meaning you can use your language of choice to orchestrate multi-step applications.

With Workflows, you can automate a sequence of idempotent steps in your application with built-in error handling and retry behavior. But Workflows were originally supported only in TypeScript. Since Python is the de facto language of choice for data pipelines, artificial intelligence/machine learning, and task automation – all of which heavily rely on orchestration – this created friction for many developers.

Over the years, we’ve been giving developers the tools to build these applications in Python, on Cloudflare. In 2020, we brought Python to Workers via Transcrypt before directly integrating Python into workerd in 2024. Earlier this year, we built support for CPython along with any packages built in Pyodide, like matplotlib and pandas, in Workers. Now, Python Workflows are supported as well, so developers can create robust applications using the language they know best.

Why Python for Workflows?

Imagine you’re training an LLM. You need to label the dataset, feed data, wait for the model to run, evaluate the loss, adjust the model, and repeat. Without automation, you’d need to start each step, monitor manually until completion, and then start the next one. Instead, you could use a workflow to orchestrate the training of the model, triggering each step pending the completion of its predecessor. For any manual adjustments needed, like evaluating the loss and adjusting the model accordingly, you can implement a step that notifies you and waits for the necessary input.

Consider data pipelines, which are a top Python use case for ingesting and processing data. By automating the data pipeline through a defined set of idempotent steps, developers can deploy a workflow that handles the entire data pipeline for them.

Take another example: building AI agents, such as an agent to manage your groceries. Each week, you input your list of recipes, and the agent (1) compiles the list of necessary ingredients, (2) checks what ingredients you have left over from previous weeks, and (3) orders the differential for pickup from your local grocery store. Using a Workflow, this could look like:

  1. await step.wait_for_event() the user inputs the grocery list

  2. step.do() compile list of necessary ingredients

  3. step.do() check list of necessary ingredients against left over ingredients

  4. step.do() make an API call to place the order

  5. step.do() proceed with payment

Using workflows as a tool to build agents on Cloudflare can simplify agents’ architecture and improve their odds for reaching completion through individual step retries and state persistence. Support for Python Workflows means building agents with Python is easier than ever.

How Python Workflows work

Cloudflare Workflows uses the underlying infrastructure that we created for durable execution, while providing an idiomatic way for Python users to write their workflows. In addition, we aimed for complete feature parity between the Javascript and the Python SDK. This is possible because Cloudflare Workers support Python directly in the runtime itself. 

Creating a Python Workflow

Cloudflare Workflows are fully built on top of Workers and Durable Objects. Each element plays a part in storing Workflow metadata, and instance level information. For more detail on how the Workflows platform works, check out this blog post.

At the very bottom of the Workflows control plane sits the user Worker, which is the WorkflowEntrypoint. When the Workflow instance is ready to run, the Workflow engine will call into the run method of the user worker via RPC, which in this case will be a Python Worker.

This is an example skeleton for a Workflow declaration, provided by the official documentation:

export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
  async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
    // Steps here
  }
}

The run method, as illustrated above, provides a WorkflowStep parameter that implements the durable execution APIs. This is what users rely on for at-most-once execution. These APIs are implemented in JavaScript and need to be accessed in the context of the Python Worker.

A WorkflowStep must cross the RPC barrier, meaning the engine (caller) exposes it as an RpcTarget. This setup allows the user’s Workflow (callee) to substitute the parameter with a stub. This stub then enables the use of durable execution APIs for Workflows by RPCing back to the engine. To read more about RPC serialization and how functions can be passed from caller and callee, read the Remote-Procedure call documentation.

All of this is true for both Python and JavaScript Workflows, since we don’t really change how the user Worker is called from the Workflows side. However, in the Python case, there is another barrier – language bridging between Python and the JavaScript module. When an RPC request targets a Python Worker, there is a Javascript entrypoint module responsible for proxying the request to be handled by the Python script, and then returned to the caller. This process typically involves type translation before and after handling the request.

Overcoming the language barrier

Python workers rely on Pyodide, which is a port of CPython to WebAssembly. Pyodide provides a foreign function interface (FFI) to JavaScript which allows for calling into JavaScript methods from Python. This is the mechanism that allows other bindings and Python packages to work within the Workers platform. Therefore, we use this FFI layer not only to allow using the Workflow binding directly, but also to provide WorkflowStep methods in Python. In other words, by considering that WorkflowEntrypoint is a special class for the runtime, the run method is manually wrapped so that WorkflowStep is exposed as a JsProxy instead of being type translated like other JavaScript objects. Moreover, by wrapping the APIs from the perspective of the user Worker, we allow ourselves to make some adjustments to the overall development experience, instead of simply exposing a JavaScript SDK to a different language with different semantics. 

Making the Python Workflows SDK Pythonic

A big part of porting Workflows to Python includes exposing an interface that Python users will be familiar with and have no problems using, similarly to what happens with our JavaScript APIs. Let’s take a step back and look at a snippet for a Workflow (written in Typescript) definition.

import { WorkflowEntrypoint, WorkflowStep, WorkflowEvent} from 'cloudflare:workers';
 
export class MyWorkflow extends WorkflowEntrypoint {
    async run(event: WorkflowEvent<YourEventType>, step: WorkflowStep) {
        let state = step.do("my first step", async () => {
          // Access your properties via event.payload
          let userEmail = event.payload.userEmail
          let createdTimestamp = event.payload.createdTimestamp
          return {"userEmail": userEmail, "createdTimestamp": createdTimestamp}
	    })
 
        step.sleep("my first sleep", "30 minutes");
 
        await step.waitForEvent<EventType>("receive example event", { type: "simple-event", timeout: "1 hour" })
 
   	 const developerWeek = Date.parse("22 Sept 2025 13:00:00 UTC");
        await step.sleepUntil("sleep until X times out", developerWeek)
    }
}

The Python implementation of the workflows API requires modification of the do method. Unlike other languages, Python does not easily support anonymous callbacks. This behavior is typically achieved through the use of decorators, which in this case allow us to intercept the method and expose it idiomatically. In other words, all parameters maintain their original order, with the decorated method serving as the callback.

The methods waitForEvent, sleep, and sleepUntil can retain their original signatures, as long as their names are converted to snake case.

Here’s the corresponding Python version for the same workflow, achieving similar behavior:

from workers import WorkflowEntrypoint
 
class MyWorkflow(WorkflowEntrypoint):
    async def run(self, event, step):
        @step.do("my first step")
        async def my_first_step():
            user_email = event["payload"]["userEmail"]
            created_timestamp = event["payload"]["createdTimestamp"]
            return {
                "userEmail": user_email,
                "createdTimestamp": created_timestamp,
            }
 
        await my_first_step()
 
        step.sleep("my first sleep", "30 minutes")
 
         await step.wait_for_event(
            "receive example event",
            "simple-event",
            timeout="1 hour",
        )
 
        developer_week = datetime(2024, 10, 24, 13, 0, 0, tzinfo=timezone.utc)
        await step.sleep_until("sleep until X times out", developer_week)

DAG Workflows

When designing Workflows, we’re often managing dependencies between steps even when some of these tasks can be handled concurrently. Even though we’re not thinking about it, many Workflows have a directed acyclic graph (DAG) execution flow. Concurrency is achievable in the first iteration of Python Workflows (i.e.: minimal port to Python Workers) because Pyodide captures Javascript thenables and proxies them into Python awaitables.

Consequently, asyncio.gather works as a counterpart to Promise.all. Although this is perfectly fine and ready to be used in the SDK, we also support a declarative approach.

One of the advantages of decorating the do method is that we can essentially provide further abstractions on the original API, and have them work on the entrypoint wrapper. Here’s an example of a Python API making use of the DAG capabilities introduced:

from workers import Response, WorkflowEntrypoint

class PythonWorkflowDAG(WorkflowEntrypoint):
    async def run(self, event, step):

        @step.do('dependency 1')
        async def dep_1():
            # does stuff
            print('executing dep1')

        @step.do('dependency 2')
        async def dep_2():
            # does stuff
            print('executing dep2')

        @step.do('demo do', depends=[dep_1, dep_2], concurrent=True)
        async def final_step(res1=None, res2=None):
            # does stuff
            print('something')

        await final_step()

This kind of approach makes the Workflow declaration much cleaner, leaving state management to the Workflows engine data plane, as well as the Python workers Workflow wrapper. Note that even though multiple steps can run with the same name, the engine will slightly modify the name of each step to ensure uniqueness. In Python Workflows, a dependency is considered resolved once the initial step involving it has been successfully completed.

Try it out

Check out writing Workers in Python and create your first Python Workflow today! If you have any feature requests or notice any bugs, share your feedback directly with the Cloudflare team by joining the Cloudflare Developers community on Discord.

How Workers VPC Services connects to your regional private networks from anywhere in the world

Post Syndicated from Thomas Gauvin original https://blog.cloudflare.com/workers-vpc-open-beta/

In April, we shared our vision for a global virtual private cloud on Cloudflare, a way to unlock your applications from regionally constrained clouds and on-premise networks, enabling you to build truly cross-cloud applications.

Today, we’re announcing the first milestone of our Workers VPC initiative: VPC Services. VPC Services allow you to connect to your APIs, containers, virtual machines, serverless functions, databases and other services in regional private networks via Cloudflare Tunnels from your Workers running anywhere in the world. 

Once you set up a Tunnel in your desired network, you can register each service that you want to expose to Workers by configuring its host or IP address. Then, you can access the VPC Service as you would any other Workers service binding — Cloudflare’s network will automatically route to the VPC Service over Cloudflare’s network, regardless of where your Worker is executing:

export default {
  async fetch(request, env, ctx) {
    // Perform application logic in Workers here	

    // Call an external API running in a ECS in AWS when needed using the binding
    const response = await env.AWS_VPC_ECS_API.fetch("http://internal-host.com");

    // Additional application logic in Workers
    return new Response();
  },
};

Workers VPC is now available to everyone using Workers, at no additional cost during the beta, as is Cloudflare Tunnels. Try it out now. And read on to learn more about how it works under the hood.

Connecting the networks you trust, securely

Your applications span multiple networks, whether they are on-premise or in external clouds. But it’s been difficult to connect from Workers to your APIs and databases locked behind private networks. 

We have previously described how traditional virtual private clouds and networks entrench you into traditional clouds. While they provide you with workload isolation and security, traditional virtual private clouds make it difficult to build across clouds, access your own applications, and choose the right technology for your stack.

A significant part of the cloud lock-in is the inherent complexity of building secure, distributed workloads. VPC peering requires you to configure routing tables, security groups and network access-control lists, since it relies on networking across clouds to ensure connectivity. In many organizations, this means weeks of discussions and many teams involved to get approvals. This lock-in is also reflected in the solutions invented to wrangle this complexity: Each cloud provider has their own bespoke version of a “Private Link” to facilitate cross-network connectivity, further restricting you to that cloud and the vendors that have integrated with it.

With Workers VPC, we’re simplifying that dramatically. You set up your Cloudflare Tunnel once, with the necessary permissions to access your private network. Then, you can configure Workers VPC Services, with the tunnel and hostname (or IP address and port) of the service you want to expose to Workers. Any request made to that VPC Service will use this configuration to route to the given service within the network.

{
  "type": "http",
  "name": "vpc-service-name",
  "http_port": 80,
  "https_port": 443,
  "host": {
    "hostname": "internally-resolvable-hostname.com",
    "resolver_network": {
      "tunnel_id": "0191dce4-9ab4-7fce-b660-8e5dec5172da"
    }
  }
}

This ensures that, once represented as a Workers VPC Service, a service in your private network is secured in the same way other Cloudflare bindings are, using the Workers binding model. Let’s take a look at a simple VPC Service binding example:

{
  "name": "WORKER-NAME",
  "main": "./src/index.js",
  "vpc_services": [
    {
      "binding": "AWS_VPC2_ECS_API",
      "service_id": "5634563546"
    }
  ]
}

Like other Workers bindings, when you deploy a Worker project that tries to connect to a VPC Service, the access permissions are verified at deploy time to ensure that the Worker has access to the service in question. And once deployed, the Worker can use the VPC Service binding to make requests to that VPC Service — and only that service within the network. 

That’s significant: Instead of exposing the entire network to the Worker, only the specific VPC Service can be accessed by the Worker. This access is verified at deploy time to provide a more explicit and transparent service access control than traditional networks and access-control lists do.

This is a key factor in the design of Workers bindings: de facto security with simpler management and making Workers immune to Server-Side Request Forgery (SSRF) attacks. We’ve gone deep on the binding security model in the past, and it becomes that much more critical when accessing your private networks. 

Notably, the binding model is also important when considering what Workers are: scripts running on Cloudflare’s global network. They are not, in contrast to traditional clouds, individual machines with IP addresses, and do not exist within networks. Bindings provide secure access to other resources within your Cloudflare account – and the same applies to Workers VPC Services.

A peek under the hood

So how do VPC Services and their bindings route network requests from Workers anywhere on Cloudflare’s global network to regional networks using tunnels? Let’s look at the lifecycle of a sample HTTP Request made from a VPC Service’s dedicated fetch() request represented here:


It all starts in the Worker code, where the .fetch() function of the desired VPC Service is called with a standard JavaScript Request (as represented with Step 1). The Workers runtime will use a Cap’n Proto remote-procedure-call to send the original HTTP request alongside additional context, as it does for many other Workers bindings. 

The Binding Worker of the VPC Service System receives the HTTP request along with the binding context, in this case, the Service ID of the VPC Service being invoked. The Binding Worker will proxy this information to the Iris Service within an HTTP CONNECT connection, a standard pattern across Cloudflare’s bindings to place connection logic to Cloudflare’s edge services within Worker code rather than the Workers runtime itself (Step 2). 

The Iris Service is the main service for Workers VPC. Its responsibility is to accept requests for a VPC Service and route them to the network in which your VPC Service is located. It does this by integrating with Apollo, an internal service of Cloudflare One. Apollo provides a unified interface that abstracts away the complexity of securely connecting to networks and tunnels, across various layers of networking

To integrate with Apollo, Iris must complete two tasks. First, Iris will parse the VPC Service ID from the metadata and fetch the information of the tunnel associated with it from our configuration store. This includes the tunnel ID and type from the configuration store (Step 3), which is the information that Iris needs to send the original requests to the right tunnel.

Second, Iris will create the UDP datagrams containing DNS questions for the A and AAAA records of the VPC Service’s hostname. These datagrams will be sent first, via Apollo. Once DNS resolution is completed, the original request is sent along, with the resolved IP address and port (Step 4). That means that steps 4 through 7 happen in sequence twice for the first request: once for DNS resolution and a second time for the original HTTP Request. Subsequent requests benefit from Iris’ caching of DNS resolution information, minimizing request latency.

In Step 5, Apollo receives the metadata of the Cloudflare Tunnel that needs to be accessed, along with the DNS resolution UDP datagrams or the HTTP Request TCP packets. Using the tunnel ID, it determines which datacenter is connected to the Cloudflare Tunnel. This datacenter is in a region close to the Cloudflare Tunnel, and as such, Apollo will route the DNS resolution messages and the Original Request to the Tunnel Connector Service running in that datacenter (Step 5).


The Tunnel Connector Service is responsible for providing access to the Cloudflare Tunnel to the rest of Cloudflare’s network. It will relay the DNS resolution questions, and subsequently the original request to the tunnel over the QUIC protocol (Step 6).

Finally, the Cloudflare Tunnel will send the DNS resolution questions to the DNS resolver of the network it belongs to. It will then send the original HTTP Request from its own IP address to the destination IP and port (Step 7). The results of the request are then relayed all the way back to the original Worker, from the datacenter closest to the tunnel all the way to the original Cloudflare datacenter executing the Worker request.

What VPC Service allows you to build

This unlocks a whole new tranche of applications you can build on Cloudflare. For years, Workers have excelled at the edge, but they’ve largely been kept “outside” your core infrastructure. They could only call public endpoints, limiting their ability to interact with the most critical parts of your stack—like a private accounts API or an internal inventory database. Now, with VPC Services, Workers can securely access those private APIs, databases, and services, fundamentally changing what’s possible.


This immediately enables true cross-cloud applications that span Cloudflare Workers and any other cloud like AWS, GCP or Azure. We’ve seen many customers adopt this pattern over the course of our private beta, establishing private connectivity between their external clouds and Cloudflare Workers. We’ve even done so ourselves, connecting our Workers to Kubernetes services in our core datacenters to power the control plane APIs for many of our services. Now, you can build the same powerful, distributed architectures, using Workers for global scale while keeping stateful backends in the network you already trust.

It also means you can connect to your on-premise networks from Workers, allowing you to modernize legacy applications with the performance and infinite scale of Workers. More interesting still are some emerging use cases for developer workflows. We’ve seen developers run cloudflared on their laptops to connect a deployed Worker back to their local machine for real-time debugging. The full flexibility of Cloudflare Tunnels is now a programmable primitive accessible directly from your Worker, opening up a world of possibilities.

The path ahead of us

VPC Services is the first milestone within the larger Workers VPC initiative, but we’re just getting started. Our goal is to make connecting to any service and any network, anywhere in the world, a seamless part of the Workers experience. Here’s what we’re working on next:

Deeper network integration. Starting with Cloudflare Tunnels was a deliberate choice. It’s a highly available, flexible, and familiar solution, making it the perfect foundation to build upon. To provide more options for enterprise networking, we’re going to be adding support for standard IPsec tunnels, Cloudflare Network Interconnect (CNI), and AWS Transit Gateway, giving you and your teams more choices and potential optimizations. Crucially, these connections will also become truly bidirectional, allowing your private services to initiate connections back to Cloudflare resources such as pushing events to Queues or fetching from R2.

Expanded protocol and service support. The next step beyond HTTP is enabling access to TCP services. This will first be achieved by integrating with Hyperdrive. We’re evolving the previous Hyperdrive support for private databases to be simplified with VPC Services configuration, avoiding the need to add Cloudflare Access and manage security tokens. This creates a more native experience, complete with Hyperdrive’s powerful connection pooling. Following this, we will add broader support for raw TCP connections, unlocking direct connectivity to services like Redis caches and message queues from Workers ‘connect()’.

Ecosystem compatibility. We want to make connecting to a private service feel as natural as connecting to a public one. To do so, we will be providing a unique autogenerated hostname for each Workers VPC Service, similar to Hyperdrive’s connection strings. This will make it easier to use Workers VPC with existing libraries and object–relational mapping libraries that may require a hostname (e.g., in a global ‘fetch()’ call or a MongoDB connection string). Workers VPC Service hostname will automatically resolve and route to the correct VPC Service, just as the ‘fetch()’ command does.

Get started with Workers VPC

We’re excited to release Workers VPC Services into open beta today. We’ve spent months building out and testing our first milestone for Workers to private network access. And we’ve refined it further based on feedback from both internal teams and customers during the closed beta. 

Now, we’re looking forward to enabling everyone to build cross-cloud apps on Workers with Workers VPC, available for free during the open beta. With Workers VPC, you can bring your apps on private networks to region Earth, closer to your users and available to Workers across the globe.

Get started with Workers VPC Services for free now.

Building a better testing experience for Workflows, our durable execution engine for multi-step applications

Post Syndicated from Olga Silva original https://blog.cloudflare.com/better-testing-for-workflows/

Cloudflare Workflows is our take on “Durable Execution.” They provide a serverless engine, powered by the Cloudflare Developer Platform, for building long-running, multi-step applications that persist through failures. When Workflows became generally available earlier this year, they allowed developers to orchestrate complex processes that would be difficult or impossible to manage with traditional stateless functions. Workflows handle state, retries, and long waits, allowing you to focus on your business logic.

However, complex orchestrations require robust testing to be reliable. To date, testing Workflows was a black-box process. Although you could test if a Workflow instance reached completion through an await to its status, there was no visibility into the intermediate steps. This made debugging really difficult. Did the payment processing step succeed? Did the confirmation email step receive the correct data? You couldn’t be sure without inspecting external systems or logs. 

Why was this necessary?

As developers ourselves, we understand the need to ensure reliable code, and we heard your feedback loud and clear: the developer experience for testing Workflows needed to be better.

The black box nature of testing was one part of the problem. Beyond that, though, the limited testing offered came at a high cost. If you added a workflow to your project, even if you weren’t testing the workflow directly, you were required to disable isolated storage because we couldn’t guarantee isolation between tests. Isolated storage is a vitest-pool-workers feature to guarantee that each test runs in a clean, predictable environment, free from the side effects of other tests. Being forced to have it disabled meant that state could leak between tests, leading to flaky, unpredictable, and hard-to-debug failures.

This created a difficult choice for developers building complex applications. If your project used Workers, Durable Objects, and R2 alongside Workflows, you had to either abandon isolated testing for your entire project or skip testing. This friction resulted in a poor testing experience, which in turn discouraged the adoption of Workflows. Solving this wasn’t just an improvement, it was a critical step in making Workflows part of any well-tested Cloudflare application.

Introducing isolated testing for Workflows

We’re introducing a new set of APIs that enable comprehensive, granular, and isolated testing for your Workflows, all running locally and offline with vitest-pool-workers, our testing framework that supports running tests in the Workers runtime workerd. This enables fast, reliable, and cheap test runs that don’t depend on a network connection.

They are available through the cloudflare:test module, with @cloudflare/vitest-pool-workers version 0.9.0 and above. The new test module provides two primary functions to introspect your Workflows:

  • introspectWorkflowInstance: useful for unit tests with known instance IDs

  • introspectWorkflow: useful for integration tests where IDs are typically generated dynamically.

Let’s walk through a practical example.

A practical example: testing a blog moderation workflow

Imagine a simple Workflow for moderating a blog. When a user submits a comment, the Workflow requests a review from workers-ai. Based on the violation score returned, it then waits for a moderator to approve or deny the comment. If approved, it calls a step.do to publish the comment via an external API.

Testing this without our new APIs would be impossible. You’d have no direct way to simulate the step’s outcomes and simulate the moderator’s approval. Now, you can mock everything.

Here’s the test code using introspectWorkflowInstance with a known instance ID:

import { env, introspectWorkflowInstance } from "cloudflare:test";

it("should mock a an ambiguous score, approve comment and complete", async () => {
   // CONFIG
   await using instance = await introspectWorkflowInstance(
       env.MODERATOR,
       "my-workflow-instance-id-123"
   );
   await instance.modify(async (m) => {
       await m.mockStepResult({ name: "AI content scan" }, { violationScore: 50 });
       await m.mockEvent({ 
           type: "moderation-approval", 
           payload: { action: "approved" },
       });
       await m.mockStepResult({ name: "publish comment" }, { status: "published" });
   });

   await env.MODERATOR.create({ id: "my-workflow-instance-id-123" });
   
   // ASSERTIONS
   expect(await instance.waitForStepResult({ name: "AI content scan" })).toEqual(
       { violationScore: 50 }
   );
   expect(
       await instance.waitForStepResult({ name: "publish comment" })
   ).toEqual({ status: "published" });

   await expect(instance.waitForStatus("complete")).resolves.not.toThrow();
});

This test mocks the outcomes of steps that require external API calls, such as the ‘AI content scan’, which calls Workers AI, and the ‘publish comment’ step, which calls an external blog API.

If the instance ID is not known, because you are either making a worker request that starts one/multiple Workflow instances with random generated ids, you can call introspectWorkflow(env.MY_WORKFLOW). Here’s the test code for that scenario, where only one Workflow instance is created:

it("workflow mock a non-violation score and be successful", async () => {
   // CONFIG
   await using introspector = await introspectWorkflow(env.MODERATOR);
   await introspector.modifyAll(async (m) => {
       await m.disableSleeps();
       await m.mockStepResult({ name: "AI content scan" }, { violationScore: 0 });
   });

   await SELF.fetch(`https://mock-worker.local/moderate`);

   const instances = introspector.get();
   expect(instances.length).toBe(1);

   // ASSERTIONS
   const instance = instances[0];
   expect(await instance.waitForStepResult({ name: "AI content scan"  })).toEqual({ violationScore: 0 });
   await expect(instance.waitForStatus("complete")).resolves.not.toThrow();
});

Notice how in both examples we’re calling the introspectors with await using – this is the Explicit Resource Management syntax from modern JavaScript. It is crucial here because when the introspector objects go out of scope at the end of the test, its disposal method is automatically called. This is how we ensure each test works with its own isolated storage.

The modify and modifyAll functions are the gateway to controlling instances. Inside its callback, you get access to a modifier object with methods to inject behavior such as mocking step outcomes, events and disabling sleeps.

You can find detailed documentation on the Workers Cloudflare Docs.

How we connected Vitest to the Workflows Engine

To understand the solution, you first need to understand the local architecture. When you run wrangler dev, your Workflows are powered by Miniflare, a simulator for testing Cloudflare Workers, and workerd. Each running workflow instance is backed by its own SQLite Durable Object, which we call the “Engine DO”. This Engine DO is responsible for executing steps, persisting state, and managing the instance’s lifecycle. It lives inside the local isolated Workers runtime.

Meanwhile, the Vitest test runner is a separate Node.js process living outside of workerd. This is why we have a Vitest custom pool that allows tests to run inside workerd called vitest-pool-workers. Vitest-pool-workers has a Runner Worker, which is a worker to run the tests with bindings to everything specified in the user wrangler.json file. This worker has access to the APIs under the “cloudflare:test” module. It communicates with Node.js through a special DO called Runner Object via WebSocket/RPC.

The first approach we considered was to use the test runner worker. In its current state, Runner worker has access to Workflow bindings from Workflows defined on the wrangler file. We considered also binding each Workflow’s Engine DO namespace to this runner worker. This would give vitest-pool-workers direct access to the Engine DOs where it would be possible to directly call Engine methods. 


While promising, this approach would have required undesirable changes to the core of Miniflare and vitest-pool-workers, making it too invasive for this single feature. 

Firstly, we would have needed to add a new unsafe field to Miniflare’s Durable Objects. Its sole purpose would be to specify the service name of our Engines, preventing Miniflare from applying its default user prefix which would otherwise prevent the Durable Objects from being found.

Secondly, vitest-pool-workers would have been forced to bind every Engine DO from the Workflows in the project to its runner, even those not being tested. This would introduce unwanted bindings into the test environment, requiring an additional cleanup to ensure they were not exposed to the user’s tests env.

The breakthrough

The solution is a combination of privileged local-only APIs and Remote Procedure Calls (RPC).

First, we added a set of unsafe functions to the local implementation of the Workflows binding, functions that are not available in the production environment. They act as a controlled access point, accessible from the test environment, allowing the test runner to get a stub to a specific Engine DO by providing its instance ID.

Once the test runner has this stub, it uses RPC to call specific, trusted methods on the Engine DO via a special RpcTarget called WorkflowInstanceModifier. Any class that extends RpcTarget has its objects replaced by a stub. Calling a method on this stub, in turn, makes an RPC back to the original object.


This simpler approach is far less invasive because it’s confined to the Workflows environment, which also ensures any future feature changes are safely isolated.

Introspecting Workflows with unknown IDs

When creating Workflows instances (either by create() or createBatch()) developers can provide a specific ID or have it automatically generated for them. This ID identifies the Workflow instance and is then used to create the associated Engine DO ID.

The logical starting point for implementation was introspectWorkflowInstance(binding, instanceID), as the instance ID is known in advance. This allows us to generate the Engine DO ID required to identify the engine associated with that Workflow instance.

But often, one part of your application (like an HTTP endpoint) will create a Workflow instance with a randomly generated ID. How can we introspect an instance when we don’t know its ID until after it’s created?

The answer was to use a powerful feature of JavaScript: Proxy objects.

When you use introspectWorkflow(binding), we wrap the Workflow binding in a Proxy. This proxy non-destructively intercepts all calls to the binding, specifically looking for .create() and .createBatch(). When your test triggers a workflow creation, the proxy inspects the call. It captures the instance ID — either one you provided or the random one generated — and immediately sets up the introspection on that ID, applying all the modifications you defined in the modifyAll call. The original creation call then proceeds as normal.

env[workflow] = new Proxy(env[workflow], {
  get(target, prop) {
    if (prop === "create") {
      return new Proxy(target.create, {
        async apply(_fn, _this, [opts = {}]) {

          // 1. Ensure an ID exists 
          const optsWithId = "id" in opts ? opts : { id: crypto.randomUUID(), ...opts };

          // 2. Apply test modifications before creation
          await introspectAndModifyInstance(optsWithId.id);

          // 3. Call the original 'create' method 
          return target.create(optsWithId);
        },
      });
    }

    // Same logic for createBatch()
  }
}

When the await using block from introspectWorkflow() finishes, or the dispose() method is called at the end of the test, the introspector is disposed of, and the proxy is removed, leaving the binding in its original state. It’s a low-impact approach that prioritizes developer experience and long-term maintainability.

Get started with testing Workflows

Ready to add tests to your Workflows? Here’s how to get started:

  1. Update your dependencies: Make sure you are using @cloudflare/vitest-pool-workers version 0.9.0 or newer. Run the following command in your project: npm install @cloudflare/vitest-pool-workers@latest

  2. Configure your test environment: If you’re new to testing on Workers, follow our guide to write your first test.

Start writing tests: Import introspectWorkflowInstance or introspectWorkflow from cloudflare:test in your test files and use the patterns shown in this post to mock, control, and assert on your Workflow’s behavior. Also check out the official API reference.

Keeping the Internet fast and secure: introducing Merkle Tree Certificates

Post Syndicated from Luke Valenta original https://blog.cloudflare.com/bootstrap-mtc/

The world is in a race to build its first quantum computer capable of solving practical problems not feasible on even the largest conventional supercomputers. While the quantum computing paradigm promises many benefits, it also threatens the security of the Internet by breaking much of the cryptography we have come to rely on.

To mitigate this threat, Cloudflare is helping to migrate the Internet to Post-Quantum (PQ) cryptography. Today, about 50% of traffic to Cloudflare’s edge network is protected against the most urgent threat: an attacker who can intercept and store encrypted traffic today and then decrypt it in the future with the help of a quantum computer. This is referred to as the harvest now, decrypt later threat.

However, this is just one of the threats we need to address. A quantum computer can also be used to crack a server’s TLS certificate, allowing an attacker to impersonate the server to unsuspecting clients. The good news is that we already have PQ algorithms we can use for quantum-safe authentication. The bad news is that adoption of these algorithms in TLS will require significant changes to one of the most complex and security-critical systems on the Internet: the Web Public-Key Infrastructure (WebPKI).

The central problem is the sheer size of these new algorithms: signatures for ML-DSA-44, one of the most performant PQ algorithms standardized by NIST, are 2,420 bytes long, compared to just 64 bytes for ECDSA-P256, the most popular non-PQ signature in use today; and its public keys are 1,312 bytes long, compared to just 64 bytes for ECDSA. That’s a roughly 20-fold increase in size. Worse yet, the average TLS handshake includes a number of public keys and signatures, adding up to 10s of kilobytes of overhead per handshake. This is enough to have a noticeable impact on the performance of TLS.

That makes drop-in PQ certificates a tough sell to enable today: they don’t bring any security benefit before Q-day — the day a cryptographically relevant quantum computer arrives — but they do degrade performance. We could sit and wait until Q-day is a year away, but that’s playing with fire. Migrations always take longer than expected, and by waiting we risk the security and privacy of the Internet, which is dear to us.

It’s clear that we must find a way to make post-quantum certificates cheap enough to deploy today by default for everyone — not just those that can afford it. In this post, we’ll introduce you to the plan we’ve brought together with industry partners to the IETF to redesign the WebPKI in order to allow a smooth transition to PQ authentication with no performance impact (and perhaps a performance improvement!). We’ll provide an overview of one concrete proposal, called Merkle Tree Certificates (MTCs), whose goal is to whittle down the number of public keys and signatures in the TLS handshake to the bare minimum required.

But talk is cheap. We know from experience that, as with any change to the Internet, it’s crucial to test early and often. Today we’re announcing our intent to deploy MTCs on an experimental basis in collaboration with Chrome Security. In this post, we’ll describe the scope of this experiment, what we hope to learn from it, and how we’ll make sure it’s done safely.

The WebPKI today — an old system with many patches

Why does the TLS handshake have so many public keys and signatures?

Let’s start with Cryptography 101. When your browser connects to a website, it asks the server to authenticate itself to make sure it’s talking to the real server and not an impersonator. This is usually achieved with a cryptographic primitive known as a digital signature scheme (e.g., ECDSA or ML-DSA). In TLS, the server signs the messages exchanged between the client and server using its secret key, and the client verifies the signature using the server’s public key. In this way, the server confirms to the client that they’ve had the same conversation, since only the server could have produced a valid signature.

If the client already knows the server’s public key, then only 1 signature is required to authenticate the server. In practice, however, this is not really an option. The web today is made up of around a billion TLS servers, so it would be unrealistic to provision every client with the public key of every server. What’s more, the set of public keys will change over time as new servers come online and existing ones rotate their keys, so we would need some way of pushing these changes to clients.

This scaling problem is at the heart of the design of all PKIs.

Trust is transitive

Instead of expecting the client to know the server’s public key in advance, the server might just send its public key during the TLS handshake. But how does the client know that the public key actually belongs to the server? This is the job of a certificate.

A certificate binds a public key to the identity of the server — usually its DNS name, e.g., cloudflareresearch.com. The certificate is signed by a Certification Authority (CA) whose public key is known to the client. In addition to verifying the server’s handshake signature, the client verifies the signature of this certificate. This establishes a chain of trust: by accepting the certificate, the client is trusting that the CA verified that the public key actually belongs to the server with that identity.

Clients are typically configured to trust many CAs and must be provisioned with a public key for each. Things are much easier however, since there are only 100s of CAs instead of billions. In addition, new certificates can be created without having to update clients.

These efficiencies come at a relatively low cost: for those counting at home, that’s +1 signature and +1 public key, for a total of 2 signatures and 1 public key per TLS handshake.

That’s not the end of the story, however. As the WebPKI has evolved, so have these chains of trust grown a bit longer. These days it’s common for a chain to consist of two or more certificates rather than just one. This is because CAs sometimes need to rotate their keys, just as servers do. But before they can start using the new key, they must distribute the corresponding public key to clients. This takes time, since it requires billions of clients to update their trust stores. To bridge the gap, the CA will sometimes use the old key to issue a certificate for the new one and append this certificate to the end of the chain.

That’s +1 signature and +1 public key, which brings us to 3 signatures and 2 public keys. And we still have a little ways to go.

Trust but verify

The main job of a CA is to verify that a server has control over the domain for which it’s requesting a certificate. This process has evolved over the years from a high-touch, CA-specific process to a standardized, mostly automated process used for issuing most certificates on the web. (Not all CAs fully support automation, however.) This evolution is marked by a number of security incidents in which a certificate was mis-issued to a party other than the server, allowing that party to impersonate the server to any client that trusts the CA.

Automation helps, but attacks are still possible, and mistakes are almost inevitable. Earlier this year, several certificates for Cloudflare’s encrypted 1.1.1.1 resolver were issued without our involvement or authorization. This apparently occurred by accident, but it nonetheless put users of 1.1.1.1 at risk. (The mis-issued certificates have since been revoked.)

Ensuring mis-issuance is detectable is the job of the Certificate Transparency (CT) ecosystem. The basic idea is that each certificate issued by a CA gets added to a public log. Servers can audit these logs for certificates issued in their name. If ever a certificate is issued that they didn’t request itself, the server operator can prove the issuance happened, and the PKI ecosystem can take action to prevent the certificate from being trusted by clients.

Major browsers, including Firefox and Chrome and its derivatives, require certificates to be logged before they can be trusted. For example, Chrome, Safari, and Firefox will only accept the server’s certificate if it appears in at least two logs the browser is configured to trust. This policy is easy to state, but tricky to implement in practice:

  1. Operating a CT log has historically been fairly expensive. Logs ingest billions of certificates over their lifetimes: when an incident happens, or even just under high load, it can take some time for a log to make a new entry available for auditors.

  2. Clients can’t really audit logs themselves, since this would expose their browsing history (i.e., the servers they wanted to connect to) to the log operators.

The solution to both problems is to include a signature from the CT log along with the certificate. The signature is produced immediately in response to a request to log a certificate, and attests to the log’s intent to include the certificate in the log within 24 hours.

Per browser policy, certificate transparency adds +2 signatures to the TLS handshake, one for each log. This brings us to a total of 5 signatures and 2 public keys in a typical handshake on the public web.

The future WebPKI

The WebPKI is a living, breathing, and highly distributed system. We’ve had to patch it a number of times over the years to keep it going, but on balance it has served our needs quite well — until now.

Previously, whenever we needed to update something in the WebPKI, we would tack on another signature. This strategy has worked because conventional cryptography is so cheap. But 5 signatures and 2 public keys on average for each TLS handshake is simply too much to cope with for the larger PQ signatures that are coming.

The good news is that by moving what we already have around in clever ways, we can drastically reduce the number of signatures we need.

Crash course on Merkle Tree Certificates

Merkle Tree Certificates (MTCs) is a proposal for the next generation of the WebPKI that we are implementing and plan to deploy on an experimental basis. Its key features are as follows:

  1. All the information a client needs to validate a Merkle Tree Certificate can be disseminated out-of-band. If the client is sufficiently up-to-date, then the TLS handshake needs just 1 signature, 1 public key, and 1 Merkle tree inclusion proof. This is quite small, even if we use post-quantum algorithms.

  2. The MTC specification makes certificate transparency a first class feature of the PKI by having each CA run its own log of exactly the certificates they issue.

Let’s poke our head under the hood a little. Below we have an MTC generated by one of our internal tests. This would be transmitted from the server to the client in the TLS handshake:

-----BEGIN CERTIFICATE-----
MIICSzCCAUGgAwIBAgICAhMwDAYKKwYBBAGC2ksvADAcMRowGAYKKwYBBAGC2ksv
AQwKNDQzNjMuNDguMzAeFw0yNTEwMjExNTMzMjZaFw0yNTEwMjgxNTMzMjZaMCEx
HzAdBgNVBAMTFmNsb3VkZmxhcmVyZXNlYXJjaC5jb20wWTATBgcqhkjOPQIBBggq
hkjOPQMBBwNCAARw7eGWh7Qi7/vcqc2cXO8enqsbbdcRdHt2yDyhX5Q3RZnYgONc
JE8oRrW/hGDY/OuCWsROM5DHszZRDJJtv4gno2wwajAOBgNVHQ8BAf8EBAMCB4Aw
EwYDVR0lBAwwCgYIKwYBBQUHAwEwQwYDVR0RBDwwOoIWY2xvdWRmbGFyZXJlc2Vh
cmNoLmNvbYIgc3RhdGljLWN0LmNsb3VkZmxhcmVyZXNlYXJjaC5jb20wDAYKKwYB
BAGC2ksvAAOB9QAAAAAAAAACAAAAAAAAAAJYAOBEvgOlvWq38p45d0wWTPgG5eFV
wJMhxnmDPN1b5leJwHWzTOx1igtToMocBwwakt3HfKIjXYMO5CNDOK9DIKhmRDSV
h+or8A8WUrvqZ2ceiTZPkNQFVYlG8be2aITTVzGuK8N5MYaFnSTtzyWkXP2P9nYU
Vd1nLt/WjCUNUkjI4/75fOalMFKltcc6iaXB9ktble9wuJH8YQ9tFt456aBZSSs0
cXwqFtrHr973AZQQxGLR9QCHveii9N87NXknDvzMQ+dgWt/fBujTfuuzv3slQw80
mibA021dDCi8h1hYFQAA
-----END CERTIFICATE-----

Looks like your average PEM encoded certificate. Let’s decode it and look at the parameters:

$ openssl x509 -in merkle-tree-cert.pem -noout -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 531 (0x213)
        Signature Algorithm: 1.3.6.1.4.1.44363.47.0
        Issuer: 1.3.6.1.4.1.44363.47.1=44363.48.3
        Validity
            Not Before: Oct 21 15:33:26 2025 GMT
            Not After : Oct 28 15:33:26 2025 GMT
        Subject: CN=cloudflareresearch.com
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:70:ed:e1:96:87:b4:22:ef:fb:dc:a9:cd:9c:5c:
                    ef:1e:9e:ab:1b:6d:d7:11:74:7b:76:c8:3c:a1:5f:
                    94:37:45:99:d8:80:e3:5c:24:4f:28:46:b5:bf:84:
                    60:d8:fc:eb:82:5a:c4:4e:33:90:c7:b3:36:51:0c:
                    92:6d:bf:88:27
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Subject Alternative Name:
                DNS:cloudflareresearch.com, DNS:static-ct.cloudflareresearch.com
    Signature Algorithm: 1.3.6.1.4.1.44363.47.0
    Signature Value:
        00:00:00:00:00:00:02:00:00:00:00:00:00:00:02:58:00:e0:
        44:be:03:a5:bd:6a:b7:f2:9e:39:77:4c:16:4c:f8:06:e5:e1:
        55:c0:93:21:c6:79:83:3c:dd:5b:e6:57:89:c0:75:b3:4c:ec:
        75:8a:0b:53:a0:ca:1c:07:0c:1a:92:dd:c7:7c:a2:23:5d:83:
        0e:e4:23:43:38:af:43:20:a8:66:44:34:95:87:ea:2b:f0:0f:
        16:52:bb:ea:67:67:1e:89:36:4f:90:d4:05:55:89:46:f1:b7:
        b6:68:84:d3:57:31:ae:2b:c3:79:31:86:85:9d:24:ed:cf:25:
        a4:5c:fd:8f:f6:76:14:55:dd:67:2e:df:d6:8c:25:0d:52:48:
        c8:e3:fe:f9:7c:e6:a5:30:52:a5:b5:c7:3a:89:a5:c1:f6:4b:
        5b:95:ef:70:b8:91:fc:61:0f:6d:16:de:39:e9:a0:59:49:2b:
        34:71:7c:2a:16:da:c7:af:de:f7:01:94:10:c4:62:d1:f5:00:
        87:bd:e8:a2:f4:df:3b:35:79:27:0e:fc:cc:43:e7:60:5a:df:
        df:06:e8:d3:7e:eb:b3:bf:7b:25:43:0f:34:9a:26:c0:d3:6d:
        5d:0c:28:bc:87:58:58:15:00:00

While some of the parameters probably look familiar, others will look unusual. On the familiar side, the subject and public key are exactly what we might expect: the DNS name is cloudflareresearch.com and the public key is for a familiar signature algorithm, ECDSA-P256. This algorithm is not PQ, of course — in the future we would put ML-DSA-44 there instead.

On the unusual side, OpenSSL appears to not recognize the signature algorithm of the issuer and just prints the raw OID and bytes of the signature. There’s a good reason for this: the MTC does not have a signature in it at all! So what exactly are we looking at?

The trick to leave out signatures is that a Merkle Tree Certification Authority (MTCA) produces its signatureless certificates in batches rather than individually. In place of a signature, the certificate has an inclusion proof of the certificate in a batch of certificates signed by the MTCA.

To understand how inclusion proofs work, let’s think about a slightly simplified version of the MTC specification. To issue a batch, the MTCA arranges the unsigned certificates into a data structure called a Merkle tree that looks like this:


Each leaf of the tree corresponds to a certificate, and each inner node is equal to the hash of its children. To sign the batch, the MTCA uses its secret key to sign the head of the tree. The structure of the tree guarantees that each certificate in the batch was signed by the MTCA: if we tried to tweak the bits of any one of the certificates, the treehead would end up having a different value, which would cause the signature to fail.

An inclusion proof for a certificate consists of the hash of each sibling node along the path from the certificate to the treehead:


Given a validated treehead, this sequence of hashes is sufficient to prove inclusion of the certificate in the tree. This means that, in order to validate an MTC, the client also needs to obtain the signed treehead from the MTCA.

This is the key to MTC’s efficiency:

  1. Signed treeheads can be disseminated to clients out-of-band and validated offline. Each validated treehead can then be used to validate any certificate in the corresponding batch, eliminating the need to obtain a signature for each server certificate.

  2. During the TLS handshake, the client tells the server which treeheads it has. If the server has a signatureless certificate covered by one of those treeheads, then it can use that certificate to authenticate itself. That’s 1 signature,1 public key and 1 inclusion proof per handshake, both for the server being authenticated.

Now, that’s the simplified version. MTC proper has some more bells and whistles. To start, it doesn’t create a separate Merkle tree for each batch, but it grows a single large tree, which is used for better transparency. As this tree grows, periodically (sub)tree heads are selected to be shipped to browsers, which we call landmarks. In the common case browsers will be able to fetch the most recent landmarks, and servers can wait for batch issuance, but we need a fallback: MTC also supports certificates that can be issued immediately and don’t require landmarks to be validated, but these are not as small. A server would provision both types of Merkle tree certificates, so that the common case is fast, and the exceptional case is slow, but at least it’ll work.

Experimental deployment

Ever since early designs for MTCs emerged, we’ve been eager to experiment with the idea. In line with the IETF principle of “running code”, it often takes implementing a protocol to work out kinks in the design. At the same time, we cannot risk the security of users. In this section, we describe our approach to experimenting with aspects of the Merkle Tree Certificates design without changing any trust relationships.

Let’s start with what we hope to learn. We have lots of questions whose answers can help to either validate the approach, or uncover pitfalls that require reshaping the protocol — in fact, an implementation of an early MTC draft by Maximilian Pohl and Mia Celeste did exactly this. We’d like to know:

What breaks? Protocol ossification (the tendency of implementation bugs to make it harder to change a protocol) is an ever-present issue with deploying protocol changes. For TLS in particular, despite having built-in flexibility, time after time we’ve found that if that flexibility is not regularly used, there will be buggy implementations and middleboxes that break when they see things they don’t recognize. TLS 1.3 deployment took years longer than we hoped for this very reason. And more recently, the rollout of PQ key exchange in TLS caused the Client Hello to be split over multiple TCP packets, something that many middleboxes weren’t ready for.

What is the performance impact? In fact, we expect MTCs to reduce the size of the handshake, even compared to today’s non-PQ certificates. They will also reduce CPU cost: ML-DSA signature verification is about as fast as ECDSA, and there will be far fewer signatures to verify. We therefore expect to see a reduction in latency. We would like to see if there is a measurable performance improvement.

What fraction of clients will stay up to date? Getting the performance benefit of MTCs requires the clients and servers to be roughly in sync with one another. We expect MTCs to have fairly short lifetimes, a week or so. This means that if the client’s latest landmark is older than a week, the server would have to fallback to a larger certificate. Knowing how often this fallback happens will help us tune the parameters of the protocol to make fallbacks less likely.

In order to answer these questions, we are implementing MTC support in our TLS stack and in our certificate issuance infrastructure. For their part, Chrome is implementing MTC support in their own TLS stack and will stand up infrastructure to disseminate landmarks to their users.

As we’ve done in past experiments, we plan to enable MTCs for a subset of our free customers with enough traffic that we will be able to get useful measurements. Chrome will control the experimental rollout: they can ramp up slowly, measuring as they go and rolling back if and when bugs are found.

Which leaves us with one last question: who will run the Merkle Tree CA?

Bootstrapping trust from the existing WebPKI

Standing up a proper CA is no small task: it takes years to be trusted by major browsers. That’s why Cloudflare isn’t going to become a “real” CA for this experiment, and Chrome isn’t going to trust us directly.

Instead, to make progress on a reasonable timeframe, without sacrificing due diligence, we plan to “mock” the role of the MTCA. We will run an MTCA (on Workers based on our StaticCT logs), but for each MTC we issue, we also publish an existing certificate from a trusted CA that agrees with it. We call this the bootstrap certificate. When Chrome’s infrastructure pulls updates from our MTCA log, they will also pull these bootstrap certificates, and check whether they agree. Only if they do, they’ll proceed to push the corresponding landmarks to Chrome clients. In other words, Cloudflare is effectively just “re-encoding” an existing certificate (with domain validation performed by a trusted CA) as an MTC, and Chrome is using certificate transparency to keep us honest.

Conclusion

With almost 50% of our traffic already protected by post-quantum encryption, we’re halfway to a fully post-quantum secure Internet. The second part of our journey, post-quantum certificates, is the hardest yet though. A simple drop-in upgrade has a noticeable performance impact and no security benefit before Q-day. This means it’s a hard sell to enable today by default. But here we are playing with fire: migrations always take longer than expected. If we want to keep an ubiquitously private and secure Internet, we need a post-quantum solution that’s performant enough to be enabled by default today.

Merkle Tree Certificates (MTCs) solves this problem by reducing the number of signatures and public keys to the bare minimum while maintaining the WebPKI’s essential properties. We plan to roll out MTCs to a fraction of free accounts by early next year. This does not affect any visitors that are not part of the Chrome experiment. For those that are, thanks to the bootstrap certificates, there is no impact on security.

We’re excited to keep the Internet fast and secure, and will report back soon on the results of this experiment: watch this space! MTC is evolving as we speak, if you want to get involved, please join the IETF PLANTS mailing list.

Announcing Workers automatic tracing, now in open beta

Post Syndicated from Nevi Shah original https://blog.cloudflare.com/workers-tracing-now-in-open-beta/

When your Worker slows down or starts throwing errors, finding the root cause shouldn’t require hours of log analysis and trial-and-error debugging. You should have clear visibility into what’s happening at every step of your application’s request flow. This is feedback we’ve heard loud and clear from developers using Workers, and today we’re excited to announce an Open Beta for tracing on Cloudflare Workers! You can now:  

  • Get automatic instrumentation for applications on the Workers platform: No manual setup, complex instrumentation, or code changes. It works out of the box. 

  • Explore and investigate traces in the Cloudflare dashboard: Your traces are processed and available in the Workers Observability dashboard alongside your existing logs.

  • Export logs and traces to OpenTelemetry-compatible providers: Send OpenTelemetry traces (and correlated logs) to your observability provider of choice. 

In 2024, we set out to build the best first-party observability of any cloud platform. We launched a new metrics dashboard to give better insights into how your Worker is performing, Workers Logs to automatically ingest and store logs for your Workers, a query builder to explore your data across any dimension and real-time logs to stream your logs in real time with advanced filtering capabilities. Starting today, you can get an even deeper understanding of your Workers applications by enabling automatic tracing!

What is Workers Tracing? 

Workers traces capture and emit OpenTelemetry-compliant spans to show you detailed metadata and timing information on every operation your Worker performs. It helps you identify performance bottlenecks, resolve errors, and understand how your Worker interacts with other services on the Workers platform. You can now answer questions like:

  • Which calls are slowing down my application?

  • Which queries to my database take the longest? 

  • What happened within a request that resulted in an error?

Tracing provides a visualization of each invocation’s journey through various operations. Each operation is captured as a span, a timed segment that shows what happened and how long it took. Child spans nest within parent spans to show sub-operations and dependencies, creating a hierarchical view of your invocation’s execution flow. Each span can include contextual metadata or attributes that provide details for debugging and filtering events.


Full automatic instrumentation, no code changes 

Previously, instrumenting your application typically required an understanding of the OpenTelemetry spec, multiple OTel libraries, and how they related to each other. Implementation was tedious and bloated your codebase with instrumentation code that obfuscated your application logic.

Setting up tracing typically meant spending hours integrating third-party SDKs, wrapping every database call and API request with instrumentation code, and debugging complex config files before you saw a single trace. This implementation overhead often makes observability an afterthought, leaving you without full visibility in production when issues arise.

What makes Workers Tracing truly magical is it’s completely automatic – no set up, no code changes, no wasted time. We took the approach of automatically instrumenting every I/O operation in your Workers, through a deep integration in workerd, our runtime, enabling us to capture the full extent of data flows through every invocation of your Workers.

You focus on your application logic. We take care of the instrumentation.

What you can trace today

The operations covered today are: 

  • Binding calls: Interactions with various Worker bindings. KV reads and writes, R2 object storage operations, Durable Object invocations, and many more binding calls are automatically traced. This gives you complete visibility into how your Worker uses other services.

  • Fetch calls: All outbound HTTP requests are automatically instrumented, capturing timing, status codes, and request metadata. This enables you to quickly identify which external dependencies are affecting your application’s performance.

  • Handler calls: Methods on a Worker that can receive and process external inputs, such as fetch handlers, scheduled handlers, and queue handlers. This gives you visibility into performance of how your Worker is being invoked.

Automatic attributes on every span 

Our automated instrumentation captures each operation as a span. For example, a span generated by an R2 binding call (like a get or put operation) will automatically contain any available attributes, such as the operation type, the error if applicable, the object key, and duration. These detailed attributes provide the context you need to answer precise questions about your application without needing to manually log every detail.


We will continue to add more detailed attributes to spans and add the ability to trace an invocation across multiple Workers or external services. Our documentation contains a complete list of all instrumented spans and their attributes.

Investigate traces in the Workers dashboard

You can easily view traces directly within a specific Worker application in the Cloudflare dashboard, giving you immediate visibility into your application’s performance. You’ll find a list of all trace events within your desired time frame and a trace visualization of each invocation including duration of each call and any available attributes. You can also query across all Workers on your account, letting you pinpoint issues occurring on multiple applications.


To get started viewing traces on your Workers application, you can set: 


Or if you already have observability.enabled=true configured, traces will be automatically emitted alongside your logs.


Export traces to OpenTelemetry compatible providers 

However, we realize that some development teams need Workers data to live alongside other telemetry data in the tools they are already using. That’s why we’re also adding tracing exports, letting your team send, visualize and query data with your existing observability stack! Starting today, you can export traces directly to providers like Honeycomb, Grafana or any other OpenTelemetry Protocol (OTLP) provider with an available endpoint.


Correlated logs and traces 

We also support exporting OTLP-formatted logs that share the same trace ID, enabling third-party platforms to automatically correlate log entries with their corresponding traces. This lets you easily jump between spans and related log messages.

Set up your destination, enable exports, and go! 

To start sending events to your destination of choice, first, configure your OTLP endpoint destination in the Cloudflare dashboard. For every destination you can specify a custom name and set custom headers to include API keys or app configuration. 

Once you have your destination set up (e.g. honeycomb-tracing), set the following in your wrangler.jsonc and deploy: 


Coming up for Workers observability

This is just the beginning of Workers providing the workflows and tools to get you the telemetry data you want, where you want it. We’re improving our support both for native tracing in the dashboard and for exporting other types of telemetry to 3rd parties. In the upcoming months we’ll be launching: 

  • Support for more spans and attributes: We are adding more automatic traces for every part of the Workers platform. While our first goal is to give you visibility into the duration of every operation within your request, we also want to add detailed attributes. Your feedback on what’s missing will be extremely valuable here. 

  • Trace context propagation: When building distributed applications, ensuring your traces connect across all of your services (even those outside of Cloudflare), automatically linking spans together to create complete, end-to-end visibility is critical. For example, a trace from Workers could be nested from a parent service or vice versa. When fully implemented, our automatic trace context propagation will follow W3C standards to ensure compatibility across your existing tools and services. 

  • Support for custom spans and attributes: While automatic instrumentation gives you visibility into what’s happening within the Workers platform, we know you need visibility into your own application logic too. So, we’ll give you the ability to manually add your own spans as well.

  • Ability to export metrics: Today, metrics, logs and traces are available for you to monitor and view within the Workers dashboard. But the final missing piece is giving you the ability to export both infrastructure metrics (like request volume, error rates, and execution duration) and custom application metrics to your preferred observability provider.

What you can expect from tracing pricing  

Today, at the start of beta, viewing traces in the Cloudflare dashboard and exporting traces to a 3rd party provider are both free. On January 15, 2026, tracing and log events will be charged the following pricing:

Viewing Workers traces in the Cloudflare dashboard

To view traces in the Cloudflare dashboard, you can do so on a Workers Free and Paid plan at the pricing shown below:

Workers Free Workers Paid
Included Volume 200K events per day 20M events per month
Additional Events N/A $0.60 per million logs
Retention 3 days 7 days

Exporting traces and logs 

To export traces to a 3rd-party OTLP-compatible destination, you will need a Workers Paid subscription. Pricing is based on total span or log events with the following inclusions:

Workers Free Workers Paid
Events Not available 10 million events per month
Additional events $0.05 per million batched events

Enable tracing today

Ready to get started with tracing on your Workers application? 

  • Check out our documentation: Learn how to get set up, read about current limitations and discover more about what’s coming up. 

  • Join the chatter in our GitHub discussion: Your feedback will be extremely valuable in our beta period on our automatic instrumentation, tracing dashboard, and OpenTelemetry export flow. Head to our GitHub discussion to raise issues, put in feature requests and get in touch with us!

Unpacking Cloudflare Workers CPU Performance Benchmarks

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/unpacking-cloudflare-workers-cpu-performance-benchmarks/

On October 4, independent developer Theo Browne published a series of benchmarks designed to compare server-side JavaScript execution speed between Cloudflare Workers and Vercel, a competing compute platform built on AWS Lambda. The initial results showed Cloudflare Workers performing worse than Node.js on Vercel at a variety of CPU-intensive tasks, by a factor of as much as 3.5x.

We were surprised by the results. The benchmarks were designed to compare JavaScript execution speed in a CPU-intensive workload that never waits on external services. But, Cloudflare Workers and Node.js both use the same underlying JavaScript engine: V8, the open source engine from Google Chrome. Hence, one would expect the benchmarks to be executing essentially identical code in each environment. Physical CPUs can vary in performance, but modern server CPUs do not vary by anywhere near 3.5x.

On investigation, we discovered a wide range of small problems that contributed to the disparity, ranging from some bad tuning in our infrastructure, to differences between the JavaScript libraries used on each platform, to some issues with the test itself. We spent the week working on many of these problems, which means over the past week Workers got better and faster for all of our customers. We even fixed some problems that affect other compute providers but not us, such as an issue that made trigonometry functions much slower on Vercel. This post will dig into all the gory details.

It’s important to note that the original benchmark was not representative of billable CPU usage on Cloudflare, nor did the issues involved impact most typical workloads. Most of the disparity was an artifact of the specific benchmark methodology. Read on to understand why.

With our fixes, the results now look much more like we’d expect:


There is still work to do, but we’re happy to say that after these changes, Cloudflare now performs on par with Vercel in every benchmark case except the one based on Next.js. On that benchmark, the gap has closed considerably, and we expect to be able to eliminate it with further improvements detailed later in this post.

We are grateful to Theo for highlighting areas where we could make improvements, which will now benefit all our customers, and even many who aren’t our customers.

Our benchmark methodology

We wanted to run Theo’s test with no major design changes, in order to keep numbers comparable. Benchmark cases are nearly identical to Theo’s original test but we made a couple changes in how we ran the test, in the hopes of making the results more accurate:

  • Theo ran the test client on a laptop connected by a Webpass internet connection in San Francisco, against Vercel instances running in its sfo1 region. In order to make our results easier to reproduce, we chose instead to run our test client directly in AWS’s us-east-1 datacenter, invoking Vercel instances running in its iad1 region (which we understand to be in the same building). We felt this would minimize any impact from network latency. Because of this, Vercel’s numbers are slightly better in our results than they were in Theo’s.

  • We chose to use Vercel instances with 1 vCPU instead of 2. All of the benchmarks are single-threaded workloads, meaning they cannot take advantage of a second CPU anyway. Vercel’s CTO, Malte Ubl, had stated publicly on X that using single-CPU instances would make no difference in this test, and indeed, we found this to be correct. Using 1 vCPU makes it easier to reason about pricing, since both Vercel and Cloudflare charge for CPU time ($0.128/hr for Vercel in iad1, and $0.072/hr for Cloudflare globally).

  • We made some changes to fix bugs in the test, for which we submitted a pull request. More on this below.

Cloudflare platform improvements

Theo’s benchmarks covered a variety of frameworks, making it clear that no single JavaScript library could be at fault for the general problem. Clearly, we needed to look first at the Workers Runtime itself. And so we did, and we found two problems – not bugs, but tuning and heuristic choices which interacted poorly with the benchmarks as written.

Sharding and warm isolate routing: A problem of scheduling, not CPU speed

Over the last year we shipped smarter routing that sends traffic to warm isolates more often. That cuts cold starts for large apps, which matters for frameworks with heavy initialization requirements like Next.js. The original policy optimized for latency and throughput across billions of requests, but was less optimal for heavily CPU-bound workloads for the same reason that such workloads cause performance issues in other platforms like Node.js: When the CPU is busy computing an expensive operation for one request, other requests sent to the same isolate must wait for it to finish before they can proceed.

The system uses heuristics to detect when requests are getting blocked behind each other, and automatically spin up more isolates to compensate. However, these heuristics are not precise, and the particular workload generated by Theo’s tests – in which a burst of expensive traffic would come from a single client – played poorly with our existing algorithm. As a result, the benchmarks showed much higher latency (and variability in latency) than would normally be expected.

It’s important to understand that, as a result of this problem, the benchmark was not really measuring CPU time. Pricing on the Workers platform is based on CPU time – that is, time spent actually executing JavaScript code, as opposed to time waiting for things. Time spent waiting for the isolate to become available makes the request take longer, but is not billed as CPU time against the waiting request. So, this problem would not have affected your bill.

After analyzing the benchmarks, we updated the algorithm to detect sustained CPU-heavy work earlier, then bias traffic so that new isolates spin up faster. The result is that Workers can more effectively and efficiently autoscale when different workloads are applied. I/O-bound workloads coalesce into individual already warm isolates while CPU-bound are directed so that they do not block each other. This change has already been rolled out globally and is enabled automatically for everyone. It should be pretty clear from the graph when the change was rolled out:


V8 garbage collector tuning

While this scheduling issue accounted for the majority of the disparity in the benchmark, we did find a minor issue affecting code execution performance during our testing.

The range of issues that we uncovered in the framework code in these benchmarks repeatedly pointed at garbage collection and memory management issues as being key contributors to the results. But, we would expect these to be an issue with the same frameworks running in Node.js as well. To see exactly what was going on differently with Workers and why it was causing such a significant degradation in performance, we had to look inwards at our own memory management configuration.

The V8 garbage collector has a huge number of knobs that can be tuned that directly impact performance. One of these is the size of the “young generation”. This is where newly created objects go initially. It’s a memory area that’s less compact, but optimized for short-lived objects. When objects have bounced around the “young space” for a few generations they get moved to the old space, which is more compact, but requires more CPU to reclaim.

V8 allows the embedding runtime to tune the size of the young generation. And it turns out, we had done so. Way back in June of 2017, just two months after the Workers project kicked off, we – or specifically, I, Kenton, as I was the only engineer on the project at the time – had configured this value according to V8’s recommendations at the time for environments with 512MB of memory or less. Since Workers defaults to a limit of 128MB per isolate, this seemed appropriate.

V8’s entire garbage collector has changed dramatically since 2017. When analyzing the benchmarks, it became apparent that the setting which made sense in 2017 no longer made sense in 2025, and we were now limiting V8’s young space too rigidly. Our configuration was causing V8’s garbage collection to work harder and more frequently than it otherwise needed to. As a result, we have backed off on the manual tuning and now allow V8 to pick its young space size more freely, based on its internal heuristics. This is already live on Cloudflare Workers, and it has given an approximately 25% boost to the benchmarks with only a small increase in memory usage. Of course, the benchmarks are not the only Workers that benefit: all Workers should now be faster. That said, for most Workers the difference has been much smaller.

Tuning OpenNext for performance

The platform changes solved most of the problem. Following the changes, our testing showed we were now even on all of the benchmarks save one: Next.js.

Next.js is a popular web application framework which, historically, has not had built-in support for hosting on a wide range of platforms. Recently, a project called OpenNext has arisen to fill the gap, making Next.js work well on many platforms, including Cloudflare. On investigation, we found several missing optimizations and other opportunities to improve performance, explaining much of why the benchmark performed poorly on Workers.

Unnecessary allocations and copies

When profiling the benchmark code, we noticed that garbage collection was dominating the timeline. From 10-25% of the request processing time was being spent reclaiming memory.



So we dug in and discovered that OpenNext, and in some cases Next.js and React itself, will often create unnecessary copies of internal data buffers at some of the worst times during the handling of the process. For instance, there’s one pipeThrough() operation in the rendering pipeline that we saw creating no less than 50 2048-byte Buffer instances, whether they are actually used or not.

We further discovered that on every request, the Cloudflare OpenNext adapter has been needlessly copying every chunk of streamed output data as it’s passed out of the renderer and into the Workers runtime to return to users. Given this benchmark returns a 5 MB result on every request, that’s a lot of data being copied!

In other places, we found that arrays of internal Buffer instances were being copied and concatenated using Buffer.concat for no other reason than to get the total number of bytes in the collection. That is, we spotted code of the form getBody().length. The function getBody() would concatenate a large number of buffers into a single buffer and return it, without storing the buffer anywhere. So, all that work was being done just to read the overall length. Obviously this was not intended, and fixing it was an easy win.

We’ve started opening a series of pull requests in OpenNext to fix these issues, and others in hot paths, removing some unnecessary allocations and copies:

We’re not done. We intend to keep iterating through OpenNext code, making improvements wherever they’re needed – not only in the parts that run on Workers. Many of these improvements apply to other OpenNext platforms. The shared goal of OpenNext is to make NextJS as fast as possible regardless of where you choose to run your code.

Inefficient Streams Adapters

Much of the Next.js code was written to use Node.js’s APIs for byte streams. Workers, however, prefers the web-standard Streams API, and uses it to represent HTTP request and response bodies. This necessitates using adapters to convert between the two APIs. When investigating the performance bottlenecks, we found a number of examples where inefficient streams adapters are being needlessly applied. For example:

const stream = Readable.toWeb(Readable.from(res.getBody()))

res.getBody() was performing a Buffer.concat(chunks) to copy accumulated chunks of data into a new Buffer, which was then passed as an iterable into a Node.js stream.Readable that was then wrapped by an adapter that returns a ReadableStream. While these utilities do serve a useful purpose, this becomes a data buffering nightmare since both Node.js streams and Web streams each apply their own internal buffers! Instead we can simply do:

const stream = ReadableStream.from(chunks);

This returns a ReadableStream directly from the accumulated chunks without additional copies, extraneous buffering, or passing everything through inefficient adaptation layers.

In other places we see that Next.js and React make extensive use of ReadableStream to pass bytes through, but the streams being created are value-oriented rather than byte-oriented! For example,

const readable = new ReadableStream({
  pull(controller) {
    controller.enqueue(chunks.shift());
    if (chunks.length === 0) {
      controller.close();
    }
});  // Default highWaterMark is 1!

Seems perfectly reasonable. However, there’s an issue here. If the chunks are Buffer or Uint8Array instances, every instance ends up being a separate read by default. So if the chunk is only a single byte, or 1000 bytes, that’s still always two reads. By converting this to a byte stream with a reasonable high water mark, we can make it possible to read this stream much more efficiently:

const readable = new ReadableStream({
  type: 'bytes',
  pull(controller) {
    controller.enqueue(chunks.shift());
    if (chunks.length === 0) {
      controller.close();
    }
}, { highWaterMark: 4096 });

Now, the stream can be read as a stream of bytes rather than a stream of distinct JavaScript values, and the individual chunks can be coalesced internally into 4096 byte chunks, making it possible to optimize the reads much more efficiently. Rather than reading each individual enqueued chunk one at a time, the ReadableStream will proactively call pull() repeatedly until the highWaterMark is reached. Reads then do not have to ask the stream for one chunk of data at a time.

While it would be best for the rendering pipeline to be using byte streams and paying attention to back pressure signals more, our implementation can still be tuned to better handle cases like this.

The bottom line? We’ve got some work to do! There are a number of improvements to make in the implementation of OpenNext and the adapters that allow it to work on Cloudflare that we will continue to investigate and iterate on. We’ve made a handful of these fixes already and we’re already seeing improvements. Soon we also plan to start submitting patches to Next.js and React to make further improvements upstream that will ideally benefit the entire ecosystem.

JSON parsing

Aside from buffer allocations and streams, one additional item stood out like a sore thumb in the profiles: JSON.parse() with a reviver function. This is used in both React and Next.js and in our profiling this was significantly slower than it should be. We built a microbenchmark and found that JSON.parse with a reviver argument recently got even slower when the standard added a third argument to the reviver callback to provide access to the JSON source context.

For those unfamiliar with the reviver function, it allows an application to effectively customize how JSON is parsed. But it has drawbacks. The function gets called on every key-value pair included in the JSON structure, including every individual element of an Array that gets serialized. In Theo’s NextJS benchmark, in any single request, it ends up being called well over 100,000 times!

Even though this problem affects all platforms, not just ours, we decided that we weren’t just going to accept it. After all, we have contributors to V8 on the Workers runtime team! We’ve upstreamed a V8 patch that can speed up JSON.parse() with revivers by roughly 33 percent. That should be in V8 starting with version 14.3 (Chrome 143) and can help everyone using V8, not just Cloudflare: Node.js, Chrome, Deno, the entire ecosystem.  If you are not using Cloudflare Workers or didn’t change the syntax of your reviver you are currently suffering under the red performance bar.

We will continue to work with framework authors to reduce overhead in hot paths. Some changes belong in the frameworks, some belong in the engine, some in our platform.

Node.js’s trigonometry problem

We are engineers, and we like to solve engineering problems — whether our own, or for the broader community.

Theo’s benchmarks were actually posted in response to a different benchmark by another author which compared Cloudflare Workers against Vercel. The original benchmark focused on calling trigonometry functions (e.g. sine and cosine) in a tight loop. In this benchmark, Cloudflare Workers performed 3x faster than Node.js running on Vercel.

The author of the original benchmark offered this as evidence that Cloudflare Workers are just faster. Theo disagreed, and so did we. We expect to be faster, but not by 3x! We don’t implement math functions ourselves; these come with V8. We weren’t happy to just accept the win, so we dug in.

It turns out that Node.js is not using the latest, fastest path for these functions. Node.js can be built with either the clang or gcc compilers, and is written to support a broader range of operating systems and architectures than Workers. This means that Node.js’ compilation often ends up using a lowest-common denominator for some things in order to provide support for the broadest range of platforms. V8 includes a compile-time flag that, in some configurations, allows it to use a faster implementation of the trig functions. In Workers, mostly by coincidence, that flag is enabled by default. In Node.js, it is not. We’ve opened a pull request to enable the flag in Node.js so that everyone benefits, at least on platforms where it can be supported.

Assuming that lands, and once AWS Lambda and Vercel are able to pick it up, we expect this specific gap to go away, making these operations faster for everyone. This change won’t benefit our customers, since Cloudflare Workers already uses the faster trig functions, but a bug is a bug and we like making everything faster.

Benchmarks are hard

Even the best benchmarks have bias and tradeoffs. It’s difficult to create a benchmark that is truly representative of real-world performance, and all too easy to misinterpret the results of benchmarks that are not. We particularly liked Planetscale’s take on this subject.

These specific CPU-bound tests are not an ideal choice to represent web applications. Theo even notes this in his video. Most real-world applications on Workers and Vercel are bound by databases, downstream services, network, and page size. End user experience is what matters. CPU is one piece of that picture. That said, if a benchmark shows us slower, we take it seriously.

While the benchmarks helped us find and fix many real problems, we also found a few problems with the benchmarks themselves, which contributed to the apparent disparity in speed:

Running locally

The benchmark is designed to be run on your laptop, from which it hits Cloudflare’s and Vercel’s servers over the Internet. It makes the assumption that latency observed from the client is a close enough approximation of server-side CPU time. The reasons are fair: As Theo notes, Cloudflare does not permit an application to measure its own CPU time, in order to prevent timing side channel attacks. Actual CPU time can be seen in logs after the fact, but gathering those may be a lot of work. It’s just easier to measure time from the client.

However, as Cloudflare and Vercel are hosted from different data centers, the network latency to each can be a factor in the benchmark, and this can skew the results. Typically, this effect will favor Cloudflare, because Cloudflare can run your Worker in locations spread across 330+ cities worldwide, and will tend to choose the closest one to you. Vercel, on the other hand, usually places compute in a central location, so latency will vary depending on your distance from that location.

For our own testing, to minimize this effect, we ran the benchmark client from a VM on AWS located in the same data center as our Vercel instances. Since Cloudflare is well-connected to every AWS location, we think this should have eliminated network latency from the picture. We chose AWS’s us-east-1 / Vercel’s iad1 for our test as it is widely seen as the default choice; any other choice could draw questions about cherry-picking.

Not all CPUs are equal

Cloudflare’s servers aren’t all identical. Although we refresh them aggressively, there will always be multiple generations of hardware in production at any particular time. Currently, this includes generations 10, 11, and 12 of our server hardware.

Other cloud providers are no different. No cloud provider simply throws away all their old servers every time a new version becomes available.

Of course, newer CPUs run faster, even for single-threaded workloads. The differences are not as large as they used to be 20-30 years ago, but they are not nothing. As such, an application may get (a little bit) lucky or unlucky depending on what machine it is assigned to.

In cloud environments, even identical CPUs can yield different performance depending on circumstances, due to multitenancy. The server your application is assigned to is running many others as well. In AWS Lambda, a server may be running hundreds of applications; in Cloudflare, with our ultra-efficient runtime, a server may be running thousands. These “noisy neighbors” won’t share the same CPU core as your app, but they may share other resources, such as memory bandwidth. As a result, performance can vary.

It’s important to note that these problems create correlated noise. That is, if you run the test again, the application is likely to remain assigned to the same machines as before – this is true of both Cloudflare and Vercel. So, this noise cannot be corrected by simply running more iterations. To correct for this type of noise on Cloudflare, one would need to initiate requests from a variety of geographic locations, in order to hit different Cloudflare data centers and therefore different machines. But, that is admittedly a lot of work. (We are not familiar with how best to get an application to switch machines on Vercel.)

A Next.js config bug

The Cloudflare version of the NextJS benchmark was not configured to use force-dynamic while the Vercel version was. This triggered curious behavior. Our understanding is that pages which are not “dynamic” should normally be rendered statically at build time. With OpenNext, however, it appears the pages are still rendered dynamically, but if multiple requests for the same page are received at the same time, OpenNext will only invoke the rendering once. Before we made the changes to fix our scheduling algorithm to avoid sending too many requests to the same isolate, this behavior may have somewhat counteracted that problem. Theo reports that he had disabled force-dynamic in the Cloudflare version specifically for this reason: with it on, our results were so bad as to appear outright broken, so he intentionally turned it off.

Ironically, though, once we fixed the scheduling issue, using “static” rendering (i.e. not enabling force-dynamic) hurt Cloudflare’s performance for other reasons. It seems that when OpenNext renders a “cacheable” page, streaming of the response body is inhibited. This interacted poorly with a property of the benchmark client: it measured time-to-first-byte (TTFB), rather than total request/response time. When running in dynamic mode – as the test did on Vercel – the first byte would be returned to the client before the full page had been rendered. The rest of the rendering would happen as bytes streamed out. But with OpenNext in non-dynamic mode, the entire payload was rendered into a giant buffer upfront, before any bytes were returned to the client.

Due to the TTFB behavior of the benchmark client, in dynamic mode, the benchmark actually does not measure the time needed to fully render the page. We became suspicious when we noticed that Vercel’s observability tools indicated more CPU time had been spent than the benchmark itself had reported.

One option would have been to change the benchmarks to use TTLB instead – that is, wait until the last byte is received before stopping the timer. However, this would make the benchmark even more affected by network differences: The responses are quite large, ranging from 2MB to 15MB, and so the results could vary depending on the bandwidth to the provider. Indeed, this would tend to favor Cloudflare, but as the point of the test is to measure CPU speed, not bandwidth, it would be an unfair advantage.

Once we changed the Cloudflare version of the test to use force-dynamic as well, matching the Vercel version, the streaming behavior then matched, making the request fair. This means that neither version is actually measuring the cost of rendering the full page to HTML, but at least they are now measuring the same thing.

As a side note, the original behavior allowed us to spot that OpenNext has a couple of performance bottlenecks in its implementation of the composable cache it uses to deduplicate rendering requests. While fixes to these aren’t going to impact the numbers for this particular set of benchmarks, we’re working on improving those pieces also.

A React SSR config bug

The React SSR benchmark contained a more basic configuration error. React inspects the environment variable NODE_ENV to decide whether the environment is “production” or a development environment. Many Node.js-based environments, including Vercel, set this variable automatically in production. Many frameworks, such as OpenNext, automatically set this variable for Workers in production as well. However, the React SSR benchmark was written against lower-level React APIs, not using any framework. In this case, the NODE_ENV variable wasn’t being set at all.

And, unfortunately, when NODE_ENV is not set, React defaults to “dev mode”, a mode that contains extra debugging checks and is therefore much slower than production mode. As a result, the numbers for Workers were much worse than they should have been.

Arguably, it may make sense for Workers to set this variable automatically for all deployed workers, particularly when Node.js compatibility is enabled. We are looking into doing this in the future, but for now we’ve updated the test to set it directly.

What we’re going to do next

Our improvements to the Workers Runtime are already live for all workers, so you do not need to change anything. Many apps will already see faster, steadier tail latency on compute heavy routes with less jitter during bursts. In places where garbage collection improved, some workloads will also use fewer billed CPU seconds.

We also sent Theo a pull request to update OpenNext with our improvements there, and with other test fixes.

But we’re far from done. We still have work to do to close the gap between OpenNext and Next.js on Vercel – but given the other benchmark results, it’s clear we can get there. We also have plans for further improvements to our scheduling algorithm, so that requests almost never block each other. We will continue to improve V8, and even Node.js – the Workers team employs multiple core contributors to each project. Our approach is simple: improve open source infrastructure so that everyone gets faster, then make sure our platform makes the most of those improvements.

And, obviously, we’ll be writing more benchmarks, to make sure we’re catching these kinds of issues ourselves in the future. If you have a benchmark that shows Workers being slower, send it to us with a repro. We will profile it, fix what we can upstream, and share back what we learn!

Payload on Workers: a full-fledged CMS, running entirely on Cloudflare’s stack

Post Syndicated from Jason Kincaid original https://blog.cloudflare.com/payload-cms-workers/

Tucked behind the administrator login screen of countless websites is one of the Internet’s unsung heroes: the Content Management System (CMS). This seemingly basic piece of software is used to draft and publish blog posts, organize media assets, manage user profiles, and perform countless other tasks across a dizzying array of use cases. One standout in this category is a vibrant open-source project called Payload, which has over 35,000 stars on GitHub and has generated so much community excitement that it was recently acquired by Figma.

Today we’re excited to showcase a new template from the Payload team, which makes it possible to deploy a full-fledged CMS to Cloudflare’s platform in a single click: just click the Deploy to Cloudflare button to generate a fully-configured Payload instance, complete with bindings to Cloudflare D1 and R2. Below we’ll dig into the technical work that enables this, some of the opportunities it unlocks, and how we’re using Payload to help power Cloudflare TV. But first, a look at why hosting a CMS on Workers is such a game changer.


Behind the scenes: Cloudflare TV’s Payload instance

Serverless by design

Most CMSs are designed to be hosted on a conventional server that runs 24/7. That means you need to provision hardware or virtual machines, install the CMS software and dependencies, manage ports and firewalls, and navigate ongoing maintenance and scaling hurdles.

This presents significant operational overhead, and can be costly if your server needs to handle high volumes (or spiky peaks) of traffic. What’s worse, you’re paying for that server whether you have any active users or not. One of the superpowers of Cloudflare Workers is that your application and data are accessible 24/7, without needing a server running all the time. When people use your application, it spins up at the closest Cloudflare server, ready to go. When your users are asleep, the Worker spins down, and you don’t pay for compute you aren’t using. 

With Payload running on Workers, you get the best of conventional CMSs — fully configurable asset management, custom webhooks, a library of community plugins, version history — all in a serverless form factor. We’ve been piloting the Payload-on-Workers template with an instance of our 24/7 video platform Cloudflare TV, which we use as a test bed for new technologies. Migrating from a conventional CMS was painless, thanks to its support for common features like conditional logic and an extensive set of components for building out our admin dashboard. Our content library has over 2,000 episodes and 70,000 assets, and Payload’s filtering and search features help us navigate them with ease.

It is worth reiterating just how many use cases CMSs can fulfill, from publishing to ecommerce to bespoke application dashboards whipped up by Claude Code or Codex. CMSs provide the sort of interface that less-technical users can pick up intuitively, and can be molded into whatever shape best fits the project. We’re excited to see what people get to building.

OpenNext opens doors

Payload first launched in 2022 as a Node/Express.js application and quickly began building steam. In 2024, it introduced native support for the popular Next.js framework, which helped pave the way for today’s announcement: this year, Cloudflare became the best place to host your applications built on Next.js, with the GA release of our OpenNext adapter.

Thanks to this adapter, porting Payload to OpenNext was relatively straightforward using the official OpenNext Get Started guide. Because we wanted the application to run seamlessly on Workers, with all the benefits of Workers Bindings, we set out to ensure support for Cloudflare’s database and storage products.

Database

For our initial approach, we began by connecting Payload to an external Postgres database, using the official @payloadcms/db-postgres adapter. Thanks to Workers support for the node-postgres package, everything worked pretty much straight away. As connections cannot be shared across requests, we just had to disable connection pooling:

import { buildConfig } from 'payload'
import { postgresAdapter } from '@payloadcms/db-postgres'

export default buildConfig({
  …
  db: postgresAdapter({
    pool: {
      connectionString: process.env.DATABASE_URI,
      max: 1,
      min: 0,
      idleTimeoutMillis: 1,
    },
  }),
  …
});

Of course, disabling connection pooling increases the overall latency, as each request needs to first establish a new connection with the database. To address this, we put Hyperdrive in front of it, which not only maintains a pool of connections across the Cloudflare network, by setting up a tunnel to the database server, but also adds a query cache, significantly improving the performance.

import { buildConfig } from 'payload'
import { postgresAdapter } from '@payloadcms/db-postgres'
import { getCloudflareContext } from '@opennextjs/cloudflare';

const cloudflare = await getCloudflareContext({ async: true });

export default buildConfig({
  …
  db: postgresAdapter({
    pool: {
      connectionString: cloudflare.env.HYPERDRIVE.connectionString,
      max: 1,
      min: 0,
      idleTimeoutMillis: 1,
    },
  }),
  …
});

Database with D1

With Postgres working, we next sought to add support for D1, Cloudflare’s managed serverless database, built on top of SQLite.

Payload doesn’t support D1 out of the box, but has support for SQLite via the @payloadcms/db-sqlite adapter, which uses Drizzle ORM alongside libSQL. Thankfully, Drizzle also has support for D1, so we decided to build a custom adapter for D1, using the SQLite one as a base.

The main difference between D1 and libSQL is on the result object, so we built a small method to map the result from D1 into the format expected by libSQL:

export const execute: Execute<any> = function execute({ db, drizzle, raw, sql: statement }) {
  const executeFrom = (db ?? drizzle)!
  const mapToLibSql = (query: SQLiteRaw<D1Result<unknown>>) => {
    const execute = query.execute
    query.execute = async () => {
      const result: D1Result = await execute()
      const resultLibSQL: Omit<ResultSet, 'toJSON'> = {
        columns: undefined,
        columnTypes: undefined,
        lastInsertRowid: BigInt(result.meta.last_row_id),
        rows: result.results as any[],
        rowsAffected: result.meta.rows_written,
      }

      return Object.assign(result, resultLibSQL)
    }

    return query
  }

  if (raw) {
    const result = mapToLibSql(executeFrom.run(sql.raw(raw)))
    return result
  } else {
    const result = mapToLibSql(executeFrom.run(statement!))
    return result
  }
}

Other than that, it was just a matter of passing the D1 binding directly into Drizzle’s constructor in order to get it working.

For applying database migrations during deployment, we used the newly released remote bindings feature of Wrangler to connect to the remote database, using the same binding. This way we didn’t need to configure any API tokens to be able to interact with the database.

Media storage with R2

Payload provides an official S3 storage adapter, via the @payloadcms/storage-s3 package. R2 is S3-compatible, which means we could have used the official adapter, but similar to the database, we wanted to use the R2 binding instead of having to create API tokens.

Therefore, we decided to also build a custom storage adapter for R2. This one was pretty straightforward, as the binding already handles most of the work:

import type { Adapter } from '@payloadcms/plugin-cloud-storage/types'
import path from 'path'

const isMiniflare = process.env.NODE_ENV === 'development';

export const r2Storage: (bucket: R2Bucket) => Adapter = (bucket) => ({ prefix = '' }) => {
  const key = (filename: string) => path.posix.join(prefix, filename)
  return {
    name: 'r2',
    handleDelete: ({ filename }) => bucket.delete(key(filename)),
    handleUpload: async ({ file }) => {
      // Read more: https://github.com/cloudflare/workers-sdk/issues/6047#issuecomment-2691217843
      const buffer = isMiniflare ? new Blob([file.buffer]) : file.buffer
      await bucket.put(key(file.filename), buffer)
    },
    staticHandler: async (req, { params }) => {
      // Due to https://github.com/cloudflare/workers-sdk/issues/6047
      // We cannot send a Headers instance to Miniflare
      const obj = await bucket?.get(key(params.filename), { range: isMiniflare ? undefined : req.headers })
      if (obj?.body == undefined) return new Response(null, { status: 404 })

      const headers = new Headers()
      if (!isMiniflare) obj.writeHttpMetadata(headers)

      return obj.etag === (req.headers.get('etag') || req.headers.get('if-none-match'))
        ? new Response(null, { headers, status: 304 })
        : new Response(obj.body, { headers, status: 200 })
    },
  }
}

Deployment

With the database and storage adapters in place, we were able to successfully launch an instance of Payload, running completely on Cloudflare’s Developer Platform.

The blank template consists of a simple database with just two tables, one for media and another for the users. In this template it’s possible to sign up, create new users and upload media files. Then, it’s quite easy to expand with additional collections, relationships and custom fields, by modifying Payload’s configuration.

Performance optimization with Read Replicas

By default, D1 is placed in a single location, customizable via a location hint. As Payload is deployed as a Worker, requests may be coming from any part of the world and so latency will be all over the place when connecting to the database.

To solve this, we can make use of D1’s global read replication, which deploys multiple read-only replicas across the globe. To select the correct replica and ensure sequential consistency, D1 uses sessions, with a bookmark that needs to be passed around.

Drizzle doesn’t support D1 sessions yet, but we can still use the “first-primary” type of session, in which the first query will always hit the primary instance and subsequent queries may hit one of the replicas. Updating the adapter to use replicas is just a matter of updating the Drizzle initialization to pass the D1 session directly:

this.drizzle = drizzle(this.binding.withSession("first-primary"), 
{ logger, schema: this.schema });

After this simple change, we saw immediate latency improvements, with the P50 wall-time for requests from across the globe reduced by 60% when connecting to a database located in Eastern North America. Read replicas, as the name implies, only affect read-only queries, so any write operations will always be forwarded to the primary instance, but for our use case, reads are most of the traffic.

No read replicas

Read replicas enabled

Improvement

P50

300ms

120ms

-60%

P90

480ms

250ms

-48%

P99

760ms

550ms

-28%

Wall time for requests to the Payload worker, each involving two database calls, as reported by Cloudflare Dash. Load was generated via 4 globally distributed uptime checks making a request every 60s to 4 distinct URLs.

Because we’ll be relying on Payload for managing Cloudflare TV’s enormous content library, we’re well positioned to test it at scale, and will continue to submit PRs with optimizations and improvements as they arise.

The right tool for the job

The potential use cases for CMSs are limitless, which is all the more reason it’s a good thing to have choices. We opted for Payload because of its extensive library of components, mature feature set, and large community — but it’s not the only Workers-compatible CMS in town.

Another exciting project is SonicJs (Docs), which is built from the ground up on Workers, D1, and Astro, promising blazing speeds and a malleable foundation. SonicJs is working on a version that’s well suited for collaborating with agentic AI assistants like Claude and Codex, and we’re excited to see how that develops. For lightweight use cases, microfeed is a self-hosted CMS on Cloudflare designed for managing podcasts, blogs, photos, and more.

These are each headless CMSs, which means you choose the frontend for your application. Don’t miss our recent announcement around sponsoring the powerful frameworks Astro and Tanstack, and find our complete guides to using these frameworks and others, including React + Vite, in the Workers Docs.

To get started using Payload right now, click the Deploy to Cloudflare button below, which will generate a fully functional Payload instance, including a D1 database and R2 bucket automatically bound to your worker. Find the README and more details in Payload’s template repository.

15 years of helping build a better Internet: a look back at Birthday Week 2025

Post Syndicated from Nikita Cano original https://blog.cloudflare.com/birthday-week-2025-wrap-up/

Cloudflare launched fifteen years ago with a mission to help build a better Internet. Over that time the Internet has changed and so has what it needs from teams like ours.  In this year’s Founder’s Letter, Matthew and Michelle discussed the role we have played in the evolution of the Internet, from helping encryption grow from 10% to 95% of Internet traffic to more recent challenges like how people consume content. 

We spend Birthday Week every year releasing the products and capabilities we believe the Internet needs at this moment and around the corner. Previous Birthday Weeks saw the launch of IPv6 gateway in 2011,  Universal SSL in 2014, Cloudflare Workers and unmetered DDoS protection in 2017, Cloudflare Radar in 2020, R2 Object Storage with zero egress fees in 2021,  post-quantum upgrades for Cloudflare Tunnel in 2022, Workers AI and Encrypted Client Hello in 2023. And those are just a sample of the launches.

This year’s themes focused on helping prepare the Internet for a new model of monetization that encourages great content to be published, fostering more opportunities to build community both inside and outside of Cloudflare, and evergreen missions like making more features available to everyone and constantly improving the speed and security of what we offer.

We shipped a lot of new things this year. In case you missed the dozens of blog posts, here is a breakdown of everything we announced during Birthday Week 2025. 

Monday, September 22

What

In a sentence …

Help build the future: announcing Cloudflare’s goal to hire 1,111 interns in 2026

To invest in the next generation of builders, we announced our most ambitious intern program yet with a goal to hire 1,111 interns in 2026.

Supporting the future of the open web: Cloudflare is sponsoring Ladybird and Omarchy

To support a diverse and open Internet, we are now sponsoring Ladybird (an independent browser) and Omarchy (an open-source Linux distribution and developer environment).

Come build with us: Cloudflare’s new hubs for startups

We are opening our office doors in four major cities (San Francisco, Austin, London, and Lisbon) as free hubs for startups to collaborate and connect with the builder community.

Free access to Cloudflare developer services for non-profit and civil society organizations

We extended our Cloudflare for Startups program to non-profits and public-interest organizations, offering free credits for our developer tools.

Introducing free access to Cloudflare developer features for students

We are removing cost as a barrier for the next generation by giving students with .edu emails 12 months of free access to our paid developer platform features.

Cap’n Web: a new RPC system for browsers and web servers

We open-sourced Cap’n Web, a new JavaScript-native RPC protocol that simplifies powerful, schema-free communication for web applications.

A lookback at Workers Launchpad and a warm welcome to Cohort #6

We announced Cohort #6 of the Workers Launchpad, our accelerator program for startups building on Cloudflare.

Tuesday, September 23

What

In a sentence …

Building unique, per-customer defenses against advanced bot threats in the AI era

New anomaly detection system that uses machine learning trained on each zone to build defenses against AI-driven bot attacks. 

Why Cloudflare, Netlify, and Webflow are collaborating to support Open Source tools

To support the open web, we joined forces with Webflow to sponsor Astro, and with Netlify to sponsor TanStack.

Launching the x402 Foundation with Coinbase, and support for x402 transactions

We are partnering with Coinbase to create the x402 Foundation, encouraging the adoption of the x402 protocol to allow clients and services to exchange value on the web using a common language

Helping protect journalists and local news from AI crawlers with Project Galileo

We are extending our free Bot Management and AI Crawl Control services to journalists and news organizations through Project Galileo.

Cloudflare Confidence Scorecards – making AI safer for the Internet

Automated evaluation of AI and SaaS tools, helping organizations to embrace AI without compromising security.

Wednesday, September 24

What

In a sentence …

Automatically Secure: how we upgraded 6,000,000 domains by default

Our Automatic SSL/TLS system has upgraded over 6 million domains to more secure encryption modes by default and will soon automatically enable post-quantum connections.

Giving users choice with Cloudflare’s new Content Signals Policy

The Content Signals Policy is a new standard for robots.txt that lets creators express clear preferences for how AI can use their content.

To build a better Internet in the age of AI, we need responsible AI bot principles

A proposed set of responsible AI bot principles to start a conversation around transparency and respect for content creators’ preferences.

Securing data in SaaS to SaaS applications

New security tools to give companies visibility and control over data flowing between SaaS applications.

Securing today for the quantum future: WARP client now supports post-quantum cryptography (PQC)

Cloudflare’s WARP client now supports post-quantum cryptography, providing quantum-resistant encryption for traffic. 

A simpler path to a safer Internet: an update to our CSAM scanning tool

We made our CSAM Scanning Tool easier to adopt by removing the need to create and provide unique credentials, helping more site owners protect their platforms.

Thursday, September 25

What

In a sentence …

Every Cloudflare feature, available to everyone

We are making every Cloudflare feature, starting with Single Sign On (SSO), available for anyone to purchase on any plan. 

Cloudflare’s developer platform keeps getting better, faster, and more powerful

Updates across Workers and beyond for a more powerful developer platform – such as support for larger and more concurrent Container images, support for external models from OpenAI and Anthropic in AI Search (previously AutoRAG), and more. 

Partnering to make full-stack fast: deploy PlanetScale databases directly from Workers

You can now connect Cloudflare Workers to PlanetScale databases directly, with connections automatically optimized by Hyperdrive.

Announcing the Cloudflare Data Platform

A complete solution for ingesting, storing, and querying analytical data tables using open standards like Apache Iceberg. 

R2 SQL: a deep dive into our new distributed query engine

A technical deep dive on R2 SQL, a serverless query engine for petabyte-scale datasets in R2.

Safe in the sandbox: security hardening for Cloudflare Workers

A deep-dive into how we’ve hardened the Workers runtime with new defense-in-depth security measures, including V8 sandboxes and hardware-assisted memory protection keys.

Choice: the path to AI sovereignty

To champion AI sovereignty, we’ve added locally-developed open-source models from India, Japan, and Southeast Asia to our Workers AI platform.

Announcing Cloudflare Email Service’s private beta

We announced the Cloudflare Email Service private beta, allowing developers to reliably send and receive transactional emails directly from Cloudflare Workers.

A year of improving Node.js compatibility in Cloudflare Workers

There are hundreds of new Node.js APIs now available that make it easier to run existing Node.js code on our platform. 

Friday, September 26

What

In a sentence …

Cloudflare just got faster and more secure, powered by Rust

We have re-engineered our core proxy with a new modular, Rust-based architecture, cutting median response time by 10ms for millions. 

Introducing Observatory and Smart Shield

New monitoring tools in the Cloudflare dashboard that provide actionable recommendations and one-click fixes for performance issues.

Monitoring AS-SETs and why they matter

Cloudflare Radar now includes Internet Routing Registry (IRR) data, allowing network operators to monitor AS-SETs to help prevent route leaks.

An AI Index for all our customers

We announced the private beta of AI Index, a new service that creates an AI-optimized search index for your domain that you control and can monetize.

Introducing new regional Internet traffic and Certificate Transparency insights on Cloudflare Radar

Sub-national traffic insights and Certificate Transparency dashboards for TLS monitoring.

Eliminating Cold Starts 2: shard and conquer

We have reduced Workers cold starts by 10x by implementing a new “worker sharding” system that routes requests to already-loaded Workers.

Network performance update: Birthday Week 2025

The TCP Connection Time (Trimean) graph shows that we are the fastest TCP connection time in 40% of measured ISPs – and the fastest across the top networks.

How Cloudflare uses performance data to make the world’s fastest global network even faster

We are using our network’s vast performance data to tune congestion control algorithms, improving speeds by an average of 10% for QUIC traffic.

Come build with us!

Helping build a better Internet has always been about more than just technology. Like the announcements about interns or working together in our offices, the community of people behind helping build a better Internet matters to its future. This week, we rolled out our most ambitious set of initiatives ever to support the builders, founders, and students who are creating the future.

For founders and startups, we are thrilled to welcome Cohort #6 to the Workers Launchpad, our accelerator program that gives early-stage companies the resources they need to scale. But we’re not stopping there. We’re opening our doors, literally, by launching new physical hubs for startups in our San Francisco, Austin, London, and Lisbon offices. These spaces will provide access to mentorship, resources, and a community of fellow builders.

We’re also investing in the next generation of talent. We announced free access to the Cloudflare developer platform for all students, giving them the tools to learn and experiment without limits. To provide a path from the classroom to the industry, we also announced our goal to hire 1,111 interns in 2026 — our biggest commitment yet to fostering future tech leaders.

And because a better Internet is for everyone, we’re extending our support to non-profits and public-interest organizations, offering them free access to our production-grade developer tools, so they can focus on their missions.

Whether you’re a founder with a big idea, a student just getting started, or a team working for a cause you believe in, we want to help you succeed.

Until next year

Thank you to our customers, our community, and the millions of developers who trust us to help them build, secure, and accelerate the Internet. Your curiosity and feedback drive our innovation.

It’s been an incredible 15 years. And as always, we’re just getting started!

Eliminating Cold Starts 2: shard and conquer

Post Syndicated from Harris Hancock original https://blog.cloudflare.com/eliminating-cold-starts-2-shard-and-conquer/

Five years ago, we announced that we were Eliminating Cold Starts with Cloudflare Workers. In that episode, we introduced a technique to pre-warm Workers during the TLS handshake of their first request. That technique takes advantage of the fact that the TLS Server Name Indication (SNI) is sent in the very first message of the TLS handshake. Armed with that SNI, we often have enough information to pre-warm the request’s target Worker.

Eliminating cold starts by pre-warming Workers during TLS handshakes was a huge step forward for us, but “eliminate” is a strong word. Back then, Workers were still relatively small, and had cold starts constrained by limits explained later in this post. We’ve relaxed those limits, and users routinely deploy complex applications on Workers, often replacing origin servers. Simultaneously, TLS handshakes haven’t gotten any slower. In fact, TLS 1.3 only requires a single round trip for a handshake – compared to three round trips for TLS 1.2 – and is more widely used than it was in 2021.

Earlier this month, we finished deploying a new technique intended to keep pushing the boundary on cold start reduction. The new technique (or old, depending on your perspective) uses a consistent hash ring to take advantage of our global network. We call this mechanism “Worker sharding”.


What’s in a cold start?

A Worker is the basic unit of compute in our serverless computing platform. It has a simple lifecycle. We instantiate it from source code (typically JavaScript), make it serve a bunch of requests (often HTTP, but not always), and eventually shut it down some time after it stops receiving traffic, to re-use its resources for other Workers. We call that shutdown process “eviction”.

The most expensive part of the Worker’s lifecycle is the initial instantiation and first request invocation. We call this part a “cold start”. Cold starts have several phases: fetching the script source code, compiling the source code, performing a top-level execution of the resulting JavaScript module, and finally, performing the initial invocation to serve the incoming HTTP request that triggered the whole sequence of events in the first place.


Cold starts have become longer than TLS handshakes

Fundamentally, our TLS handshake technique depends on the handshake lasting longer than the cold start. This is because the duration of the TLS handshake is time that the visitor must spend waiting, regardless, so it’s beneficial to everyone if we do as much work during that time as possible. If we can run the Worker’s cold start in the background while the handshake is still taking place, and if that cold start finishes before the handshake, then the request will ultimately see zero cold start delay. If, on the other hand, the cold start takes longer than the TLS handshake, then the request will see some part of the cold start delay – though the technique still helps reduce that visible delay.


In the early days, TLS handshakes lasting longer than Worker cold starts was a safe bet, and cold starts typically won the race. One of our early blog posts explaining how our platform works mentions 5 millisecond cold start times – and that was correct, at the time!

For every limit we have, our users have challenged us to relax them. Cold start times are no different. 

There are two crucial limits which affect cold start time: Worker script size and the startup CPU time limit. While we didn’t make big announcements at the time, we have quietly raised both of those limits since our last Eliminating Cold Starts blog post:

We relaxed these limits because our users wanted to deploy increasingly complex applications to our platform. And deploy they did! But the increases have a cost:

  • Increasing script size increases the amount of data we must transfer from script storage to the Workers runtime.

  • Increasing script size also increases the time complexity of the script compilation phase.

  • Increasing the startup CPU time limit increases the maximum top-level execution time.

Taken together, cold starts for complex applications began to lose the TLS handshake race.

Routing requests to an existing Worker

With relaxed script size and startup time limits, optimizing cold start time directly was a losing battle. Instead, we needed to figure out how to reduce the absolute number of cold starts, so that requests are simply less likely to incur one.

One option is to route requests to existing Worker instances, where before we might have chosen to start a new instance.

Previously, we weren’t particularly good at routing requests to existing Worker instances. We could trivially coalesce requests to a single Worker instance if they happened to land on a machine which already hosted a Worker, because in that case it’s not a distributed systems problem. But what if a Worker already existed in our data center on a different server, and some other server received a request for the Worker? We would always choose to cold start a new Worker on the machine which received the request, rather than forward the request to the machine with the already-existing Worker, even though forwarding the request would avoid the cold start.


To drive the point home: Imagine a visitor sends one request per minute to a data center with 300 servers, and that the traffic is load balanced evenly across all servers. On average, each server will receive one request every five hours. In particularly busy data centers, this span of time could be long enough that we need to evict the Worker to re-use its resources, resulting in a 100% cold start rate. That’s a terrible experience for the visitor.

Consequently, we found ourselves explaining to users, who saw high latency while prototyping their applications, that their latency would counterintuitively decrease once they put sufficient traffic on our network. This highlighted the inefficiency in our original, simple design.

If, instead, those requests were all coalesced onto one single server, we would notice multiple benefits. The Worker would receive one request per minute, which is short enough to virtually guarantee that it won’t be evicted. This would mean the visitor may experience a single cold start, and then have a 100% “warm request rate.” We would also use 99.7% (299 / 300) less memory serving this traffic. This makes room for other Workers, decreasing their eviction rate, and increasing their warm request rates, too – a virtuous cycle!

There’s a cost to coalescing requests to a single instance, though, right? After all, we’re adding latency to requests if we have to proxy them around the data center to a different server.

In practice, the added time-to-first-byte is less than one millisecond, and is the subject of continual optimization by our IPC and performance teams. One millisecond is far less than a typical cold start, meaning it’s always better, in every measurable way, to proxy a request to a warm Worker than it is to cold start a new one.

The consistent hash ring

A solution to this very problem lies at the heart of many of our products, including one of our oldest: the HTTP cache in our Content Delivery Network.

When a visitor requests a cacheable web asset through Cloudflare, the request gets routed through a pipeline of proxies. One of those proxies is a caching proxy, which stores the asset for later, so we can serve it to future requests without having to request it from the origin again.

A Worker cold start is analogous to an HTTP cache miss, in that a request to a warm Worker is like an HTTP cache hit.

When our standard HTTP proxy pipeline routes requests to the caching layer, it chooses a cache server based on the request’s cache key to optimize the HTTP cache hit rate. The cache key is the request’s URL, plus some other details. This technique is often called “sharding”. The servers are considered to be individual shards of a larger, logical system – in this case a data center’s HTTP cache. So, we can say things like, “Each data center contains one logical HTTP cache, and that cache is sharded across every server in the data center.”

Until recently, we could not make the same claim about the set of Workers in a data center. Instead, each server contained its own standalone set of Workers, and they could easily duplicate effort.

We borrow the cache’s trick to solve that. In fact, we even use the same type of data structure used by our HTTP cache to choose servers: a consistent hash ring. A naive sharding implementation might use a classic hash table mapping Worker script IDs to server addresses. That would work fine for a set of servers which never changes. But servers are actually ephemeral and have their own lifecycle. They can crash, get rebooted, taken out for maintenance, or decommissioned. New ones can come online. When these events occur, the size of the hash table would change, necessitating a re-hashing of the whole table. Every Worker’s home server would change, and all sharded Workers would be cold started again!

A consistent hash ring improves this scenario significantly. Instead of establishing a direct correspondence between script IDs and server addresses, we map them both to a number line whose end wraps around to its beginning, also known as a ring. To look up the home server of a Worker, first we hash its script, and then we find where it lies on the ring. Next, we take the server address which comes directly on or after that position on the ring, and consider that the Worker’s home.


If a new server appears for some reason, all the Workers that lie before it on the ring get re-homed, but none of the other Workers are disturbed. Similarly, if a server disappears, all the Workers which lay before it on the ring get re-homed.


We refer to the Worker’s home server as the “shard server”. In request flows involving sharding, there is also a “shard client”. It’s also a server! The shard client initially receives a request, and, using its consistent hash ring, looks up which shard server it should send the request to. I’ll be using these two terms – shard client and shard server – in the rest of this post.

Handling overload

The nature of HTTP assets lend themselves well to sharding. If they are cacheable, they are static, at least for their cache Time to Live (TTL) duration. So, serving them requires time and space complexity which scales linearly with their size.

But Workers aren’t JPEGs. They are live units of compute which can use up to five minutes of CPU time per request. Their time and space complexity do not necessarily scale with their input size, and can vastly outstrip the amount of computing power we must dedicate to serving even a huge file from cache.

This means that individual Workers can easily get overloaded when given sufficient traffic. So, no matter what we do, we need to keep in mind that we must be able to scale back up to infinity. We will never be able to guarantee that a data center has only one instance of a Worker, and we must always be able to horizontally scale at the drop of a hat to support burst traffic. Ideally this is all done without producing any errors.

This means that a shard server must have the ability to refuse requests to invoke Workers on it, and shard clients must always gracefully handle this scenario.

Two load shedding options

I am aware of two general solutions to shedding load gracefully, without serving errors.

In the first solution, the client asks politely if it may issue the request. It then sends the request if it  receives a positive response. If it instead receives a “go away” response, it handles the request differently, like serving it locally. In HTTP, this pattern can be found in Expect: 100-continue semantics. The main downside is that this introduces one round-trip of latency to set the expectation of success before the request can be sent. (Note that a common naive solution is to just retry requests. This works for some kinds of requests, but is not a general solution, as requests may carry arbitrarily large bodies.)


The second general solution is to send the request without confirming that it can be handled by the server, then count on the server to forward the request elsewhere if it needs to. This could even be back to the client. This avoids the round-trip of latency that the first solution incurs, but there is a tradeoff: It puts the shard server in the request path, pumping bytes back to the client. Fortunately, we have a trick to minimize the amount of bytes we actually have to send back in this fashion, which I’ll describe in the next section.


Optimistically sending sharded requests

There are a couple of reasons why we chose to optimistically send sharded requests without waiting for permission.

The first reason of note is that we expect to see very few of these refused requests in practice. The reason is simple: If a shard client receives a refusal for a Worker, then it must cold start the Worker locally. As a consequence, it can serve all future requests locally without incurring another cold start. So, after a single refusal, the shard client won’t shard that Worker any more (until traffic for the Worker tapers off enough for an eviction, at least).

Generally, this means we expect that if a request gets sharded to a different server, the shard server will most likely accept the request for invocation. Since we expect success, it makes a lot more sense to optimistically send the entire request to the shard server than it does to incur a round-trip penalty to establish permission first.

The second reason is that we have a trick to avoid paying too high a cost for proxying the request back to the client, as I mentioned above.

We implement our cross-instance communication in the Workers runtime using Cap’n Proto RPC, whose distributed object model enables some incredible features, like JavaScript-native RPC. It is also the elder, spiritual sibling to the just-released Cap’n Web.

In the case of sharding, Cap’n Proto makes it very easy to implement an optimal request refusal mechanism. When the shard client assembles the sharded request, it includes a handle (called a capability in Cap’n Proto) to a lazily-loaded local instance of the Worker. This lazily-loaded instance has the same exact interface as any other Worker exposed over RPC. The difference is just that it’s lazy – it doesn’t get cold started until invoked. In the event the shard server decides it must refuse the request, it does not return a “go away” response, but instead returns the shard client’s own lazy capability!

The shard client’s application code only sees that it received a capability from the shard server. It doesn’t know where that capability is actually implemented. But the shard client’s RPC system does know where the capability lives! Specifically, it recognizes that the returned capability is actually a local capability – the same one that it passed to the shard server. Once it realizes this, it also realizes that any request bytes it continues to send to the shard server will just come looping back. So, it stops sending more request bytes, waits to receive back from the shard server all the bytes it already sent, and shortens the request path as soon as possible. This takes the shard server entirely out of the loop, preventing a “trombone effect.”

Workers invoking Workers

With load shedding behavior figured out, we thought the hard part was over.

But, of course, Workers may invoke other Workers. There are many ways this could occur, most obviously via Service Bindings. Less obviously, many of our favorite features, such as Workers KV, are actually cross-Worker invocations. But there is one product, in particular, that stands out for its powerful ability to invoke other Workers: Workers for Platforms.

Workers for Platforms allows you to run your own functions-as-a-service on Cloudflare infrastructure. To use the product, you deploy three special types of Workers:

  • a dynamic dispatch Worker

  • any number of user Workers

  • an optional, parameterized outbound Worker

A typical request flow for Workers for Platforms goes like so: First, we invoke the dynamic dispatch Worker. The dynamic dispatch Worker chooses and invokes a user Worker. Then, the user Worker invokes the outbound Worker to intercept its subrequests. The dynamic dispatch Worker chose the outbound Worker’s arguments prior to invoking the user Worker.

To really amp up the fun, the dynamic dispatch Worker could have a tail Worker attached to it. This tail Worker would need to be invoked with traces related to all the preceding invocations. Importantly, it should be invoked one single time with all events related to the request flow, not invoked multiple times for different fragments of the request flow.

You might further ask, can you nest Workers for Platforms? I don’t know the official answer, but I can tell you that the code paths do exist, and they do get exercised.

To support this nesting doll of Workers, we keep a context stack during invocations. This context includes things like ownership overrides, resource limit overrides, trust levels, tail Worker configurations, outbound Worker configurations, feature flags, and so on. This context stack was manageable-ish when everything was executed on a single thread. For sharding to be truly useful, though, we needed to be able to move this context stack around to other machines.

Our choice of Cap’n Proto RPC as our primary communications medium helped us make sense of it all. To shard Workers deep within a stack of invocations, we serialize the context stack into a Cap’n Proto data structure and send it to the shard server. The shard server deserializes it into native objects, and continues the execution where things left off.

As with load shedding, Cap’n Proto’s distributed object model provides us simple answers to otherwise difficult questions. Take the tail Worker question – how do we coalesce tracing data from invocations which got fanned out across any number of other servers back to one single place? Easy: create a capability (a live Cap’n Proto object) for a reportTraces() callback on the dynamic dispatch Worker’s home server, and put that in the serialized context stack. Now, that context stack can be passed around at will. That context stack will end up in multiple places: At a minimum, it will end up on the user Worker’s shard server and the outbound Worker’s shard server. It may also find its way to other shard servers if any of those Workers invoked service bindings! Each of those shard servers can call the reportTraces() callback, and be confident that the data will make its way back to the right place: the dynamic dispatch Worker’s home server. None of those shard servers need to actually know where that home server is. Phew!

Eviction rates down, warm request rates up

Features like this are always satisfying to roll out, because they produce graphs showing huge efficiency gains.

Once fully rolled out, only about 4% of total requests from enterprise traffic ended up being sharded. To put that another way, 96% of all enterprise requests are to Workers which are sufficiently loaded that we must run multiple instances of them in a data center.


Despite that low total rate of sharding, we reduced our global Worker eviction rate by 10x. 


Our eviction rate is a measure of memory pressure within our system. You can think of it like garbage collection at a macro level, and it has the same implications. Fewer evictions means our system uses memory more efficiently. This has the happy consequence of using less CPU to clean up our memory. More relevant to Workers users, the increased efficiency means we can keep Workers in memory for an order of magnitude longer, improving their warm request rate and reducing their latency.

The high leverage shown – sharding just 4% of our traffic to improve memory efficiency by 10x – is a consequence of the power-law distribution of Internet traffic.

A power law distribution is a phenomenon which occurs across many fields of science, including linguistics, sociology, physics, and, of course, computer science. Events which follow power law distributions typically see a huge amount clustered in some small number of “buckets”, and the rest spread out across a large number of those “buckets”. Word frequency is a classic example: A small handful of words like “the”, “and”, and “it” occur in texts with extremely high frequency, while other words like “eviction” or “trombone” might occur only once or twice in a text.

In our case, the majority of Workers requests goes to a small handful of high-traffic Workers, while a very long tail goes to a huge number of low-traffic Workers. The 4% of requests which were sharded are all to low-traffic Workers, which are the ones that benefit the most from sharding.

So did we eliminate cold starts? Or will there be an Eliminating Cold Starts 3 in our future?


For enterprise traffic, our warm request rate increased from 99.9% to 99.99% – that’s three 9’s to four 9’s. Conversely, this means that the cold start rate went from 0.1% to 0.01% of requests, a 10x decrease. A moment’s thought, and you’ll realize that this is coherent with the eviction rate graph I shared above: A 10x decrease in the number of Workers we destroy over time must imply we’re creating 10x fewer to begin with.

Simultaneously, our warm request rate became less volatile throughout the course of the day.

Hmm.

I hate to admit this to you, but I still notice a little bit of space at the top of the graph. 😟

Can you help us get to five 9’s?

Code Mode: the better way to use MCP

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/code-mode/

It turns out we’ve all been using MCP wrong.

Most agents today use MCP by directly exposing the “tools” to the LLM.

We tried something different: Convert the MCP tools into a TypeScript API, and then ask an LLM to write code that calls that API.

The results are striking:

  1. We found agents are able to handle many more tools, and more complex tools, when those tools are presented as a TypeScript API rather than directly. Perhaps this is because LLMs have an enormous amount of real-world TypeScript in their training set, but only a small set of contrived examples of tool calls.

  2. The approach really shines when an agent needs to string together multiple calls. With the traditional approach, the output of each tool call must feed into the LLM’s neural network, just to be copied over to the inputs of the next call, wasting time, energy, and tokens. When the LLM can write code, it can skip all that, and only read back the final results it needs.

In short, LLMs are better at writing code to call MCP, than at calling MCP directly.

What’s MCP?

For those that aren’t familiar: Model Context Protocol is a standard protocol for giving AI agents access to external tools, so that they can directly perform work, rather than just chat with you.

Seen another way, MCP is a uniform way to:

  • expose an API for doing something,

  • along with documentation needed for an LLM to understand it,

  • with authorization handled out-of-band.

MCP has been making waves throughout 2025 as it has suddenly greatly expanded the capabilities of AI agents.

The “API” exposed by an MCP server is expressed as a set of “tools”. Each tool is essentially a remote procedure call (RPC) function – it is called with some parameters and returns a response. Most modern LLMs have the capability to use “tools” (sometimes called “function calling”), meaning they are trained to output text in a certain format when they want to invoke a tool. The program invoking the LLM sees this format and invokes the tool as specified, then feeds the results back into the LLM as input.

Anatomy of a tool call

Under the hood, an LLM generates a stream of “tokens” representing its output. A token might represent a word, a syllable, some sort of punctuation, or some other component of text.

A tool call, though, involves a token that does not have any textual equivalent. The LLM is trained (or, more often, fine-tuned) to understand a special token that it can output that means “the following should be interpreted as a tool call,” and another special token that means “this is the end of the tool call.” Between these two tokens, the LLM will typically write tokens corresponding to some sort of JSON message that describes the call.

For instance, imagine you have connected an agent to an MCP server that provides weather info, and you then ask the agent what the weather is like in Austin, TX. Under the hood, the LLM might generate output like the following. Note that here we’ve used words in <| and |> to represent our special tokens, but in fact, these tokens do not represent text at all; this is just for illustration.

I will use the Weather MCP server to find out the weather in Austin, TX.

I will use the Weather MCP server to find out the weather in Austin, TX.

<|tool_call|>
{
  "name": "get_current_weather",
  "arguments": {
    "location": "Austin, TX, USA"
  }
}
<|end_tool_call|>

Upon seeing these special tokens in the output, the LLM’s harness will interpret the sequence as a tool call. After seeing the end token, the harness pauses execution of the LLM. It parses the JSON message and returns it as a separate component of the structured API result. The agent calling the LLM API sees the tool call, invokes the relevant MCP server, and then sends the results back to the LLM API. The LLM’s harness will then use another set of special tokens to feed the result back into the LLM:

<|tool_result|>
{
  "location": "Austin, TX, USA",
  "temperature": 93,
  "unit": "fahrenheit",
  "conditions": "sunny"
}
<|end_tool_result|>

The LLM reads these tokens in exactly the same way it would read input from the user – except that the user cannot produce these special tokens, so the LLM knows it is the result of the tool call. The LLM then continues generating output like normal.

Different LLMs may use different formats for tool calling, but this is the basic idea.

What’s wrong with this?

The special tokens used in tool calls are things LLMs have never seen in the wild. They must be specially trained to use tools, based on synthetic training data. They aren’t always that good at it. If you present an LLM with too many tools, or overly complex tools, it may struggle to choose the right one or to use it correctly. As a result, MCP server designers are encouraged to present greatly simplified APIs as compared to the more traditional API they might expose to developers.

Meanwhile, LLMs are getting really good at writing code. In fact, LLMs asked to write code against the full, complex APIs normally exposed to developers don’t seem to have too much trouble with it. Why, then, do MCP interfaces have to “dumb it down”? Writing code and calling tools are almost the same thing, but it seems like LLMs can do one much better than the other?

The answer is simple: LLMs have seen a lot of code. They have not seen a lot of “tool calls”. In fact, the tool calls they have seen are probably limited to a contrived training set constructed by the LLM’s own developers, in order to try to train it. Whereas they have seen real-world code from millions of open source projects.

Making an LLM perform tasks with tool calling is like putting Shakespeare through a month-long class in Mandarin and then asking him to write a play in it. It’s just not going to be his best work.

But MCP is still useful, because it is uniform

MCP is designed for tool-calling, but it doesn’t actually have to be used that way.

The “tools” that an MCP server exposes are really just an RPC interface with attached documentation. We don’t really have to present them as tools. We can take the tools, and turn them into a programming language API instead.

But why would we do that, when the programming language APIs already exist independently? Almost every MCP server is just a wrapper around an existing traditional API – why not expose those APIs?

Well, it turns out MCP does something else that’s really useful: It provides a uniform way to connect to and learn about an API.

An AI agent can use an MCP server even if the agent’s developers never heard of the particular MCP server, and the MCP server’s developers never heard of the particular agent. This has rarely been true of traditional APIs in the past. Usually, the client developer always knows exactly what API they are coding for. As a result, every API is able to do things like basic connectivity, authorization, and documentation a little bit differently.

This uniformity is useful even when the AI agent is writing code. We’d like the AI agent to run in a sandbox such that it can only access the tools we give it. MCP makes it possible for the agentic framework to implement this, by handling connectivity and authorization in a standard way, independent of the AI code. We also don’t want the AI to have to search the Internet for documentation; MCP provides it directly in the protocol.

OK, how does it work?

We have already extended the Cloudflare Agents SDK to support this new model!

For example, say you have an app built with ai-sdk that looks like this:

const stream = streamText({
  model: openai("gpt-5"),
  system: "You are a helpful assistant",
  messages: [
    { role: "user", content: "Write a function that adds two numbers" }
  ],
  tools: {
    // tool definitions 
  }
})

You can wrap the tools and prompt with the codemode helper, and use them in your app: 

import { codemode } from "agents/codemode/ai";

const {system, tools} = codemode({
  system: "You are a helpful assistant",
  tools: {
    // tool definitions 
  },
  // ...config
})

const stream = streamText({
  model: openai("gpt-5"),
  system,
  tools,
  messages: [
    { role: "user", content: "Write a function that adds two numbers" }
  ]
})

With this change, your app will now start generating and running code that itself will make calls to the tools you defined, MCP servers included. We will introduce variants for other libraries in the very near future. Read the docs for more details and examples. 

Converting MCP to TypeScript

When you connect to an MCP server in “code mode”, the Agents SDK will fetch the MCP server’s schema, and then convert it into a TypeScript API, complete with doc comments based on the schema.

For example, connecting to the MCP server at https://gitmcp.io/cloudflare/agents, will generate a TypeScript definition like this:

interface FetchAgentsDocumentationInput {
  [k: string]: unknown;
}
interface FetchAgentsDocumentationOutput {
  [key: string]: any;
}

interface SearchAgentsDocumentationInput {
  /**
   * The search query to find relevant documentation
   */
  query: string;
}
interface SearchAgentsDocumentationOutput {
  [key: string]: any;
}

interface SearchAgentsCodeInput {
  /**
   * The search query to find relevant code files
   */
  query: string;
  /**
   * Page number to retrieve (starting from 1). Each page contains 30
   * results.
   */
  page?: number;
}
interface SearchAgentsCodeOutput {
  [key: string]: any;
}

interface FetchGenericUrlContentInput {
  /**
   * The URL of the document or page to fetch
   */
  url: string;
}
interface FetchGenericUrlContentOutput {
  [key: string]: any;
}

declare const codemode: {
  /**
   * Fetch entire documentation file from GitHub repository:
   * cloudflare/agents. Useful for general questions. Always call
   * this tool first if asked about cloudflare/agents.
   */
  fetch_agents_documentation: (
    input: FetchAgentsDocumentationInput
  ) => Promise<FetchAgentsDocumentationOutput>;

  /**
   * Semantically search within the fetched documentation from
   * GitHub repository: cloudflare/agents. Useful for specific queries.
   */
  search_agents_documentation: (
    input: SearchAgentsDocumentationInput
  ) => Promise<SearchAgentsDocumentationOutput>;

  /**
   * Search for code within the GitHub repository: "cloudflare/agents"
   * using the GitHub Search API (exact match). Returns matching files
   * for you to query further if relevant.
   */
  search_agents_code: (
    input: SearchAgentsCodeInput
  ) => Promise<SearchAgentsCodeOutput>;

  /**
   * Generic tool to fetch content from any absolute URL, respecting
   * robots.txt rules. Use this to retrieve referenced urls (absolute
   * urls) that were mentioned in previously fetched documentation.
   */
  fetch_generic_url_content: (
    input: FetchGenericUrlContentInput
  ) => Promise<FetchGenericUrlContentOutput>;
};

This TypeScript is then loaded into the agent’s context. Currently, the entire API is loaded, but future improvements could allow an agent to search and browse the API more dynamically – much like an agentic coding assistant would.

Running code in a sandbox

Instead of being presented with all the tools of all the connected MCP servers, our agent is presented with just one tool, which simply executes some TypeScript code.

The code is then executed in a secure sandbox. The sandbox is totally isolated from the Internet. Its only access to the outside world is through the TypeScript APIs representing its connected MCP servers.

These APIs are backed by RPC invocation which calls back to the agent loop. There, the Agents SDK dispatches the call to the appropriate MCP server.

The sandboxed code returns results to the agent in the obvious way: by invoking console.log(). When the script finishes, all the output logs are passed back to the agent.


Dynamic Worker loading: no containers here

This new approach requires access to a secure sandbox where arbitrary code can run. So where do we find one? Do we have to run containers? Is that expensive?

No. There are no containers. We have something much better: isolates.

The Cloudflare Workers platform has always been based on V8 isolates, that is, isolated JavaScript runtimes powered by the V8 JavaScript engine.

Isolates are far more lightweight than containers. An isolate can start in a handful of milliseconds using only a few megabytes of memory.

Isolates are so fast that we can just create a new one for every piece of code the agent runs. There’s no need to reuse them. There’s no need to prewarm them. Just create it, on demand, run the code, and throw it away. It all happens so fast that the overhead is negligible; it’s almost as if you were just eval()ing the code directly. But with security.

The Worker Loader API

Until now, though, there was no way for a Worker to directly load an isolate containing arbitrary code. All Worker code instead had to be uploaded via the Cloudflare API, which would then deploy it globally, so that it could run anywhere. That’s not what we want for Agents! We want the code to just run right where the agent is.

To that end, we’ve added a new API to the Workers platform: the Worker Loader API. With it, you can load Worker code on-demand. Here’s what it looks like:

// Gets the Worker with the given ID, creating it if no such Worker exists yet.
let worker = env.LOADER.get(id, async () => {
  // If the Worker does not already exist, this callback is invoked to fetch
  // its code.

  return {
    compatibilityDate: "2025-06-01",

    // Specify the worker's code (module files).
    mainModule: "foo.js",
    modules: {
      "foo.js":
        "export default {\n" +
        "  fetch(req, env, ctx) { return new Response('Hello'); }\n" +
        "}\n",
    },

    // Specify the dynamic Worker's environment (`env`).
    env: {
      // It can contain basic serializable data types...
      SOME_NUMBER: 123,

      // ... and bindings back to the parent worker's exported RPC
      // interfaces, using the new `ctx.exports` loopback bindings API.
      SOME_RPC_BINDING: ctx.exports.MyBindingImpl({props})
    },

    // Redirect the Worker's `fetch()` and `connect()` to proxy through
    // the parent worker, to monitor or filter all Internet access. You
    // can also block Internet access completely by passing `null`.
    globalOutbound: ctx.exports.OutboundProxy({props}),
  };
});

// Now you can get the Worker's entrypoint and send requests to it.
let defaultEntrypoint = worker.getEntrypoint();
await defaultEntrypoint.fetch("http://example.com");

// You can get non-default entrypoints as well, and specify the
// `ctx.props` value to be delivered to the entrypoint.
let someEntrypoint = worker.getEntrypoint("SomeEntrypointClass", {
  props: {someProp: 123}
});

You can start playing with this API right now when running workerd locally with Wrangler (check out the docs), and you can sign up for beta access to use it in production.

Workers are better sandboxes

The design of Workers makes it unusually good at sandboxing, especially for this use case, for a few reasons:

Faster, cheaper, disposable sandboxes

The Workers platform uses isolates instead of containers. Isolates are much lighter-weight and faster to start up. It takes mere milliseconds to start a fresh isolate, and it’s so cheap we can just create a new one for every single code snippet the agent generates. There’s no need to worry about pooling isolates for reuse, prewarming, etc.

We have not yet finalized pricing for the Worker Loader API, but because it is based on isolates, we will be able to offer it at a significantly lower cost than container-based solutions.

Isolated by default, but connected with bindings

Workers are just better at handling isolation.

In Code Mode, we prohibit the sandboxed worker from talking to the Internet. The global fetch() and connect() functions throw errors.

But on most platforms, this would be a problem. On most platforms, the way you get access to private resources is, you start with general network access. Then, using that network access, you send requests to specific services, passing them some sort of API key to authorize private access.

But Workers has always had a better answer. In Workers, the “environment” (env object) doesn’t just contain strings, it contains live objects, also known as “bindings”. These objects can provide direct access to private resources without involving generic network requests.

In Code Mode, we give the sandbox access to bindings representing the MCP servers it is connected to. Thus, the agent can specifically access those MCP servers without having network access in general.

Limiting access via bindings is much cleaner than doing it via, say, network-level filtering or HTTP proxies. Filtering is hard on both the LLM and the supervisor, because the boundaries are often unclear: the supervisor may have a hard time identifying exactly what traffic is legitimately necessary to talk to an API. Meanwhile, the LLM may have difficulty guessing what kinds of requests will be blocked. With the bindings approach, it’s well-defined: the binding provides a JavaScript interface, and that interface is allowed to be used. It’s just better this way.

No API keys to leak

An additional benefit of bindings is that they hide API keys. The binding itself provides an already-authorized client interface to the MCP server. All calls made on it go to the agent supervisor first, which holds the access tokens and adds them into requests sent on to MCP.

This means that the AI cannot possibly write code that leaks any keys, solving a common security problem seen in AI-authored code today.

Try it now!

Sign up for the production beta

The Dynamic Worker Loader API is in closed beta. To use it in production, sign up today.

Or try it locally

If you just want to play around, though, Dynamic Worker Loading is fully available today when developing locally with Wrangler and workerd – check out the docs for Dynamic Worker Loading and code mode in the Agents SDK to get started.

Choice: the path to AI sovereignty

Post Syndicated from Carly Ramsey original https://blog.cloudflare.com/sovereign-ai-and-choice/

Every government is laser-focused on the potential for national transformation by AI. Many view AI as an unparalleled opportunity to solve complex national challenges, drive economic growth, and improve the lives of their citizens. Others are concerned about the risks AI can bring to its society and economy. Some sit somewhere between these two perspectives. But as plans are drawn up by governments around the world to address the question of AI development and adoption, all are grappling with the critical question of sovereignty — how much of this technology, mostly centered in the United States and China, needs to be in their direct control? 

Each nation has their own response to that question — some seek ‘self-sufficiency’ and total authority. Others, particularly those that do not have the capacity to build the full AI technology stack, are approaching it layer-by-layer, seeking to build on the capacities their country does have and then forming strategic partnerships to fill the gaps. 

We believe AI sovereignty at its core is about choice. Each nation should have the ability to select the right tools for the task, to control its own data, and to deploy applications at will, all without being locked into a single provider or a single way of doing things. It’s about autonomy and options, realized through a diversified, resilient digital supply chain. 

Cloudflare’s mission is to help build a better Internet. We make tools for developers around the world to build Internet and AI applications that are widely, and in many cases, freely, available. We work on standards to improve interoperability and prevent vendor lock-in. And we are global — our network spans 330 cities in over 125 countries. By supporting local developers to build and deploy AI tools and services right where they are, Cloudflare can help each nation on their path to greater AI sovereignty. 

Creating a future that enables many AI options

Many nations recognize the practical challenge of realizing a robust AI-driven future that incorporates sovereignty — the significant cost and complexity of the infrastructure needed to set AI in action. Cloudflare believes that countries can achieve their objectives by creating vibrant marketplaces that allow multiple options, and we are creating a path for governments that provides maximum choice:

Infrastructure accessibility: Countries often focus on building large data centers that have the compute capacity to train general purpose AI models, neglecting the infrastructure needed to effectively deploy AI. Because of their proximity to end users, distributed edge networks are critical to ensuring that consumers can actually use AI technologies at scale. Although some AI technologies will be designed to work on-device, many will need more power to run AI inference, the tasks that users ask an AI engine to complete.  Distributed networks are equipped to run AI workloads at the edge, to help deliver the low latency and high performance needed for advanced technologies. Cloudflare’s distributed network gives developers a path to rapidly deploy their apps globally without massive upfront investments. 

Inclusivity: Nations want their entire economies, from the small businesses, to research institutions, to non-profits and enterprises, to benefit from AI transformation. Serverless models like Cloudflare’s make it easy to get started. Developers pay only for what they use, rather than being locked into paying for expensive and unnecessary compute, dramatically lowering the barrier to entry. Our free tier allows developers to experiment, build, and even launch applications without any cost, while our pay-as-you-go model for increased usage removes the significant financial barriers that might otherwise keep advanced AI out of reach. 

Control over data: An important part of sovereignty is the ability to control your own data. We believe countries should avoid equating this type of control with data locality, focusing instead on integrating security tools that provide visibility and the ability to restrict access to data.  Cloudflare’s global, distributed network ensures that developers can experiment, build, and deploy AI-powered applications right where they are, setting rules and controls at the Internet edge.

Multi-modal, dynamic markets: Building new applications with closed AI models can make it challenging to switch models later, and can make developers dependent on particular providers. AI strategies must embrace diversity — developers should have access to a wide variety of both open source and closed AI models. Cloudflare’s Workers AI platform, with over 50 open source models, is model agnostic, helping to create a competitive, dynamic environment where developers can swap models in and out as better, cheaper, or more specialized options become available. Cloudflare’s AI Gateway allows our customers to connect and control all their AI models, regardless of vendor, in a single, unified, interoperable platform. 

Underpinning all of this is the importance of open standards that encourage interoperability. Open standards and protocols throughout the AI technology stack help prevent dependency, create dynamic and competitive markets, and create choice for governments and their developers.

Championing regional AI innovation  

Many countries have started to put their own mark on how to spur innovation in their markets, starting with large language models (LLM).  AI development to date has mostly centered around LLMs  trained on English-centric data, and increasingly, Chinese-centric data, leaving behind those who can’t fully access this technology in these two languages. Recognizing this gap, these nations are building and freely offering AI models trained on local language datasets that are fine-tuned to the nuances of their own cultures and languages. This approach lowers the barrier to entry for local businesses, organizations, and governments to create customized AI solutions for their specific markets. Open-sourcing these LLMs is to recognize that AI sovereignty is a means to an end. The goal is innovation, economic growth, and the ability to solve meaningful problems.

Cloudflare is now supporting these sovereign AI initiatives in India, Japan, and Southeast Asia. We are bringing these locally-developed, open-source AI models to developers around the world through our serverless inference platform, Workers AI.

India: India’s national vision is “AI for All”, which focuses on AI driving inclusive growth and social empowerment. India will host the momentous global AI Impact Summit in 2026, and a key element of showcasing empowering technological advancements that are accessible to the Global South. With its immense linguistic diversity, India is at the forefront of creating models that serve its hundreds of millions of Internet users in their native tongues. A cornerstone in this endeavor is the Government of India’s Bhashini, a digital public good platform that enables all Indian citizens to access the Internet and digital services in 22 official languages.

Cloudflare is now offering AI4Bharat’s IndicTrans2 model, a key open source language model that is also part of the Bhashini initiative. The model is able to translate text across 22 Indic languages, including Bengali, Gujarati, Hindi, Tamil, Sanskrit and even traditionally low-resourced languages like Kashmiri, Manipuri and Sindhi. 

You can use the @cf/ai4bharat/indictrans2-en-indic-1B model on Workers AI as follows:

curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai/run/@cf/ai4bharat/indictrans2-en-indic-1B \
  --header 'Authorization: Bearer TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
    "text": ["What is your favourite food?", "I like pizza"],
    "target_language": "guj_Gujr"
}'

Japan: Japan has a very clear and expansive vision of AI development. Concerned about Japan’s slow AI uptake, the Japanese government aims to make the country “the world’s most friendly AI nation” by creating the ideal conditions for AI growth, both at home and abroad. A major initiative for Japan’s government is supporting AI that deeply understands the complexities and cultural context of the Japanese language. 

Cloudflare is offering Preferred Networks, Inc.(PFN)  PLaMo-Embedding-1B, a home-grown Japanese text embedding model, made freely and openly available. The Japanese government supported PFN through its Generative AI Accelerator Challenge (GENIAC) program, which supports local LLM development through subsidized access to compute resources for training. The PLaMo Embedding model enables users to generate high-quality embeddings for Japanese text, which is helpful for building RAG-powered applications and semantic search use cases.

You can use the @cf/pfnet/plamo-embedding-1b model on Workers AI as follows:

curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai/run/@cf/pfnet/plamo-embedding-1b \
  --header 'Authorization: Bearer TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
  	"text": [
            "PLaMo-Embedding-1Bは、Preferred Networks, Inc. によって開発された日本語テキスト埋め込みモデルです。",
            "最近は随分と暖かくなりましたね。"
        ]
}'

Southeast Asia: As Chair of the Association of Southeast Asian Nations (ASEAN) Working Group on AI Governance, Singapore’s ambitious National AI Strategy 2.0 aims to ensure that AI is a public good, both for Southeast Asia and the world. As a cornerstone of this strategy, Singapore is championing the development and adoption of SEA-LION, a family of open-source LLMs designed for Southeast Asia’s diverse languages and cultures. The initiative aims to establish the nation as an inclusive global AI leader, ensuring the technology is both accessible and regionally relevant to its multilingual and multicultural populaces. The models are adept in numerous regional languages, including Bahasa Indonesia, Bahasa Malaysia, Thai, Vietnamese, and Tamil, unlocking AI technologies for a significant portion of the Asian and global population.

SEA-LION model v4-27B is now available on the Workers AI platform. SEA-LION v4 stands out on the Singapore government’s leaderboard as its most powerful, efficient, multimodal and multilingual model yet. 

You can use the @cf/aisingapore/gemma-sea-lion-v4-27b-it model on Workers AI as follows:

curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai/run/@cf/aisingapore/gemma-sea-lion-v4-27b-it \
  --header 'Authorization: Bearer TOKEN' \
  --header 'Content-Type: application/json' \
  --data '{
  "messages": [
    {
      "role": "user",
      "content": "แล้วทำผัดไทยอย่างไร"
    }
  ]
}'

Bringing AI models to the world

Singapore, India and Japan have all chosen to open-source many of their local language models, a strategy that champions an expansive vision of AI sovereignty. This approach demonstrates a crucial understanding: true AI sovereignty is ensuring you have choices. 

Supporting local language open source models is more than just supporting technology; this is a shared commitment to fostering an open, interoperable, and competitive AI ecosystem by empowering governments and developers to solve local problems, create economic opportunities, and preserve their digital and cultural heritages.

We are honored to support the initiatives of the governments of India, Japan, and Singapore on this journey. We believe that by putting their sovereign AI models into the hands of developers in their economies, we can help unlock a powerful wave of innovation that is more diverse, equitable, and representative of the world we live in. The future of AI is being built today, and we are proud to ensure that AI developers everywhere are at the forefront. 

Choice is the foundation of AI sovereignty. We’re starting with the models from India, Japan, and Singapore on our serverless inference platform, but it’s only the start. Come build with us! Take the first step for free on Workers AI.

Safe in the sandbox: security hardening for Cloudflare Workers

Post Syndicated from Erik Corry original https://blog.cloudflare.com/safe-in-the-sandbox-security-hardening-for-cloudflare-workers/

As a serverless cloud provider, we run your code on our globally distributed infrastructure. Being able to run customer code on our network means that anyone can take advantage of our global presence and low latency. Workers isn’t just efficient though, we also make it simple for our users. In short: You write code. We handle the rest.

Part of ‘handling the rest’ is making Workers as secure as possible. We have previously written about our security architecture. Making Workers secure is an interesting problem because the whole point of Workers is that we are running third party code on our hardware. This is one of the hardest security problems there is: any attacker has the full power available of a programming language running on the victim’s system when they are crafting their attacks.

This is why we are constantly updating and improving the Workers Runtime to take advantage of the latest improvements in both hardware and software. This post shares some of the latest work we have been doing to keep Workers secure.

Some background first: Workers is built around the V8 JavaScript runtime, originally developed for Chromium-based browsers like Chrome. This gives us a head start, because V8 was forged in an adversarial environment, where it has always been under intense attack and scrutiny. Like Workers, Chromium is built to run adversarial code safely. That’s why V8 is constantly being tested against the best fuzzers and sanitizers, and over the years, it has been hardened with new technologies like Oilpan/cppgc and improved static analysis.

We use V8 in a slightly different way, though, so we will be describing in this post how we have been making some changes to V8 to improve security in our use case.

Hardware-assisted security improvements from Memory Protection Keys

Modern CPUs from Intel, AMD, and ARM have support for memory protection keys, sometimes called PKU, Protection Keys for Userspace. This is a great security feature which increases the power of virtual memory and memory protection.

Traditionally, the memory protection features of the CPU in your PC or phone were mainly used to protect the kernel and to protect different processes from each other. Within each process, all threads had access to the same memory. Memory protection keys allow us to prevent specific threads from accessing memory regions they shouldn’t have access to.

V8 already uses memory protection keys for the JIT compilers. The JIT compilers for a language like JavaScript generate optimized, specialized versions of your code as it runs. Typically, the compiler is running on its own thread, and needs to be able to write data to the code area in order to install its optimized code. However, the compiler thread doesn’t need to be able to run this code. The regular execution thread, on the other hand, needs to be able to run, but not modify, the optimized code. Memory protection keys offer a way to give each thread the permissions it needs, but no more. And the V8 team in the Chromium project certainly aren’t standing still. They describe some of their future plans for memory protection keys here.

In Workers, we have some different requirements than Chromium. The security architecture for Workers uses V8 isolates to separate different scripts that are running on our servers. (In addition, we have extra mitigations to harden the system against Spectre attacks). If V8 is working as intended, this should be enough, but we believe in defense in depth: multiple, overlapping layers of security controls.

That’s why we have deployed internal modifications to V8 to use memory protection keys to isolate the isolates from each other. There are up to 15 different keys available on a modern x64 CPU and a few are used for other purposes in V8, so we have about 12 to work with. We give each isolate a random key which is used to protect its V8 heap data, the memory area containing the JavaScript objects a script creates as it runs. This means security bugs that might previously have allowed an attacker to read data from a different isolate would now hit a hardware trap in 92% of cases. (Assuming 12 keys, 92% is about 11/12.)


The illustration shows an attacker attempting to read from a different isolate. Most of the time this is detected by the mismatched memory protection key, which kills their script and notifies us, so we can investigate and remediate. The red arrow represents the case where the attacker got lucky by hitting an isolate with the same memory protection key, represented by the isolates having the same colors.

However, we can further improve on a 92% protection rate. In the last part of this blog post we’ll explain how we can lift that to 100% for a particular common scenario. But first, let’s look at a software hardening feature in V8 that we are taking advantage of.

The V8 sandbox, a software-based security boundary

Over the past few years, V8 has been gaining another defense in depth feature: the V8 sandbox. (Not to be confused with the layer 2 sandbox which Workers have been using since the beginning.) The V8 sandbox has been a multi-year project that has been gaining maturity for a while. The sandbox project stems from the observation that many V8 security vulnerabilities start by corrupting objects in the V8 heap memory. Attackers then leverage this corruption to reach other parts of the process, giving them the opportunity to escalate and gain more access to the victim’s browser, or even the entire system.

V8’s sandbox project is an ambitious software security mitigation that aims to thwart that escalation: to make it impossible for the attacker to progress from a corruption on the V8 heap to a compromise of the rest of the process. This means, among other things, removing all pointers from the heap. But first, let’s explain in as simple terms as possible, what a memory corruption attack is.

Memory corruption attacks

A memory corruption attack tricks a program into misusing its own memory. Computer memory is just a store of integers, where each integer is stored in a location. The locations each have an address, which is also just a number. Programs interpret the data in these locations in different ways, such as text, pixels, or pointers. Pointers are addresses that identify a different memory location, so they act as a sort of arrow that points to some other piece of data.

Here’s a concrete example, which uses a buffer overflow. This is a form of attack that was historically common and relatively simple to understand: Imagine a program has a small buffer (like a 16-character text field) followed immediately by an 8-byte pointer to some ordinary data. An attacker might send the program a 24-character string, causing a “buffer overflow.” Because of a vulnerability in the program, the first 16 characters fill the intended buffer, but the remaining 8 characters spill over and overwrite the adjacent pointer.


See below for how such an attack would now be thwarted.

Now the pointer has been redirected to point at sensitive data of the attacker’s choosing, rather than the normal data it was originally meant to access. When the program tries to use what it believes is its normal pointer, it’s actually accessing sensitive data chosen by the attacker.

This type of attack works in steps: first create a small confusion (like the buffer overflow), then use that confusion to create bigger problems, eventually gaining access to data or capabilities the attacker shouldn’t have.  The attacker can eventually use the misdirection to either steal information or plant malicious data that the program will treat as legitimate.

This was a somewhat abstract description of memory corruption attacks using a buffer overflow, one of the simpler techniques. For some much more detailed and recent examples, see this description from Google, or this breakdown of a V8 vulnerability.

Compressed pointers in V8

Many attacks are based on corrupting pointers, so ideally we would remove all pointers from the memory of the program.  Since an object-oriented language’s heap is absolutely full of pointers, that would seem, on its face, to be a hopeless task, but it is enabled by an earlier development. Starting in 2020, V8 has offered the option of saving memory by using compressed pointers. This means that, on a 64-bit system, the heap uses only 32 bit offsets, relative to a base address. This limits the total heap to maximally 4 GiB, a limitation that is acceptable for a browser, and also fine for individual scripts running in a V8 isolate on Cloudflare Workers.


An artificial object with various fields, showing how the layout differs in a compressed vs. an uncompressed heap. The boxes are 64 bits wide.

If the whole of the heap is in a single 4 GiB area then the first 32 bits of all pointers will be the same, and we don’t need to store them in every pointer field in every object. In the diagram we can see that the object pointers all start with 0x12345678, which is therefore redundant and doesn’t need to be stored. This means that object pointer fields and integer fields can be reduced from 64 to 32 bits.

We still need 64 bit fields for some fields like double precision floats and for the sandbox offsets of buffers, which are typically used by the script for input and output data. See below for details.

Integers in an uncompressed heap are stored in the high 32 bits of a 64 bit field. In the compressed heap, the top 31 bits of a 32 bit field are used. In both cases the lowest bit is set to 0 to indicate integers (as opposed to pointers or offsets).

Conceptually, we have two methods for compressing and decompressing, using a base address that is divisible by 4 GiB:

// Decompress a 32 bit offset to a 64 bit pointer by adding a base address.
void* Decompress(uint32_t offset) { return base + offset; }
// Compress a 64 bit pointer to a 32 bit offset by discarding the high bits.
uint32_t Compress(void* pointer) { return (intptr_t)pointer & 0xffffffff; }

This pointer compression feature, originally primarily designed to save memory, can be used as the basis of a sandbox.

From compressed pointers to the sandbox

The biggest 32-bit unsigned integer is about 4 billion, so the Decompress() function cannot generate any pointer that is outside the range [base, base + 4 GiB]. You could say the pointers are trapped in this area, so it is sometimes called the pointer cage. V8 can reserve 4 GiB of virtual address space for the pointer cage so that only V8 objects appear in this range. By eliminating all pointers from this range, and following some other strict rules, V8 can contain any memory corruption by an attacker to this cage. Even if an attacker corrupts a 32 bit offset within the cage, it is still only a 32 bit offset and can only be used to create new pointers that are still trapped within the pointer cage.


The buffer overflow attack from earlier no longer works because only the attacker’s own data is available in the pointer cage.

To construct the sandbox, we take the 4 GiB pointer cage and add another 4 GiB for buffers and other data structures to make the 8 GiB sandbox. This is why the buffer offsets above are 33 bits, so they can reach buffers in the second half of the sandbox (40 bits in Chromium with larger sandboxes). V8 stores these buffer offsets in the high 33 bits and shifts down by 31 bits before use, in case an attacker corrupted the low bits.

Cloudflare Workers have made use of compressed pointers in V8 for a while, but for us to get the full power of the sandbox we had to make some changes. Until recently, all isolates in a process had to be one single sandbox if you were using the sandboxed configuration of V8. This would have limited the total size of all V8 heaps to be less than 4 GiB, far too little for our architecture, which relies on serving 1000s of scripts at once.

That’s why we commissioned Igalia to add isolate groups to V8. Each isolate group has its own sandbox and can have 1 or more isolates within it. Building on this change we have been able to start using the sandbox, eliminating a whole class of potential security issues in one stroke. Although we can place multiple isolates in the same sandbox, we are currently only putting a single isolate in each sandbox.


The layout of the sandbox. In the sandbox there can be more than one isolate, but all their heap pages must be in the pointer cage: the first 4 GiB of the sandbox. Instead of pointers between the objects, we use 32 bit offsets. The offsets for the buffers are 33 bits, so they can reach the whole sandbox, but not outside it.

Virtual memory isn’t infinite, there’s a lot going on in a Linux process

At this point, we were not quite done, though. Each sandbox reserves 8 GiB of space in the virtual memory map of the process, and it must be 4 GiB aligned for efficiency. It uses much less physical memory, but the sandbox mechanism requires this much virtual space for its security properties. This presents us with a problem, since a Linux process ‘only’ has 128 TiB of virtual address space in a 4-level page table (another 128 TiB are reserved for the kernel, not available to user space).

At Cloudflare, we want to run Workers as efficiently as possible to keep costs and prices down, and to offer a generous free tier. That means that on each machine we have so many isolates running (one per sandbox) that it becomes hard to place them all in a 128 TiB space.

Knowing this, we have to place the sandboxes carefully in memory. Unfortunately, the Linux syscall, mmap, does not allow us to specify the alignment of an allocation unless you can guess a free location to request. To get an 8 GiB area that is 4 GiB aligned, we have to ask for 12 GiB, then find the aligned 8 GiB area that must exist within that, and return the unused (hatched) edges to the OS:


If we allow the Linux kernel to place sandboxes randomly, we end up with a layout like this with gaps. Especially after running for a while, there can be both 8 GiB and 4 GiB gaps between sandboxes:


Sadly, because of our 12 GiB alignment trick, we can’t even make use of the 8 GiB gaps. If we ask the OS for 12 GiB, it will never give us a gap like the 8 GiB gap between the green and blue sandboxes above. In addition, there are a host of other things going on in the virtual address space of a Linux process: the malloc implementation may want to grab pages at particular addresses, the executable and libraries are mapped at a random location by ASLR, and V8 has allocations outside the sandbox.

The latest generation of x64 CPUs supports a much bigger address space, which solves both problems, and Linux kernels are able to make use of the extra bits with five level page tables. A process has to opt into this, which is done by a single mmap call suggesting an address outside the 47 bit area. The reason this needs an opt-in is that some programs can’t cope with such high addresses. Curiously, V8 is one of them.

This isn’t hard to fix in V8, but not all of our fleet has been upgraded yet to have the necessary hardware. So for now, we need a solution that works with the existing hardware. We have modified V8 to be able to grab huge memory areas and then use mprotect syscalls to create tightly packed 8 GiB spaces for sandboxes, bypassing the inflexible mmap API.


Putting it all together

Taking control of the sandbox placement like this actually gives us a security benefit, but first we need to describe a particular threat model.

We assume for the purposes of this threat model that an attacker has an arbitrary way to corrupt data within the sandbox. This is historically the first step in many V8 exploits. So much so that there is a special tier in Google’s V8 bug bounty program where you may assume you have this ability to corrupt memory, and they will pay out if you can leverage that to a more serious exploit.

However, we assume that the attacker does not have the ability to execute arbitrary machine code. If they did, they could disable memory protection keys. Having access to the in-sandbox memory only gives the attacker access to their own data. So the attacker must attempt to escalate, by corrupting data inside the sandbox to access data outside the sandbox.

You will recall that the compressed, sandboxed V8 heap only contains 32 bit offsets. Therefore, no corruption there can reach outside the pointer cage. But there are also arrays in the sandbox — vectors of data with a given size that can be accessed with an index. In our threat model, the attacker can modify the sizes recorded for those arrays and the indexes used to access elements in the arrays. That means an attacker could potentially turn an array in the sandbox into a tool for accessing memory incorrectly. For this reason, the V8 sandbox normally has guard regions around it: These are 32 GiB virtual address ranges that have no virtual-to-physical address mappings. This helps guard against the worst case scenario: Indexing an array where the elements are 8 bytes in size (e.g. an array of double precision floats) using a maximal 32 bit index. Such an access could reach a distance of up to 32 GiB outside the sandbox: 8 times the maximal 32 bit index of four billion.

We want such accesses to trigger an alarm, rather than letting an attacker access nearby memory.  This happens automatically with guard regions, but we don’t have space for conventional 32 GiB guard regions around every sandbox.

Instead of using conventional guard regions, we can make use of memory protection keys. By carefully controlling which isolate group uses which key, we can ensure that no sandbox within 32 GiB has the same protection key. Essentially, the sandboxes are acting as each other’s guard regions, protected by memory protection keys. Now we only need a wasted 32 GiB guard region at the start and end of the huge packed sandbox areas.


With the new sandbox layout, we use strictly rotating memory protection keys. Because we are not using randomly chosen memory protection keys, for this threat model the 92% problem described above disappears. Any in-sandbox security issue is unable to reach a sandbox with the same memory protection key. In the diagram, we show that there is no memory within 32 GiB of a given sandbox that has the same memory protection key. Any attempt to access memory within 32 GiB of a sandbox will trigger an alarm, just like it would with unmapped guard regions.

The future

In a way, this whole blog post is about things our customers don’t need to do. They don’t need to upgrade their server software to get the latest patches, we do that for them. They don’t need to worry whether they are using the most secure or efficient configuration. So there’s no call to action here, except perhaps to sleep easy.

However, if you find work like this interesting, and especially if you have experience with the implementation of V8 or similar language runtimes, then you should consider coming to work for us. We are recruiting both in the US and in Europe. It’s a great place to work, and Cloudflare is going from strength to strength.

Partnering to make full-stack fast: deploy PlanetScale databases directly from Workers

Post Syndicated from Matt Silverlock original https://blog.cloudflare.com/planetscale-postgres-workers/

We’re not burying the lede on this one: you can now connect Cloudflare Workers to your PlanetScale databases directly and ship full-stack applications backed by Postgres or MySQL. 


We’ve teamed up with PlanetScale because we wanted to partner with a database provider that we could confidently recommend to our users: one that shares our obsession with performance, reliability and developer experience. These are all critical factors for any development team building a serious application. 

Now, when connecting to PlanetScale databases, your connections are automatically configured for optimal performance with Hyperdrive, ensuring that you have the fastest access from your Workers to your databases, regardless of where your Workers are running.

Building full-stack

As Workers has matured into a full-stack platform, we’ve introduced more options to facilitate your connectivity to data. With Workers KV, we made it easy to store configuration and cache unstructured data on the edge. With D1 and Durable Objects, we made it possible to build multi-tenant apps with simple, isolated SQL databases. And with Hyperdrive, we made connecting to external databases fast and scalable from Workers.

Today, we’re introducing a new choice for building on Cloudflare: Postgres and MySQL PlanetScale databases, directly accessible from within the Cloudflare dashboard. Link your Cloudflare and PlanetScale accounts, stop manually copying API keys back-and-forth, and connect Workers to any of your PlanetScale databases (production or otherwise!).


Connect to a PlanetScale database — no figuring things out on your own

Postgres and MySQL are the most popular options for building applications, and with good reason. Many large companies have built and scaled on these databases, providing for a robust ecosystem (like Cloudflare!). And you may want to have access to the power, familiarity, and functionality that these databases provide. 

Importantly, all of this builds on Hyperdrive, our distributed connection pooler and query caching infrastructure. Hyperdrive keeps connections to your databases warm to avoid incurring latency penalties for every new request, reduces the CPU load on your database by managing a connection pool, and can cache the results of your most frequent queries, removing load from your database altogether. Given that about 80% of queries for a typical transactional database are read-only, this can be substantial — we’ve observed this in reality!

No more copying credentials around

Starting today, you can connect to your PlanetScale databases from the Cloudflare dashboard in just a few clicks. Connecting is now secure by default with a one-click password rotation option, without needing to copy and manage credentials back and forth. A Hyperdrive configuration will be created for your PlanetScale database, providing you with the optimal setup to start building on Workers.

And the experience spans both Cloudflare and PlanetScale dashboards: you can also create and view attached Hyperdrive configurations for your databases from the PlanetScale dashboard.


By automatically integrating with Hyperdrive, your PlanetScale databases are optimally configured for access from Workers. When you connect your database via Hyperdrive, Hyperdrive’s Placement system automatically determines the location of the database and places its pool of database connections in Cloudflare data centers with the lowest possible latency. 

When one of your Workers connects to your Hyperdrive configuration for your PlanetScale database, Hyperdrive will ensure the fastest access to your database by eliminating the unnecessary roundtrips included in a typical database connection setup. Hyperdrive will resolve connection setup within the Hyperdrive client and use existing connections from the pool to quickly serve your queries. Better yet, Hyperdrive allows you to cache your query results in case you need to scale for high-read workloads. 

This is a peek under the hood of how Hyperdrive makes access to PlanetScale as fast as possible. We’ve previously blogged about Hyperdrive’s technical underpinnings — it’s worth a read. And with this integration with Hyperdrive, you can easily connect to your databases across different Workers applications or environments, without having to reconfigure your credentials. All in all, a perfect match.

Get started with PlanetScale and Workers

With this partnership, we’re making it trivially easy to build on Workers with PlanetScale. Want to build a new application on Workers that connects to your existing PlanetScale cluster? With just a few clicks, you can create a globally deployed app that can query your database, cache your hottest queries, and keep your database connections warmed for fast access from Workers.


Connect directly to your PlanetScale MySQL or Postgres databases from the Cloudflare dashboard, for optimal configuration with Hyperdrive.

To get started, you can:

  • Head to the Cloudflare dashboard and connect your PlanetScale account

  • … or head to PlanetScale and connect your Cloudflare account

  • … and then deploy a Worker

Review the Hyperdrive docs and/or the PlanetScale docs to learn more about how to connect Workers to PlanetScale and start shipping.

Cloudflare’s developer platform keeps getting better, faster, and more powerful. Here’s everything that’s new.

Post Syndicated from Brendan Irvine-Broque original https://blog.cloudflare.com/cloudflare-developer-platform-keeps-getting-better-faster-and-more-powerful/

When you build on Cloudflare, we consider it our job to do the heavy lifting for you. That’s been true since we introduced Cloudflare Workers in 2017, when we first provided a runtime for you where you could just focus on building. 

That commitment is still true today, and many of today’s announcements are focused on just that — removing friction where possible to free you up to build something great. 

There are only so many blog posts we can write (and that you can read)! We have been busy on a much longer list of new improvements, and many of them we’ve been rolling out consistently over the course of the year. Today’s announcement breaks down all the new capabilities in detail, in one single post. The features being released today include:

Alongside that, we’re constantly adding new building blocks, to make sure you have all the tools you need to build what you set out to. Those launches (that also went out today, but require a bit more explanation) include:

AI Search (formerly AutoRAG) — now with More Models To Choose From

AutoRAG is now AI Search! The new name marks a new and bigger mission: to make world-class search infrastructure available to every developer and business. AI Search is no longer just about retrieval for LLM apps: it’s about giving you a fast, flexible index for your content that is ready to power any AI experience. With recent additions like NLWeb support, we are expanding beyond simple retrieval to provide a foundation for top quality search experiences that are open and built for the future of the web.

With AI Search you can now use models from different providers like OpenAI and Anthropic. Last month during AI Week we announced BYO Provider Keys for AI Gateway. That capability now extends to AI Search. By attaching your keys to the AI Gateway linked to your AI Search instance, you can use many more models for both embedding and inference.


Once configured, your AI Search instance will be able to reference models available through your AI Gateway when making a /ai-search request:

export default {
  async fetch(request, env) {
    
    // Query your AI Search instance with a natural language question to an OpenAI model
    const result = await env.AI.autorag("my-ai-search").aiSearch({
      query: "What's new for Cloudflare Birthday Week?",
      model: "openai/gpt-5"
    });

    // Return only the generated answer as plain text
    return new Response(result.response, {
      headers: { "Content-Type": "text/plain" },
    });
  },
};

In the coming weeks we will also roll out updates to align the APIs with the new name. The existing APIs will continue to be supported for the time being. Stay tuned to the AI Search Changelog and Discord for more updates!

Connect to production services and resources from local development with Remote Bindings — now GA

Remote bindings for local development are generally available, supported in Wrangler v4.37.0, the Cloudflare Vite plugin, and the @cloudflare/vitest-pool-workers package. Remote bindings are bindings that are configured to connect to a deployed resource on your Cloudflare account instead of the locally simulated resource. 

For example, here’s how you can instruct Wrangler or Vite to send all requests to env.MY_BUCKET to hit the real, deployed R2 bucket instead of a locally simulated one: 

{
  "name": "my-worker",
  "compatibility_date": "2025-09-25",

  "r2_buckets": [
    {
      "bucket_name": "my-bucket",
      "binding": "MY_BUCKET",
      "remote": true
    },
  ],
}

With the above configuration, all requests to env.MY_BUCKET will be proxied to the remote resource, but the Worker code will still execute locally. This means you get all the benefits of local development like faster execution times – without having to seed local databases with data. 

You can pair remote bindings with environments, so that you can use staging data during local development and leave production data untouched. 

For example, here’s how you could point Wrangler or Vite to send all requests to env.MY_BUCKET to staging-storage-bucket when you run wrangler dev --env staging (CLOUDFLARE_ENV=staging vite dev if using Vite). 

{
  "name": "my-worker",
  "compatibility_date": "2025-09-25",

"env": {
    "staging": {
      "r2_buckets": [
        {
          "binding": "MY_BUCKET",
          "bucket_name": "staging-storage-bucket",
          "remote": true
        }
      ]
    },
    "production": {
      "r2_buckets": [
        {
          "binding": "MY_BUCKET",
          "bucket_name": "production-storage-bucket" 
        }
      ]
    }
  }
}

More Node.js APIs and packages “just work” on Workers

Over the past year, we have been hard at work to make Workers more compatible with Node.js packages and APIs.

Several weeks ago, we shared how node:http and node:https APIs are now supported on Workers. This means that you can run backend Express and Koa.js work with only a few additional lines of code:

import { httpServerHandler } from 'cloudflare:node';
import express from 'express';

const app = express();

app.get('/', (req, res) => {
  res.json({ message: 'Express.js running on Cloudflare Workers!' });
});

app.listen(3000);
export default httpServerHandler({ port: 3000 });

And there’s much, much more. You can now:

  • Read and write temporary files in Workers, using node:fs

  • Do DNS looking using 1.1.1.1 with node:dns

  • Use node:net and node:tls for first class Socket support

  • Use common hashing libraries with node:crypto

  • Access environment variables in a Node-like fashion on process.env

Read our full recap of the last year’s Node.js-related changes for all the details.

With these changes, Workers become even more powerful and easier to adopt, regardless of where you’re coming from. The APIs that you are familiar with are there, and more packages you need will just work.

Larger Container instances, more concurrent instances

Cloudflare Containers now has higher limits on concurrent instances and an upcoming new, larger instance type.

Previously you could run 50 instances of the dev instance type or 25 instances of the basic instance type concurrently. Now you can run concurrent containers with up to 400 GiB of memory, 100 vCPUs, and 2 TB of disk. This allows you to run up to 1000 dev instances or 400 basic instances concurrently. Enterprise customers can push far beyond these limits — contact us if you need more. If you are using Containers to power your app and it goes viral, you’ll have the ability to scale on Cloudflare.

Cloudflare Containers also now has a new instance type coming soon — standard-2 which includes 8 GiB of memory, 1 vCPU, and 12 GB of disk. This new instance type is an ideal default for workloads that need more resources, from AI Sandboxes to data processing jobs.

Workers Builds provides more disk and CPU — and is now GA

Last Birthday Week, we announced the launch of our integrated CI/CD pipeline, Workers Builds, in open beta. We also gave you a detailed look into how we built this system on our Workers platform using Containers, Durable Objects, Hyperdrive, Workers Logs, and Smart Placement.

This year, we are excited to announce that Workers Builds is now Generally Available. Here’s what’s new:

  • Increased disk space for all plans: We’ve increased the disk size from 8 GB to 20 GB for both free and paid plans, giving you more space for your projects and dependencies

  • More compute for paid plans: We’ve doubled the CPU power for paid plans from 2 vCPU to 4 vCPU, making your builds significantly faster

  • Faster single-core and multi-core performance: To ensure consistent, high performance builds, we now run your builds on the fastest available CPUs at the time your build runs

Haven’t used Workers Builds yet? You can try it by connecting a Git repository to an existing Worker, or try it out on a fresh new project by clicking any Deploy to Cloudflare button, like the one below that deploys a blog built with Astro to your Cloudflare account:

A more consistent look and feel for the Cloudflare dashboard

Durable Objects, R2, and Workers now all have a more consistent look with the rest of our developer platform. As you explore these pages you’ll find that things should load faster, feel smoother and are easier to use.

Across storage products, you can now customize the table that lists the resources on your account, choose which data you want to see, sort by any column, and hide columns you don’t need. In the Workers and Pages dashboard, we’ve reduced clutter and have modernized the design to make it faster for you to get the data you need.


And when you create a new Pipeline or a Hyperdrive configuration, you’ll find a new interface that helps you get started and guides you through each step.


This work is ongoing, and we’re excited to continue improving with the help of your feedback, so keep it coming!

Resize, clip and reformat video files on-demand with Media Transformations — now GA

In March 2025 we announced Media Transformations in open beta, which brings the magic of Image transformations to short-form video files — including video files stored outside of Cloudflare. Since then, we have increased input and output limits, and added support for audio-only extraction. Media Transformations is now generally available.

Media Transformations is ideal if you have a large existing volume of short videos, such as generative AI output, e-commerce product videos, social media clips, or short marketing content. Content like this should be fetched from your existing storage like R2 or S3 directly, optimized by Cloudflare quickly, and delivered efficiently as small MP4 files or used to extract still images and audio.

https://example.com/cdn-cgi/media/<OPTIONS>/<SOURCE-VIDEO>

EXAMPLE, RESIZE:



EXAMPLE, STILL THUMBNAIL:
https://example.com/cdn-cgi/media/mode=frame,time=3s,width=120,height=120,fit=cover/https://pub-d9fcbc1abcd244c1821f38b99017347f.r2.dev/aus-mobile.mp4

Media Transformations includes a free tier available to all customers and is included with Media Platform subscriptions. Check out the transform videos documentation for all the latest, then enable transformations for your zone today!

Infrequent Access in R2 is now GA

R2 Infrequent Access is now generally available. Last year, we introduced the Infrequent Access storage class designed for data that doesn’t need to be accessed frequently. It’s a great fit for use cases including long-tail user content, logs, or data backups.

Since launch, Infrequent Access has been proven in production by our customers running these types of workloads at scale. The results confirmed our goal: a storage class that reduces storage costs while maintaining performance and durability.

Pricing is simple. You pay less on data storage, while data retrievals are billed per GB to reflect the additional compute required to serve data from underlying storage optimized for less frequent access. And as with all of R2, there are no egress fees, so you don’t pay for the bandwidth to move data out.

Here’s how you can upload an object to R2 infrequent access class via Workers:

export default {
  async fetch(request, env) {

    // Upload the incoming request body to R2 in Infrequent Access class
    await env.MY_BUCKET.put("my-object", request.body, {
      storageClass: "InfrequentAccess",
    });

    return new Response("Object uploaded to Infrequent Access!", {
      headers: { "Content-Type": "text/plain" },
    });
  },
};

You can also monitor your Infrequent Access vs. Standard storage usage directly in your R2 dashboard for each bucket. Get started with R2 today!

Playwright in Browser Rendering is now GA

We’re excited to announce three updates to Browser Rendering:

  1. Our support for Playwright is now Generally Available, giving developers the stability and confidence to run critical browser tasks.

  2. We’re introducing support for Stagehand, enabling developers to build AI agents using natural language, powered by Cloudflare Workers AI.

  3. Finally, to help developers scale, we are tripling limits for paid plans, with more increases to come. 

The browser is no longer only used by humans. AI agents need to be able to reliably navigate browsers in the same way a human would, whether that’s booking flights, filling in customer info, or scraping structured data. Playwright gives AI agents the ability to interact with web pages and perform complex tasks on behalf of humans. However, running browsers at scale is a significant infrastructure challenge. Cloudflare Browser Rendering solves this by providing headless browsers on-demand. By moving Playwright support to Generally Available, and now synced with the latest version v1.55, customers have a production-ready foundation to build reliable, scalable applications on. 

To help AI agents better navigate the web, we’re introducing support for Stagehand, an open source browser automation framework.  Rather than dictating exact steps or specifying selectors, Stagehand enables developers to build more reliably and flexibly by combining code with natural-language instructions powered by AI. This makes it possible for AI agents to navigate and adapt if a website changes – just like a human would. 

To get started with Playwright and Stagehand, check our changelog with code examples and more.

A year of improving Node.js compatibility in Cloudflare Workers

Post Syndicated from James M Snell original https://blog.cloudflare.com/nodejs-workers-2025/

We’ve been busy.

Compatibility with the broad JavaScript developer ecosystem has always been a key strategic investment for us. We believe in open standards and an open web. We want you to see Workers as a powerful extension of your development platform with the ability to just drop code in that Just Works. To deliver on this goal, the Cloudflare Workers team has spent the past year significantly expanding compatibility with the Node.js ecosystem, enabling hundreds (if not thousands) of popular npm modules to now work seamlessly, including the ever popular express framework.

We have implemented a substantial subset of the Node.js standard library, focusing on the most commonly used, and asked for, APIs. These include:

Each of these has been carefully implemented to approximate Node.js’ behavior as closely as possible where feasible. Where matching Node.js‘ behavior is not possible, our implementations will throw a clear error when called, rather than silently failing or not being present at all. This ensures that packages that check for the presence of these APIs will not break, even if the functionality is not available.

In some cases, we had to implement entirely new capabilities within the runtime in order to provide the necessary functionality. For node:fs, we added a new virtual file system within the Workers environment. In other cases, such as with node:net, node:tls, and node:http, we wrapped the new Node.js APIs around existing Workers capabilities such as the Sockets API and fetch.

Most importantly, all of these implementations are done natively in the Workers runtime, using a combination of TypeScript and C++. Whereas our earlier Node.js compatibility efforts relied heavily on polyfills and shims injected at deployment time by developer tooling such as Wrangler, we are moving towards a model where future Workers will have these APIs available natively, without need for any additional dependencies. This not only improves performance and reduces memory usage, but also ensures that the behavior is as close to Node.js as possible.

The networking stack

Node.js has a rich set of networking APIs that allow applications to create servers, make HTTP requests, work with raw TCP and UDP sockets, send DNS queries, and more. Workers do not have direct access to raw kernel-level sockets though, so how can we support these Node.js APIs so packages still work as intended? We decided to build on top of the existing managed Sockets and fetch APIs. These implementations allow many popular Node.js packages that rely on networking APIs to work seamlessly in the Workers environment.

Let’s start with the HTTP APIs.

HTTP client and server support

From the moment we announced that we would be pursuing Node.js compatibility within Workers, users have been asking specifically for an implementation of the node:http module. There are countless modules in the ecosystem that depend directly on APIs like http.get(...) and http.createServer(...).

The node:http and node:https modules provide APIs for creating HTTP clients and servers. We have implemented both, allowing you to create HTTP clients using http.request() and servers using http.createServer(). The HTTP client implementation is built on top of the Fetch API, while the HTTP server implementation is built on top of the Workers runtime’s existing request handling capabilities.

The client side is fairly straightforward:

import http from 'node:http';

export default {
  async fetch(request) {
    return new Promise((resolve, reject) => {
      const req = http.request('http://example.com', (res) => {
        let data = '';
        res.setEncoding('utf8');
        res.on('data', (chunk) => {
          data += chunk;
        });
        res.on('end', () => {
          resolve(new Response(data));
        });
      });
      req.on('error', (err) => {
        reject(err);
      });
      req.end();
    });
  }
}

The server side is just as simple but likely even more exciting. We’ve often been asked about the possibility of supporting Express, or Koa, or Fastify within Workers, but it was difficult to do because these were so dependent on the Node.js APIs. With the new additions it is now possible to use both Express and Koa within Workers, and we’re hoping to be able to add Fastify support later. 

import { createServer } from "node:http";
import { httpServerHandler } from "cloudflare:node";

const server = createServer((req, res) => {
  res.writeHead(200, { "Content-Type": "text/plain" });
  res.end("Hello from Node.js HTTP server!");
});

export default httpServerHandler(server);

The httpServerHandler() function from the cloudflare:node module integrates the HTTP server with the Workers fetch event, allowing it to handle incoming requests.

The node:dns module

The node:dns module provides an API for performing DNS queries. 

At Cloudflare, we happen to have a DNS-over-HTTPS (DoH) service and our own DNS service called 1.1.1.1. We took advantage of this when exposing node:dns in Workers. When you use this module to perform a query, it will just make a subrequest to 1.1.1.1 to resolve the query. This way the user doesn’t have to think about DNS servers, and the query will just work.

The node:net and node:tls modules

The node:net module provides an API for creating TCP sockets, while the node:tls module provides an API for creating secure TLS sockets. As we mentioned before, both are built on top of the existing Workers Sockets API. Note that not all features of the node:net and node:tls modules are available in Workers. For instance, it is not yet possible to create a TCP server using net.createServer() yet (but maybe soon!), but we have implemented enough of the APIs to allow many popular packages that rely on these modules to work in Workers.

import net from 'node:net';
import tls from 'node:tls';

export default {
  async fetch(request) {
    const { promise, resolve } = Promise.withResolvers();
    const socket = net.connect({ host: 'example.com', port: 80 },
        () => {
      let buf = '';
      socket.setEncoding('utf8')
      socket.on('data', (chunk) => buf += chunk);
      socket.on('end', () => resolve(new Response('ok'));
      socket.end();
    });
    return promise;
  }
}

A new virtual file system and the node:fs module

What does supporting filesystem APIs mean in a serverless environment? When you deploy a Worker, it runs in Region:Earth and we don’t want you needing to think about individual servers with individual file systems. There are, however, countless existing applications and modules in the ecosystem that leverage the file system to store configuration data, read and write temporary data, and more.

Workers do not have access to a traditional file system like a Node.js process does, and for good reason! A Worker does not run on a single machine; a single request to one worker can run on any one of thousands of servers anywhere in Cloudflare’s global network. Coordinating and synchronizing access to shared physical resources such as a traditional file system harbor major technical challenges and risks of deadlocks and more; challenges that are inherent in any massively distributed system. Fortunately, Workers provide powerful tools like Durable Objects that provide a solution for coordinating access to shared, durable state at scale. To address the need for a file system in Workers, we built on what already makes Workers great.

We implemented a virtual file system that allows you to use the node:fs APIs to read and write temporary, in-memory files. This virtual file system is specific to each Worker. When using a stateless worker, files created in one request are not accessible in any other request. However, when using a Durable Object, this temporary file space can be shared across multiple requests from multiple users. This file system is ephemeral (for now), meaning that files are not persisted across Worker restarts or deployments, so it does not replace the use of the Durable Object Storage mechanism, but it provides a powerful new tool that greatly expands the capabilities of your Durable Objects.

The node:fs module provides a rich set of APIs for working with files and directories:

import fs from 'node:fs';

export default {
  async fetch(request) {
    // Write a temporary file
    await fs.promises.writeFile('/tmp/hello.txt', 'Hello, world!');

    // Read the file
    const data = await fs.promises.readFile('/tmp/hello.txt', 'utf-8');

    return new Response(`File contents: ${data}`);
  }
}

The virtual file system supports a wide range of file operations, including reading and writing files, creating and removing directories, and working with file descriptors. It also supports standard input/output/error streams via process.stdin, process.stdout, and process.stderr, symbolic links, streams, and more.

While the current implementation of the virtual file system is in-memory only, we are exploring options for adding persistent storage in the future that would link to existing Cloudflare storage solutions like R2 or Durable Objects. But you don’t have to wait on us! When combined with powerful tools like Durable Objects and JavaScript RPC, it’s certainly possible to create your own general purpose, durable file system abstraction backed by sqlite storage.

Cryptography with node:crypto

The node:crypto module provides a comprehensive set of cryptographic functionality, including hashing, encryption, decryption, and more. We have implemented a full version of the node:crypto module, allowing you to use familiar cryptographic APIs in your Workers applications. There will be some difference in behavior compared to Node.js due to the fact that Workers uses BoringSSL under the hood, while Node.js uses OpenSSL. However, we have strived to make the APIs as compatible as possible, and many popular packages that rely on node:crypto now work seamlessly in Workers.

To accomplish this, we didn’t just copy the implementation of these cryptographic operations from Node.js. Rather, we worked within the Node.js project to extract the core crypto functionality out into a separate dependency project called ncrypto that is used – not only by Workers but Bun as well – to implement Node.js compatible functionality by simply running the exact same code that Node.js is running.

import crypto from 'node:crypto';

export default {
  async fetch(request) {
    const hash = crypto.createHash('sha256');
    hash.update('Hello, world!');
    const digest = hash.digest('hex');

    return new Response(`SHA-256 hash: ${digest}`);
  }
}

All major capabilities of the node:crypto module are supported, including:

  • Hashing (e.g., SHA-256, SHA-512)

  • HMAC

  • Symmetric encryption/decryption

  • Asymmetric encryption/decryption

  • Digital signatures

  • Key generation and management

  • Random byte generation

  • Key derivation functions (e.g., PBKDF2, scrypt)

  • Cipher and Decipher streams

  • Sign and Verify streams

  • KeyObject class for managing keys

  • Certificate handling (e.g., X.509 certificates)

  • Support for various encoding formats (e.g., PEM, DER, base64)

  • and more…

Process & Environment

In Node.js, the node:process module provides a global object that gives information about, and control over, the current Node.js process. It includes properties and methods for accessing environment variables, command-line arguments, the current working directory, and more. It is one of the most fundamental modules in Node.js, and many packages rely on it for basic functionality and simply assume its presence. There are, however, some aspects of the node:process module that do not make sense in the Workers environment, such as process IDs and user/group IDs which are tied to the operating system and process model of a traditional server environment and have no equivalent in the Workers environment.

When nodejs_compat is enabled, the process global will be available in your Worker scripts or you can import it directly via import process from 'node:process'. Note that the process global is only available when the nodejs_compat flag is enabled. If you try to access process without the flag, it will be undefined and the import will throw an error.

Let’s take a look at the process APIs that do make sense in Workers, and that have been fully implemented, starting with process.env.

Environment variables

Workers have had support for environment variables for a while now, but previously they were only accessible via the env argument passed to the Worker function. Accessing the environment at the top-level of a Worker was not possible:

export default {
  async fetch(request, env) {
    const config = env.MY_ENVIRONMENT_VARIABLE;
    // ...
  }
}

With the new process.env implementation, you can now access environment variables in a more familiar way, just like in Node.js, and at any scope, including the top-level of your Worker:

import process from 'node:process';
const config = process.env.MY_ENVIRONMENT_VARIABLE;

export default {
  async fetch(request, env) {
    // You can still access env here if you need to
    const configFromEnv = env.MY_ENVIRONMENT_VARIABLE;
    // ...
  }
}

Environment variables are set in the same way as before, via the wrangler.toml or wrangler.jsonc configuration file, or via the Cloudflare dashboard or API. They may be set as simple key-value pairs or as JSON objects:

{
  "name": "my-worker-dev",
  "main": "src/index.js",
  "compatibility_date": "2025-09-15",
  "compatibility_flags": [
    "nodejs_compat"
  ],
  "vars": {
    "API_HOST": "example.com",
    "API_ACCOUNT_ID": "example_user",
    "SERVICE_X_DATA": {
      "URL": "service-x-api.dev.example",
      "MY_ID": 123
    }
  }
}

When accessed via process.env, all environment variable values are strings, just like in Node.js.

Because process.env is accessible at the global scope, it is important to note that environment variables are accessible from anywhere in your Worker script, including third-party libraries that you may be using. This is consistent with Node.js behavior, but it is something to be aware of from a security and configuration management perspective. The Cloudflare Secrets Store can provide enhanced handling around secrets within Workers as an alternative to using environment variables.

Importable environment and waitUntil

When not using the nodejs_compat flag, we decided to go a step further and make it possible to import both the environment, and the waitUntil mechanism, as a module, rather than forcing users to always access it via the env and ctx arguments passed to the Worker function. This can make it easier to access the environment in a more modular way, and can help to avoid passing the env argument through multiple layers of function calls. This is not a Node.js-compatibility feature, but we believe it is a useful addition to the Workers environment:

import { env, waitUntil } from 'cloudflare:workers';

const config = env.MY_ENVIRONMENT_VARIABLE;

export default {
  async fetch(request) {
    // You can still access env here if you need to
    const configFromEnv = env.MY_ENVIRONMENT_VARIABLE;
    // ...
  }
}

function doSomething() {
  // Bindings and waitUntil can now be accessed without
  // passing the env and ctx through every function call.
  waitUntil(env.RPC.doSomethingRemote());
}

One important note about process.env: changes to environment variables via process.env will not be reflected in the env argument passed to the Worker function, and vice versa. The process.env is populated at the start of the Worker execution and is not updated dynamically. This is consistent with Node.js behavior, where changes to process.env do not affect the actual environment variables of the running process. We did this to minimize the risk that a third-party library, originally meant to run in Node.js, could inadvertently modify the environment assumed by the rest of the Worker code.

Stdin, stdout, stderr

Workers do not have a traditional standard input/output/error streams like a Node.js process does. However, we have implemented process.stdin, process.stdout, and process.stderr as stream-like objects that can be used similarly. These streams are not connected to any actual process stdin and stdout, but they can be used to capture output that is written to the logs captured by the Worker in the same way as console.log and friends, just like them, they will show up in Workers Logs.

The process.stdout and process.stderr are Node.js writable streams:

import process from 'node:process';

export default {
  async fetch(request) {
    process.stdout.write('This will appear in the Worker logs\n');
    process.stderr.write('This will also appear in the Worker logs\n');
    return new Response('Hello, world!');
  }
}

Support for stdin, stdout, and stderr is also integrated with the virtual file system, allowing you to write to the standard file descriptors 0, 1, and 2 (representing stdin, stdout, and stderr respectively) using the node:fs APIs:

import fs from 'node:fs';
import process from 'node:process';

export default {
  async fetch(request) {
    // Write to stdout
    fs.writeSync(process.stdout.fd, 'Hello, stdout!\n');
    // Write to stderr
    fs.writeSync(process.stderr.fd, 'Hello, stderr!\n');

    return new Response('Check the logs for stdout and stderr output!');
  }
}

Other process APIs

We cannot cover every node:process API in detail here, but here are some of the other notable APIs that we have implemented:

  • process.nextTick(fn): Schedules a callback to be invoked after the current execution context completes. Our implementation uses the same microtask queue as promises so that it behaves exactly the same as queueMicrotask(fn).

  • process.cwd() and process.chdir(): Get and change the current virtual working directory. The current working directory is initialized to /bundle when the Worker starts, and every request has its own isolated view of the current working directory. Changing the working directory in one request does not affect the working directory in other requests.

  • process.exit(): Immediately terminates the current Worker request execution. This is unlike Node.js where process.exit() terminates the entire process. In Workers, calling process.exit() will stop execution of the current request and return an error response to the client.

Compression with node:zlib

The node:zlib module provides APIs for compressing and decompressing data using various algorithms such as gzip, deflate, and brotli. We have implemented the node:zlib module, allowing you to use familiar compression APIs in your Workers applications. This enables a wide range of use cases, including data compression for network transmission, response optimization, and archive handling.

import zlib from 'node:zlib';

export default {
  async fetch(request) {
    const input = 'Hello, world! Hello, world! Hello, world!';
    const compressed = zlib.gzipSync(input);
    const decompressed = zlib.gunzipSync(compressed).toString('utf-8');

    return new Response(`Decompressed data: ${decompressed}`);
  }
}

While Workers has had built-in support for gzip and deflate compression via the Web Platform Standard Compression API, the node:zlib module support brings additional support for the Brotli compression algorithm, as well as a more familiar API for Node.js developers.

Timing & scheduling

Node.js provides a set of timing and scheduling APIs via the node:timers module. We have implemented these in the runtime as well.

import timers from 'node:timers';

export default {
  async fetch(request) {
    timers.setInterval(() => {
      console.log('This will log every half-second');
    }, 500);

    timers.setImmediate(() => {
      console.log('This will log immediately after the current event loop');
    });

    return new Promise((resolve) => {
      timers.setTimeout(() => {
        resolve(new Response('Hello after 1 second!'));
      }, 1000);
    });
  }
}

The Node.js implementations of the timers APIs are very similar to the standard Web Platform with one key difference: the Node.js timers APIs return Timeout objects that can be used to manage the timers after they have been created. We have implemented the Timeout class in Workers to provide this functionality, allowing you to clear or re-fire timers as needed.

Console

The node:console module provides a set of console logging APIs that are similar to the standard console global, but with some additional features. We have implemented the node:console module as a thin wrapper around the existing globalThis.console that is already available in Workers.

How to enable the Node.js compatibility features

To enable the Node.js compatibility features as a whole within your Workers, you can set the nodejs_compat compatibility flag in your wrangler.jsonc or wrangler.toml configuration file. If you are not using Wrangler, you can also set the flag via the Cloudflare dashboard or API:

{
  "name": "my-worker",
  "main": "src/index.js",
  "compatibility_date": "2025-09-21",
  "compatibility_flags": [
    // Get everything Node.js compatibility related
    "nodejs_compat",
  ]
}

The compatibility date here is key! Update that to the most current date, and you’ll always be able to take advantage of the latest and greatest features.

The nodejs_compat flag is an umbrella flag that enables all the Node.js compatibility features at once. This is the recommended way to enable Node.js compatibility, as it ensures that all features are available and work together seamlessly. However, if you prefer, you can also enable or disable some features individually via their own compatibility flags:

Module Enable Flag (default) Disable Flag
node:console enable_nodejs_console_module disable_nodejs_console_module
node:fs enable_nodejs_fs_module disable_nodejs_fs_module
node:http (client) enable_nodejs_http_modules disable_nodejs_http_modules
node:http (server) enable_nodejs_http_server_modules disable_nodejs_http_server_modules
node:os enable_nodejs_os_module disable_nodejs_os_module
node:process enable_nodejs_process_v2
node:zlib nodejs_zlib no_nodejs_zlib
process.env nodejs_compat_populate_process_env nodejs_compat_do_not_populate_process_env

By separating these features, you can have more granular control over which Node.js APIs are available in your Workers. At first, we had started rolling out these features under the one nodejs_compat flag, but we quickly realized that some users perform feature detection based on the presence of certain modules and APIs and that by enabling everything all at once we were risking breaking some existing Workers. Users who are checking for the existence of these APIs manually can ensure new changes don’t break their workers by opting out of specific APIs:

{
  "name": "my-worker",
  "main": "src/index.js",
  "compatibility_date": "2025-09-15",
  "compatibility_flags": [
    // Get everything Node.js compatibility related
    "nodejs_compat",
    // But disable the `node:zlib` module if necessary
    "no_nodejs_zlib",
  ]
}

But, to keep things simple, we recommend starting with the nodejs_compat flag, which will enable everything. You can always disable individual features later if needed. There is no performance penalty to having the additional features enabled.

Handling end-of-life’d APIs

One important difference between Node.js and Workers is that Node.js has a defined long term support (LTS) schedule that allows it to make breaking changes at certain points in time. More specifically, Node.js can remove APIs and features when they reach end-of-life (EOL). On Workers, however, we have a rule that once a Worker is deployed, it will continue to run as-is indefinitely, without any breaking changes as long as the compatibility date does not change. This means that we cannot simply remove APIs when they reach EOL in Node.js, since this would break existing Workers. To address this, we have introduced a new set of compatibility flags that allow users to specify that they do not want the nodejs_compat features to include end-of-life APIs. These flags are based on the Node.js major version in which the APIs were removed:

The remove_nodejs_compat_eol flag will remove all APIs that have reached EOL up to your current compatibility date:

{
  "name": "my-worker",
  "main": "src/index.js",
  "compatibility_date": "2025-09-15",
  "compatibility_flags": [
    // Get everything Node.js compatibility related
    "nodejs_compat",
    // Remove Node.js APIs that have reached EOL up to your
    // current compatibility date
    "remove_nodejs_compat_eol",
  ]
}
  • The remove_nodejs_compat_eol_v22 flag will remove all APIs that reached EOL in Node.js v22. When using removenodejs_compat_eol, this flag will be automatically enabled if your compatibility date is set to a date after Node.js v22’s EOL date (April 30, 2027).

  • The remove_nodejs_compat_eol_v23 flag will remove all APIs that reached EOL in Node.js v23. When using removenodejs_compat_eol, this flag will be automatically enabled if your compatibility date is set to a date after Node.js v24’s EOL date (April 30, 2028).

  • The remove_nodejs_compat_eol_v24 flag will remove all APIs that reached EOL in Node.js v24. When using removenodejs_compat_eol, this flag will be automatically enabled if your compatibility date is set to a date after Node.js v24’s EOL date (April 30, 2028).

If you look at the date for remove_nodejs_compat_eol_v23 you’ll notice that it is the same as the date for remove_nodejs_compat_eol_v24. That is not a typo! Node.js v23 is not an LTS release, and as such it has a very short support window. It was released in October 2023 and reached EOL in May 2024. Accordingly, we have decided to group the end-of-life handling of non-LTS releases into the next LTS release. This means that when you set your compatibility date to a date after the EOL date for Node.js v24, you will also be opting out of the APIs that reached EOL in Node.js v23. Importantly, these flags will not be automatically enabled until your compatibility date is set to a date after the relevant Node.js version’s EOL date, ensuring that existing Workers will have plenty of time to migrate before any APIs are removed, or can choose to just simply keep using the older APIs indefinitely by using the reverse compatibility flags like add_nodejs_compat_eol_v24.

Giving back

One other important bit of work that we have been doing is expanding Cloudflare’s investment back into the Node.js ecosystem as a whole. There are now five members of the Workers runtime team (plus one summer intern) that are actively contributing to the Node.js project on GitHub, two of which are members of Node.js’ Technical Steering Committee. While we have made a number of new feature contributions such as an implementation of the Web Platform Standard URLPattern API and improved implementation of crypto operations, our primary focus has been on improving the ability for other runtimes to interoperate and be compatible with Node.js, fixing critical bugs, and improving performance. As we continue to grow our efforts around Node.js compatibility we will also grow our contributions back to the project and ecosystem as a whole.

Aaron Snell 2025 Summer Intern, Cloudflare Containers
Node.js Web Infrastructure Team
flakey5
Dario Piotrowicz Senior System Engineer
Node.js Collaborator
dario-piotrowicz
Guy Bedford Principal Systems Engineer
Node.js Collaborator
guybedford
James Snell Principal Systems Engineer
Node.js TSC
jasnell
Nicholas Paun Systems Engineer
Node.js Contributor
npaun
Yagiz Nizipli Principal Systems Engineer
Node.js TSC
anonrig

Cloudflare is also proud to continue supporting critical infrastructure for the Node.js project through its ongoing strategic partnership with the OpenJS Foundation, providing free access to the project to services such as Workers, R2, DNS, and more.

Give it a try!

Our vision for Node.js compatibility in Workers is not just about implementing individual APIs, but about creating a comprehensive platform that allows developers to run existing Node.js code seamlessly in the Workers environment. This involves not only implementing the APIs themselves, but also ensuring that they work together harmoniously, and that they integrate well with the unique aspects of the Workers platform.

In some cases, such as with node:fs and node:crypto, we have had to implement entirely new capabilities that were not previously available in Workers and did so at the native runtime level. This allows us to tailor the implementations to the unique aspects of the Workers environment and ensure both performance and security.

And we’re not done yet. We are continuing to work on implementing additional Node.js APIs, as well as improving the performance and compatibility of the existing implementations. We are also actively engaging with the community to understand their needs and priorities, and to gather feedback on our implementations. If there are specific Node.js APIs or npm packages that you would like to see supported in Workers, please let us know! If there are any issues or bugs you encounter, please report them on our GitHub repository. While we might not be able to implement every single Node.js API, nor match Node.js’ behavior exactly in every case, we are committed to providing a robust and comprehensive Node.js compatibility layer that meets the needs of the community.

All the Node.js compatibility features described in this post are available now. To get started, simply enable the nodejs_compat compatibility flag in your wrangler.toml or wrangler.jsonc file, or via the Cloudflare dashboard or API. You can then start using the Node.js APIs in your Workers applications right away.

Deploy your own AI vibe coding platform — in one click!

Post Syndicated from Ashish Kumar Singh original https://blog.cloudflare.com/deploy-your-own-ai-vibe-coding-platform/

It’s an exciting time to build applications. With the recent AI-powered “vibe coding” boom, anyone can build a website or application by simply describing what they want in a few sentences. We’re already seeing organizations expose this functionality to both their users and internal employees, empowering anyone to build out what they need.


Today, we’re excited to open-source an AI vibe coding platform, VibeSDK, to enable anyone to run an entire vibe coding platform themselves, end-to-end, with just one click.

Want to see it for yourself? Check out our demo platform that you can use to create and deploy applications. Or better yet, click the button below to deploy your own AI-powered platform, and dive into the repo to learn about how it’s built.

Deploying VibeSDK sets up everything you need to run your own AI-powered development platform:

  • Integration with LLM models to generate code, build applications, debug errors, and iterate in real-time, powered by Agents SDK

  • Isolated development environments that allow users to safely build and preview their applications in secure sandboxes.

  • Infinite scale that allows you to deploy thousands or even millions of applications that end users deploy, all served on Cloudflare’s global network

  • Observability and caching across multiple AI providers, giving you insight into costs and performance with built-in caching for popular responses. 

  • Project templates that the LLM can use as a starting point to build common applications and speed up development.

  • One-click project export to the user’s Cloudflare account or GitHub repo, so users can take their code and continue development on their own.


Building an AI vibe coding platform from start to finish

Step 0: Get started immediately with VibeSDK

We’re seeing companies build their own AI vibe coding platforms to enable both internal and external users. With a vibe coding platform, internal teams like marketing, product, and support can build their own landing pages, prototypes, or internal tools without having to rely on the engineering team. Similarly, SaaS companies can embed this capability into their product to allow users to build their own customizations. 

Every platform has unique requirements and specializations. By building your own, you can write custom logic to prompt LLMs for your specific needs, giving your users more relevant results. This also grants you complete control over the development environment and application hosting, giving you a secure platform that keeps your data private and within your control. 

We wanted to make it easy for anyone to build this themselves, which is why we built a complete platform that comes with project templates, previews, and project deployment. Developers can repurpose the whole platform, or simply take the components they need and customize them to fit their needs.

Step 1: Finding a safe, isolated environment for running untrusted, AI generated code

AI can now build entire applications, but there’s a catch: you need somewhere safe to run this untrusted, AI-generated code. Imagine if an LLM writes an application that needs to install packages, run build commands, and start a development server — you can’t just run this directly on your infrastructure where it might affect other users or systems.

With Cloudflare Sandboxes, you don’t have to worry about this. Every user gets their own isolated environment where the AI-generated code can do anything a normal development environment can do: install npm packages, run builds, start servers, but it’s fully contained in a secure, container-based environment that can’t affect anything outside its sandbox. 


The platform assigns each user to their own sandbox based on their session, so that if a user comes back, they can continue to access the same container with their files intact:

// Creating a sandbox client for a user session
const sandbox = getSandbox(env.Sandbox, sandboxId);

// Now AI can safely write and execute code in this isolated environment
await sandbox.writeFile('app.js', aiGeneratedCode);
await sandbox.exec('npm install express');
await sandbox.exec('node app.js');

Step 2: Generating the code

Once the sandbox is created, you have a development environment that can bring the code to life. VibeSDK orchestrates the whole workflow from writing the code, installing the necessary packages, and starting the development server. If you ask it to build a to-do app, it will generate the React application, write the component files, run bun install to get the dependencies, and start the server, so you can see the end result. 

Once the user submits their request, the AI will generate all the necessary files, whether it’s a React app, Node.js API, or full-stack application, and write them directly to the sandbox:

async function generateAndWriteCode(instanceId: string) {
    // AI generates the application structure
    const aiGeneratedFiles = await callAIModel("Create a React todo app");
    
    // Write all generated files to the sandbox
    for (const file of aiGeneratedFiles) {
        await sandbox.writeFile(
            `${instanceId}/${file.path}`,
            file.content
        );
        // User sees: "✓ Created src/App.tsx"
        notifyUser(`✓ Created ${file.path}`);
    }
}

To speed this up even more, we’ve provided a set of templates, stored in an R2 bucket, that the platform can use and quickly customize, instead of generating every file from scratch. This is just an initial set, but you can expand it and add more examples. 

Step 3: Getting a preview of your deployment

Once everything is ready, the platform starts the development server and uses the Sandbox SDK to expose it to the internet with a public preview URL which allows users to instantly see their AI-generated application running live:

// Start the development server in the sandbox
const processId = await sandbox.startProcess(
    `bun run dev`, 
    { cwd: instanceId }
);

// Create a public preview URL 
const preview = await sandbox.exposePort(3000, { 
    hostname: 'preview.example.com' 
});

// User instantly gets: "https://my-app-xyz.preview.example.com"
notifyUser(`✓ Preview ready at: ${preview.url}`);

Step 4: Test, log, fix, repeat

But that’s not all! Throughout this process, the platform will capture console output, build logs, and error messages and feed them back to the LLM for automatic fixes. As the platform makes any updates or fixes, the user can see it all happening live — the file editing, installation progress, and error resolution. 

Deploying applications: From Sandbox to Region Earth

Once the application is developed, it needs to be deployed. The platform packages everything in the sandbox and then uses a separate specialized “deployment sandbox” to deploy the application to Cloudflare Workers. This deployment sandbox runs wrangler deploy inside the secure environment to publish the application to Cloudflare’s global network. 

Since the platform may deploy up to thousands or millions of applications, Workers for Platforms is used to deploy the Workers at scale. Although all the Workers are deployed to the same Namespace, they are all isolated from one another by default, ensuring there’s no cross-tenant access. Once deployed, each application receives its own isolated Worker instance with a unique public URL like my-app.vibe-build.example.com

async function deployToWorkersForPlatforms(instanceId: string) {
    // 1. Package the app from development sandbox
    const devSandbox = getSandbox(env.Sandbox, instanceId);
    const packagedApp = await devSandbox.exec('zip -r app.zip .');
    
    // 2. Transfer to specialized deployment sandbox
    const deploymentSandbox = getSandbox(env.DeployerServiceObject, 'deployer');
    await deploymentSandbox.writeFile('app.zip', packagedApp);
    await deploymentSandbox.exec('unzip app.zip');
    
    // 3. Deploy using Workers for Platforms dispatch namespace
    const deployResult = await deploymentSandbox.exec(`
        bunx wrangler deploy \\\\
        --dispatch-namespace vibe-sdk-build-default-namespace
    `);
    
    // Each app gets its own isolated Worker and unique URL
    // e.g., https://my-app.example.com
    return `https://${instanceId}.example.com`;
}

Exportable Applications 

The platform also allows users to export their application to their own Cloudflare account and GitHub repo, so they can continue the development on their own. 


Observability, caching, and multi-model support built in! 

It’s no secret that LLM models have their specialties, which means that when building an AI-powered platform, you may end up using a few different models for different operations. By default, VibeSDK leverages Google’s Gemini models (gemini-2.5-pro, gemini-2.5-flash-lite, gemini-2.5-flash) for project planning, code generation, and debugging. 

VibeSDK is automatically set up with AI Gateway, so that by default, the platform is able to:

  • Use a unified access point to route requests across LLM providers, allowing you to use models from a range of providers (OpenAI, Anthropic, Google, and others)

  • Cache popular responses, so when someone asks to “build a to-do list app”, the gateway can serve a cached response instead of going to the provider (saving inference costs)

  • Get observability into the requests, tokens used, and response times across all providers in one place

  • Track costs across models and integrations

Open sourced, so you can build your own Platform! 

We’re open-sourcing VibeSDK for the same reason Cloudflare open-sourced the Workers runtime — we believe the best development happens in the open. That’s why we wanted to make it as easy as possible for anyone to build their own AI coding platform, whether it’s for internal company use, for your website builder, or for the next big vibe coding platform. We tied all the pieces together for you, so you can get started with the click of a button instead of spending months figuring out how to connect everything yourself. To learn more, check out our reference architecture for vibe coding platforms. 


A Lookback at Workers Launchpad and a Warm Welcome to Cohort #6

Post Syndicated from Christopher Rotas original https://blog.cloudflare.com/workers-launchpad-006/

Imagine you have an idea for an AI application that you’re really excited about — but the cost of GPU time and complex infrastructure stops you in your tracks before you even write a line of code. This is the problem founders everywhere face: balancing high infrastructure costs with the need to innovate and scale quickly.

Our startup programs remove those barriers, so founders can focus on what matters the most: building products, finding customers, and growing a business. Cloudflare for Startups launched in 2018 to provide enterprise-level application security and performance services to growing startups. As we built out our Developer Platform, we pivoted last year to offer founders up to $250,000 in cloud credits to build on our Developer Platform for up to one year.

During Birthday Week 2022, we announced our Cloudflare Workers Launchpad Program with an initial $1.25 billion in potential funding for startups building on Cloudflare Workers, made possible through partnerships with 26 leading venture capital (VC) firms. Within months, we expanded VC-backed funding to $2 billion.

Since 2022, we’ve welcomed 145 startups from 23 countries. These startups are solving problems across verticals such as AI and machine learning, developer tools, 3D design, cloud infrastructure, data tools, ad tech, media, logistics, finance, and other industries. We’re especially proud of the female founder representation in recent cohorts — with nearly a third of companies in Cohort #5 run by a female founder. 

Participants engaged in bootcamp sessions with Cloudflare leadership and product teams, covering key topics like product pricing and scaling sales. Startups received hands-on design support from our Solutions Architecture team, empowering these builders to build and scale their full-stack applications on the Cloudflare network. We facilitate countless introductions across the VC network, and are happy to see funding and M&A activity as these startups scale. Cloudflare also identified direct opportunities and acquired Nefeli Networks (Cohort #2) and Outerbase (Cohort #4).

Check out what Launchpad alumni have to say about their experience in the program:

Langbase (Cohort #3)
Ship hyper-personalized AI apps to any LLM, any data, any developer in seconds


“For Langbase, the best part about Workers Launchpad was the incredible support from Cloudflare’s internal teams. It wasn’t just about access to infrastructure; it was the hands-on migration help, rapid feedback loops, and genuine partnership from engineers, product folks, and the broader Cloudflare community. That human support empowered us to iterate faster, solve hard problems, and truly feel like we were building something impactful together. 

Langbase has quickly become one of the most powerful serverless AI clouds for building and deploying AI agents. We process 700 TB of agent memory and 1.2 billion AI agent runs a month. Langbase is an agent lab, and we’ve also launched a coding agent called Command.new, an “agent of agents” that can take your prompts and turn them into production-ready agents by provisioning infrastructure and writing the agent’s code in TypeScript.

My advice for anyone joining future Workers Launchpad cohorts is to use every resource offered. Engage deeply with the Cloudflare teams, ask for feedback early and often, and be open to sharing your challenges and wins, especially in the Discord community, which is super helpful. Cloudflare listens closely to participant feedback and genuinely wants to help startups succeed. Treat it as a two-way conversation and a collaborative growth opportunity. This mindset is what unlocks the real power of the program.”

-Ahmad Awais, Founder & CEO of Langbase

Sherpo.io (Cohort #4)
AI-first no-code platform to build and sell digital content


“Since joining Cohort #4, we’ve exited closed beta and expanded our product suite for content creators. Today, more than 3,000 creators worldwide power their digital product stores with Sherpo, while we continue building and scaling.

We learned as much from fellow startups as from Cloudflare during office hours and sessions, and we got to meet incredible people along the way, including Cloudflare’s CSO, Stephanie Cohen.

For anyone joining, attend every session, listen closely, and ask questions—they’re incredibly valuable. Building on Workers has given us a real advantage, and the team’s pace of innovation only compounds it.”

-Giacomo Di Pinto, Co-Founder & CEO of Sherpo.io

Tightknit AI (Cohort #4)
Embedded community engagement platform built for SaaS


“Beyond the cloud credits that Launchpad provided us to play with every Cloudflare product, the most important aspect of the program we found was our ability to access (and even contribute) to the product roadmap. We were able to connect with product managers and solutions architects that have helped us take our work to the next level.

We’ve recently passed half a million users on the platform and have started to close not just the top Saas businesses in the world, but the top AI companies in the world, including Clay, Gamma, Lindy, beehiiv, Amplitude, Mixpanel, and so many more. The best part is that 100% of application logic is still powered by Cloudflare!

The biggest piece of advice for anyone starting the cohort is attend office hours as much as you can. I can’t tell you how many times we were able to unblock ourselves or even provide real product feedback/bug reports. It was amazing to meet the rest of the cohort and solve problems together that ordinary Cloudflare developers just do not face. So my advice is don’t miss the office hours. They were by far the most valuable part of our experience.”

-Zach Hawtof, Co-Founder & CEO of Tightknit.ai

Render Better (Cohort #4)
Increase e-commerce revenue by automatically optimizing your site speed


“My favorite part of the Launchpad was the community and the leaders who brought us together. The startup and product teams provided expert advice on both business and technical questions through meetings, 1-on-1s, and Discord. Many of them were former founders, so they understood what we were going through and helped us get what we needed. They were crucial in helping us get unstuck, whether we were using obscure Cloudflare features or needed connections to the right people.

I met a lot of great founders who are on the same journey and face the same struggles. Watching them grow was motivating and gave us a morale boost to keep up the fast pace a startup needs.

Since Launchpad, Render Better has scaled to 60 automated site speed optimizations, helping e-commerce sites convert 20% higher powered by Cloudflare Workers. Our growth accelerated after the program, and we’re now optimizing traffic for some of the biggest e-commerce brands like PSD, Polywood, and Self-Portrait. Render Better now processes 5 billion requests each month, made possible by Cloudflare’s global edge network and Workers platform.

Launchpad is truly just that: Cloudflare gives you the resources and attention to help you grow from an idea into something big. Build fast and take as much advantage of the fuel they give you to fly your startup rocket!”

-James Koshigoe, Founder & CEO of Render Better

Launchpad is growing into more than just a program. It is a community of builders and innovators showing what is possible with Cloudflare’s network behind them. With that foundation, we are excited to introduce the next group of entrepreneurs taking the stage in Cohort #6.

Introducing Cohort #6

Before introducing Cohort #6, we want to give one last shout out to Cohort #5. As Launchpad alumni, we cannot wait to see what you achieve. If you didn’t get a chance to check out Cohort #5’s demo day, watch  the recording here.

With that, help us give a warm welcome to the participants of Workers Launchpad Cohort #6:


We’re excited to see what Cohort #6 accomplishes. Follow @CloudflareDev on X and join our Developer Discord to stay updated on their progress. If you’re a startup interested in joining Workers Launchpad, applications for Cohort #7 are now open.

Company

About

Allegory

AI-Powered platform connecting impact to funding

Apgio

Mobile app localization platform with smart AI translations and workflow tooling

Atlas

Building the operating system for restaurants

Bloctave

Configurable rights management platform with instantaneous royalty distribution

Byte

AI code auditor that translates codebases into natural language

Calljmp

Agentic AI backend for apps

Centian

MCP-powered AI Agent middleware for successful and compliant operations

Divinci AI

Release management and quality assurance for custom LLMs

DXOS

An extensible open-core super-app designed to be your team’s brain

Fidsy

AI-native, code-free orchestration platform providing automated data privacy for all AI & Data workflows

Fluentos

Create popups your customers won’t hate

Framebird

Media sharing solution for creatives featuring modern galleries and client review tools

GoPersonal

AI to build, personalize, and manage your ecommerce business

Horizon

Short-form & agentic experiences for apps and websites

Kenobi

Personalizing the Internet with custom web experiences

MonetizationOS

Intelligent decisioning at the edge, for monetising the human and machine web

Natively.dev

Build your dream mobile app using AI, enabling users to take directly to App Stores

Outhire

AI agent automating phone screens without bias

Outsession

Privacy-first AI tools that preserve the therapeutic relationship

Phleid

Direct-to-wallet mobile passes and notifications platform

PlaySafe (By Doge Labs)

Makes voice-chat communities safer by detecting and blocking harassment in real time

Ploton

Help small business businesses grow by building workflows through natural conversation, not complex tools

Project Karna

Multimodal, continuous identity for the post-GenAI enterprise

Schematic

Simplify monetization for GTM teams, allowing them to control pricing, packaging, and entitlements without code changes

SonicLinker

Turn AI-agent visits into revenue

SuiteOp

All-in-one platform to streamline hospitality operations and guest services

SuprSend

Multi-channel notification engine for product and platform teams

Yara AI

Ethical, memory-rich AI for mental health at scale

Zephyr Cloud

Fastest way to go from idea to production

Zero Email

AI native email client that manages your inbox so you don’t have to

Cap’n Web: a new RPC system for browsers and web servers

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/capnweb-javascript-rpc-library/

Allow us to introduce Cap’n Web, an RPC protocol and implementation in pure TypeScript.

Cap’n Web is a spiritual sibling to Cap’n Proto, an RPC protocol I (Kenton) created a decade ago, but designed to play nice in the web stack. That means:

  • Like Cap’n Proto, it is an object-capability protocol. (“Cap’n” is short for “capabilities and”.) We’ll get into this more below, but it’s incredibly powerful.

  • Unlike Cap’n Proto, Cap’n Web has no schemas. In fact, it has almost no boilerplate whatsoever. This means it works more like the JavaScript-native RPC system in Cloudflare Workers.

  • That said, it integrates nicely with TypeScript.

  • Also unlike Cap’n Proto, Cap’n Web’s underlying serialization is human-readable. In fact, it’s just JSON, with a little pre-/post-processing.

  • It works over HTTP, WebSocket, and postMessage() out-of-the-box, with the ability to extend it to other transports easily.

  • It works in all major browsers, Cloudflare Workers, Node.js, and other modern JavaScript runtimes.

  • The whole thing compresses (minify+gzip) to under 10 kB with no dependencies.

  • It’s open source under the MIT license.

Cap’n Web is more expressive than almost every other RPC system, because it implements an object-capability RPC model. That means it:

  • Supports bidirectional calling. The client can call the server, and the server can also call the client.

  • Supports passing functions by reference: If you pass a function over RPC, the recipient receives a “stub”. When they call the stub, they actually make an RPC back to you, invoking the function where it was created. This is how bidirectional calling happens: the client passes a callback to the server, and then the server can call it later.

  • Similarly, supports passing objects by reference: If a class extends the special marker type RpcTarget, then instances of that class are passed by reference, with method calls calling back to the location where the object was created.

  • Supports promise pipelining. When you start an RPC, you get back a promise. Instead of awaiting it, you can immediately use the promise in dependent RPCs, thus performing a chain of calls in a single network round trip.

  • Supports capability-based security patterns.

In short, Cap’n Web lets you design RPC interfaces the way you’d design regular JavaScript APIs – while still acknowledging and compensating for network latency.

The best part is, Cap’n Web is absolutely trivial to set up.

A client looks like this:

import { newWebSocketRpcSession } from "capnweb";

// One-line setup.
let api = newWebSocketRpcSession("wss://example.com/api");

// Call a method on the server!
let result = await api.hello("World");

console.log(result);

And here’s a complete Cloudflare Worker implementing an RPC server:

import { RpcTarget, newWorkersRpcResponse } from "capnweb";

// This is the server implementation.
class MyApiServer extends RpcTarget {
  hello(name) {
    return `Hello, ${name}!`
  }
}

// Standard Workers HTTP handler.
export default {
  fetch(request, env, ctx) {
    // Parse URL for routing.
    let url = new URL(request.url);

    // Serve API at `/api`.
    if (url.pathname === "/api") {
      return newWorkersRpcResponse(request, new MyApiServer());
    }

    // You could serve other endpoints here...
    return new Response("Not found", {status: 404});
  }
}

That’s it. That’s the app.

  • You can add more methods to MyApiServer, and call them from the client.

  • You can have the client pass a callback function to the server, and then the server can just call it.

  • You can define a TypeScript interface for your API, and easily apply it to the client and server.

It just works.

Why RPC? (And what is RPC anyway?)

Remote Procedure Calls (RPC) are a way of expressing communications between two programs over a network. Without RPC, you might communicate using a protocol like HTTP. With HTTP, though, you must format and parse your communications as an HTTP request and response, perhaps designed in REST style. RPC systems try to make communications look like a regular function call instead, as if you were calling a library rather than a remote service. The RPC system provides a “stub” object on the client side which stands in for the real server-side object. When a method is called on the stub, the RPC system figures out how to serialize and transmit the parameters to the server, invoke the method on the server, and then transmit the return value back.

The merits of RPC have been subject to a great deal of debate. RPC is often accused of committing many of the fallacies of distributed computing.

But this reputation is outdated. When RPC was first invented some 40 years ago, async programming barely existed. We did not have Promises, much less async and await. Early RPC was synchronous: calls would block the calling thread waiting for a reply. At best, latency made the program slow. At worst, network failures would hang or crash the program. No wonder it was deemed “broken”.

Things are different today. We have Promise and async and await, and we can throw exceptions on network failures. We even understand how RPCs can be pipelined so that a chain of calls takes only one network round trip. Many large distributed systems you likely use every day are built on RPC. It works.

The fact is, RPC fits the programming model we’re used to. Every programmer is trained to think in terms of APIs composed of function calls, not in terms of byte stream protocols nor even REST. Using RPC frees you from the need to constantly translate between mental models, allowing you to move faster.

When should you use Cap’n Web?

Cap’n Web is useful anywhere where you have two JavaScript applications speaking to each other over a network, including client-to-server and microservice-to-microservice scenarios. However, it is particularly well-suited to interactive web applications with real-time collaborative features, as well as modeling interactions over complex security boundaries.

Cap’n Web is still new and experimental, so for now, a willingness to live on the cutting edge may also be required!

Features, features, features…

Here’s some more things you can do with Cap’n Web.

HTTP batch mode

Sometimes a WebSocket connection is a bit too heavyweight. What if you just want to make a quick one-time batch of calls, but don’t need an ongoing connection?

For that, Cap’n Web supports HTTP batch mode:

import { newHttpBatchRpcSession } from "capnweb";

let batch = newHttpBatchRpcSession("https://example.com/api");

let result = await batch.hello("World");

console.log(result);

(The server is exactly the same as before.)

Note that once you’ve awaited an RPC in the batch, the batch is done, and all the remote references received through it become broken. To make more calls, you need to start over with a new batch. However, you can make multiple calls in a single batch:

let batch = newHttpBatchRpcSession("https://example.com/api");

// We can call make multiple calls, as long as we await them all at once.
let promise1 = batch.hello("Alice");
let promise2 = batch.hello("Bob");

let [result1, result2] = await Promise.all([promise1, promise2]);

console.log(result1);
console.log(result2);

And that brings us to another feature…

Chained calls (Promise Pipelining)

Here’s where things get magical.

In both batch mode and WebSocket mode, you can make a call that depends on the result of another call, without waiting for the first call to finish. In batch mode, that means you can, in a single batch, call a method, then use its result in another call. The entire batch still requires only one network round trip.

For example, say your API is:

class MyApiServer extends RpcTarget {
  getMyName() {
    return "Alice";
  }

  hello(name) {
    return `Hello, ${name}!`
  }
}

You can do:

let namePromise = batch.getMyName();
let result = await batch.hello(namePromise);

console.log(result);

Notice the initial call to getMyName() returned a promise, but we used the promise itself as the input to hello(), without awaiting it first. With Cap’n Web, this just works: The client sends a message to the server saying: “Please insert the result of the first call into the parameters of the second.”

Or perhaps the first call returns an object with methods. You can call the methods immediately, without awaiting the first promise, like:

let batch = newHttpBatchRpcSession("https://example.com/api");

// Authencitate the API key, returning a Session object.
let sessionPromise = batch.authenticate(apiKey);

// Get the user's name.
let name = await sessionPromise.whoami();

console.log(name);

This works because the promise returned by a Cap’n Web call is not a regular promise. Instead, it’s a JavaScript Proxy object. Any methods you call on it are interpreted as speculative method calls on the eventual result. These calls are sent to the server immediately, telling the server: “When you finish the call I sent earlier, call this method on what it returns.”

Did you spot the security?

This last example shows an important security pattern enabled by Cap’n Web’s object-capability model.

When we call the authenticate() method, after it has verified the provided API key, it returns an authenticated session object. The client can then make further RPCs on the session object to perform operations that require authorization as that user. The server code might look like this:

class MyApiServer extends RpcTarget {
  authenticate(apiKey) {
    let username = await checkApiKey(apiKey);
    return new AuthenticatedSession(username);
  }
}

class AuthenticatedSession extends RpcTarget {
  constructor(username) {
    super();
    this.username = username;
  }

  whoami() {
    return this.username;
  }

  // ...other methods requiring auth...
}

Here’s what makes this work: It is impossible for the client to “forge” a session object. The only way to get one is to call authenticate(), and have it return successfully.

In most RPC systems, it is not possible for one RPC to return a stub pointing at a new RPC object in this way. Instead, all functions are top-level, and can be called by anyone. In such a traditional RPC system, it would be necessary to pass the API key again to every function call, and check it again on the server each time. Or, you’d need to do authorization outside of the RPC system entirely.

This is a common pain point for WebSockets in particular. Due to the design of the web APIs for WebSocket, you generally cannot use headers nor cookies to authorize them. Instead, authorization must happen in-band, by sending a message over the WebSocket itself. But this can be annoying for RPC protocols, as it means the authentication message is “special” and changes the state of the connection itself, affecting later calls. This breaks the abstraction.

The authenticate() pattern shown above neatly makes authentication fit naturally into the RPC abstraction. It’s even type-safe: you can’t possibly forget to authenticate before calling a method requiring auth, because you wouldn’t have an object on which to make the call. Speaking of type-safety…

TypeScript

If you use TypeScript, Cap’n Web plays nicely with it. You can declare your RPC API once as a TypeScript interface, implement in on the server, and call it on the client:

// Shared interface declaration:
interface MyApi {
  hello(name: string): Promise<string>;
}

// On the client:
let api: RpcStub<MyApi> = newWebSocketRpcSession("wss://example.com/api");

// On the server:
class MyApiServer extends RpcTarget implements MyApi {
  hello(name) {
    return `Hello, ${name}!`
  }
}

Now you get end-to-end type checking, auto-completed method names, and so on.

Note that, as always with TypeScript, no type checks occur at runtime. The RPC system itself does not prevent a malicious client from calling an RPC with parameters of the wrong type. This is, of course, not a problem unique to Cap’n Web – JSON-based APIs have always had this problem. You may wish to use a runtime type-checking system like Zod to solve this. (Meanwhile, we hope to add type checking based directly on TypeScript types in the future.)

An alternative to GraphQL?

If you’ve used GraphQL before, you might notice some similarities. One benefit of GraphQL was to solve the “waterfall” problem of traditional REST APIs by allowing clients to ask for multiple pieces of data in one query. For example, instead of making three sequential HTTP calls:

GET /user
GET /user/friends
GET /user/friends/photos

…you can write one GraphQL query to fetch it all at once.

That’s a big improvement over REST, but GraphQL comes with its own tradeoffs:

  • New language and tooling. You have to adopt GraphQL’s schema language, servers, and client libraries. If your team is all-in on JavaScript, that’s a lot of extra machinery.

  • Limited composability. GraphQL queries are declarative, which makes them great for fetching data, but awkward for chaining operations or mutations. For example, you can’t easily say: “create a user, then immediately use that new user object to make a friend request, all-in-one round trip.”

  • Different abstraction model. GraphQL doesn’t look or feel like the JavaScript APIs you already know. You’re learning a new mental model rather than extending the one you use every day.

How Cap’n Web goes further

Cap’n Web solves the waterfall problem without introducing a new language or ecosystem. It’s just JavaScript. Because Cap’n Web supports promise pipelining and object references, you can write code that looks like this:

let user = api.createUser({ name: "Alice" });
let friendRequest = await user.sendFriendRequest("Bob");

What happens under the hood? Both calls are pipelined into a single network round trip:

  1. Create the user.

  2. Take the result of that call (a new User object).

  3. Immediately invoke sendFriendRequest() on that object.

All of this is expressed naturally in JavaScript, with no schemas, query languages, or special tooling required. You just call methods and pass objects around, like you would in any other JavaScript code.

In other words, GraphQL gave us a way to flatten REST’s waterfalls. Cap’n Web lets us go even further: it gives you the power to model complex interactions exactly the way you would in a normal program, with no impedance mismatch.

But how do we solve arrays?

With everything we’ve presented so far, there’s a critical missing piece to seriously consider Cap’n Web as an alternative to GraphQL: handling lists. Often, GraphQL is used to say: “Perform this query, and then, for every result, perform this other query.” For example: “List the user’s friends, and then for each one, fetch their profile photo.”

In short, we need an array.map() operation that can be performed without adding a round trip.

Cap’n Proto, historically, has never supported such a thing.

But with Cap’n Web, we’ve solved it. You can do:

let user = api.authenticate(token);

// Get the user's list of friends (an array).
let friendsPromise = user.listFriends();

// Do a .map() to annotate each friend record with their photo.
// This operates on the *promise* for the friends list, so does not
// add a round trip.
// (wait WHAT!?!?)
let friendsWithPhotos = friendsPromise.map(friend => {
  return {friend, photo: api.getUserPhoto(friend.id))};
}

// Await the friends list with attached photos -- one round trip!
let results = await friendsWithPhotos;

Wait… How!?

.map() takes a callback function, which needs to be applied to each element in the array. As we described earlier, normally when you pass a function to an RPC, the function is passed “by reference”, meaning that the remote side receives a stub, where calling that stub makes an RPC back to the client where the function was created.

But that is NOT what is happening here. That would defeat the purpose: we don’t want the server to have to round-trip to the client to process every member of the array. We want the server to just apply the transformation server-side.

To that end, .map() is special. It does not send JavaScript code to the server, but it does send something like “code”, restricted to a domain-specific, non-Turing-complete language. The “code” is a list of instructions that the server should carry out for each member of the array. In this case, the instructions are:

  1. Invoke api.getUserPhoto(friend.id).

  2. Return an object {friend, photo}, where friend is the original array element and photo is the result of step 1.

But the application code just specified a JavaScript method. How on Earth could we convert this into the narrow DSL?

The answer is record-replay: On the client side, we execute the callback once, passing in a special placeholder value. The parameter behaves like an RPC promise. However, the callback is required to be synchronous, so it cannot actually await this promise. The only thing it can do is use promise pipelining to make pipelined calls. These calls are intercepted by the implementation and recorded as instructions, which can then be sent to the server, where they can be replayed as needed.

And because the recording is based on promise pipelining, which is what the RPC protocol itself is designed to represent, it turns out that the “DSL” used to represent “instructions” for the map function is just the RPC protocol itself. 🤯

Implementation details

JSON-based serialization

Cap’n Web’s underlying protocol is based on JSON – but with a preprocessing step to handle special types. Arrays are treated as “escape sequences” that let us encode other values. For example, JSON does not have an encoding for Date objects, but Cap’n Web does. You might see a message that looks like this:

{
  event: "Birthday Week",
  timestamp: ["date", 1758499200000]
}

To encode a literal array, we simply double-wrap it in []:

{
  names: [["Alice", "Bob", "Carol"]]
}

In other words, an array with just one element which is itself an array, evaluates to the inner array literally. An array whose first element is a type name, evaluates to an instance of that type, where the remaining elements are parameters to the type.

Note that only a fixed set of types are supported: essentially, “structured clonable” types, and RPC stub types.

On top of this basic encoding, we define an RPC protocol inspired by Cap’n Proto – but greatly simplified.

RPC protocol

Since Cap’n Web is a symmetric protocol, there is no well-defined “client” or “server” at the protocol level. There are just two parties exchanging messages across a connection. Every kind of interaction can happen in either direction.

In order to make it easier to describe these interactions, I will refer to the two parties as “Alice” and “Bob”.

Alice and Bob start the connection by establishing some sort of bidirectional message stream. This may be a WebSocket, but Cap’n Web also allows applications to define their own transports. Each message in the stream is JSON-encoded, as described earlier.

Alice and Bob each maintain some state about the connection. In particular, each maintains an “export table”, describing all the pass-by-reference objects they have exposed to the other side, and an “import table”, describing the references they have received. Alice’s exports correspond to Bob’s imports, and vice versa. Each entry in the export table has a signed integer ID, which is used to reference it. You can think of these IDs like file descriptors in a POSIX system. Unlike file descriptors, though, IDs can be negative, and an ID is never reused over the lifetime of a connection.

At the start of the connection, Alice and Bob each populate their export tables with a single entry, numbered zero, representing their “main” interfaces. Typically, when one side is acting as the “server”, they will export their main public RPC interface as ID zero, whereas the “client” will export an empty interface. However, this is up to the application: either side can export whatever they want.

From there, new exports are added in two ways:

  • When Alice sends a message to Bob that contains within it an object or function reference, Alice adds the target object to her export table. IDs assigned in this case are always negative, starting from -1 and counting downwards.

  • Alice can send a “push” message to Bob to request that Bob add a value to his export table. The “push” message contains an expression which Bob evaluates, exporting the result. Usually, the expression describes a method call on one of Bob’s existing exports – this is how an RPC is made. Each “push” is assigned a positive ID on the export table, starting from 1 and counting upwards. Since positive IDs are only assigned as a result of pushes, Alice can predict the ID of each push she makes, and can immediately use that ID in subsequent messages. This is how promise pipelining is achieved.

After sending a push message, Alice can subsequently send a “pull” message, which tells Bob that once he is done evaluating the “push”, he should proactively serialize the result and send it back to Alice, as a “resolve” (or “reject”) message. However, this is optional: Alice may not actually care to receive the return value of an RPC, if Alice only wants to use it in promise pipelining. In fact, the Cap’n Web implementation will only send a “pull” message if the application has actually awaited the returned promise.

Putting it together, a code sequence like this:

{
  names: [["Alice", "Bob", "Carol"]]
}

Might produce a message exchange like this:

// Call api.getByName(). `api` is the server's main export, so has export ID 0.
-> ["push", ["pipeline", 0, "getMyName", []]
// Call api.hello(namePromise). `namePromise` refers to the result of the first push,
// so has ID 1.
-> ["push", ["pipeline", 0, "hello", [["pipeline", 1]]]]
// Ask that the result of the second push be proactively serialized and returned.
-> ["pull", 2]
// Server responds.
<- ["resolve", 2, "Hello, Alice!"]

For more details about the protocol, check out the docs.

Try it out!

Cap’n Web is new and still highly experimental. There may be bugs to shake out. But, we’re already using it today. Cap’n Web is the basis of the recently-launched “remote bindings” feature in Wrangler, allowing a local test instance of workerd to speak RPC to services in production. We’ve also begun to experiment with it in various frontend applications – expect more blog posts on this in the future.

In any case, Cap’n Web is open source, and you can start using it in your own projects now.

Check it out on GitHub.


Introducing free access to Cloudflare developer features for students

Post Syndicated from Veronica Marin original https://blog.cloudflare.com/workers-for-students/

I can recall countless late nights as a student spent building out ideas that felt like breakthroughs. My own thesis had significant costs associated with the tools and computational resources I needed. The reality for students is that turning ideas into working applications often requires production-grade tools, and having to pay for them can stop a great project before it even starts. We don’t think that cost should stand in the way of building out your ideas.

Cloudflare’s Developer Platform already makes it easy for anyone to go from idea to launch. It gives you all the tools you need in one place to work on that class project, build out your portfolio, and create full-stack applications. We want students to be able to use these tools without worrying about the cost, so starting today, students at least 18 years old in the United States with a verified .edu email can receive 12 months of free access to Cloudflare’s developer features. This is the first step for Cloudflare for Students, and we plan to continue expanding our support for the next generation of builders.


What’s included

12 months of our paid developer features plan at no upfront cost

Eligible student accounts will receive increased usage allotments for our developer features compared to our free plan. That includes Workers, Pages Functions, KV, Containers, Vectorize, Hyperdrive, Durable Objects, Workers Logpush, and Queues. With these, you can build everything from APIs and full-stack apps to data pipelines and websites.

After 12 months, you can easily renew your subscription by upgrading to our Workers Paid plan. If you choose not to, your account will automatically revert to the free plan, and you won’t be charged.

Here’s a look at the increased usage allotments students can receive today. Above those free allotments, our standard usage rates will apply.

Free Plan

Student Accounts (Paid developer features)

Workers

100,000 requests/day

10 million requests/month

+ $.30 per additional million requests

Workers KV

100,000 read operations/day

1,000 write, delete, list operations per day

10 million read operations/month

1 million write, delete, and list operations per month

Hyperdrive

100,000 database queries/day

Unlimited database queries / day

Durable Objects

100,000 requests/day

1 million requests / day

+ $0.15 / per additional million requests

Workers Logs

200,000 log events / day

3 Days of retention

20 million log events / month 

7 Days of retention

+$0.60 per additional million events

Workers Logpush

Not Included

10 million log events / month

+$0.05 per additional million log events

Queues

Not Included

1 million operations/month included 

+$0.40 per additional million operations

Access to a dedicated student developer community

You’ll also have access to a dedicated Discord channel just for students. We want to see what you’re building! This is a place to connect with peers, get support, and share ideas in a community of student developers.

What others have built with Cloudflare’s Developer Platform

Curious about what’s possible with Cloudflare’s developer features? Here are some projects from our community:

by Daniel Foldi


Adventure is a text-based adventure game running on Cloudflare Workers that uses Workers AI to generate the stories with the @cf/google/gemma-3-12b-it model. 

The project’s developer chose Workers AI with the OpenNext adapter because it made deployment simple and handled scaling automatically. It uses the Workers Paid plan mainly to enable Workers Logpush and get access to detailed logs for better monitoring and analysis.

When a new game starts, the server gives the AI a custom prompt to set the scene and explain how the adventure should work. From there, each time the player makes a choice, their story history is sent back to the server, which asks the AI to continue the narrative, allowing the story to evolve dynamically based on the player’s choices.

The code below shows how this logic is implemented:

"use server";
import { getCloudflareContext } from "@opennextjs/cloudflare";

async function prime(env: CloudflareEnv) {
  const id = Math.floor(Math.random() * 1000000);//unique ID for each game run
  const messages = [
    {
      role: "user",
      content:
        `The user is playing a text-based adventure game. Each game is different, this is game ${id}. Your first job is to create a short background story in 3-4 sentences. Scenarios may include interesting locations such as jungles, deserts, caves.
        After the first message, each of your messages will be responses to the user interaction. State three short options (A, B, C). The user responses will be the chosen action. Your responses should end by asking the user about their choice.
        Your message will be shown to the user directly, so avoid "Certainly", "Great", "Let's get started", and other filler content, and avoid bringing up technical details such as "this is game #id".
        The games should have a win condition that is actually feasible given the story, and if the player loses, the message should end with "Try again.".
        `,
    },
  ];
  //Call Workers AI to generate the first response (story intro)
  const { response } = await env.AI.run("@cf/google/gemma-3-12b-it", { messages });

  return [
    ...messages,
    { role: "assistant", content: response }
  ];
}

/**
 * Main server action for the adventure game.
 * If no input yet, it primes the game with the opening story
 * If there is input, it continues the story based on the full history
 * Uses getCloudflareContext from @opennextjs/cloudflare to access env.
 */
export async function adventureAction(input: any[]) {
  let { env } = await getCloudflareContext({ async: true });

  return input.length === 0
  ? await prime(env)
  : [...input,
      { role: "assistant", content: (await env.AI.run("@cf/google/gemma-3-12b-it", { messages: input })).response }
  ];
}

by Matt Cowley


DNS over Discord is a bot that lets you run DNS lookups right inside Discord. Instead of switching to a terminal or online tool, you can use simple slash commands to check records like A, AAAA, MX, TXT, and more.

The developer behind the project chose Cloudflare Workers because it’s a great platform for running small JavaScript apps that handle requests, which made it a good fit for Discord’s slash commands. Since every command translates into a request and the bot sees a lot of traffic, the free tier wasn’t enough, so it now runs on Workers Paid to keep up reliably without hitting request limits.

In this project, the Worker checks if the request is a Discord interaction, and if so, it sends it to the right command (e.g., /dig, /multi-dig, etc.), using a handler that calls out to a custom framework for Discord slash commands. If it’s not from Discord, it can also serve routes like the privacy page or terms of service.

Here’s what that looks like in code:

export default {
  // Process all requests to the Worker
  fetch: async (request, env, ctx) => {
    try {
      // Include the env in the context we pass to the handler
      ctx.env = env;

      // Check if it's a Discord interaction (or a health check)
      const resp = await handler(request, ctx);
      if (resp) return resp;

      // Otherwise, process the request
      const url = new URL(request.url);

      if (request.method === 'GET' && url.pathname === '/privacy')
        return new textResponse(Privacy);

      if (request.method === 'GET' && url.pathname === '/terms')
        return new textResponse(Terms);

      // Fallback if nothing matches
      return new textResponse(null, { status: 404 });
    } catch (err) {
      // Log any errors
      captureException(err);

      // Re-throw the error
      throw err;
    }
  },
};

by James Ross


placeholders.dev is a service that generates placeholder images, making it easy for developers to prototype and scaffold websites without dealing with hosting or asset management. Users can generate placeholders instantly with a simple URL, such as: https://images.placeholders.dev/350x150

Since placeholders are typically used in early development, speed and consistency matter, and images need to load instantly so the workflow isn’t interrupted. Running on Cloudflare Workers makes the service fast and consistent no matter where developers are.

This project uses the Workers Paid plan because it regularly exceeds the free-tier limits on requests and compute time. The Worker below shows the core of how the service works. When a request comes in, it looks at the URL path (like /300x150) to determine the size of the placeholder, applies some defaults for style, and then returns an SVG image on the fly.

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext) {
    try {
      const url = new URL(request.url);
      const cache = caches.default;

      // Handle requests for the placeholder API
      if (url.host === 'images.placeholders.dev' || url.pathname.startsWith('/api')) {
        // Try edge cache first
        const cached = await cache.match(url, { ignoreMethod: true });
        if (cached) return cached;

        // Default placeholder options
        const imageOptions: Options = {
   dataUri: false, // always return an unencoded SVG source
          width: 300,
          height: 150,
          fontFamily: 'sans-serif',
          fontWeight: 'bold',
          bgColor: '#ddd',
          textColor: 'rgba(0,0,0,0.5)',
        };

        // Parse sizes from path (e.g. /350 or /350x150)
        const sizeParts = url.pathname.replace('/api', '').replace('/', '').split('x');
        if (sizeParts[0]) {
          const width = sanitizeNumber(parseInt(sizeParts[0], 10));
          const height = sizeParts[1] ? sanitizeNumber(parseInt(sizeParts[1], 10)) : width;
          imageOptions.width = width;
          imageOptions.height = height;
        }

        // Generate SVG placeholder
        const response = new Response(simpleSvgPlaceholder(imageOptions), {
          headers: { 'content-type': 'image/svg+xml; charset=utf-8' },
        });

        // Cache result
        response.headers.set('Cache-Control', 'public, max-age=' + cacheTtl);
        ctx.waitUntil(cache.put(url, response.clone()));

        return response;
      }

      return new Response('Not Found', { status: 404 });
    } catch (err) {
      console.error(err);
      return new Response('Internal Error', { status: 500 });
    }
  },
};

Check out Built With Workers to see what other developers are building with our developer platform.


How do I get started?

This offering is available to United States students at least 18 years old with a verified .edu billing email address.

Based on when your account was created, you can redeem this offer either by signing up for a free Cloudflare account with your .edu email or by filling out a form to request access for your existing .edu account. Just make sure your verified .edu email address is your billing email address.

New .edu accounts

Existing .edu accounts

Creation Date 

Created on/after September 22, 2025

Created prior to September 22, 2025

How to Redeem

Sign up for a free Cloudflare account, add your credit card and ensure your verified .edu email address is added to your billing details.

Ensure your verified .edu email address is added to your billing details.

Fill out our form and a member of our team will help you get access

Note: in order to receive the credit, your verified .edu email address needs to be your billing email address 

Expanding Cloudflare for Students coverage

While our first offering is primarily for institutions in the US, we’re working on expanding support for our students in other countries and plan to add additional higher education domain names after launch. If you’re at an educational institution outside of the United States, please reach out to us and apply for your educational/academic domain to be added. We’ll let you know as soon as it becomes available in your region. Check our Cloudflare for Students page for updates and keep an eye out for emails if you have an account with a newly supported domain.

Whether you’re gearing up for your first hackathon, launching a side project, or looking to build the next big thing, you can get started today with free access and join a global developer community already building on Cloudflare.

Get started by signing up or requesting access today.