Tag Archives: Cloudflare Workers

Moving Baselime from AWS to Cloudflare: simpler architecture, improved performance, over 80% lower cloud costs

2024-10-31 Boris Tane

Post Syndicated from Boris Tane original https://blog.cloudflare.com/80-percent-lower-cloud-cost-how-baselime-moved-from-aws-to-cloudflare

Introduction

When Baselime joined Cloudflare in April 2024, our architecture had evolved to hundreds of AWS Lambda functions, dozens of databases, and just as many queues. We were drowning in complexity and our cloud costs were growing fast. We are now building Baselime and Workers Observability on Cloudflare and will save over 80% on our cloud compute bill. The estimated potential Cloudflare costs are for Baselime, which remains a stand-alone offering, and the estimate is based on the Workers Paid plan. Not only did we achieve huge cost savings, we also simplified our architecture and improved overall latency, scalability, and reliability.

Daily Cost	Before (AWS)	After (Cloudflare)
Compute	$650 – AWS Lambda	$25 – Cloudflare Workers
CDN	$140 – Cloudfront	$0 – Free
Data Stream + Analytics database	$1,150 – Kinesis Data Stream + EC2	$300 – Workers Analytics Engine
Total	$1,940	$325 (83% cost reduction)

^{Table 1: Daily Costs Comparison ($USD)}

When we joined Cloudflare, we immediately saw a surge in usage, and within the first week following the announcement, we were processing over a billion events daily and our weekly active users tripled.

As the platform grew, so did the challenges of managing real-time observability with new scalability, reliability, and cost considerations. This drove us to rebuild Baselime on the Cloudflare Developer Platform, where we could innovate quickly while reducing operational overhead.

Initial architecture — all on AWS

Our initial architecture was all on Amazon Web Services (AWS). We’ll focus here on the data pipeline, which covers ingestion, processing, and storage of tens of billions of events daily.

This pipeline was built on top of AWS Lambda, Cloudfront, Kinesis, EC2, DynamoDB, ECS, and ElastiCache.

^{Figure1: Initial data pipeline architecture}

The key elements are:

Data receptors: Responsible for receiving telemetry data from multiple sources, including OpenTelemetry, Cloudflare Logpush, CloudWatch, Vercel, etc. They cover validation, authentication, and transforming data from each source into a common internal format. The data receptors were deployed either on AWS Lambda (using function URLs and Cloudfront) or ECS Fargate depending on the data source.
Kinesis Data Stream: Responsible for transporting the data from the receptors to the next step: data processing.
Processor: A single AWS Lambda function responsible for enriching and transforming the data for storage. It also performed real-time error tracking and detecting patterns in logs.
ClickHouse cluster: All the telemetry data was ultimately indexed and stored in a self-hosted ClickHouse cluster on EC2.

In addition to these key elements, the existing stack also included orchestration with Firehose, S3 buckets, SQS, DynamoDB and RDS for error handling, retries, and storing metadata.

While this architecture served us well in the early days, it started to show major cracks as we scaled our solution to more and larger customers.

Handling retries at the interface between the data receptors and the Kinesis Data Stream was complex, requiring introducing and orchestrating Firehose, S3 buckets, SQS, and another Lambda function.

Self-hosting ClickHouse also introduced major challenges at scale, as we continuously had to plan our capacity and update our setup to keep pace with our growing user base whilst attempting to maintain control over costs.

Costs began scaling unpredictably with our growing workloads, especially in AWS Lambda, Kinesis, and EC2, but also in less obvious ways, such as in Cloudfront (required for a custom domain in front of Lambda function URLs) and DynamoDB. Specifically, the time spent on I/O operations in AWS Lambda was a particularly costly piece. At every step, from the data receptors to the ClickHouse cluster, moving data to the next stage required waiting for a network request to complete, accounting for over 70% of wall time in the Lambda function.

In a nutshell, we were continuously paged by our alerts, innovating at a slower pace, and our costs were out of control.

Additionally, the entire solution was deployed in a single AWS region: eu-west-1. As a result, all developers located outside continental Europe were experiencing high latency when emitting logs and traces to Baselime.

Modern architecture — transitioning to Cloudflare

The shift to the Cloudflare Developer Platform enabled us to rethink our architecture to be exceptionally fast, globally distributed, and highly scalable, without compromising on cost, complexity, or agility. This new architecture is built on top of Cloudflare primitives.

^{Figure 2: Modern data pipeline architecture}

Cloudflare Workers: the core of Baselime

Cloudflare Workers are now at the core of everything we do. All the data receptors and the processor run in Workers. Workers minimize cold-start times and are deployed globally by default. As such, developers always experience lower latency when emitting events to Baselime.

Additionally, we heavily use JavaScript-native RPC for data transfer between steps of the pipeline. It’s low-latency, lightweight, and simplifies communication between components. This further simplifies our architecture, as separate components behave more as functions within the same process, rather than completely separate applications.

export default {
  async fetch(request: Request, env: Bindings, ctx: ExecutionContext): Promise<Response> {
      try {
        const { err, apiKey } = auth(request);
        if (err) return err;

        const data = {
          workspaceId: apiKey.workspaceId,
          environmentId: apiKey.environmentId,
          events: request.body
        };
        await env.PROCESSOR.ingest(data);

        return success({ message: "Request Accepted" }, 202);
      } catch (error) {
        return failure({ message: "Internal Error" });
      }
  },
};

^{Code Block 1: Simplified data receptor using JavaScript-native RPC to execute the processor.}

Workers also expose a Rate Limiting binding that enables us to automatically add rate limiting to our services, which we previously had to build ourselves using a combination of DynamoDB and ElastiCache.

Moreover, we heavily use ctx.waitUntil within our Worker invocations, to offload data transformation outside the request / response path. This further reduces the latency of calls developers make to our data receptors.

Durable Objects: stateful data processing

Durable Objects is a unique service within the Cloudflare Developer Platform, as it enables building stateful applications in a serverless environment. We use Durable Objects in the data pipelines for both real-time error tracking and detecting log patterns.

For instance, to track errors in real-time, we create a durable object for each new type of error, and this durable object is responsible for keeping track of the frequency of the error, when to notify customers, and the notification channels for the error. This implementation with a single building block removes the need for ElastiCache, Kinesis, and multiple Lambda functions to coordinate protecting the RDS database from being overwhelmed by a high frequency error.

^{Figure 3: Real-time error detection architecture comparison}

Durable Objects gives us precise control over consistency and concurrency of managing state in the data pipeline.

In addition to the data pipeline, we use Durable Objects for alerting. Our previous architecture required orchestrating EventBridge Scheduler, SQS, DynamoDB and multiple AWS Lambda functions, whereas with Durable Objects, everything is handled within the alarm handler.

Workers Analytics Engine: high-cardinality analytics at scale

Though managing our own ClickHouse cluster was technically interesting and challenging, it took us away from building the best observability developer experience. With this migration, more of our time is spent enhancing our product and none is spent managing server instances.

Workers Analytics Engine lets us synchronously write events to a scalable high-cardinality analytics database. We built on top of the same technology that powers Workers Analytics Engine. We also made internal changes to Workers Analytics Engine to natively enable high dimensionality in addition to high cardinality.

Moreover, Workers Analytics Engine and our solution leverages Cloudflare’s ABR analytics. ABR stands for Adaptive Bit Rate, and enables us to store telemetry data in multiple tables with varying resolutions, from 100% to 0.0001% of the data. Querying the table with 0.0001% of the data will be several orders of magnitudes faster than the table with all the data, with a corresponding trade-off in accuracy. As such, when a query is sent to our systems, Workers Analytics Engine dynamically selects the most appropriate table to run the query, optimizing both query time and accuracy. Users always get the most accurate result with optimal query time, regardless of the size of their dataset or the timeframe of the query. Compared to our previous system, which was always running queries on the full dataset, the new system now delivers faster queries across our entire user base and use cases.

In addition to these core services (Workers, Durable Objects, Workers Analytics Engine), the new architecture leverages other building blocks from the Cloudflare Developer Platform. Queues for asynchronous messaging, decoupling services and enabling an event-driven architecture; D1 as our main database for transactional data (queries, alerts, dashboards, configurations, etc.); Workers KV for fast distributed storage; Hono for all our APIs, etc.

How did we migrate?

Baselime is built on an event-driven architecture, where every user action triggers an event. It operates on the principle that every user action is recorded as an event and emitted to the rest of the system — whether it’s creating a user, editing a dashboard, or performing any other action. Migrating to Cloudflare involved transitioning our event-driven architecture without compromising uptime and data consistency. Previously, this was powered by AWS EventBridge and SQS, and we moved entirely to Cloudflare Queues.

We followed the strangler fig pattern to incrementally migrate the solution from AWS to Cloudflare. It consists of gradually replacing specific parts of the system with newer services, with minimal disruption to the system. Early in the process, we created a central Cloudflare Queue which acted as the backbone for all transactional event processing during the migration. Every event, whether a new user signup or a dashboard edit, was funneled into this Queue. From there, events were dynamically routed, each event to the relevant part of the application. User actions were synced into D1 and KV, ensuring that all user actions were mirrored across both AWS and Cloudflare during the transition.

This syncing mechanism enabled us to maintain consistency and ensure that no data was lost as users continued to interact with Baselime.

Here’s an example of how events are processed:

export default {
  async queue(batch, env) {
    for (const message of batch.messages) {
      try {
        const event = message.body;
        switch (event.type) {
          case "WORKSPACE_CREATED":
            await workspaceHandler.create(env, event.data);
            break;
          case "QUERY_CREATED":
            await queryHandler.create(env, event.data);
            break;
          case "QUERY_DELETED":
            await queryHandler.remove(env, event.data);
            break;
          case "DASHBOARD_CREATED":
            await dashboardHandler.create(env, event.data);
            break;
          //
          // Many more events...
          //
          default:
            logger.info("Matched no events", { type: event.type });
        }
        message.ack();
      } catch (e) {
        if (message.attempts < 3) {
          message.retry({ delaySeconds: Math.ceil(30 ** message.attempts / 10), });
        } else {
          logger.error("Failed handling event - No more retrys", { event: message.body, attempts: message.attempts }, e);
        }
      }
    }
  },
} satisfies ExportedHandler<Env, InternalEvent>;

^{Code Block 2: Simplified internal events processing during migration.}

We migrated the data pipeline from AWS to Cloudflare with an outside-in method: we started with the data receptors and incrementally moved the data processor and the ClickHouse cluster to the new architecture. We began writing telemetry data (logs, metrics, traces, wide-events, etc.) to both ClickHouse (in AWS) and to Workers Analytics Engine simultaneously for the duration of the retention period (30 days).

The final step was rewriting all of our endpoints, previously hosted on AWS Lambda and ECS containers, into Cloudflare Workers. Once those Workers were ready, we simply switched the DNS records to point to the Workers instead of the existing Lambda functions.

Despite the complexity, the entire migration process, from the data pipeline to all re-writing API endpoints, took our then team of 3 engineers less than three months.

We ended up saving over 80% on our cloud bill

Savings on the data receptors

After switching the data receptors from AWS to Cloudflare in early June 2024, our AWS Lambda cost was reduced by over 85%. These costs were primarily driven by I/O time the receptors spent sending data to a Kinesis Data Stream in the same region.

^{Figure 4: Baselime daily AWS Lambda cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]}

Moreover, we used Cloudfront to enable custom domains pointing to the data receptors. When we migrated the data receptors to Cloudflare, there was no need for Cloudfront anymore. As such, our Cloudfront cost was reduced to $0.

^{Figure 5: Baselime daily Cloudfront cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]}

If we were a regular Cloudflare customer, we estimate that our daily Cloudflare Workers bill would be around \$25 after the switch, against \$790 on AWS: over 95% cost reduction. These savings are primarily driven by the Workers pricing model, since Workers charge for CPU time, and the receptors are primarily just moving data, and as such, are mostly I/O bound.

Savings on the ClickHouse cluster

To evaluate the cost impact of switching from self-hosting ClickHouse to using Workers Analytics Engine, we need to take into account not only the EC2 instances, but also the disk space, networking, and the Kinesis Data Stream cost.

We completed this switch in late August, achieving over 95% cost reduction in both the Kinesis Data Stream and all EC2 related costs.

^{Figure 6: Baselime daily Kinesis Data Stream cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]}

^{Figure 7: Baselime daily EC2 cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]}

If we were a regular Cloudflare customer, we estimate that our daily Workers Analytics Engine cost would be around \$300 after the switch, compared to \$1150 on AWS, a cost reduction of over 70%.

Not only did we significantly reduce costs by migrating to Cloudflare, but we also improved performance across the board. Responses to users are now faster, with real-time event ingestion happening across Cloudflare’s network, closer to our users. Responses to users querying their data are also much faster, thanks to Cloudflare’s deep expertise in operating ClickHouse at scale.

Most importantly, we’re no longer bound by limitations in throughput or scale. We launched Workers Logs on September 26, 2024, and our system now handles a much higher volume of events than before, with no sacrifices in speed or reliability.

These cost savings are outstanding as is, and do not include the total cost of ownership of those systems. We significantly simplified our systems and our codebase, as the platform is taking care of more for us. We’re paged less, we spend less time monitoring infrastructure, and we can focus on delivering product improvements.

Conclusion

Migrating Baselime to Cloudflare has transformed how we build and scale our platform. With Workers, Durable Objects, Workers Analytics Engine, and other services, we now run a fully serverless, globally distributed system that’s more cost-efficient and agile. This shift has significantly reduced our operational overhead and enabled us to iterate faster, delivering better observability tooling to our users.

You can start observing your Cloudflare Workers today with Workers Logs. Looking ahead, we’re excited about the features we will deliver directly in the Cloudflare Dashboard, including real-time error tracking, alerting, and a query builder for high-cardinality and dimensionality events. All coming by early 2025.

Elephants in tunnels: how Hyperdrive connects to databases inside your VPC networks

2024-10-25 Andrew Repp

Post Syndicated from Andrew Repp original https://blog.cloudflare.com/elephants-in-tunnels-how-hyperdrive-connects-to-databases-inside-your-vpc-networks

With September’s announcement of Hyperdrive’s ability to send database traffic from Workers over Cloudflare Tunnels, we wanted to dive into the details of what it took to make this happen.

Hyper-who?

Accessing your data from anywhere in Region Earth can be hard. Traditional databases are powerful, familiar, and feature-rich, but your users can be thousands of miles away from your database. This can cause slower connection startup times, slower queries, and connection exhaustion as everything takes longer to accomplish.

Cloudflare Workers is an incredibly lightweight runtime, which enables our customers to deploy their applications globally by default and renders the cold start problem almost irrelevant. The trade-off for these light, ephemeral execution contexts is the lack of persistence for things like database connections. Database connections are also notoriously expensive to spin up, with many round trips required between client and server before any query or result bytes can be exchanged.

Hyperdrive is designed to make the centralized databases you already have feel like they’re global while keeping connections to those databases hot. We use our global network to get faster routes to your database, keep connection pools primed, and cache your most frequently run queries as close to users as possible.

Why a Tunnel?

For something as sensitive as your database, exposing access to the public Internet can be uncomfortable. It is common to instead host your database on a private network, and allowlist known-safe IP addresses or configure GRE tunnels to permit traffic to it. This is complex, toilsome, and error-prone.

On Cloudflare’s Developer Platform, we strive for simplicity and ease-of-use. We cannot expect all of our customers to be experts in configuring networking solutions, and so we went in search of a simpler solution. Being your own customer is rarely a bad choice, and it so happens that Cloudflare offers an excellent option for this scenario: Tunnels.

Cloudflare Tunnel is a Zero Trust product that creates a secure connection between your private network and Cloudflare. Exposing services within your private network can be as simple as running a cloudflared binary, or deploying a Docker container running the cloudflared image we distribute.

A custom handler and generic streams

Integrating with Tunnels to support sending Postgres directly through them was a bit of a new challenge for us. Most of the time, when we use Tunnels internally (more on that later!), we rely on the excellent job cloudflared does of handling all of the mechanics, and we just treat them as pipes. That wouldn’t work for Hyperdrive, though, so we had to dig into how Tunnels actually ingress traffic to build a solution.

Hyperdrive handles Postgres traffic using an entirely custom implementation of the Postgres message protocol. This is necessary, because we sometimes have to alter the specific type or content of messages sent from client to server, or vice versa. Handling individual bytes gives us the flexibility to implement whatever logic any new feature might need.

An additional, perhaps less obvious, benefit of handling Postgres message traffic as just bytes is that we are not bound to the transport layer choices of some ORM or library. One of the nuances of running services in Cloudflare is that we may want to egress traffic over different services or protocols, for a variety of different reasons. In this case, being able to egress traffic via a Tunnel would be pretty challenging if we were stuck with whatever raw TCP socket a library had established for us.

The way we accomplish this relies on a mainstay of Rust: traits (which are how Rust lets developers apply logic across generic functions and types). In the Rust ecosystem, there are two traits that define the behavior Hyperdrive wants out of its transport layers: AsyncRead and AsyncWrite. There are a couple of others we also need, but we’re going to focus on just these two. These traits enable us to code our entire custom handler against a generic stream of data, without the handler needing to know anything about the underlying protocol used to implement the stream. So, we can pass around a WebSocket connection as a generic I/O stream, wherever it might be needed.

As an example, the code to create a generic TCP stream and send a Postgres startup message across it might look like this:

/// Send a startup message to a Postgres server, in the role of a PG client.
/// https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-STARTUPMESSAGE
pub async fn send_startup<S>(stream: &mut S, user_name: &str, db_name: &str, app_name: &str) -> Result<(), ConnectionError>
where
    S: AsyncWrite + Unpin,
{
    let protocol_number = 196608 as i32;
    let user_str = &b"user\0"[..];
    let user_bytes = user_name.as_bytes();
    let db_str = &b"database\0"[..];
    let db_bytes = db_name.as_bytes();
    let app_str = &b"application_name\0"[..];
    let app_bytes = app_name.as_bytes();
    let len = 4 + 4
        + user_str.len() + user_bytes.len() + 1
        + db_str.len() + db_bytes.len() + 1
        + app_str.len() + app_bytes.len() + 1 + 1;

    // Construct a BytesMut of our startup message, then send it
    let mut startup_message = BytesMut::with_capacity(len as usize);
    startup_message.put_i32(len as i32);
    startup_message.put_i32(protocol_number);
    startup_message.put(user_str);
    startup_message.put_slice(user_bytes);
    startup_message.put_u8(0);
    startup_message.put(db_str);
    startup_message.put_slice(db_bytes);
    startup_message.put_u8(0);
    startup_message.put(app_str);
    startup_message.put_slice(app_bytes);
    startup_message.put_u8(0);
    startup_message.put_u8(0);

    match stream.write_all(&startup_message).await {
        Ok(_) => Ok(()),
        Err(err) => {
            error!("Error writing startup to server: {}", err.to_string());
            ConnectionError::InternalError
        }
    }
}

/// Connect to a TCP socket
let stream = match TcpStream::connect(("localhost", 5432)).await {
    Ok(s) => s,
    Err(err) => {
        error!("Error connecting to address: {}", err.to_string());
        return ConnectionError::InternalError;
    }
};
let _ = send_startup(&mut stream, "db_user", "my_db").await;

With this approach, if we wanted to encrypt the stream using TLS before we write to it (upgrading our existing TcpStream connection in-place, to an SslStream), we would only have to change the code we use to create the stream, while generating and sending the traffic would remain unchanged. This is because SslStream also implements AsyncWrite!

/// We're handwaving the SSL setup here. You're welcome.
let conn_config = new_tls_client_config()?;

/// Encrypt the TcpStream, returning an SslStream
let ssl_stream = match tokio_boring::connect(conn_config, domain, stream).await {
    Ok(s) => s,
    Err(err) => {
        error!("Error during websocket TLS handshake: {}", err.to_string());
        return ConnectionError::InternalError;
    }
};
let _ = send_startup(&mut ssl_stream, "db_user", "my_db").await;

Whence WebSocket

WebSocket is an application layer protocol that enables bidirectional communication between a client and server. Typically, to establish a WebSocket connection, a client initiates an HTTP request and indicates they wish to upgrade the connection to WebSocket via the “Upgrade” header. Then, once the client and server complete the handshake, both parties can send messages over the connection until one of them terminates it.

Now, it turns out that the way Cloudflare Tunnels work under the hood is that both ends of the tunnel want to speak WebSocket, and rely on a translation layer to convert all traffic to or from WebSocket. The cloudflared daemon you spin up within your private network handles this for us! For Hyperdrive, however, we did not have a suitable translation layer to send Postgres messages across WebSocket, and had to write one.

One of the (many) fantastic things about Rust traits is that the contract they present is very clear. To be AsyncRead, you just need to implement poll_read. To be AsyncWrite, you need to implement only three functions (poll_write, poll_flush, and poll_shutdown). Further, there is excellent support for WebSocket in Rust built on top of the tungstenite-rs library.

Thus, building our custom WebSocket stream such that it can share the same machinery as all our other generic streams just means translating the existing WebSocket support into these poll functions. There are some existing OSS projects that do this, but for multiple reasons we could not use the existing options. The primary reason is that Hyperdrive operates across multiple threads (thanks to the tokio runtime), and so we rely on our connections to also handle Send, Sync, and Unpin. None of the available solutions had all five traits handled. It turns out that most of them went with the paradigm of Sink and Stream, which provide a solid base from which to translate to AsyncRead and AsyncWrite. In fact some of the functions overlap, and can be passed through almost unchanged. For example, poll_flush and poll_shutdown have 1-to-1 analogs, and require almost no engineering effort to convert from Sink to AsyncWrite.

/// We use this struct to implement the traits we need on top of a WebSocketStream
pub struct HyperSocket<S>
where
    S: AsyncRead + AsyncWrite + Send + Sync + Unpin,
{
    inner: WebSocketStream<S>,
    read_state: Option<ReadState>,
    write_err: Option<Error>,
}

impl<S> AsyncWrite for HyperSocket<S>
where
    S: AsyncRead + AsyncWrite + Send + Sync + Unpin,
{
    fn poll_flush(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
        match ready!(Pin::new(&mut self.inner).poll_flush(cx)) {
            Ok(_) => Poll::Ready(Ok(())),
            Err(err) => Poll::Ready(Err(Error::new(ErrorKind::Other, err))),
        }
    }

    fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
        match ready!(Pin::new(&mut self.inner).poll_close(cx)) {
            Ok(_) => Poll::Ready(Ok(())),
            Err(err) => Poll::Ready(Err(Error::new(ErrorKind::Other, err))),
        }
    }
}

With that translation done, we can use an existing WebSocket library to upgrade our SslStream connection to a Cloudflare Tunnel, and wrap the result in our AsyncRead/AsyncWrite implementation. The result can then be used anywhere that our other transport streams would work, without any changes needed to the rest of our codebase!

That would look something like this:

let websocket = match tokio_tungstenite::client_async(request, ssl_stream).await {
    Ok(ws) => Ok(ws),
    Err(err) => {
        error!("Error during websocket conn setup: {}", err.to_string());
        return ConnectionError::InternalError;
    }
};
let websocket_stream = HyperSocket::new(websocket));
let _ = send_startup(&mut websocket_stream, "db_user", "my_db").await;

Access granted

An observant reader might have noticed that in the code example above we snuck in a variable named request that we passed in when upgrading from an SslStream to a WebSocketStream. This is for multiple reasons. The first reason is that Tunnels are assigned a hostname and use this hostname for routing. The second and more interesting reason is that (as mentioned above) when negotiating an upgrade from HTTP to WebSocket, a request must be sent to the server hosting the ingress side of the Tunnel to perform the upgrade. This is pretty universal, but we also add in an extra piece here.

At Cloudflare, we believe that secure defaults and defense in depth are the correct ways to build a better Internet. This is why traffic across Tunnels is encrypted, for example. However, that does not necessarily prevent unwanted traffic from being sent into your Tunnel, and therefore egressing out to your database. While Postgres offers a robust set of access control options for protecting your database, wouldn’t it be best if unwanted traffic never got into your private network in the first place?

To that end, all Tunnels set up for use with Hyperdrive should have a Zero Trust Access Application configured to protect them. These applications should use a Service Token to authorize connections. When setting up a new Hyperdrive, you have the option to provide the token’s ID and Secret, which will be encrypted and stored alongside the rest of your configuration. These will be presented as part of the WebSocket upgrade request to authorize the connection, allowing your database traffic through while preventing unwanted access.

This can be done within the request’s headers, and might look something like this:

let ws_url = format!("wss://{}", host);
let mut request = match ws_url.into_client_request() {
    Ok(req) => req,
    Err(err) => {
        error!(
            "Hostname {} could not be parsed into a valid request URL: {}", 
            host,
            err.to_string()
        );
        return ConnectionError::InternalError;
    }
};
request.headers_mut().insert(
    "CF-Access-Client-Id",
    http::header::HeaderValue::from_str(&client_id).unwrap(),
);
request.headers_mut().insert(
    "CF-Access-Client-Secret",
    http::header::HeaderValue::from_str(&client_secret).unwrap(),
);

Building for customer zero

If you’ve been reading the blog for a long time, some of this might sound a bit familiar. This isn’t the first time that we’ve sent Postgres traffic across a tunnel, it’s something most of us do from our laptops regularly. This works very well for interactive use cases with low traffic volume and a high tolerance for latency, but historically most of our products have not been able to employ the same approach.

Cloudflare operates many data centers around the world, and most services run in every one of those data centers. There are some tasks, however, that make the most sense to run in a more centralized fashion. These include tasks such as managing control plane operations, or storing configuration state. Nearly every Cloudflare product houses its control plane information in Postgres clusters run centrally in a handful of our data centers, and we use a variety of approaches for accessing that centralized data from elsewhere in our network. For example, many services currently use a push-based model to publish updates to Quicksilver, and work through the complexities implied by such a model. This has been a recurring challenge for any team looking to build a new product.

Hyperdrive’s entire reason for being is to make it easy to access such central databases from our global network. When we began exploring Tunnel integrations as a feature, many internal teams spoke up immediately and strongly suggested they’d be interested in using it themselves. This was an excellent opportunity for Cloudflare to scratch its own itch, while also getting a lot of traffic on a new feature before releasing it directly to the public. As always, being “customer zero” means that we get fast feedback, more reliability over time, stronger connections between teams, and an overall better suite of products. We jumped at the chance.

As we rolled out early versions of Tunnel integration, we worked closely with internal teams to get them access to it, and fixed any rough spots they encountered. We’re pleased to share that this first batch of teams have found great success building new or refactored products on Hyperdrive over Tunnels. For example: if you’ve already tried out Workers Builds, or recently submitted an abuse report, you’re among our first users! At the time of this writing, we have several more internal teams working to onboard, and we on the Hyperdrive team are very excited to see all the different ways in which fast and simple connections from Workers to a centralized database can help Cloudflare just as much as they’ve been helping our external customers.

Outro

Cloudflare is on a mission to make the Internet faster, safer, and more reliable. Hyperdrive was built to make connecting to centralized databases from the Workers runtime as quick and consistent as possible, and this latest development is designed to help all those who want to use Hyperdrive without directly exposing resources within their virtual private clouds (VPCs) on the public web.

To this end, we chose to build a solution around our suite of industry-leading Zero Trust tools, and were delighted to find how simple it was to implement in our runtime given the power and extensibility of the Rust trait system.

Without waiting for the ink to dry, multiple teams within Cloudflare have adopted this new feature to quickly and easily solve what have historically been complex challenges, and are happily operating it in production today.

And now, if you haven’t already, try setting up Hyperdrive across a Tunnel, and let us know what you think in the Hyperdrive Discord channel!

Durable Objects aren’t just durable, they’re fast: a 10x speedup for Cloudflare Queues

2024-10-24 Josh Wheeler

Post Syndicated from Josh Wheeler original https://blog.cloudflare.com/how-we-built-cloudflare-queues

Cloudflare Queues let a developer decouple their Workers into event-driven services. Producer Workers write events to a Queue, and consumer Workers are invoked to take actions on the events. For example, you can use a Queue to decouple an e-commerce website from a service which sends purchase confirmation emails to users. During 2024’s Birthday Week, we announced that Cloudflare Queues is now Generally Available, with significant performance improvements that enable larger workloads. To accomplish this, we switched to a new architecture for Queues that enabled the following improvements:

Median latency for sending messages has dropped from ~200ms to ~60ms
Maximum throughput for each Queue has increased over 10x, from 400 to 5000 messages per second
Maximum Consumer concurrency for each Queue has increased from 20 to 250 concurrent invocations

^{Median latency drops from ~200ms to ~60ms as Queues are migrated to the new architecture}

In this blog post, we’ll share details about how we built Queues using Durable Objects and the Cloudflare Developer Platform, and how we migrated from an initial Beta architecture to a geographically-distributed, horizontally-scalable architecture for General Availability.

v1 Beta architecture

When initially designing Cloudflare Queues, we decided to build something simple that we could get into users’ hands quickly. First, we considered leveraging an off-the-shelf messaging system such as Kafka or Pulsar. However, we decided that it would be too challenging to operate these systems at scale with the large number of isolated tenants that we wanted to support.

Instead of investing in new infrastructure, we decided to build on top of one of Cloudflare’s existing developer platform building blocks: Durable Objects. Durable Objects are a simple, yet powerful building block for coordination and storage in a distributed system. In our initial v1 architecture, each Queue was implemented using a single Durable Object. As shown below, clients would send messages to a Worker running in their region, which would be forwarded to the single Durable Object hosted in the WNAM (Western North America) region. We used a single Durable Object for simplicity, and hosted it in WNAM for proximity to our centralized configuration API service.

One of a Queue’s main responsibilities is to accept and store incoming messages. Sending a message to a v1 Queue used the following flow:

A client sends a POST request containing the message body to the Queues API at /accounts/:accountID/queues/:queueID/messages
The request is handled by an instance of the Queue Broker Worker in a Cloudflare data center running near the client.
The Worker performs authentication, and then uses Durable Objects idFromName API to route the request to the Queue Durable Object for the given queueID
The Queue Durable Object persists the message to storage before returning a success back to the client.

Durable Objects handled most of the heavy-lifting here: we did not need to set up any new servers, storage, or service discovery infrastructure. To route requests, we simply provided a queueID and the platform handled the rest. To store messages, we used the Durable Object storage API to put each message, and the platform handled reliably storing the data redundantly.

Consuming messages

The other main responsibility of a Queue is to deliver messages to a Consumer. Delivering messages in a v1 Queue used the following process:

Each Queue Durable Object maintained an alarm that was always set when there were undelivered messages in storage. The alarm guaranteed that the Durable Object would reliably wake up to deliver any messages in storage, even in the presence of failures. The alarm time was configured to fire after the user’s selected max wait time, if only a partial batch of messages was available. Whenever one or more full batches were available in storage, the alarm was scheduled to fire immediately.
The alarm would wake the Durable Object, which continually looked for batches of messages in storage to deliver.
Each batch of messages was sent to a “Dispatcher Worker” that used Workers for Platforms dynamic dispatch to pass the messages to the queue() function defined in a user’s Consumer Worker

This v1 architecture let us flesh out the initial version of the Queues Beta product and onboard users quickly. Using Durable Objects allowed us to focus on building application logic, instead of complex low-level systems challenges such as global routing and guaranteed durability for storage. Using a separate Durable Object for each Queue allowed us to host an essentially unlimited number of Queues, and provided isolation between them.

However, using only one Durable Object per queue had some significant limitations:

Latency: we created all of our v1 Queue Durable Objects in Western North America. Messages sent from distant regions incurred significant latency when traversing the globe.
Throughput: A single Durable Object is not scalable: it is single-threaded and has a fixed capacity for how many requests per second it can process. This is where the previous 400 messages per second limit came from.
Consumer Concurrency: Due to concurrent subrequest limits, a single Durable Object was limited in how many concurrent subrequests it could make to our Dispatcher Worker. This limited the number of queue() handler invocations that it could run simultaneously.

To solve these issues, we created a new v2 architecture that horizontally scales across multiple Durable Objects to implement each single high-performance Queue.

v2 Architecture

In the new v2 architecture for Queues, each Queue is implemented using multiple Durable Objects, instead of just one. Instead of a single region, we place Storage Shard Durable Objects in all available regions to enable lower latency. Within each region, we create multiple Storage Shards and load balance incoming requests amongst them. Just like that, we’ve multiplied message throughput.

Sending a message to a v2 Queue uses the following flow:

A client sends a POST request containing the message body to the Queues API at /accounts/:accountID/queues/:queueID/messages
The request is handled by an instance of the Queue Broker Worker running in a Cloudflare data center near the client.
The Worker:
- Performs authentication
- Reads from Workers KV to obtain a Shard Map that lists available storage shards for the given region and queueID
- Picks one of the region’s Storage Shards at random, and uses Durable Objects idFromName API to route the request to the chosen shard
The Storage Shard persists the message to storage before returning a success back to the client.

In this v2 architecture, messages are stored in the closest available Durable Object storage cluster near the user, greatly reducing latency since messages don’t need to be shipped all the way to WNAM. Using multiple shards within each region removes the bottleneck of a single Durable Object, and allows us to scale each Queue horizontally to accept even more messages per second. Workers KV acts as a fast metadata store: our Worker can quickly look up the shard map to perform load balancing across shards.

To improve the Consumer side of v2 Queues, we used a similar “scale out” approach. A single Durable Object can only perform a limited number of concurrent subrequests. In v1 Queues, this limited the number of concurrent subrequests we could make to our Dispatcher Worker. To work around this, we created a new Consumer Shard Durable Object class that we can scale horizontally, enabling us to execute many more concurrent instances of our users’ queue() handlers.

Consumer Durable Objects in v2 Queues use the following approach:

Each Consumer maintains an alarm that guarantees it will wake up to process any pending messages. v2 Consumers are notified by the Queue’s Coordinator (introduced below) when there are messages ready for consumption. Upon notification, the Consumer sets an alarm to go off immediately.
The Consumer looks at the shard map, which contains information about the storage shards that exist for the Queue, including the number of available messages on each shard.
The Consumer picks a random storage shard with available messages, and asks for a batch.
The Consumer sends the batch to the Dispatcher Worker, just like for v1 Queues.
After processing the messages, the Consumer sends another request to the Storage Shard to either “acknowledge” or “retry” the messages.

This scale-out approach enabled us to work around the subrequest limits of a single Durable Object, and increase the maximum supported concurrency level of a Queue from 20 to 250.

The Coordinator and “Control Plane”

So far, we have primarily discussed the “Data Plane” of a v2 Queue: how messages are load balanced amongst Storage Shards, and how Consumer Shards read and deliver messages. The other main piece of a v2 Queue is the “Control Plane”, which handles creating and managing all the individual Durable Objects in the system. In our v2 architecture, each Queue has a single Coordinator Durable Object that acts as the brain of the Queue. Requests to create a Queue, or change its settings, are sent to the Queue’s Coordinator.

The Coordinator maintains a Shard Map for the Queue, which includes metadata about all the Durable Objects in the Queue (including their region, number of available messages, current estimated load, etc.). The Coordinator periodically writes a fresh copy of the Shard Map into Workers KV, as pictured in step 1 of the diagram. Placing the shard map into Workers KV ensures that it is globally cached and available for our Worker to read quickly, so that it can pick a shard to accept the message.

Every shard in the system periodically sends a heartbeat to the Coordinator as shown in steps 2 and 3 of the diagram. Both Storage Shards and Consumer Shards send heartbeats, including information like the number of messages stored locally, and the current load (requests per second) that the shard is handling. The Coordinator uses this information to perform autoscaling. When it detects that the shards in a particular region are overloaded, it creates additional shards in the region, and adds them to the shard map in Workers KV. Our Worker sees the updated shard map and naturally load balances messages across the freshly added shards. Similarly, the Coordinator looks at the backlog of available messages in the Queue, and decides to add more Consumer shards to increase Consumer throughput when the backlog is growing. Consumer Shards pull messages from Storage Shards for processing as shown in step 4 of the diagram.

Switching to a new scalable architecture allowed us to meet our performance goals and take Queues to GA. As a recap, this new architecture delivered these significant improvements:

P50 latency for writing to a Queue has dropped from ~200ms to ~60ms.
Maximum throughput for a Queue has increased from 400 to 5000 messages per second.
Maximum consumer concurrency has increased from 20 to 250 invocations.

What’s next for Queues

We plan on leveraging the performance improvements in the new beta version of Durable Objects which use SQLite to continue to improve throughput/latency in Queues.
We will soon be adding message management features to Queues so that you can take actions to purge messages in a queue, pause consumption of messages, or “redrive”/move messages from one queue to another (for example messages that have been sent to a Dead Letter Queue could be “redriven” or moved back to the original queue).
Work to make Queues the “event hub” for the Cloudflare Developer Platform:
- Create a low-friction way for events emitted from other Cloudflare services with event schemas to be sent to Queues.
- Build multi-Consumer support for Queues so that Queues are no longer limited to one Consumer per queue.

To start using Queues, head over to our Getting Started guide.

Do distributed systems like Cloudflare Queues and Durable Objects interest you? Would you like to help build them at Cloudflare? We’re Hiring!

Billions and billions (of logs): scaling AI Gateway with the Cloudflare Developer Platform

2024-10-24 Catarina Pires Mota

Post Syndicated from Catarina Pires Mota original https://blog.cloudflare.com/billions-and-billions-of-logs-scaling-ai-gateway-with-the-cloudflare

With the rapid advancements occurring in the AI space, developers face significant challenges in keeping up with the ever-changing landscape. New models and providers are continuously emerging, and understandably, developers want to experiment and test these options to find the best fit for their use cases. This creates the need for a streamlined approach to managing multiple models and providers, as well as a centralized platform to efficiently monitor usage, implement controls, and gather data for optimization.

AI Gateway is specifically designed to address these pain points. Since its launch in September 2023, AI Gateway has empowered developers and organizations by successfully proxying over 2 billion requests in just one year, as we highlighted during September’s Birthday Week. With AI Gateway, developers can easily store, analyze, and optimize their AI inference requests and responses in real time.

With our initial architecture, AI Gateway faced a significant challenge: the logs, those critical trails of data interactions between applications and AI models, could only be retained for 30 minutes. This limitation was not just a minor inconvenience; it posed a substantial barrier for developers and businesses needing to analyze long-term patterns, ensure compliance, or simply debug over more extended periods.

In this post, we’ll explore the technical challenges and strategic decisions behind extending our log storage capabilities from 30 minutes to being able to store billions of logs indefinitely. We’ll discuss the challenges of scale, the intricacies of data management, and how we’ve engineered a system that not only meets the demands of today, but is also scalable for the future of AI development.

Background

AI Gateway is built on Cloudflare Workers, a serverless platform that runs on the Cloudflare network, allowing developers to write small JavaScript functions that can execute at the point of need, near the user, on Cloudflare’s vast network of data centers, without worrying about platform scalability.

Our customers use multiple providers and models and are always looking to optimize the way they do inference. And, of course, in order to evaluate their prompts, performance, cost, and to troubleshoot what’s going on, AI Gateway’s customers need to store requests and responses. New requests show up within 15 seconds and customers can check a request’s cost, duration, number of tokens, and provide their feedback (thumbs up or down).

This scales in a way where an account can have multiple gateways and each gateway has its own settings. In our first implementation, a backend worker was responsible for storing Real Time Logs and other background tasks. However, in the rapidly evolving domain of artificial intelligence, where real-time data is as precious as the insights it provides, managing log data efficiently becomes paramount. We recognized that to truly empower our users, we needed to offer a solution where logs weren’t just transient records but could be stored permanently. Permanent log storage means developers can now track the performance, security, and operational insights of their AI applications over time, enabling not only immediate troubleshooting but also longitudinal studies of AI behavior, usage trends, and system health.

The diagram above describes our old architecture, which could only store 30 minutes of data.

Tracing the path of a request through the AI Gateway, as depicted in the sequence above:

A developer sends a new inference request, which is first received by our Gateway Worker.
The Gateway Worker then performs several checks: it looks for cached results, enforces rate limits, and verifies any other configurations set by the user for their gateway. Provided all conditions are met, it forwards the request to the selected inference provider (in this diagram, OpenAI).
The inference provider processes the request and sends back the response.
Simultaneously, as the response is relayed back to the developer, the request and response details are also dispatched to our Backend Worker. This worker’s role is to manage and store the log of this transaction.

The challenge: Store two billion logs

First step: real-time logs

Initially, the AI Gateway project stored both request metadata and the actual request bodies in a D1 database. This approach facilitated rapid development in the project’s infancy. However, as customer engagement grew, the D1 database began to fill at an accelerating rate, eventually retaining logs for only 30 minutes at a time.

To mitigate this, we first optimized the database schema, which extended the log retention to one hour. However, we soon encountered diminishing returns due to the sheer volume of byte data from the request bodies. Post-launch, it became clear that a more scalable solution was necessary. We decided to migrate the request bodies to R2 storage, significantly alleviating the data load on D1. This adjustment allowed us to incrementally extend log retention to 24 hours.

Consequently, D1 functioned primarily as a log index, enabling users to search and filter logs efficiently. When users needed to view details or download a log, these actions were seamlessly proxied through to R2.

This dual-system approach provided us with the breathing room to contemplate and develop more sophisticated storage solutions for the future.

Second step: persistent logs and Durable Object transactional storage

As our traffic surged, we encountered a growing number of requests from customers wanting to access and compare older logs.

Upon learning that the Durable Objects team was seeking beta testers for their new Durable Objects with SQLite, we eagerly signed up.

Originally, we considered Durable Objects as the ideal solution for expanding our log storage capacity, which required us to shard the logs by a unique string. Initially, this string was the account ID, but during a mid-development load test, we hit a cap at 10 million logs per Durable Object. This limitation meant that each account could only support up to this number of logs.

Given our commitment to the DO migration, we saw an opportunity rather than a constraint. To overcome the 10 million log limit per account, we refined our approach to shard by both account ID and gateway name. This adjustment effectively raised the storage ceiling from 10 million logs per account to 10 million per gateway. With the default setting allowing each account up to 10 gateways, the potential storage for each account skyrocketed to 100 million logs.

This strategic pivot not only enabled us to store a significantly larger number of logs. But also enhanced our flexibility in gateway management. Now, when a gateway is deleted, we can simply remove the corresponding Durable Object.

Additionally, this sharding method isolates high-volume request scenarios. If one customer’s heavy usage slows down log insertion, it only impacts their specific Durable Object, thereby preserving performance for other customers.

Taking a glance at the revised architecture diagram, we replaced the Backend Worker with our newly integrated Durable Object. The rest of the request flow remains unchanged, including the concurrent response to the user and the interaction with the Durable Object, which occurs in the fourth step.

Leveraging Cloudflare’s network, our Gateway Worker operates near the user’s location, which in turn positions the user’s Durable Object close by. This proximity significantly enhances the speed of log insertion and query operations.

Third step: managing thousands of Durable Objects

As the number of users and requests on AI Gateway grows, managing each unique Durable Object (DO) becomes increasingly complex. New customers join continuously, and we needed an efficient method to track each DO, ensure users stay within their 10 gateway limit, and manage the storage capacity for free users.

To address these challenges, we introduced another layer of control with a new Durable Object we’ve named the Account Manager. The primary function of the Account Manager is straightforward yet crucial: it keeps user activities in check.

Here’s how it works: before any Gateway commits a new log to permanent storage, it consults the Account Manager. This check determines whether the gateway is allowed to insert the log based on the user’s current usage and entitlements. The Account Manager uses its own SQLite database to verify the total number of rows a user has and their service level. If all checks pass, it signals the Gateway that the log can be inserted. It was paramount to guarantee that this entire validation process occurred in the background, ensuring that the user experience remains seamless and uninterrupted.

The Account Manager stays updated by periodically receiving data from each Gateway’s Durable Object. Specifically, after every 1000 inference requests, the Gateway sends an update on its total rows to the Account Manager, which then updates its local records. This system ensures that the Account Manager has the most current data when making its decisions.

Additionally, the Account Manager is responsible for monitoring customer entitlements. It tracks whether an account is on a free or paid plan, how many gateways a user is permitted to create, and the log storage capacity allocated to each gateway.

Through these mechanisms, the Account Manager not only helps in maintaining system integrity but also ensures fair usage across all users of AI Gateway.

AI evaluations and Durable Objects sharding

As we continue to develop evaluations to fully automatic and, in the future, use Large Language Models (LLMs), we are now taking the first step towards this goal and launching the open beta phase of comprehensive AI evaluations, centered on Human-in-the-Loop feedback.

This feature empowers users to create bespoke datasets from their application logs, thereby enabling them to score and evaluate the performance, speed, and cost-effectiveness of their models, with a primary focus on LLMs and automated scoring, analyzing the performance of LLMs, providing developers with objective, data-driven insights to refine their models.

To do this, developers require a reliable logging mechanism that persists logs from multiple gateways, storing up to 100 million logs in total (10 million logs per gateway, across 10 gateways). This represents a significant volume of data, as each request made through the AI Gateway generates a log entry, with some log entries potentially exceeding 50 MB in size.

This necessity leads us to work on the expansion of log storage capabilities. Since log storage is limited to 10 million logs per gateway, in future iterations, we aim to scale this capacity by implementing sharded Durable Objects (DO), allowing multiple Durable Objects per gateway to handle and store logs. This scaling strategy will enable us to store significantly larger volumes of logs, providing richer data for evaluations (using LLMs as a judge or from user input), all through AI Gateway.

Coming Soon

We are working on improving our existing Universal Endpoint, the next step on an enhanced solution that builds on existing fallback mechanisms to offer greater resilience, flexibility, and intelligence in request management.

Currently, when a provider encounters an error or is unavailable, our system falls back to an alternative provider to ensure continuity. The improved Universal Endpoint takes this a step further by introducing automatic retry capabilities, allowing failed requests to be reattempted before fallback is triggered. This significantly improves reliability by handling transient errors and increasing the likelihood of successful request fulfillment. It will look something like this:

curl --location 'https://aig.example.com/' \
--header 'CF-AIG-TOKEN: Bearer XXXX' \
--header 'Content-Type: application/json' \
--data-raw '[
    {
        "id": "0001",
        "provider": "openai",
        "endpoint": "chat/completions",
        "headers": {
            "Authorization": "Bearer XXXX",
            "Content-Type": "application/json"
        },
        "query": {
            "model": "gpt-3.5-turbo",
            "messages": [
                {
                    "role": "user",
                    "content": "generate a prompt to create cloudflare random images"
                }
            ]
        },
        "option": {
            "retry": 2,
            "delay": 200,
            "onComplete": {
                "provider": "workers-ai",
                "endpoint": "@cf/stabilityai/stable-diffusion-xl-base-1.0",
                "headers": {
                    "Authorization": "Bearer A5UFQkHewHF1-sA3hTVQFaPxRuu5wmS0eJcCS_MC",
                    "Content-Type": "application/json"
                },
                "query": {
                    "messages": [
                        {
                            "role": "user",
                            "content": "<prompt-response id='\''0001'\'' />"
                        }
                    ]
                }
            }
        }
    },
    {
        "provider": "workers-ai",
        "endpoint": "@cf/stabilityai/stable-diffusion-xl-base-1.0",
        "headers": {
            "Authorization": "Bearer XXXXXX",
            "Content-Type": "application/json"
        },
        "query": {
            "messages": [
                {
                    "role": "user",
                    "content": "create a image of a missing cat"
                }
            ]
        }
    }
]'

The request to the improved Universal Endpoint system demonstrates how it handles multiple providers with integrated retry mechanisms and fallback logic. In this example, the first request is sent to a provider like OpenAI, asking it to generate a text-to-image prompt. The “retry” option ensures that transient issues don’t result in immediate failure.

The system’s ability to seamlessly switch between providers while applying retry strategies ensures higher reliability and robustness in managing requests. By leveraging fallback logic, the Improved Universal Endpoint can dynamically adapt to provider failures, ensuring that tasks are completed successfully even in complex, multi-step workflows.

In addition to retry logic, we will have the ability to inspect requests and responses and make dynamic decisions based on the content of the result. This enables developers to create conditional workflows where the system can adapt its behavior depending on the nature of the response, creating a highly flexible and intelligent decision-making process.

If you haven’t yet used AI Gateway, check out our developer documentation on how to get started. If you have any questions, reach out on our Discord channel.

Build durable applications on Cloudflare Workers: you write the Workflows, we take care of the rest

2024-10-24 Sid Chatterjee

Post Syndicated from Sid Chatterjee original https://blog.cloudflare.com/building-workflows-durable-execution-on-workers

Workflows, Cloudflare’s durable execution engine that allows you to build reliable, repeatable multi-step applications that scale for you, is now in open beta. Any developer with a free or paid Workers plan can build and deploy a Workflow right now: no waitlist, no sign-up form, no fake line around-the-block.

If you learn by doing, you can create your first Workflow via a single command (or visit the docs for the full guide):

npm create cloudflare@latest workflows-starter -- \
  --template "cloudflare/workflows-starter"

Open the src/index.ts file, poke around, start extending it, and deploy it with a quick wrangler deploy.

If you want to learn more about how Workflows works, how you can use it to build applications, and how we built it, read on.

Workflows? Durable Execution?

Workflows—which we announced back during Developer Week earlier this year—is our take on the concept of “Durable Execution”: the ability to build and execute applications that are durable in the face of errors, network issues, upstream API outages, rate limits, and (most importantly) infrastructure failure.

As over 2.4 million developers continue to build applications on top of Cloudflare Workers, R2, and Workers AI, we’ve noticed more developers building multi-step applications and workflows that process user data, transform unstructured data into structured, export metrics, persist state as they progress, and automatically retry & restart. But writing any non-trivial application and making it durable in the face of failure is hard: this is where Workflows comes in. Workflows manages the retries, emitting the metrics, and durably storing the state (without you having to stand up your own database) as the Workflow progresses.

What makes Workflows different from other takes on “Durable Execution” is that we manage the underlying compute and storage infrastructure for you. You’re not left managing a compute cluster and hoping it scales both up (on a Monday morning) and down (during quieter periods) to manage costs, or ensuring that you have compute running in the right locations. Workflows is built on Cloudflare Workers — our job is to run your code and operate the infrastructure for you.

As an example of how Workflows can help you build durable applications, assume you want to post-process file uploads from your users that were uploaded to an R2 bucket directly via a pre-signed URL. That post-processing could involve multiple actions: text extraction via a Workers AI model, calls to a third-party API to validate data, updating or querying rows in a database once the file has been processed… the list goes on.

But what each of these actions has in common is that it could fail. Maybe that upstream API is unavailable, maybe you get rate-limited, maybe your database is down. Having to write extensive retry logic around each action, manage backoffs, and (importantly) ensure your application doesn’t have to start from scratch when a later step fails is more boilerplate to write and more code to test and debug.

What’s a step, you ask? The core building block of every Workflow is the step: an individually retriable component of your application that can optionally emit state. That state is then persisted, even if subsequent steps were to fail. This means that your application doesn’t have to restart, allowing it to not only recover more quickly from failure scenarios, but it can also avoid doing redundant work. You don’t want your application hammering an expensive third-party API (or getting you rate limited) because it’s naively retrying an API call that you don’t have to.

export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
	async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
		const files = await step.do('my first step', async () => {
			return {
				inputParams: event,
				files: [
					'doc_7392_rev3.pdf',
					'report_x29_final.pdf',
					'memo_2024_05_12.pdf',
					'file_089_update.pdf',
					'proj_alpha_v2.pdf',
					'data_analysis_q2.pdf',
					'notes_meeting_52.pdf',
					'summary_fy24_draft.pdf',
				],
			};
		});

		// Other steps...
	}
}

Notably, a Workflow can have hundreds of steps: one of the Rules of Workflows is to encapsulate every API call or stateful action within your application into its own step. Each step can also define its own retry strategy, automatically backing off, adding a delay and/or (eventually) giving up after a set number of attempts.

await step.do(
	'make a call to write that could maybe, just might, fail',
	// Define a retry strategy
	{
		retries: {
			limit: 5,
			delay: '5 seconds',
			backoff: 'exponential',
		},
		timeout: '15 minutes',
	},
	async () => {
		// Do stuff here, with access to the state from our previous steps
		if (Math.random() > 0.5) {
			throw new Error('API call to $STORAGE_SYSTEM failed');
		}
	},
);

To illustrate this further, imagine you have an application that reads text files from an R2 storage bucket, pre-processes the text into chunks, generates text embeddings using Workers AI, and then inserts those into a vector database (like Vectorize) for semantic search.

In the Workflows programming model, each of those is a discrete step, and each can emit state. For example, each of the four actions below can be a discrete step.do call in a Workflow:

Reading the files from storage and emitting the list of filenames
Chunking the text and emitting the results
Generating text embeddings
Upserting them into Vectorize and capturing the result of a test query

You can also start to imagine that some steps, such as chunking text or generating text embeddings, can be broken down into even more steps — a step per file that we chunk, or a step per API call to our text embedding model, so that our application is even more resilient to failure.

Steps can be created programmatically or conditionally based on input, allowing you to dynamically create steps based on the number of inputs your application needs to process. You do not need to define all steps ahead of time, and each instance of a Workflow may choose to conditionally create steps on the fly.

Building Cloudflare on Cloudflare

As the Cloudflare Developer platform continues to grow, almost all of our own products are built on top of it. Workflows is yet another example of how we built a new product from scratch using nothing but Workers and its vast catalog of features and APIs. This section of the blog has two goals: to explain how we built it, and to demonstrate that anyone can create a complex application or platform with demanding requirements and multiple architectural layers on our stack, too.

If you’re wondering how Workflows manages to make durable execution easy, how it persists state, and how it automatically scales: it’s because we built it on Cloudflare Workers, including the brand-new zero-latency SQLite storage we recently introduced to Durable Objects.

To understand how Workflows uses Workers & Durable Objects, here’s the high-level overview of our architecture:

There are three main blocks in this diagram:

The user-facing APIs are where the user interacts with the platform, creating and deploying new workflows or instances, controlling them, and accessing their state and activity logs. These operations can be executed through our public API gateway using REST calls, a Worker script using bindings, Wrangler (Cloudflare’s developer platform command line tool), or via the Dashboard user interface.

The managed platform holds the internal configuration APIs running on a Worker implementing a catalog of REST endpoints, the binding shim, which is supported by another dedicated Worker, every account controller, and their correspondent workflow engines, all powered by SQLite-backed Durable Objects. This is where all the magic happens and what we are sharing more details about in this technical blog.

Finally, there are the workflow instances, essentially independent clones of the workflow application. Instances are user account-owned and have a one-to-one relationship with a managed engine that powers them. You can run as many instances and engines as you want concurrently.

Let’s get into more detail…

Configuration API and Binding Shim

The Configuration API and the Binding Shim are two stateless Workers; one receives REST API calls from clients calling our API Gateway directly, using Wrangler, or navigating the Dashboard UI, and the other is the endpoint for the Workflows binding, an efficient and authenticated interface to interact with the Cloudflare Developer Platform resources from a Workers script.

The configuration API worker uses HonoJS and Zod to implement the REST endpoints, which are declared in an OpenAPI schema and exported to our API Gateway, thus adding our methods to the Cloudflare API catalog.

import { swaggerUI } from '@hono/swagger-ui';
import { createRoute, OpenAPIHono, z } from '@hono/zod-openapi';
import { Hono } from 'hono';

...

api.openapi(
  createRoute({
    method: 'get',
    path: '/',
    request: {
      query: PaginationParams,
    },
    responses: {
      200: {
        content: {
          'application/json': {
             schema: APISchemaSuccess(z.array(WorkflowWithInstancesCountSchema)),
          },
        },
        description: 'List of all Workflows belonging to a account.',
      },
    },
  }),
  async (ctx) => {
    ...
  },
);

...

api.route('/:workflow_name', routes.workflows);
api.route('/:workflow_name/instances', routes.instances);
api.route('/:workflow_name/versions', routes.versions);

These Workers perform two different functions, but they share a large portion of their code and implement similar logic; once the request is authenticated and ready to travel to the next stage, they use the account ID to delegate the operation to a Durable Object called Account Controller.

// env.ACCOUNTS is the Account Controllers Durable Objects namespace
const accountStubId = c.env.ACCOUNTS.idFromName(accountId.toString());
const accountStub = c.env.ACCOUNTS.get(accountStubId);

As you can see, every account has its own Account Controller Durable Object.

Account Controllers

The Account Controller is a dedicated persisted database that stores the list of all the account’s workflows, versions, and instances. We scale to millions of account controllers, one per every Cloudflare account using Workflows, by leveraging the power of Durable Objects with SQLite backend.

Durable Objects (DOs) are single-threaded singletons that run in our data centers and are bound to a stateful storage API, in this case, SQLite. They are also Workers, just a special kind, and have access to all of our other APIs. This makes it easy to build consistent, highly available distributed applications with them.

Here’s what we get for free by using one Durable Object per Workflows account:

Sharding based on account boundaries aligns perfectly with the way we manage resources at Cloudflare internally. Also, due to the nature of DOs, there are other things that this model gets us for free: Not that we expect them, but eventual bugs or state inconsistencies during beta are confined to the affected account, and don’t impact everyone.
DO instances run close to the end user; Alice is in London and will call the config API through our LHR data center, while Bob is in Lisbon and will connect to LIS.
Because every account is a Worker, we can gradually upgrade them to new versions, starting with the internal users, thus derisking real customers.

Before SQLite, our only option was to use the Durable Object’s key-value storage API, but having a relational database at our fingertips and being able to create tables and do complex queries is a significant enabler. For example, take a look at how we implement the internal method getWorkflow():

async function getWorkflow(accountId: number, workflowName: string) {
  try {
    const res = this.ctx.storage.transactionSync(() => {
      const cursor = Array.from(
        this.ctx.storage.sql.exec(
          `
                    SELECT *,
                    (SELECT class_name
                        FROM   versions
                        WHERE  workflow_id = w.id
                        ORDER  BY created_on DESC
                        LIMIT  1) AS class_name
                    FROM   workflows w
                    WHERE  w.name = ? 
                    `,
          workflowName
        )
      )[0] as Workflow;

      return cursor;
    });

    this.sendAnalytics(accountId, begin, "getWorkflow");
    return res as Workflow | undefined;
  } catch (err) {
    this.sendErrorAnalytics(accountId, begin, "getWorkflow");
    throw err;
  }
}

The other thing we take advantage of in Workflows is using the recently announced JavaScript-native RPC feature when communicating between components.

Before RPC, we had to fetch() between components, make HTTP requests, and serialize and deserialize the parameters and the payload. Now, we can async call the remote object’s method as if it was local. Not only does this feel more natural and simplify our logic, but it’s also more efficient, and we can take advantage of TypeScript type-checking when writing code.

This is how the Configuration API would call the Account Controller’s countWorkflows() method before:

const resp = await accountStub.fetch(
      "https://controller/count-workflows",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/json; charset=utf-8",
        },
        body: JSON.stringify({ accountId }),
      },
    );

if (!resp.ok) {
  return new Response("Internal Server Error", { status: 500 });
}

const result = await resp.json();
const total_count = result.total_count;

This is how we do it using RPC:

const total_count = await accountStub.countWorkflows(accountId);

The other powerful feature of our RPC system is that it supports passing not only Structured Cloneable objects back and forth but also entire classes. More on this later.

Let’s move on to Engine.

Engine and instance

Every instance of a workflow runs alongside an Engine instance. The Engine is responsible for starting up the user’s workflow entry point, executing the steps on behalf of the user, handling their results, and tracking the workflow state until completion.

When we started thinking about the Engine, we thought about modeling it after a state machine, and that was what our initial prototypes looked like. However, state machines require an ahead-of-time understanding of the userland code, which implies having a build step before running them. This is costly at scale and introduces additional complexity.

A few iterations later, we had another idea. What if we could model the engine as a game loop?

Unlike other computer programs, games operate regardless of a user’s input. The game loop is essentially a sequence of tasks that implement the game’s logic and update the display, typically one loop per video frame. Here’s an example of a game loop in pseudo-code:

while (game in running)
    check for user input
    move graphics
    play sounds
end while

Well, an oversimplified version of our Workflow engine would look like this:

while (last step not completed)
    iterate every step
       use memoized cache as response if the step has run already
       continue running step or timer if it hasn't finished yet
end while

A workflow is indeed a loop that keeps on going, performing the same sequence of logical tasks until the last step completes.

The Engine and the instance run hand-in-hand in a one-to-one relationship. The first is managed, and part of the platform. It uses SQLite and other platform APIs internally, and we can constantly add new features, fix bugs, and deploy new versions, while keeping everything transparent to the end user. The second is the actual account-owned Worker script that declares the Workflow steps.

For example, when someone passes a callback into step.do():

export class MyWorkflow extends WorkflowEntrypoint<Env, Params> {
  async run(event: WorkflowEvent<Params>, step: WorkflowStep) {
    step.do('step1', () => { ... });
  }
}

We switch execution over to the Engine. Again, this is possible because of the power of JS RPC. Besides passing Structured Cloneable objects back and forth, JS RPC allows us to create and pass entire application-defined classes that extend the built-in RpcTarget. So this is what happens behind the scenes when your Instance calls step.do() (simplified):

export class Context extends RpcTarget {

  async do<T>(name: string, callback: () => Promise<T>): Promise<T> {

    // First we check we have a cache of this step.do() already
    const maybeResult = await this.#state.storage.get(name);

    // We return the cache if it exists
    if (maybeValue) { return maybeValue; }

    // Else we run the user callback
    return doWrapper(callback);
  }

}

Here’s a more complete diagram of the Engine’s step.do() lifecycle:

Again, this diagram only partially represents everything we do in the Engine; things like logging for observability or handling exceptions are missing, and we don’t get into the details of how queuing is implemented. However, it gives you a good idea of how the Engine abstracts and handles all the complexities of completing a step under the hood, allowing us to expose a simple-to-use API to end users.

Also, it’s worth reiterating that every workflow instance is an Engine behind the scenes, and every Engine is an SQLite-backed Durable Object. This ensures that every instance runtime and state are isolated and independent of each other and that we can effortlessly scale to run billions of workflow instances, a solved problem for Durable Objects.

Durability

Durable Execution is all the rage now when we talk about workflow engines, and ours is no exception. Workflows are typically long-lived processes that run multiple functions in sequence where anything can happen. Those functions can time out or fail because of a remote server error or a network issue and need to be retried. A workflow engine ensures that your application runs smoothly and completes regardless of the problems it encounters.

Durability means that if and when a workflow fails, the Engine can re-run it, resume from the last recorded step, and deterministically re-calculate the state from all the successful steps’ cached responses. This is possible because steps are stateful and idempotent; they produce the same result no matter how many times we run them, thus not causing unintended duplicate effects like sending the same invoice to a customer multiple times.

We ensure durability and handle failures and retries by sharing the same technique we use for a step.sleep() that requires sleeping for days or months: a combination of using scheduler.wait(), a method of the upcoming WICG Scheduling API that we already support, and Durable Objects alarms, which allow you to schedule the Durable Object to be woken up at a time in the future.

These two APIs allow us to overcome the lack of guarantees that a Durable Object runs forever, giving us complete control of its lifecycle. Since every state transition through userland code persists in the Engine’s strongly consistent SQLite, we track timestamps when a step begins execution, its attempts (if it needs retries), and its completion.

This means that steps pending if a Durable Object is evicted — perhaps due to a two-month-long timer — get rerun on the next lifetime of the Engine (with its cache from the previous lifetime hydrated) that is triggered by an alarm set with the timestamp of the next expected state transition.

Real-life workflow, step by step

Let’s walk through an example of a real-life application. You run an e-commerce website and would like to send email reminders to your customers for forgotten carts that haven’t been checked out in a few days.

What would typically have to be a combination of a queue, a cron job, and querying a database table periodically can now simply be a Workflow that we start on every new cart:

import {
  WorkflowEntrypoint,
  WorkflowEvent,
  WorkflowStep,
} from "cloudflare:workers";
import { sendEmail } from "./legacy-email-provider";

type Params = {
  cartId: string;
};

type Env = {
  DB: D1Database;
};

export class Purchase extends WorkflowEntrypoint<Env, Params> {
  async run(
    event: WorkflowEvent<Params>,
    step: WorkflowStep
  ): Promise<unknown> {
    await step.sleep("wait for three days", "3 days");

    // Retrieve cart from D1
    const cart = await step.do("retrieve cart from database", async () => {
      const { results } = await this.env.DB.prepare(`SELECT * FROM cart WHERE id = ?`)
        .bind(event.payload.cartId)
        .all();
      return results[0];
    });

    if (!cart.checkedOut) {
      await step.do("send an email", async () => {
        await sendEmail("reminder", cart);
      });
    }
  }
}

This works great. However, sometimes the sendEmail function fails due to an upstream provider erroring out. While step.do automatically retries with a reasonable default configuration, we can define our settings:

if (cart.isComplete) {
  await step.do(
    "send an email",
    {
      retries: {
        limit: 5,
        delay: "1 min",
        backoff: "exponential",
      },
    },
    async () => {
      await sendEmail("reminder", cart);
    }
  );
}

Managing Workflows

Workflows allows us to create and manage workflows using four different interfaces:

Using our REST HTTP API available on Cloudflare’s API catalog
Using Wrangler, Cloudflare’s developer platform command-line tool
Programmatically inside a Worker using bindings
Using our Web UI in the dashboard

The HTTP API makes it easy to trigger new instances of workflows from any system, even if it isn’t on Cloudflare, or from the command line. For example:

curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/workflows/purchase-workflow/instances/$CART_INSTANCE_ID \
  --header 'Authorization: Bearer $ACCOUNT_TOKEN \
  --header 'Content-Type: application/json' \
  --data '{
	"id": "$CART_INSTANCE_ID",
	"params": {
		"cartId": "f3bcc11b-2833-41fb-847f-1b19469139d1"
	}
  }'

Wrangler goes one step further and gives us a friendlier set of commands to interact with workflows with fancy formatted outputs without needing to authenticate with tokens. Type npx wrangler workflows for help, or:

npx wrangler workflows trigger purchase-workflow '{ "cartId": "f3bcc11b-2833-41fb-847f-1b19469139d1" }'

Furthermore, Workflows has first-party support in wrangler, and you can test your instances locally. A Workflow is similar to a regular WorkerEntrypoint in your Worker, which means that wrangler dev just naturally works.

❯ npx wrangler dev

 ⛅️ wrangler 3.82.0
----------------------------

Your worker has access to the following bindings:
- Workflows:
  - CART_WORKFLOW: EcommerceCartWorkflow
⎔ Starting local server...
[wrangler:inf] Ready on http://localhost:8787
╭───────────────────────────────────────────────╮
│  [b] open a browser, [d] open devtools        │
╰───────────────────────────────────────────────╯

Workflow APIs are also available as a Worker binding. You can interact with the platform programmatically from another Worker script in the same account without worrying about permissions or authentication. You can even have workflows that call and interact with other workflows.

import { WorkerEntrypoint } from "cloudflare:workers";

type Env = { DEMO_WORKFLOW: Workflow };
export default class extends WorkerEntrypoint<Env> {
  async fetch() {
    // Pass in a user defined name for this instance
    // In this case, we use the same as the cartId
    const instance = await this.env.DEMO_WORKFLOW.create({
      id: "f3bcc11b-2833-41fb-847f-1b19469139d1",
      params: {
          cartId: "f3bcc11b-2833-41fb-847f-1b19469139d1",
      }
    });
  }
  async scheduled() {
    // Restart errored out instances in a cron
    const instance = await this.env.DEMO_WORKFLOW.get(
      "f3bcc11b-2833-41fb-847f-1b19469139d1"
    );
    const status = await instance.status();
    if (status.error) {
      await instance.restart();
    }
  }
}

Observability

Having good observability and data on often long-lived asynchronous tasks is crucial to understanding how we’re doing under normal operation and, more importantly, when things go south, and we need to troubleshoot problems or when we are iterating on code changes.

We designed Workflows around the philosophy that there is no such thing as too much logging. You can get all the SQLite data for your workflow and its instances by calling the REST APIs. Here is the output of an instance:

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "status": "running",
    "params": {},
    "trigger": { "source": "api" },
    "versionId": "ae042999-39ff-4d27-bbcd-22e03c7c4d02",
    "queued": "2024-10-21 17:15:09.350",
    "start": "2024-10-21 17:15:09.350",
    "end": null,
    "success": null,
    "steps": [
      {
        "name": "send email",
        "start": "2024-10-21 17:15:09.411",
        "end": "2024-10-21 17:15:09.678",
        "attempts": [
          {
            "start": "2024-10-21 17:15:09.411",
            "end": "2024-10-21 17:15:09.678",
            "success": true,
            "error": null
          }
        ],
        "config": {
          "retries": { "limit": 5, "delay": 1000, "backoff": "constant" },
          "timeout": "15 minutes"
        },
        "output": "[email protected]",
        "success": true,
        "type": "step"
      },
      {
        "name": "sleep-1",
        "start": "2024-10-21 17:15:09.763",
        "end": "2024-10-21 17:17:09.763",
        "finished": false,
        "type": "sleep",
        "error": null
      }
    ],
    "error": null,
    "output": null
  }
}

As you can see, this is essentially a dump of the instance engine SQLite in JSON. You have the errors, messages, current status, and what happened with every step, all time stamped to the millisecond.

It’s one thing to get data about a specific workflow instance, but it’s another to zoom out and look at aggregated statistics of all your workflows and instances over time. Workflows data is available through our GraphQL Analytics API, so you can query it in aggregate and generate valuable insights and reports. In this example we ask for aggregated analytics about the wall time of all the instances of the “e-commerce-carts” workflow:

{
  viewer {
    accounts(filter: { accountTag: "febf0b1a15b0ec222a614a1f9ac0f0123" }) {
      wallTime: workflowsAdaptiveGroups(
        limit: 10000
        filter: {
          datetimeHour_geq: "2024-10-20T12:00:00.000Z"
          datetimeHour_leq: "2024-10-21T12:00:00.000Z"
          workflowName: "e-commerce-carts"
        }
        orderBy: [count_DESC]
      ) {
        count
        sum {
          wallTime
        }
        dimensions {
          date: datetimeHour
        }
      }
    }
  }
}

For convenience, you can evidently also use Wrangler to describe a workflow or an instance and get an instant and beautifully formatted response:

sid ~ npx wrangler workflows instances describe purchase-workflow latest

 ⛅️ wrangler 3.80.4

Workflow Name:         purchase-workflow
Instance Id:           d4280218-7756-41d2-bccd-8d647b82d7ce
Version Id:            0c07dbc4-aaf3-44a9-9fd0-29437ed11ff6
Status:                ✅ Completed
Trigger:               🌎 API
Queued:                14/10/2024, 16:25:17
Success:               ✅ Yes
Start:                 14/10/2024, 16:25:17
End:                   14/10/2024, 16:26:17
Duration:              1 minute
Last Successful Step:  wait for three days
Output:                false
Steps:

  Name:      wait for three days
  Type:      💤 Sleeping
  Start:     14/10/2024, 16:25:17
  End:       17/10/2024, 16:25:17
  Duration:  3 day

And finally, we worked really hard to get you the best dashboard UI experience when navigating Workflows data.

So, how much does it cost?

It’d be painful if we introduced a powerful new way to build Workers applications but made it cost prohibitive.

Workflows is priced just like Cloudflare Workers, where we introduced CPU-based pricing: only on active CPU time and requests, not duration (aka: wall time).

^{Workers Standard pricing model}

This is especially advantageous when building the long-running, multi-step applications that Workflows enables: if you had to pay while your Workflow was sleeping, waiting on an event, or making a network call to an API, writing the “right” code would be at odds with writing affordable code.

There’s also no need to keep a Kubernetes cluster or a group of virtual machines running (and burning a hole in your wallet): we manage the infrastructure, and you only pay for the compute your Workflows consume.

What’s next?

Today, after months of developing the platform, we are announcing the open beta program, and we couldn’t be more excited to see how you will be using Workflows. Looking forward, we want to do things like triggering instances from queue messages and have other ideas, but at the same time, we are certain that your feedback will help us shape the roadmap ahead.

We hope that this blog post gets you thinking about how to use Workflows for your next application, but also that it inspires you on what you can build on top of Workers. Workflows as a platform is entirely built on top of Workers, its resources, and APIs. Anyone can do it, too.

To chat with the team and other developers building on Workflows, join the #workflows-beta channel on the Cloudflare Developer Discord, and keep an eye on the Workflows changelog during the beta. Otherwise, visit the Workflows tutorial to get started.

If you’re an engineer, look for opportunities to work with us and help us improve Workflows or build other products.

The story of web framework Hono, from the creator of Hono

2024-10-17 Yusuke Wada

Post Syndicated from Yusuke Wada original https://blog.cloudflare.com/the-story-of-web-framework-hono-from-the-creator-of-hono

Hono is a fast, lightweight web framework that runs anywhere JavaScript does, built with Web Standards. Of course, it runs on Cloudflare Workers.

It was three years ago, in December 2021. At that time, I wanted to create applications for Cloudflare Workers, but the code became verbose without using a framework, and couldn’t find a framework that suited my needs. Itty-router was very nice but too simple. Worktop and Sunder did the same things I wanted to do, but their APIs weren’t quite to my liking. I was also interested in creating a router — a program that determines which action is executed based on the HTTP method and URL path of the Request — made of a Trie tree structure because it’s fast. So, I started building a web framework with a Trie tree-based router.

“While trying to create my applications, I ended up creating my framework for them.” — a classic example of yak shaving. However, Hono is now used by many developers, including Cloudflare, which uses Hono in core products. So, this journey into the depths of yak shaving was ultimately meaningful.

Write once, run anywhere

Hono truly runs anywhere — not just on Cloudflare Workers. I’ll discuss why later in the post, but Hono also runs on Deno, Bun, and Node.js. This is because Hono does not depend on external libraries, but uses only the Web Standards API, and each runtime supports Web Standards.

It’s a delight for developers to know that the same code can run across different runtimes. For instance, the following src/index.ts code will run on Cloudflare Workers, Deno, and Bun.

import { Hono } from 'hono'

const app = new Hono()
app.get('/hello', (c) => c.text('Hello Hono!'))

export default app

To run it on Cloudflare Workers, you execute the Wrangler command:

wrangler dev src/index.ts

The same code works on Deno:

deno serve src/index.ts

And it works on Bun too:

bun run src/index.ts

This is only a simple “Hello World” example, but more complex applications with middleware and helpers that are discussed below can be run on Cloudflare Workers or the other runtimes. As proof of this, almost all our test code for Hono itself can run the same way on these runtimes. This is a genuine “write once, run anywhere” experience.

Who is using Hono?

Hono is now used by many developers and companies. For example, Unkey deploys their application built with Hono’s OpenAPI feature to Cloudflare Workers. The following is a list of companies using Hono, based on my survey “Who is using Hono in production?”.

There are many, many more companies not listed here. And major web services or libraries, such as Prisma, Resend, Vercel AI SDK, Supabase, and Upstash, use Hono in their examples. There are also several influencers who like Hono and use it as an alternative to Express.

Of course, at Cloudflare, we also use Hono. D1 uses Hono for the internal Web API running on Workers. Workers Logs is based on code from Baselime (acquired by Cloudflare) and uses Hono to migrate the applications from their original infrastructure to Cloudflare Workers. All Workers Logs internal or customer-facing APIs are run on Workers using Hono. We also use Hono as part of the internals of many other products, such as KV and Queues.

Why are you making a “multi-runtime” framework?

You might wonder “Why is an employee of Cloudflare creating a framework that runs everywhere?” Initially, Hono was designed to work exclusively with Cloudflare Workers. However, starting with version 2, I added support for Deno and Bun. This was a very wise decision. If Hono had been targeted only at Cloudflare Workers, it might not have attracted as many users. By running on more runtimes, it gains more users, leading to the discovery of bugs and receiving more feedback, which ultimately leads to higher quality software.

Hono and Cloudflare are a perfect combo

The combination of Hono and Cloudflare offers a delightful developer experience.

Many websites, including our Cloudflare Docs, introduce the following “vanilla” JavaScript as a “Hello World” for Cloudflare Workers:

export default {
  fetch: () => {
    return new Response('Hello World!')
  }
}

This is primitive and good for understanding the Workers principle. However, if you want to create an endpoint that “returns a JSON response for GET requests that come to /books“, you need to write something like this:

export default {
  fetch: (req) => {
    const url = new URL(req.url)
    if (req.method === 'GET' && url.pathname === '/books') {
      return Response.json({
        ok: true
      })
    }
    return Response.json(
      {
        ok: false
      },
      {
        status: 404
      }
    )
  }
}

If you use Hono, you can write it like the following:

import { Hono } from 'hono'

const app = new Hono()

app.get('/books', (c) => {
  return c.json({
    ok: true
  })
})

export default app

It is short. And you can understand that “it handles GET accesses to /books” intuitively.

If you want to handle GET requests to /authors/yusuke and get “yusuke” from the path — “yusuke” is variable, you have to add something more complicated. The below is “vanilla” JavaScript example:

if (req.method === 'GET') {
  const match = url.pathname.match(/^\/authors\/([^\/]+)/)
  if (match) {
    const author = match[1]
    return Response.json({
      Author: author
    })
  }
}

If you use Hono, you don’t need if statements. Just add the endpoint definition to the app. Also, you don’t need to write a regular expression to get “yusuke”. You can get it with the function c.req.param():

app.get('/authors/:name', (c) => {
  const author = c.req.param('name')
  return c.json({
    Author: author
  })
})

One or two routes may be fine, but any more than that and maintenance becomes tricky. Code becomes more complex and bugs are harder to find. Using Hono, the code is very neat.

It is also easy to handle bindings to Cloudflare products, such as KV, R2, D1, etc. as Hono uses a “context model”. A context is a container that holds the application’s state until a request is received, and a response is returned. You can use a context to retrieve a request object, set response headers, and create custom variables. It also holds Cloudflare bindings. For example, if you set up a Cloudflare KV namespace with the name MY_KV, you can access it as follows, with TypeScript type completion.

import { Hono } from 'hono'

type Env = {
  Bindings: {
    MY_KV: KVNamespace
  }
}

const app = new Hono<Env>()

app.post('/message', async (c) => {
  const message = c.req.query('message') ?? 'Hi'
  await c.env.MY_KV.put('message', message)
  return c.text(`message is set`, 201)
})

Hono lets you write code in a simple and intuitive way, but that doesn’t mean there are limitations. You can do everything possible with Cloudflare Workers using Hono.

Add it when you want to use it

Hono is tiny. With the smallest preset, hono/tiny, you can write a “Hello World” application in just 12 KB. This is because it uses only the Web Standards API built into the runtime and has minimal functions. In comparison, the bundle size of Express is 579 KB.

However, there is much that you can do.

You can easily add functions using middleware. For example, it is a bit tedious to implement Basic Authentication from scratch, but with the built-in Basic Auth middleware, you can apply Basic Authentication to the path /auth/page with just this:

import { Hono } from 'hono'
import { basicAuth } from 'hono/basic-auth'

const app = new Hono()

app.use(
  '/auth/*',
  basicAuth({
    username: 'hono',
    password: 'acoolproject',
  })
)

app.get('/auth/page', (c) => {
  return c.text('You are authorized')
})

Hono’s package also includes built-in middleware that allows Bearer and JWT authentication, and easy configuration of CORS, etc. These built-in middleware components do not depend on external libraries, but there is also many 3rd-party middleware that allow the use of external libraries, such as authentication middleware using Clerk and Auth.js, and validators using Zod and Valibot.

There are also a number of built-in helpers, including the Streaming helper, which is useful for implementing AI. These can be added when you want to use them, and the file size increases only when they are added.

In Cloudflare Workers, there is a limit to a file size of a Worker. Keeping the core small and extending functions with middleware and helpers makes a lot of sense.

Onion structure

The important concepts of Hono are ”handler” and “middleware”.

A handler is a place to write a function that receives a request and returns a response, as specified by the user. For example, you can write a handler that gets a value of a query parameter, retrieves data from a database, and returns the result in JSON. Middleware can handle the requests that come to the handler and the responses that the handler returns. You can combine middleware with other middleware to build more large and complex applications. It is structured like an onion.

In a remarkably simple way, you can create middleware. For example, a custom logger that logs the request can be written as follows:

app.use(async (c, next) => {
  console.log(`[${c.req.method}] ${c.req.path}`)
  await next()
})

If you want to add a custom header to the response, write the following:

app.use(async (c, next) => {
  await next()
  c.header('X-Message', 'Hi, this is Hono!')
})

It would be interesting to combine this with HTMLRewriter. If an endpoint returns HTML, the middleware that modifies the HTML tags in it can be written as follows:

app.get('/pages/*', async (c, next) => {
  await next()

  class AttributeRewriter {
    constructor(attributeName) {
      this.attributeName = attributeName
    }
    element(element) {
      const attribute = element.getAttribute(this.attributeName)
      if (attribute) {
        element.setAttribute(this.attributeName, attribute.replace('oldhost', 'newhost'))
      }
    }
  }
  const rewriter = new HTMLRewriter().on('a', new AttributeRewriter('href'))

  const contentType = c.res.headers.get('Content-Type')

  if (contentType!.startsWith('text/html')) {
    c.res = rewriter.transform(c.res)
  }
})

There is very little to remember to create middleware. All you have to do is to work with the context, which you should already know.

The RPC is like magic

Hono has a strong type system. One feature that uses this is RPC (Remote Procedure Call). With RPC, you can express server-side API specifications as TypeScript types. When these types are loaded as generics in a client, the paths, arguments, and return types of each API endpoint are inferred. It’s like magic.

For example, imagine an endpoint for creating a blog post. This endpoint takes a number type id and a string type title. Using Zod, one of the validator libraries that support TypeScript inference, you can define the schema like this:

import { z } from 'zod'

const schema = z.object({
  id: z.number(),
  title: z.string()
})

You create a handler that receives this object in JSON format via a POST request to the path /posts. Using Zod Validator, you check if it matches the schema. The response will have a property called message of type string.

import { zValidator } from '@hono/zod-validator'

const app = new Hono().basePath('/v1')

// ...

const routes = app.post('/posts', zValidator('json', schema), (c) => {
  const data = c.req.valid('json')
  return c.json({
    message: `${data.id.toString()} is ${data.title}`
  })
})

This is a “typical” Hono handler. However, the TypeScript type you can get from the typeof for the routes will contain the information about its Web API specification. In this case, it includes the endpoint for creating blog posts — sending a POST request to the path /posts returns a JSON object.

export type AppType = typeof routes

Now, let’s create a client. You pass the earlier AppType as generics to a Hono client object.

import { hc } from 'hono/client'
import { AppType } from '.'

const client = hc<AppType>('http://localhost:8787')

With this setup, you’re ready. It’s magic time.

Code completion works perfectly. When you write client-side code, you no longer need to know the API specifications completely, which also helps eliminate mistakes.

Server-side JSX is fun

Hono provides built-in JSX, a syntax that allows you to write code in JavaScript that looks like HTML tags. When you hear the term JSX, you may think of React, a front-end UI library. However, Hono’s JSX was initially developed to run only on the server side. When we first started developing Hono, we were looking for template engines to render HTML. Most template engines, such as Handlebars and EJS, use eval internally and are incompatible with Cloudflare Workers, which does not support it. Then we came up with the idea of using JSX.

Hono’s JSX is unique in that it treats the tags as a string. So the following strange code actually works.

console.log((<h1>Hello!</h1>).toString())

There is no need to do renderToString() as in React. If you want to render HTML, just return this as is.

app.get('/', (c) => c.html(<h1>Hello</h1>))

Very interesting is the creation of Suspense — a feature in React that allows you to display a fallback UI while waiting for an asynchronous component to load — without any client implementation. The asynchronous components are running in a server-only implementation.

Server-side JSX is a better developer experience than you might imagine. You can use the toolchains for React’s JSX in the same way for Hono’s JSX, including the ability to complete tags in the editor. They bring mature front-end technology to the server side.

Testing is important

Testing is important. Fortunately, you can write tests easily when using Hono.

For example, let’s write a test for an endpoint. To test for a 200 response status of a request coming to / with the GET method, you can write the following:

it('should return 200 response', async () => {
  const res = await app.request('/')
  expect(res.status).toBe(200)
})

Simple, right? The beauty of this test is that you don’t have to bring up the server. The Web Standard API black boxes the server layer. The internal tests of Hono have 20,000 lines of code, but most of them are written in the same style as above, without the server up and running.

Going to full-stack

We released a new major version 4 in February 2024. There are three main features that stand out:

Static site generation
Client components
File-based routing

With these features, you can create full-stack applications with a user interface in Hono.

The introduction of client components allows JSX to work in the client. Now you can add interactions to your pages. Static site generation allows you to create blogs, etc. without having to bundle them into a single JavaScript file. We have also started an experimental project called HonoX. This is a meta-framework using Hono and Vite that provides file-based routing and a mechanism to hydrate client-side components to server-side generated HTML. It is easier to create larger applications that are a great match for Cloudflare Pages or Workers.

In addition to that, plans are underway to run it as a base server for existing full-stack frameworks such as Remix and Qwik.

In contrast to the Next.js framework, which started from the client-side with React, Hono is trying to become a full-stack framework starting from the server-side.

Hono Conference

On June 22, 2024, I held the “Hono Conference” in Tokyo, the first event to consist entirely of Hono-focused talks. One hundred people attended, and the event was a great success.

It was my dream to do this event. Now, there are 200 contributors to the honojs/hono repository on GitHub. If you include other Hono related repositories, there are many more. Creating “the most invincible framework we could think of” is a lot of fun for contributors and users.

Below is a group photo taken at the end of the event. This is my treasure. I want to make the 2nd event a global event.

Hono is 炎

I haven’t mentioned the origin of the name Hono yet. The name Hono is from the Japanese word for “炎“. It is similar to the word “flare“. Hono now runs on a variety of runtimes, but I said that it was first created to create Cloud”flare” Workers applications. It is an honor for Cloudflare that it has remained in its name.

That is all that the creator of Hono has to say about Hono.

Just try it

Everyone who has experienced application development with Hono and Cloudflare Workers says “the developer experience is a great experience“. If you haven’t experienced it yet, just try it.

See the Hono website for how to get started. If you are interested in reporting issues or contributing, please see the GitHub project. Plus, you can watch my interview about Hono on the YouTube Cloudflare Developers channel.

Wrapping up another Birthday Week celebration

2024-09-30 Kelly May Johnston

Post Syndicated from Kelly May Johnston original https://blog.cloudflare.com/birthday-week-2024-wrap-up

2024 marks Cloudflare’s 14th birthday. Birthday Week each year is packed with major announcements and the release of innovative new offerings, all focused on giving back to our customers and the broader Internet community. Birthday Week has become a proud tradition at Cloudflare and our culture, to not just stay true to our mission, but to always stay close to our customers. We begin planning for this week of celebration earlier in the year and invite everyone at Cloudflare to participate.

Months before Birthday Week, we invited teams to submit ideas for what to announce. We were flooded with submissions, from proposals for implementing new standards to creating new products for developers. Our biggest challenge is finding space for it all in just one week — there is still so much to build. Good thing we have a birthday to celebrate each year, but we might need an extra day in Birthday Week next year!

In case you missed it, here’s everything we announced during 2024’s Birthday Week:

Monday

What	In a sentence…
Start auditing and controlling the AI models accessing your content	Understand which AI-related bots and crawlers can access your website, and which content you choose to allow them to consume.
Making zone management more efficient with batch DNS record updates	Customers using Cloudflare to manage DNS can create a whole batch of records, enable proxying on many records, update many records to point to a new target at the same time, or even delete all of their records.
Introducing Ephemeral IDs: a new tool for fraud detection	Taking the next step in advancing security with Ephemeral IDs, a new feature that generates a unique short-lived ID, without relying on any network-level information.

Tuesday

What	In a sentence…
Cloudflare partners to deliver safer browsing experience to homes	Internet service, network, and hardware equipment providers can sign up and partner with Cloudflare to deliver a safer browsing experience to homes.
A safer Internet with Cloudflare: free threat intelligence, analytics, and new threat detections	Free threat intelligence, analytics, new threat detections, and more.
Automatically generating Cloudflare’s Terraform provider	The last pieces of the OpenAPI schemas ecosystem to now be automatically generated — the Terraform provider and API reference documentation.
Cloudflare helps verify the security of end-to-end encrypted messages by auditing key transparency for WhatsApp	Cloudflare helps verify the security of end-to-end encrypted messages by auditing key transparency for WhatsApp.

Wednesday

What	In a sentence…
Introducing Speed Brain: helping web pages load 45% faster	Speed Brain, our latest leap forward in speed, uses the Speculation Rules API to prefetch content for users’ likely next navigations — downloading web pages before they navigate to them and making pages load 45% faster.
Instant Purge: invalidating cached content in under 150ms	Instant Purge invalidates cached content in under 150ms, offering the industry’s fastest cache purge with global latency for purges by tags, hostnames, and prefixes.
New standards for a faster and more private Internet	Zstandard compression, Encrypted Client Hello, and more speed and privacy announcements all released for free.
TURN and anycast: making peer connections work globally	Starting today, Cloudflare Calls’ TURN service is now generally available to all Cloudflare accounts.
Cloudflare’s 12th Generation servers — 145% more performant and 63% more efficient	Next generation servers focused on exceptional performance and security, enhanced support for AI/ML workloads, and significant strides in power efficiency.

Thursday

What	In a sentence…
Startup Program revamped: build and grow on Cloudflare with up to $250,000 in credits	Eligible startups can now apply to receive up to $250,000 in credits to build using Cloudflare’s Developer Platform.
Cloudflare’s bigger, better, faster AI platform	More powerful GPUs, expanded model support, enhanced logging and evaluations in AI Gateway, and Vectorize GA with larger index sizes and faster queries.
Builder Day 2024: 18 big updates to the Workers platform	Persistent and queryable Workers logs, Node.js compatibility GA, improved Next.js support via OpenNext, built-in CI/CD for Workers, Gradual Deployments, Queues, and R2 Event Notifications GA, and more — making building on Cloudflare easier, faster, and more affordable.
Faster Workers KV	A deep dive into how we made Workers KV up to 3x faster.
Zero-latency SQLite storage in every Durable Object	Putting your application code into the storage layer, so your code runs where the data is stored.
Making Workers AI faster and more efficient: Performance optimization with KV cache compression and speculative decoding	Using new optimization techniques such as KV cache compression and speculative decoding, we’ve made large language model (LLM) inference lightning-fast on the Cloudflare Workers AI platform.

Friday

What	In a sentence…
Our container platform is in production. It has GPUs. Here’s an early look.	We’ve been working on something new — a platform for running containers across Cloudflare’s network. We already use it in production, for AI inference and more.
Advancing cybersecurity: Cloudflare implements a new bug bounty VIP program as part of CISA Pledge commitment	We implemented a new bug bounty VIP program this year as part of our CISA Pledge commitment.
Empowering builders: introducing the Dev Alliance and Workers Launchpad Cohort #4	Get free and discounted access to essential developer tools and meet the latest set of incredible startups building on Cloudflare.
Expanding our support for open source projects with Project Alexandria	Expanding our open source program and helping projects have a sustainable and scalable future, providing tools and protection needed to thrive.
Network trends and natural language: Cloudflare Radar’s new Data Explorer & AI Assistant	A simple Web-based interface to build more complex API queries, including comparisons and filters, and visualize the results.
AI Everywhere with the WAF Rule Builder Assistant, Cloudflare Radar AI Insights, and updated AI bot protection	Extending our AI Assistant capabilities to help you build new WAF rules, added new AI bot and crawler traffic insights to Radar, and new AI bot blocking capabilities.
Reaffirming our commitment to Free	Our free plan is here to stay, and we reaffirm that commitment this week with 15 releases that make the Free plan even better.

One more thing…

Cloudflare serves millions of customers and their millions of domains across nearly every country on Earth. However, as a global company, the payment landscape can be complex — especially in regions outside of North America. While credit cards are very popular for online purchases in the US, the global picture is quite different. 60% of consumers across EMEA, APAC and LATAM choose alternative payment methods. For instance, European consumers often opt for SEPA Direct Debit, a bank transfer mechanism, while Chinese consumers frequently use Alipay, a digital wallet.

At Cloudflare, we saw this as an opportunity to meet customers where they are. Today, we’re thrilled to announce that we are expanding our payment system and launching a closed beta for a new payment method called Stripe Link. The checkout experience will be faster and more seamless, allowing our self-serve customers to pay using saved bank accounts or cards with Link. Customers who have saved their payment details at any business using Link can quickly check out without having to reenter their payment information.

These are the first steps in our efforts to expand our payment system to support global payment methods used by customers around the world. We’ll be rolling out new payment methods gradually, ensuring a smooth integration and gathering feedback from our customers every step of the way.

Until next year

That’s all for Birthday Week 2024. However, the innovation never stops at Cloudflare. Continue to follow the Cloudflare Blog all year long as we launch more products and features that help build a better Internet.

Empowering builders: introducing the Dev Alliance and Workers Launchpad Cohort #4

2024-09-27 Melissa Kargiannakis

Post Syndicated from Melissa Kargiannakis original https://blog.cloudflare.com/launchpad-cohort4-dev-starter-pack

Today we’re announcing the Dev Starter Pack, an alliance of innovative tools for developers to get started with discounts and free services. We’re also excited to share an update on our Workers Launchpad Program.

Creating from the ground up often means spending countless hours piecing together the right development stack, navigating different pricing models, and managing growing costs — all of which can take your focus away from what truly matters: building your product and growing your business.

Introducing Dev Starter Pack: the tools you need to start building your startup

Hey! Dani Grant here, one of the first PMs at Cloudflare and co-founder of Jam.dev. Ten years ago (during 2014’s Birthday Week), Cloudflare launched Universal SSL, making SSL free on the Internet for the first time, and in one night doubling the size of the encrypted web.

I was a college student back then, and I immediately became enraptured by Cloudflare’s mission: helping build a better Internet. As part of this mission, Cloudflare has developed powerful tools typically accessible only to Internet giants, oftentimes offering them for free to developers and individuals alike. Heck yeah! I joined Cloudflare in January 2015, and 5 years after that, co-founded a developer tool company called Jam, inspired by the impact that I saw building tools for developers could have while at Cloudflare.

It’s now 10 years later, and a lot has changed –– “software ate the world” and it’s now powering all aspects of our lives, from health to finances to how we work. It’s more important than ever to empower every developer with the best tools available, because the faster we build software, the sooner people’s experiences improve.

Today we’re thrilled to announce the Dev Starter Pack, an alliance of like-minded dev tool companies giving away their services for free, or heavily discounting them for developers who want to start companies and build the future.

Not only does this stack include all the tools you need to build a startup, it also includes all the tools you need to build AI-powered features. We believe that the next wave of startups will be AI-native, as AI becomes as ubiquitous as the electricity that powers the servers.

We haven’t even scratched the surface of what’s possible with AI, and we hope this launch gets developers closer to solving the challenges of building non-deterministic software.

If you’re a software engineer, and you want to build a project or a company and need an off the shelf stack of dev tools to get started, go to devstarterpack.io to start using all of these tools.

Each provider is offering developers a heavily discounted or even free plan to get started building. You can redeem these services by either using the special code “devstarterpack” or selecting “Dev Starter Pack” while applying to relevant programs.

We welcome more tools to join the alliance — this is just the beginning. If you are building a developer tool and would like to include your product in the Dev Starter Pack, let us know here, so we can include you.

What will you build?

We are very excited to see what you will build. Please share with us in Cloudflare’s Discord and community forum, so we can support you however it makes sense.

Software developers are changing the world, and we believe in providing support to help you make an even greater impact. If you’re looking for additional funding or support, check out Cloudflare’s Launchpad for developers turned founders building startups.

Introducing Workers Launchpad Cohort #4

Melissa and Chris from the Cloudflare for Startups team here. Our team is blown away by what customers are demonstrating on the Developer Platform. Just a few weeks ago, our Workers Launchpad Cohort #3 wrapped up. On Demo Day, customers demoed their applications built on Cloudflare, spanning AI, dev tools, IaaS, observability, SaaS, media, and beyond. We’re incredibly proud of Cohort #3 participants, and we look forward to their continued success with Cloudflare.

Following Demo Day of Workers Launchpad Cohort #3, we’ve been excited to receive a surge of new applications from startups around the world. These startups are pushing the boundaries of innovation, particularly in areas like observability, PaaS, AI, automation, e-commerce, and other industries. Many startups that applied this go-around demonstrated that they’ve built some great applications on Cloudflare, and today, we’re excited to announce the accepted participants for our upcoming Workers Launchpad Cohort #4.

Let’s take a look at what Cohort #4 participants are building in their own words:

Adster	Hyperscale revenue powered by real-time data intelligence and AI
Almeta	Predict customer behavior on your website
Best Parents	Disruptive educational travel marketplace for Gen Z under 18
Comigo	Companion app to make therapy an engaging daily practice
Datastrato	A unified data catalog for generative AI infrastructure
Equimake	Create professional 3D projects without technical experience
Evefan	Your own Internet scale events infrastructure
Eventuall	Connecting stars with their fans in paid meet & greets and virtual experiences
Fermat	No-code solution to deploy AI models as internal tools
Fiberplane	Development tool that uses observability data to help test and debug APIs
Firetiger	An engineering observability tool that operates at scale inside customer infrastructure
Flightcast	Video-first podcast hosting & distribution
FlightLevel Technologies	AI Analytics and Footage in the aviation industry.
Gitlip	Powerful, collaborative and lightweight computing platform based on Git
GrackerAI	AI-powered organic growth engine for cybersecurity B2B SaaS
Hackernoon	Community-driven blogging network read by millions of technologists
Hanabi.REST	Prompt to REST API with AI-driven building, testing, and deployment
Infrastack	Next-gen application intelligence and observability platform for developers
June	AI productivity companion
Leed AI	Combined marketing workflows, website, and customer journey for a seamless, AI-accelerated experience
lookbk	Make the Internet more shoppable, starting with fashion on socials
Materialized Intelligence	Data-intensive inference solutions
Maxint	Multi-platform money management powered by AI
Midio	Visual tool to build software and AI agents
NikaPlanet	Transformative geospatial analytics experience with Google Colab, QGIS, ChatGPT, and Miro in one solution
NotHotDog	AI-Powered API Testing Tool
Outerbase	View, edit, query, and visualize your data with AI
Procureezy	AI procurement platform to empower hardware engineers to source smarter and launch sooner
Proma	Process management and automation platform to get work done fast
Render Better	Increase e-commerce revenue by optimizing your site speed, automatically
Sherpo	AI-first no-code platform to build and sell digital content
Speak_	AI platform to surface top talent by evaluating candidates against custom criteria
Tightknit	Embedded community engagement platform built for SaaS
Tinfoil	Powerful analytics with cryptographic privacy guarantees
Velvet	AI gateway to monitor, evaluate, and optimize features
Webstudio	An advanced visual site builder that connects to any headless CMS
Zipr	Streamlined visitor management

The Cloudflare team is ecstatic to work with the amazing participants of Cohort #4. If you want to follow along on Cohort #4’s journey, be sure to follow @CloudflareDev on X and join our Developer Discord server.

Are you a startup building on Cloudflare? Apply for Cohort #5!

Builder Day 2024: 18 big updates to the Workers platform

2024-09-27 Tanushree Sharma

Post Syndicated from Tanushree Sharma original https://blog.cloudflare.com/builder-day-2024-announcements

To celebrate Builder Day 2024, we’re shipping 18 updates inspired by direct feedback from developers building on Cloudflare. Choosing a platform isn’t just about current technologies and services — it’s about betting on a partner that will evolve with your needs as your project grows and the tech landscape shifts. We’re in it for the long haul with you.

Starting today, you can:

Persist logs from your Worker and query them directly on the Cloudflare dashboard
Connect your Worker to private databases (isolated in VPCs) using Hyperdrive
Use a wider set of NPM packages on Cloudflare Workers, via improved Node.js compatibility
Deploy Next.js apps that use the Node.js runtime to Cloudflare, via OpenNext
Run Evals with AI Gateway, now in Open Beta
Read from and write to SQLite with zero-latency from every Durable Object

We’ve brought key features from Pages to Workers, allowing you to:

Upload and serve static assets as part of your Worker, and use popular frameworks with Workers
Automatically build and deploy each pull request to your Worker’s git repository
Get back a preview deployment URL for each version of your Worker

Four things are going GA and are officially production-ready:

Gradual Deployments: Deploy changes to your Worker gradually, on a percentage basis of traffic
Cloudflare Queues: Now with much higher throughput and concurrency limits
R2 Event Notifications: Tightly integrated with Queues for event-driven applications
Vectorize: Globally distributed vector database, now faster, with larger indexes, and new pricing

The Workers platform is getting faster:

We made Workers KV up to 3x faster. Which makes serving static assets from Workers and Pages faster!
Workers AI now has much faster Time-to-First-Token (TTFT), backed by more powerful GPUs

And we’re lowering the cost of building on Cloudflare:

Requests made through Service Bindings and to Tail Workers are now free
Cloudflare Images is introducing a free tier for everyone with a Cloudflare account
We’ve simplified Workers AI pricing to use industry standard units of measure

Everything in this post is available for you to use today. Keep reading to learn more, and watch the Builder Day Live Stream for demos and more.

Persistent Logs for every Worker

Starting today in open beta, you can automatically retain logs from your Worker, with full search, query, and filtering capabilities available directly within the Cloudflare dashboard. All newly created Workers will have this setting automatically enabled. This marks the first step in the development of our observability platform, following Cloudflare’s acquisition of Baselime.

Getting started is easy – just add two lines to your Worker’s wrangler.toml and redeploy:

[observability]
enabled = true

Workers Logs allow you to view all logs emitted from your Worker. When enabled, each console.log message, error, and exception is published as a separate event. Every Worker invocation (i.e. requests, alarms, rpc, etc.) also publishes an enriched execution log that contains invocation metadata. You can view logs in the Logs tab of your Worker in the dashboard, where you can filter on any event field, such as time, error code, message, or your own custom field.

If you’ve ever had to piece together the puzzle of unusual metrics, such as a spike in errors or latency, you know how frustrating it is to connect metrics to traces and logs that often live in independent data silos. Workers Logs is the first piece of a new observability platform we are building that helps you easily correlate telemetry data, and surfaces insights to help you understand. We’ll structure your telemetry data so you have the full context to ask the right questions, and can quickly and easily analyze the behavior of your applications. This is just the beginning for observability tools for Workers. We are already working on automatically emitting distributed traces from Workers, with real time errors and wide, high dimensionality events coming soon as well.

Starting November 1, 2024, Workers Logs will cost $0.60 per million log lines written after the included volume, as shown in the table below. Querying your logs is free. This makes it easy to estimate and forecast your costs — we think you shouldn’t have to calculate the number of ‘Gigabytes Ingested’ to understand what you’ll pay.

	Workers Free	Workers Paid
Included Volume	200,000 logs per day	20,000,000 logs per month
Additional Events	N/A	$0.60 per million logs
Retention	3 days	7 days

Try out Workers Logs today. You can learn more from our developer documentation, and give us feedback directly in the #workers-observability channel on Discord.

Connect to private databases from Workers

Starting today, you can now use Hyperdrive, Cloudflare Tunnels and Access together to securely connect to databases that are isolated in a private network.

Hyperdrive enables you to build on Workers with your existing regional databases. It accelerates database queries using Cloudflare’s network, caching data close to end users and pooling connections close to the database. But there’s been a major blocker preventing you from building with Hyperdrive: network isolation.

The majority of databases today aren’t publicly accessible on the Internet. Data is highly sensitive and placing databases within private networks like a virtual private cloud (VPC) keeps data secure. But to date, that has also meant that your data is held captive within your cloud provider, preventing you from building on Workers.

Today, we’re enabling Hyperdrive to securely connect to private databases using Cloudflare Tunnels and Cloudflare Access. With a Cloudflare Tunnel running in your private network, Hyperdrive can securely connect to your database and start speeding up your queries.

With this update, Hyperdrive makes it possible for you to build full-stack applications on Workers with your existing databases, network-isolated or not. Whether you’re using Amazon RDS, Amazon Aurora, Google Cloud SQL, Azure Database, or any other provider, Hyperdrive can connect to your databases and optimize your database connections to provide the fast performance you’ve come to expect with building on Workers.

Improved Node.js compatibility is now GA

Earlier this month, we overhauled our support for Node.js APIs in the Workers runtime. With twice as many Node APIs now supported on Workers, you can now use a wider set of NPM packages to build a broader range of applications. Today, we’re happy to announce that improved Node.js compatibility is GA.

To give it a try, enable the nodejs_compat compatibility flag, and set your compatibility date to on or after 2024-09-23:

compatibility_flags = ["nodejs_compat"]
compatibility_date = "2024-09-23"

Read the developer documentation to learn more about how to opt-in your Workers to try it today. If you encounter any bugs or want to report feedback, open an issue.

Build frontend applications on Workers with Static Asset Hosting

Starting today in open beta, you now can upload and serve HTML, CSS, and client-side JavaScript directly as part of your Worker. This means you can build dynamic, server-side rendered applications on Workers using popular frameworks such as Astro, Remix, Next.js and Svelte (full list here), with more coming soon.

You can now deploy applications to Workers that previously could only be deployed to Cloudflare Pages and use features that are not yet supported in Pages, including Logpush, Hyperdrive, Cron Triggers, Queue Consumers, and Gradual Deployments.

To get started, create a new project with create-cloudflare. For example, to create a new Astro project:

npm create cloudflare@latest -- my-astro-app --framework=astro --experimental

Visit our developer documentation to learn more about setting up a new front-end application on Workers and watch a quick demo to learn about how you can deploy an existing application to Workers. Static assets aren’t just for Workers written in JavaScript! You can serve static assets from Workers written in Python or even deploy a Leptos app using workers-rs.

If you’re wondering “What about Pages?” — rest assured, Pages will remain fully supported. We’ve heard from developers that as we’ve added new features to Workers and Pages, the choice of which product to use has become challenging. We’re closing this gap by bringing asset hosting, CI/CD and Preview URLs to Workers this Birthday Week.

To make the upfront choice Cloudflare Workers and Pages more transparent, we’ve created a compatibility matrix. Looking ahead, we plan to bridge the remaining gaps between Workers and Pages and provide ways to migrate your Pages projects to Workers.

Cloudflare joins OpenNext to deploy Next.js apps to Workers

Starting today, as an early developer preview, you can use OpenNext to deploy Next.js apps to Cloudflare Workers via @opennextjs/cloudflare, a new npm package that lets you use the Node.js “runtime” in Next.js on Workers.

This new adapter is powered by our new Node.js compatibility layer, newly introduced Static Assets for Workers, and Workers KV, which is now up to 3x faster. It unlocks support for Incremental Static Regeneration (ISR), custom error pages, and other Next.js features that our previous adapter, @cloudflare/next-on-pages, could not support, as it was only compatible with the Edge “runtime” in Next.js.

Cloud providers shouldn’t lock you in. Like cloud compute and storage, open source frameworks should be portable — you should be able to deploy them to different cloud providers. The goal of the OpenNext project is to make sure you can deploy Next.js apps to any cloud platform, originally to AWS, and now Cloudflare. We’re excited to contribute to the OpenNext community, and give developers the freedom to run on the cloud that fits their applications needs (and budget) best.

To get started by reading the OpenNext docs, which provide examples and a guide on how to add @opennextjs/cloudflare to your Next.js app.

We want your feedback! Report issues and contribute code at opennextjs/opennextjs-cloudflare on GitHub, and join the discussion on the OpenNext Discord.

npm create cloudflare@latest -- my-next-app --framework=next --experimental

We want your feedback! Report issues and contribute code at opennextjs/opennextjs-cloudflare on GitHub, and join the discussion on the OpenNext Discord.

Continuous Integration & Delivery (CI/CD) with Workers Builds

Now in open beta, you can connect a GitHub or GitLab repository to a Worker, and Cloudflare will automatically build and deploy your changes each time you push a commit. Workers Builds provides an integrated CI/CD workflow you can use to build and deploy everything from full-stack applications built with the most popular frameworks to simple static websites. Just add your build command and let Workers Builds take care of the rest.

While in open beta, Workers Builds is free to use, with a limit of one concurrent build per account, and unlimited build minutes per month. Once Workers Builds is Generally Available in early 2025, you will be billed based on the number of build minutes you use each month, and have a higher number of concurrent builds.

	Workers Free	Workers Paid
Build minutes, open beta	Unlimited	Unlimited
Concurrent builds, open beta	1	1
Build minutes, general availability	3,000 minutes included per month	6,000 minutes included per month +$0.005 per additional build minute
Concurrent builds, general availability	1	6

Read the docs to learn more about how to deploy your first project with Workers Builds.

Workers preview URLs

Each newly uploaded version of a Worker now automatically generates a preview URL. Preview URLs make it easier for you to collaborate with your team during development, and can be used to test and identify issues in a preview environment before they are deployed to production.

When you upload a version of your Worker via the Wrangler CLI, Wrangler will display the preview URL once your upload succeeds. You can also find preview URLs for each version of your Worker in the Cloudflare dashboard:

Preview URLs for Workers are similar to Pages preview deployments — they run on your Worker’s workers.dev subdomain and allow you to view changes applied on a new version of your application before the changes are deployed.

Learn more about preview URLs by visiting our developer documentation.

Safely release to production with Gradual Deployments

At Developer Week, we launched Gradual Deployments for Workers and Durable Objects to make it safer and easier to deploy changes to your applications. Gradual Deployments is now GA — we have been using it ourselves at Cloudflare for mission-critical services built on Workers since early 2024.

Gradual deployments can help you stay on top of availability SLAs and minimize application downtime by surfacing issues early. Internally at Cloudflare, every single service built on Workers uses gradual deployments to roll out new changes. Each new version gets released in stages —– 0.05%, 0.5%, 3%, 10%, 25%, 50%, 75% and 100% with time to soak between each stage. Throughout the roll-out, we keep an eye on metrics (which are often instrumented with Workers Analytics Engine!) and we roll back if we encounter issues.

Using gradual deployments is as simple as swapping out the wrangler commands, API endpoints, and/or using “Save version” in the code editor that is built into the Workers dashboard. Read the developer documentation to learn more and get started.

Queues is GA, with higher throughput and concurrency limits

Cloudflare Queues is now generally available with higher limits.

Queues let a developer decouple their Workers into event driven services. Producer Workers write events to a Queue, and consumer Workers are invoked to take actions on the events. For example, you can use a Queue to decouple an e-commerce website from a service which sends purchase confirmation emails to users.

Throughput and concurrency limits for Queues are now significantly higher, which means you can push more messages through a Queue, and consume them faster.

Throughput: Each queue can now process 5000 messages per second (previously 400 per second).
Concurrency: Each queue can now have up to 250 concurrent consumers (previously 20 concurrent consumers).

Since we announced Queues in beta, we’ve added the following functionality:

Batch sizes can be customized, to reduce the number of consumer Worker invocations and thus reduce cost.
Individual messages can be delayed, so you can back off due to external API rate limits.
HTTP Pull consumers allow messages to be consumed outside Workers, with zero data egress costs.

Queues can be used by any developer on a Workers Paid plan. Head over to our getting started guide to start building with Queues.

Event notifications for R2 is now GA

We’re excited to announce that event notifications for R2 is now generally available. Whether it’s kicking off image processing after a user uploads a file or triggering a sync to an external data warehouse when new analytics data is generated, many applications need to be able to reliably respond when events happen. Event notifications for Cloudflare R2 give you the ability to build event-driven applications and workflows that react to changes in your data.

Here’s how it works: When data in your R2 bucket changes, event notifications are sent to your queue. You can consume these notifications with a consumer Worker or pull them over HTTP from outside of Cloudflare Workers.

Since we introduced event notifications in open beta earlier this year, we’ve made significant improvements based on your feedback:

We increased reliability of event notifications with throughput improvements from Queues. R2 event notifications can now scale to thousands of writes per second.
You can now configure event notifications directly from the Cloudflare dashboard (in addition to Wrangler).
There is now support for receiving notifications triggered by object lifecycle deletes.
You can now set up multiple notification rules for a single queue on a bucket.

Visit our documentation to learn about how to set up event notifications for your R2 buckets.

Removing the serverless microservices tax: No more request fees for Service Bindings and Tail Workers

Earlier this year, we quietly changed Workers pricing to lower your costs. As of July 2024, you are no longer charged for requests between Workers on your account made via Service Bindings, or for invocations of Tail Workers. For example, let’s say you have the following chain of Workers:

Each request from a client results in three Workers invocations. Previously, we charged you for each of these invocations, plus the CPU time for each of these Workers. With this change, we only charge you for the first request from the client, plus the CPU time used by each Worker.

This eliminates the additional cost of breaking a monolithic serverless app into microservices. In 2023, we introduced new pricing based on CPU time, rather than duration, so you don’t have to worry about being billed for time spent waiting on I/O. This includes I/O to other Workers. With this change, you’re only billed for the first request in the chain, eliminating the other additional cost of using multiple Workers.

When you build microservices on Workers, you face fewer trade offs than on other compute platforms. Service bindings have zero network overhead by default, a built-in JavaScript RPC system, and a security model with fewer footguns and simpler configuration. We’re excited to improve this further with this pricing change.

Image optimization is available to everyone for free — no subscription needed

Starting today, you can use Cloudflare Images for free to optimize your images with up to 5,000 transformations per month.

Large, oversized images can throttle your application speed and page load times. We built Cloudflare Images to let you dynamically optimize images in the correct dimensions and formats for each use case, all while storing only the original image.

In the spirit of Birthday Week, we’re making image optimization available to everyone with a Cloudflare account, no subscription needed. You’ll be able to use Images to transform images that are stored outside of Images, such as in R2.

Transformations are served from your zone through a specially formatted URL with parameters that specify how an image should be optimized. For example, the transformation URL below uses the format parameter to automatically serve the image in the most optimal format for the requesting browser:

https://example.com/cdn-cgi/image/format=auto/thumbnail.png

This means that the original PNG image may be served as AVIF to one user and WebP to another. Without a subscription, transforming images from remote sources is free up to 5,000 unique transformations per month. Once you exceed this limit, any already cached transformations will continue to be served, but you’ll need a paid Images plan to request new transformations or to purchase storage within Images.

To get started, navigate to Images in the dashboard to enable transformations on your zone.

Dive deep into more announcements from Builder Day

We shipped so much that we couldn’t possibly fit it all in one blog post. These posts dive into the technical details of what we’re announcing at Builder Day:

Cloudflare’s Bigger, Better, Faster AI platform
Making Workers AI faster with KV cache compression, speculative decoding, and upgraded hardware
We made Workers KV up to 3x faster — here’s the data
Zero-latency SQLite storage in every Durable Object

Build the next big thing on Cloudflare

Cloudflare is for builders, and everything we’re announcing at Builder Day, you can start building with right away. We’re now offering $250,000 in credits to use on our Developer Platform to qualified startups, so that you can get going even faster, and become the next company to reach hypergrowth scale with a small team, and not waste time provisioning infrastructure and doing undifferentiated heavy lifting. Focus on shipping, and we’ll take care of the rest.

Apply to the startup program here, or stop by and say hello in the Cloudflare Developers Discord.

Zero-latency SQLite storage in every Durable Object

2024-09-26 Kenton Varda

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/sqlite-in-durable-objects

Traditional cloud storage is inherently slow, because it is normally accessed over a network and must carefully synchronize across many clients that could be accessing the same data. But what if we could instead put your application code deep into the storage layer, such that your code runs directly on the machine where the data is stored, and the database itself executes as a local library embedded inside your application?

Durable Objects (DO) are a novel approach to cloud computing which accomplishes just that: Your application code runs exactly where the data is stored. Not just on the same machine: your storage lives in the same thread as the application, requiring not even a context switch to access. With proper use of caching, storage latency is essentially zero, while nevertheless being durable and consistent.

Until today, DOs only offered key/value oriented storage. But now, they support a full SQL query interface with tables and indexes, through the power of SQLite.

SQLite is the most-used SQL database implementation in the world, with billions of installations. It’s on practically every phone and desktop computer, and many embedded devices use it as well. It’s known to be blazingly fast and rock solid. But it’s been less common on the server. This is because traditional cloud architecture favors large distributed databases that live separately from application servers, while SQLite is designed to run as an embedded library. In this post, we’ll show you how Durable Objects turn this architecture on its head and unlock the full power of SQLite in the cloud.

Refresher: what are Durable Objects?

Durable Objects (DOs) are a part of the Cloudflare Workers serverless platform. A DO is essentially a small server that can be addressed by a unique name and can keep state both in-memory and on-disk. Workers running anywhere on Cloudflare’s network can send messages to a DO by its name, and all messages addressed to the same name — from anywhere in the world — will find their way to the same DO instance.

DOs are intended to be small and numerous. A single application can create billions of DOs distributed across our global network. Cloudflare automatically decides where a DO should live based on where it is accessed, automatically starts it up as needed when requests arrive, and shuts it down when idle. A DO has in-memory state while running and can also optionally store long-lived durable state. Since there is exactly one DO for each name, a DO can be used to coordinate between operations on the same logical object.

For example, imagine a real-time collaborative document editor application. Many users may be editing the same document at the same time. Each user’s changes must be broadcast to other users in real time, and conflicts must be resolved. An application built on DOs would typically create one DO for each document. The DO would receive edits from users, resolve conflicts, broadcast the changes back out to other users, and keep the document content updated in its local storage.

DOs are especially good at real-time collaboration, but are by no means limited to this use case. They are general-purpose servers that can implement any logic you desire to serve requests. Even more generally, DOs are a basic building block for distributed systems.

When using Durable Objects, it’s important to remember that they are intended to scale out, not up. A single object is inherently limited in throughput since it runs on a single thread of a single machine. To handle more traffic, you create more objects. This is easiest when different objects can handle different logical units of state (like different documents, different users, or different “shards” of a database), where each unit of state has low enough traffic to be handled by a single object. But sometimes, a lot of traffic needs to modify the same state: consider a vote counter with a million users all trying to cast votes at once. To handle such cases with Durable Objects, you would need to create a set of objects that each handle a subset of traffic and then replicate state to each other. Perhaps they use CRDTs in a gossip network, or perhaps they implement a fan-in/fan-out approach to a single primary object. Whatever approach you take, Durable Objects make it fast and easy to create more stateful nodes as needed.

Why is SQLite-in-DO so fast?

In traditional cloud architecture, stateless application servers run business logic and communicate over the network to a database. Even if the network is local, database requests still incur latency, typically measured in milliseconds.

When a Durable Object uses SQLite, SQLite is invoked as a library. This means the database code runs not just on the same machine as the DO, not just in the same process, but in the very same thread. Latency is effectively zero, because there is no communication barrier between the application and SQLite. A query can complete in microseconds.

Reads and writes are synchronous

The SQL query API in DOs does not require you to await results — they are returned synchronously:

// No awaits!
let cursor = sql.exec("SELECT name, email FROM users");
for (let user of cursor) {
  console.log(user.name, user.email);
}

This may come as a surprise to some. Querying a database is I/O, right? I/O should always be asynchronous, right? Isn’t this a violation of the natural order of JavaScript?

It’s OK! The database content is probably cached in memory already, and SQLite is being called as a library in the same thread as the application, so the query often actually won’t spend any time at all waiting for I/O. Even if it does have to go to disk, it’s a local SSD. You might as well consider the local disk as just another layer in the memory cache hierarchy: L5 cache, if you will. In any case, it will respond quickly.

Meanwhile, synchronous queries provide some big benefits. First, the logistics of asynchronous event loops have a cost, so in the common case where the data is already in memory, a synchronous query will actually complete faster than an async one.

More importantly, though, synchronous queries help you avoid subtle bugs. Any time your application awaits a promise, it’s possible that some other code executes while you wait. The state of the world may have changed by the time your await completes. Maybe even other SQL queries were executed. This can lead to subtle bugs that are hard to reproduce because they require events to happen at just the wrong time. With a synchronous API, though, none of that can happen. Your code always executes in the order you wrote it, uninterrupted.

Fast writes with Output Gates

Database experts might have a deeper objection to synchronous queries: Yes, caching may mean we can perform reads and writes very fast. However, in the case of a write, just writing to cache isn’t good enough. Before we return success to our client, we must confirm that the write is actually durable, that is, it has actually made it onto disk or network storage such that it cannot be lost if the power suddenly goes out.

Normally, a database would confirm all writes before returning to the application. So if the query is successful, it is confirmed. But confirming writes can be slow, because it requires waiting for the underlying storage medium to respond. Normally, this is OK because the write is performed asynchronously, so the program can go on and work on other things while it waits for the write to finish. It looks kind of like this:

But I just told you that in Durable Objects, writes are synchronous. While a synchronous call is running, no other code in the program can run (because JavaScript does not have threads). This is convenient, as mentioned above, because it means you don’t need to worry that the state of the world may have changed while you were waiting. However, if write queries have to wait a while, and the whole program must pause and wait for them, then throughput will suffer.

Luckily, in Durable Objects, writes do not have to wait, due to a little trick we call “Output Gates”.

In DOs, when the application issues a write, it continues executing without waiting for confirmation. However, when the DO then responds to the client, the response is blocked by the “Output Gate”. This system holds the response until all storage writes relevant to the response have been confirmed, then sends the response on its way. In the rare case that the write fails, the response will be replaced with an error and the Durable Object itself will restart. So, even though the application constructed a “success” response, nobody can ever see that this happened, and thus nobody can be misled into believing that the data was stored.

Let’s see what this looks like with multiple requests:

If you compare this against the first diagram above, you should notice a few things:

The timing of requests and confirmations are the same.
But, all responses were sent to the client sooner than in the first diagram. Latency was reduced! This is because the application is able to work on constructing the response in parallel with the storage layer confirming the write.
Request handling is no longer interleaved between the three requests. Instead, each request runs to completion before the next begins. The application does not need to worry, during the handling of one request, that its state might change unexpectedly due to a concurrent request.

With Output Gates, we get the ease-of-use of synchronous writes, while also getting lower latency and no loss of throughput.

N+1 selects? No problem.

Zero-latency queries aren’t just faster, they allow you to structure your code differently, often making it simpler. A classic example is the “N+1 selects” or “N+1 queries” problem. Let’s illustrate this problem with an example:

// N+1 SELECTs example

// Get the 100 most-recently-modified docs.
let docs = sql.exec(`
  SELECT title, authorId FROM documents
  ORDER BY lastModified DESC
  LIMIT 100
`).toArray();

// For each returned document, get the author name from the users table.
for (let doc of docs) {
  doc.authorName = sql.exec(
      "SELECT name FROM users WHERE id = ?", doc.authorId).one().name;
}

If you are an experienced SQL user, you are probably cringing at this code, and for good reason: this code does 101 queries! If the application is talking to the database across a network with 5ms latency, this will take 505ms to run, which is slow enough for humans to notice.

// Do it all in one query with a join?
let docs = sql.exec(`
  SELECT documents.title, users.name
  FROM documents JOIN users ON documents.authorId = users.id
  ORDER BY documents.lastModified DESC
  LIMIT 100
`).toArray();

Here we’ve used SQL features to turn our 101 queries into one query. Great! Except, what does it mean? We used an inner join, which is not to be confused with a left, right, or cross join. What’s the difference? Honestly, I have no idea! I had to look up joins just to write this example and I’m already confused.

Well, good news: You don’t need to figure it out. Because when using SQLite as a library, the first example above works just fine. It’ll perform about the same as the second fancy version.

More generally, when using SQLite as a library, you don’t have to learn how to do fancy things in SQL syntax. Your logic can be in regular old application code in your programming language of choice, orchestrating the most basic SQL queries that are easy to learn. It’s fine. The creators of SQLite have made this point themselves.

Point-in-Time Recovery

While not necessarily related to speed, SQLite-backed Durable Objects offer another feature: any object can be reverted to the state it had at any point in time in the last 30 days. So if you accidentally execute a buggy query that corrupts all your data, don’t worry: you can recover. There’s no need to opt into this feature in advance; it’s on by default for all SQLite-backed DOs. See the docs for details.

How do I use it?

Let’s say we’re an airline, and we are implementing a way for users to choose their seats on a flight. We will create a new Durable Object for each flight. Within that DO, we will use a SQL table to track the assignments of seats to passengers. The code might look something like this:

import {DurableObject} from "cloudflare:workers";

// Manages seat assignment for a flight.
//
// This is an RPC interface. The methods can be called remotely by other Workers
// running anywhere in the world. All Workers that specify same object ID
// (probably based on the flight number and date) will reach the same instance of
// FlightSeating.
export class FlightSeating extends DurableObject {
  sql = this.ctx.storage.sql;

  // Application calls this when the flight is first created to set up the seat map.
  initializeFlight(seatList) {
    this.sql.exec(`
      CREATE TABLE seats (
        seatId TEXT PRIMARY KEY,  -- e.g. "3B"
        occupant TEXT             -- null if available
      )
    `);

    for (let seat of seatList) {
      this.sql.exec(`INSERT INTO seats VALUES (?, null)`, seat);
    }
  }

  // Get a list of available seats.
  getAvailable() {
    let results = [];

    // Query returns a cursor.
    let cursor = this.sql.exec(`SELECT seatId FROM seats WHERE occupant IS NULL`);

    // Cursors are iterable.
    for (let row of cursor) {
      // Each row is an object with a property for each column.
      results.push(row.seatId);
    }

    return results;
  }

  // Assign passenger to a seat.
  assignSeat(seatId, occupant) {
    // Check that seat isn't occupied.
    let cursor = this.sql.exec(`SELECT occupant FROM seats WHERE seatId = ?`, seatId);
    let result = [...cursor][0];  // Get the first result from the cursor.
    if (!result) {
      throw new Error("No such seat: " + seatId);
    }
    if (result.occupant !== null) {
      throw new Error("Seat is occupied: " + seatId);
    }

    // If the occupant is already in a different seat, remove them.
    this.sql.exec(`UPDATE seats SET occupant = null WHERE occupant = ?`, occupant);

    // Assign the seat. Note: We don't have to worry that a concurrent request may
    // have grabbed the seat between the two queries, because the code is synchronous
    // (no `await`s) and the database is private to this Durable Object. Nothing else
    // could have changed since we checked that the seat was available earlier!
    this.sql.exec(`UPDATE seats SET occupant = ? WHERE seatId = ?`, occupant, seatId);
  }
}

(With just a little more code, we could extend this example to allow clients to subscribe to seat changes with WebSockets, so that if multiple people are choosing their seats at the same time, they can see in real time as seats become unavailable. But, that’s outside the scope of this blog post, which is just about SQL storage.)

Then in wrangler.toml, define a migration setting up your DO class like usual, but instead of using new_classes, use new_sqlite_classes:

[[migrations]]
tag = "v1"
new_sqlite_classes = ["FlightSeating"]

SQLite-backed objects also support the existing key/value-based storage API: KV data is stored into a hidden table in the SQLite database. So, existing applications built on DOs will work when deployed using SQLite-backed objects.

However, because SQLite-backed objects are based on an all-new storage backend, it is currently not possible to switch an existing deployed DO class to use SQLite. You must ask for SQLite when initially deploying the new DO class; you cannot change it later. We plan to begin migrating existing DOs to the new storage backend in 2025.

Pricing

We’ve kept pricing for SQLite-in-DO similar to D1, Cloudflare’s serverless SQL database, by billing for SQL queries (based on rows) and SQL storage. SQL storage per object is limited to 1 GB during the beta period, and will be increased to 10 GB on general availability. DO requests and duration billing are unchanged and apply to all DOs regardless of storage backend.

During the initial beta, billing is not enabled for SQL queries (rows read and rows written) and SQL storage. SQLite-backed objects will incur charges for requests and duration. We plan to enable SQL billing in the first half of 2025 with advance notice.

	Workers Paid
Rows read	First 25 billion / month included + $0.001 / million rows
Rows written	First 50 million / month included + $1.00 / million rows
SQL storage	5 GB-month + $0.20/ GB-month

For more on how to use SQLite-in-Durable Objects, check out the documentation.

What about D1?

Cloudflare Workers already offers another SQLite-backed database product: D1. In fact, D1 is itself built on SQLite-in-DO. So, what’s the difference? Why use one or the other?

In short, you should think of D1 as a more “managed” database product, while SQLite-in-DO is more of a lower-level “compute with storage” building block.

D1 fits into a more traditional cloud architecture, where stateless application servers talk to a separate database over the network. Those application servers are typically Workers, but could also be clients running outside of Cloudflare. D1 also comes with a pre-built HTTP API and managed observability features like query insights. With D1, where your application code and SQL database queries are not colocated like in SQLite-in-DO, Workers has Smart Placement to dynamically run your Worker in the best location to reduce total request latency, considering everything your Worker talks to, including D1. By the end of 2024, D1 will support automatic read replication for scalability and low-latency access around the world. If this managed model appeals to you, use D1.

Durable Objects require a bit more effort, but in return, give you more power. With DO, you have two pieces of code that run in different places: a front-end Worker which routes incoming requests from the Internet to the correct DO, and the DO itself, which runs on the same machine as the SQLite database. You may need to think carefully about which code to run where, and you may need to build some of your own tooling that exists out-of-the-box with D1. But because you are in full control, you can tailor the solution to your application’s needs and potentially achieve more.

Under the hood: Storage Relay Service

When Durable Objects first launched in 2020, it offered only a simple key/value-based interface for durable storage. Under the hood, these keys and values were stored in a well-known off-the-shelf database, with regional instances of this database deployed to locations in our data centers around the world. Durable Objects in each region would store their data to the regional database.

For SQLite-backed Durable Objects, we have completely replaced the persistence layer with a new system built from scratch, called Storage Relay Service, or SRS. SRS has already been powering D1 for over a year, and can now be used more directly by applications through Durable Objects.

SRS is based on a simple idea:

Local disk is fast and randomly-accessible, but expensive and prone to disk failures. Object storage (like R2) is cheap and durable, but much slower than local disk and not designed for database-like access patterns. Can we get the best of both worlds by using a local disk as a cache on top of object storage?

So, how does it work?

The mismatch in functionality between local disk and object storage

A SQLite database on disk tends to undergo many small changes in rapid succession. Any row of the database might be updated by any particular query, but the database is designed to avoid rewriting parts that didn’t change. Read queries may randomly access any part of the database. Assuming the right indexes exist to support the query, they should not require reading parts of the database that aren’t relevant to the results, and should complete in microseconds.

Object storage, on the other hand, is designed for an entirely different usage model: you upload an entire “object” (blob of bytes) at a time, and download an entire blob at a time. Each blob has a different name. For maximum efficiency, blobs should be fairly large, from hundreds of kilobytes to gigabytes in size. Latency is relatively high, measured in tens or hundreds of milliseconds.

So how do we back up our SQLite database to object storage? An obviously naive strategy would be to simply make a copy of the database files from time to time and upload it as a new “object”. But, uploading the database on every change — and making the application wait for the upload to complete — would obviously be way too slow. We could choose to upload the database only occasionally — say, every 10 minutes — but this means in the case of a disk failure, we could lose up to 10 minutes of changes. Data loss is, uh, bad! And even then, for most databases, it’s likely that most of the data doesn’t change every 10 minutes, so we’d be uploading the same data over and over again.

Trick one: Upload a log of changes

Instead of uploading the entire database, SRS records a log of changes, and uploads those.

Conveniently, SQLite itself already has a concept of a change log: the Write-Ahead Log, or WAL. SRS always configures SQLite to use WAL mode. In this mode, any changes made to the database are first written to a separate log file. From time to time, the database is “checkpointed”, merging the changes back into the main database file. The WAL format is well-documented and easy to understand: it’s just a sequence of “frames”, where each frame is an instruction to write some bytes to a particular offset in the database file.

SRS monitors changes to the WAL file (by hooking SQLite’s VFS to intercept file writes) to discover the changes being made to the database, and uploads those to object storage.

Unfortunately, SRS cannot simply upload every single change as a separate “object”, as this would result in too many objects, each of which would be inefficiently small. Instead, SRS batches changes over a period of up to 10 seconds, or up to 16 MB worth, whichever happens first, then uploads the whole batch as a single object.

When reconstructing a database from object storage, we must download the series of change batches and replay them in order. Of course, if the database has undergone many changes over a long period of time, this can get expensive. In order to limit how far back it needs to look, SRS also occasionally uploads a snapshot of the entire content of the database. SRS will decide to upload a snapshot any time that the total size of logs since the last snapshot exceeds the size of the database itself. This heuristic implies that the total amount of data that SRS must download to reconstruct a database is limited to no more than twice the size of the database. Since we can delete data from object storage that is older than the latest snapshot, this also means that our total stored data is capped to 2x the database size.

Credit where credit is due: This idea — uploading WAL batches and snapshots to object storage — was inspired by Litestream, although our implementation is different.

Trick two: Relay through other servers in our global network

Batches are only uploaded to object storage every 10 seconds. But obviously, we cannot make the application wait for 10 whole seconds just to confirm a write. So what happens if the application writes some data, returns a success message to the user, and then the machine fails 9 seconds later, losing the data?

To solve this problem, we take advantage of our global network. Every time SQLite commits a transaction, SRS will immediately forward the change log to five “follower” machines across our network. Once at least three of these followers respond that they have received the change, SRS informs the application that the write is confirmed. (As discussed earlier, the write confirmation opens the Durable Object’s “output gate”, unblocking network communications to the rest of the world.)

When a follower receives a change, it temporarily stores it in a buffer on local disk, and then awaits further instructions. Later on, once SRS has successfully uploaded the change to object storage as part of a batch, it informs each follower that the change has been persisted. At that point, the follower can simply delete the change from its buffer.

However, if the follower never receives the persisted notification, then, after some timeout, the follower itself will upload the change to object storage. Thus, if the machine running the database suddenly fails, as long as at least one follower is still running, it will ensure that all confirmed writes are safely persisted.

Each of a database’s five followers is located in a different physical data center. Cloudflare’s network consists of hundreds of data centers around the world, which means it is always easy for us to find four other data centers nearby any Durable Object (in addition to the one it is running in). In order for a confirmed write to be lost, then, at least four different machines in at least three different physical buildings would have to fail simultaneously (three of the five followers, plus the Durable Object’s host machine). Of course, anything can happen, but this is exceedingly unlikely.

Followers also come in handy when a Durable Object’s host machine is unresponsive. We may not know for sure if the machine has died completely, or if it is still running and responding to some clients but not others. We cannot start up a new instance of the DO until we know for sure that the previous instance is dead – or, at least, that it can no longer confirm writes, since the old and new instances could then confirm contradictory writes. To deal with this situation, if we can’t reach the DO’s host, we can instead try to contact its followers. If we can contact at least three of the five followers, and tell them to stop confirming writes for the unreachable DO instance, then we know that instance is unable to confirm any more writes going forward. We can then safely start up a new instance to replace the unreachable one.

Bonus feature: Point-in-Time Recovery

I mentioned earlier that SQLite-backed Durable Objects can be asked to revert their state to any time in the last 30 days. How does this work?

This was actually an accidental feature that fell out of SRS’s design. Since SRS stores a complete log of changes made to the database, we can restore to any point in time by replaying the change log from the last snapshot. The only thing we have to do is make sure we don’t delete those logs too soon.

Normally, whenever a snapshot is uploaded, all previous logs and snapshots can then be deleted. But instead of deleting them immediately, SRS merely marks them for deletion 30 days later. In the meantime, if a point-in-time recovery is requested, the data is still there to work from.

For a database with a high volume of writes, this may mean we store a lot of data for a lot longer than needed. As it turns out, though, once data has been written at all, keeping it around for an extra month is pretty cheap — typically cheaper, even, than writing it in the first place. It’s a small price to pay for always-on disaster recovery.

Get started with SQLite-in-DO

SQLite-backed DOs are available in beta starting today. You can start building with SQLite-in-DO by visiting developer documentation and provide beta feedback via the #durable-objects channel on our Developer Discord.

Do distributed systems like SRS excite you? Would you like to be part of building them at Cloudflare? We’re hiring!

Making Workers AI faster and more efficient: Performance optimization with KV cache compression and speculative decoding

2024-09-26 Isaac Rehg

Post Syndicated from Isaac Rehg original https://blog.cloudflare.com/making-workers-ai-faster

During Birthday Week 2023, we launched Workers AI. Since then, we have been listening to your feedback, and one thing we’ve heard consistently is that our customers want Workers AI to be faster. In particular, we hear that large language model (LLM) generation needs to be faster. Users want their interactive chat and agents to go faster, developers want faster help, and users do not want to wait for applications and generated website content to load. Today, we’re announcing three upgrades we’ve made to Workers AI to bring faster and more efficient inference to our customers: upgraded hardware, KV cache compression, and speculative decoding.

Thanks to Cloudflare’s 12th generation compute servers, our network now supports a newer generation of GPUs capable of supporting larger models and faster inference. Customers can now use Meta Llama 3.2 11B, Meta’s newly released multi-modal model with vision support, as well as Meta Llama 3.1 70B on Workers AI. Depending on load and time of day, customers can expect to see two to three times the throughput for Llama 3.1 and 3.2 compared to our previous generation Workers AI hardware. More performance information for these models can be found in today’s post: Cloudflare’s Bigger, Better, Faster AI platform.

New KV cache compression methods, now open source

In our effort to deliver low-cost low-latency inference to the world, Workers AI has been developing novel methods to boost efficiency of LLM inference. Today, we’re excited to announce a technique for KV cache compression that can help increase throughput of an inference platform. And we’ve made it open source too, so that everyone can benefit from our research.

It’s all about memory

One of the main bottlenecks when running LLM inference is the amount of vRAM (memory) available. Every word that an LLM processes generates a set of vectors that encode the meaning of that word in the context of any earlier words in the input that are used to generate new tokens in the future. These vectors are stored in the KV cache, causing the memory required for inference to scale linearly with the total number of tokens of all sequences being processed. This makes memory a bottleneck for a lot of transformer-based models. Because of this, the amount of memory an instance has available limits the number of sequences it can generate concurrently, as well as the maximum token length of sequences it can generate.

So what is the KV cache anyway?

LLMs are made up of layers, with an attention operation occurring in each layer. Within each layer’s attention operation, information is collected from the representations of all previous tokens that are stored in cache. This means that vectors in the KV cache are organized into layers, so that the active layer’s attention operation can only query vectors from the corresponding layer of KV cache. Furthermore, since attention within each layer is parallelized across multiple attention “heads”, the KV cache vectors of a specific layer are further subdivided into groups corresponding to each attention head of that layer.

The diagram below shows the structure of an LLM’s KV cache for a single sequence being generated. Each cell represents a KV and the model’s representation for a token consists of all KV vectors for that token across all attention heads and layers. As you can see, the KV cache for a single layer is allocated as an M x N matrix of KV vectors where M is the number of attention heads and N is the sequence length. This will be important later!

For a deeper look at attention, see the original “Attention is All You Need” paper.

KV-cache compression — “use it or lose it”

Now that we know what the KV cache looks like, let’s dive into how we can shrink it!

The most common approach to compressing the KV cache involves identifying vectors within it that are unlikely to be queried by future attention operations and can therefore be removed without impacting the model’s outputs. This is commonly done by looking at the past attention weights for each pair of key and value vectors (a measure of the degree with which that KV’s representation has been queried during past attention operations) and selecting the KVs that have received the lowest total attention for eviction. This approach is conceptually similar to a LFU (least frequently used) cache management policy: the less a particular vector is queried, the more likely it is to be evicted in the future.

Different attention heads need different compression rates

As we saw earlier, the KV cache for each sequence in a particular layer is allocated on the GPU as a # attention heads X sequence length tensor. This means that the total memory allocation scales with the maximum sequence length for all attention heads of the KV cache. Usually this is not a problem, since each sequence generates the same number of KVs per attention head.

When we consider the problem of eviction-based KV cache compression, however, this forces us to remove an equal number of KVs from each attention head when doing the compression. If we remove more KVs from one attention head alone, those removed KVs won’t actually contribute to lowering the memory footprint of the KV cache on GPU, but will just add more empty “padding” to the corresponding rows of the tensor. You can see this in the diagram below (note the empty cells in the second row below):

The extra compression along the second head frees slots for two KVs, but the cache’s shape (and memory footprint) remains the same.

This forces us to use a fixed compression rate for all attention heads of KV cache, which is very limiting on the compression rates we can achieve before compromising performance.

Enter PagedAttention

The solution to this problem is to change how our KV cache is represented in physical memory. PagedAttention can represent N x M tensors with padding efficiently by using an N x M block table to index into a series of “blocks”.

This lets us retrieve the i^th element of a row by taking the i^th block number from that row in the block table and using the block number to lookup the corresponding block, so we avoid allocating space to padding elements in our physical memory representation. In our case, the elements in physical memory are the KV cache vectors, and the M and N that define the shape of our block table are the number of attention heads and sequence length, respectively. Since the block table is only storing integer indices (rather than high-dimensional KV vectors), its memory footprint is negligible in most cases.

Results

Using paged attention lets us apply different rates of compression to different heads in our KV cache, giving our compression strategy more flexibility than other methods. We tested our compression algorithm on LongBench (a collection of long-context LLM benchmarks) with Llama-3.1-8B and found that for most tasks we can retain over 95% task performance while reducing cache size by up to 8x (left figure below). Over 90% task performance can be retained while further compressing up to 64x. That means you have room in memory for 64 times as many tokens!

This lets us increase the number of requests we can process in parallel, increasing the total throughput (total tokens generated per second) by 3.44x and 5.18x for compression rates of 8x and 64x, respectively (right figure above).

Try it yourself!

If you’re interested in taking a deeper dive check out our vLLM fork and get compressing!!

Speculative decoding for faster throughput

A new inference strategy that we implemented is speculative decoding, which is a very popular way to get faster throughput (measured in tokens per second). LLMs work by predicting the next expected token (a token can be a word, word fragment or single character) in the sequence with each call to the model, based on everything that the model has seen before. For the first token generated, this means just the initial prompt, but after that each subsequent token is generated based on the prompt plus all other tokens that have been generated. Typically, this happens one token at a time, generating a single word, or even a single letter, depending on what comes next.

But what about this prompt:

Knock, knock!

If you are familiar with knock-knock jokes, you could very accurately predict more than one token ahead. For an English language speaker, what comes next is a very specific sequence that is four to five tokens long: “Who’s there?” or “Who is there?” Human language is full of these types of phrases where the next word has only one, or a few, high probability choices. Idioms, common expressions, and even basic grammar are all examples of this. So for each prediction the model makes, we can take it a step further with speculative decoding to predict the next n tokens. This allows us to speed up inference, as we’re not limited to predicting one token at a time.

There are several different implementations of speculative decoding, but each in some way uses a smaller, faster-to-run model to generate more than one token at a time. For Workers AI, we have applied prompt-lookup decoding to some of the LLMs we offer. This simple method matches the last n tokens of generated text against text in the prompt/output and predicts candidate tokens that continue these identified patterns as candidates for continuing the output. In the case of knock-knock jokes, it can predict all the tokens for “Who’s there” at once after seeing “Knock, knock!”, as long as this setup occurs somewhere in the prompt or previous dialogue already. Once these candidate tokens have been predicted, the model can verify them all with a single forward-pass and choose to either accept or reject them. This increases the generation speed of llama-3.1-8b-instruct by up to 40% and the 70B model by up to 70%.

Speculative decoding has tradeoffs, however. Typically, the results of a model using speculative decoding have a lower quality, both when measured using benchmarks like MMLU as well as when compared by humans. More aggressive speculation can speed up sequence generation, but generally comes with a greater impact to the quality of the result. Prompt lookup decoding offers one of the smallest overall quality impacts while still providing performance improvements, and we will be adding it to some language models on Workers AI including @cf/meta/llama-3.1-8b-instruct.

And, by the way, here is one of our favorite knock-knock jokes, can you guess the punchline?

Knock, knock!

Who’s there?

Figs!

Figs who?

Figs the doorbell, it’s broken!

Keep accelerating

As the AI industry continues to evolve, there will be new hardware and software that allows customers to get faster inference responses. Workers AI is committed to researching, implementing, and making upgrades to our services to help you get fast inference. As an Inference-as-a-Service platform, you’ll be able to benefit from all the optimizations we apply, without having to hire your own team of ML researchers and SREs to manage inference software and hardware deployments.

We’re excited for you to try out some of these new releases we have and let us know what you think! Check out our full-suite of AI announcements here and check out the developer docs to get started.

Startup Program revamped: build and grow on Cloudflare with up to $250,000 in credits

2024-09-26 Christopher Rotas

Post Syndicated from Christopher Rotas original https://blog.cloudflare.com/startup-program-250k-credits

Today, we’re pleased to offer startups up to $250,000 in credits to use on Cloudflare’s Developer Platform. This new credits system will allow you to clearly see usage and associated fees to plan for a predictable future after the $250,000 in credits have been used up or after one year, whichever happens first.

You can see eligibility criteria and apply to the start-up program here.

What can you use the credits for?

Credits can be applied to all Developer Platform products, as well as Argo and Cache Reserve. Moreover, we provide participants with up to three Enterprise-level domains, which includes CDN, DDoS, DNS, WAF, Zero Trust, and other security and performance products that a participant can enable for their website.

Developer tools and building on Cloudflare

You can use credits for Cloudflare Developer Platform products, including those listed in the table below.

^{Note: credits for the Cloudflare Startup Program apply to Cloudflare products only, this table is illustrative of similar products in the market.}

Speed and performance with Cloudflare

We know that founders need all the help they can get when starting their businesses. Beyond the Developer Platform, you can also use the Startup Program for our speed and performance products. Getting customers where they need to go within milliseconds on your website or application is the difference between closing a sale or not. You can test your speed here and learn how to optimize your speed and performance here with solutions like: Images, Argo, and Early Hints.

Security from Cloudflare

But, wait, there’s more: beyond the Developer Platform products and speed tools, you can also use Cloudflare’s many security features through the Startup Program as well. These include Web Application Firewall (WAF), DDoS Alerts, bundled protection plans, and more. The Startup Program also includes Zero Trust solutions. Learn how others are securing their technology and tools with Cloudflare Zero Trust.

For more inspiration, check out our Built with Cloudflare site, which highlights what other startups are building.

Who can use the credits?

Eligibility criteria can be found here and include:

Companies building a software-based product or service
Founded within the last 5 years (2019-2024)
Have between $50,000 - $5,000,000 in funding
- Note that for startups who have not yet raised at least $50,000, there may be other opportunities for lower credit amounts. Please apply with the promo code “BOOTSTRAPPED” if you haven’t raised $50,000 yet, but are interested in the Cloudflare Startup Program
Have a LinkedIn profile, valid website, and email address
Bonus criteria that adds to your application: being part of an approved accelerator

What will you build?

We’re excited to see what you will build. Please share what you’re up to with us so that we can help you however it makes sense. If you’re actively using Cloudflare’s Developer Platform, we’d love to hear more about what you’re building and share it on our Built with Cloudflare site.

Are you a startup looking for additional support, resources, or access to funding? Apply for our Workers Launchpad Program! The program runs for a few months, and in addition to the Startup Program, participants get access to hands-on bootcamp sessions, Solutions Architect office hours, introductions to VCs, and the opportunity to present at Demo Day.

Why does Cloudflare support founders and startups?

Founders and developers face enough challenges without having to worry about incurring egregious costs to test technology and start building in the earliest days. You have the world at your fingertips and should be empowered to build and create without limitations. Invest money in your innovation, not in the infrastructure and technology that supports it.

The Startup Program understands this founder experience deeply, as the team is made up of former founders. Cloudflare is committed to programs like this to empower founders building the next big thing. Offering up to $250,000 in credits will allow folks to leverage even more of what we have to offer: a developer experience that removes friction, saves money, and gets applications spun up in hours, not days.

We want to support founders from everywhere on earth.

Be bold and keep building! Follow @CloudflareDev and join our Developer Discord server.

Are you a startup building on Cloudflare? Apply here!

Supporting Postgres Named Prepared Statements in Hyperdrive

2024-06-28 Andrew Repp

Post Syndicated from Andrew Repp original https://blog.cloudflare.com/postgres-named-prepared-statements-supported-hyperdrive

Hyperdrive (Cloudflare’s globally distributed SQL connection pooler and cache) recently added support for Postgres protocol-level named prepared statements across pooled connections. Named prepared statements allow Postgres to cache query execution plans, providing potentially substantial performance improvements. Further, many popular drivers in the ecosystem use these by default, meaning that not having them is a bit of a footgun for developers. We are very excited that Hyperdrive’s users will now have access to better performance and a more seamless development experience, without needing to make any significant changes to their applications!

While we’re not the first connection pooler to add this support (PgBouncer got to it in October 2023 in version 1.21, for example), there were some unique challenges in how we implemented it. To that end, we wanted to do a deep dive on what it took for us to deliver this.

Hyper-what?

One of the classic problems of building on the web is that your users are everywhere, but your database tends to be in one spot. Combine that with pesky limitations like network routing, or the speed of light, and you can often run into situations where your users feel the pain of having your database so far away. This can look like slower queries, slower startup times, and connection exhaustion as everything takes longer to accomplish.

Hyperdrive is designed to make the centralized databases you already have feel like they’re global. We use our global network to get faster routes to your database, keep connection pools primed, and cache your most frequently run queries as close to users as possible.

Postgres Message Protocol

To understand exactly what the challenge with prepared statements is, it’s first necessary to dig in a bit to the Postgres Message Protocol. Specifically, we are going to take a look at the protocol for an “extended” query, which uses different message types and is a bit more complex than a “simple” query, but which is more powerful and thus more widely used.

A query using Hyperdrive might be coded something like this, but a lot goes on under the hood in order for Postgres to reliably return your response.

import postgres from "postgres";

// with Hyperdrive, we don't have to disable prepared statements anymore!
// const sql = postgres(env.HYPERDRIVE.connectionString, {prepare: false});

// make a connection, with the default postgres.js settings (prepare is set to true)
const sql = postgres(env.HYPERDRIVE.connectionString);

// This sends the query, and while it looks like a single action it contains several 
// messages implied within it
let [{ a, b, c, id }] = await sql`SELECT a, b, c, id FROM hyper_test WHERE id = ${target_id}`;

To prepare a statement, a Postgres client begins by sending a Parse message. This includes the query string, the number of parameters to be interpolated, and the statement’s name. The name is a key piece of this puzzle. If it is empty, then Postgres uses a special “unnamed” prepared statement slot that gets overwritten on each new Parse. These are relatively easy to support, as most drivers will keep the entirety of a message sequence for unnamed statements together, and will not try to get too aggressive about reusing the prepared statement because it is overwritten so often.

If the statement has a name, however, then it is kept prepared for the remainder of the Postgres session (unless it is explicitly removed with DEALLOCATE). This is convenient because parsing a query string and preparing the statement costs bytes sent on the wire and CPU cycles to process, so reusing a statement is quite a nice optimization.

Once done with Parse, there are a few remaining steps to (the simplest form of) an extended query:

A Bind message, which provides the specific values to be passed for the parameters in the statement (if any).
An Execute message, which tells the Postgres server to actually perform the data retrieval and processing.
And finally a Sync message, which causes the server to close the implicit transaction, return results, and provides a synchronization point for error handling.

While that is the core pattern for accomplishing an extended protocol query, there are many more complexities possible (named Portal, ErrorResponse, etc.).

We will briefly mention one other complexity we often encounter in this protocol, which is Describe messages. Many drivers leverage Postgres’ built-in types to help with deserialization of the results into structs or classes. This is accomplished by sending a Parse-Describe-Flush/Sync sequence, which will send a statement to be prepared, and will expect back information about the types and data the query will return. This complicates bookkeeping around named prepared statements, as now there are two separate queries, with two separate kinds of responses, that must be kept track of. We won’t go into much depth on the tradeoffs of an additional round-trip in exchange for advanced information about the results’ format, but suffice it to say that it must be handled explicitly in order for the overall system to gracefully support prepared statements.

So the basic query from our code above looks like this from a message perspective:

A more complete description and the full structure of each message type are well described in the Postgres documentation.

So, what’s so hard about that?

Buffering Messages

The first challenge that Hyperdrive must solve (that many other connection poolers don’t have) is that it’s also a cache.

The happiest path for a query on Hyperdrive never travels far, and we are quite proud of the low latency of our cache hits. However, this presents a particular challenge in the case of an extended protocol query. A Parse by itself is insufficient as a cache key, both because the parameter values in the Bind messages can alter the expected results, and because it might be followed up with either a Describe or an Execute message which will invoke drastically different responses.

So Hyperdrive cannot simply pass each message to the origin database, as we must buffer them in a message log until we have enough information to reliably distinguish between cache keys. It turns out that receiving a Sync is quite a natural point at which to check whether you have enough information to serve a response. For most scenarios, we buffer until we receive a Sync, and then (assuming the scenario is cacheable) we determine whether we can serve the response from cache or we need to take a connection to the origin database.

Taking a Connection From the Pool

Assuming we aren’t serving a response from cache, for whatever reason, we’ll need to take an origin connection from our pool. One of the key advantages any connection pooler offers is in allowing many client connections to share few database connections, so minimizing how often and for how long these connections are held is crucial to making Hyperdrive performant.

To this end, Hyperdrive operates in what is traditionally called “transaction mode”. This means that a connection taken from the pool for any given transaction is returned once that transaction concludes. This is in contrast to what is often called “session mode”, where once a connection is taken from the pool it is held by the client until the client disconnects.

For Hyperdrive, allowing any client to take any database connection is vital. This is because if we “pin” a client to a given database connection then we have one fewer available for every other possible client. You can run yourself out of database connections very quickly once you start down that path, especially when your clients are many small Workers spread around the world.

The challenge prepared statements present to this scenario is that they exist at the “session” scope, which is to say, at the scope of one connection. If a client prepares a statement on connection A, but tries to reuse it and gets assigned connection B, Postgres will naturally throw an error claiming the statement doesn’t exist in the given session. No results will be returned, the client is unhappy, and all that’s left is to retry with a Parse message included. This causes extra round-trips between client and server, defeating the whole purpose of what is meant to be an optimization.

One of the goals of a connection pooler is to be as transparent to the client and server as possible. There are limitations, as Postgres will let you do some powerful things to session state that cannot be reasonably shared across arbitrary client connections, but to the extent possible the endpoints should not have to know or care about any multiplexing happening between them.

This means that when a client sends a Parse message on its connection, it should expect that the statement will be available for reuse when it wants to send a Bind-Execute-Sync sequence later on. It also means that the server should not get Bind messages for statements that only exist on some other session. Maintaining this illusion is the crux of providing support for this feature.

Putting it all together

So, what does the solution look like? If a client sends Parse-Bind-Execute-Sync with a named prepared statement, then later sends Bind-Execute-Sync to reuse it, how can we make sure that everything happens as expected? The solution, it turns out, needs just a few built-in Rust data structures for efficiently capturing what we need (a HashMap, some LruCaches and a VecDeque), and some straightforward business logic to keep track of when to intervene in the messages being passed back and forth.

Whenever a named Parse comes in, we store it in an in-memory HashMap on the server that handles message processing for that client’s connection. This persists until the client is disconnected. This means that whenever we see anything referencing the statement, we can go retrieve the complete message defining it. We’ll come back to this in a moment.

Once we’ve buffered all the messages we can and gotten to the point where it’s time to return results (let’s say because the client sent a Sync), we need to start applying some logic. For the sake of brevity we’re going to omit talking through error handling here, as it does add some significant complexity but is somewhat out of scope for this discussion.

There are two main questions that determine how we should proceed:

Does our message sequence include a Parse, or are we trying to reuse a pre-existing statement?
Do we have a cache hit or are we serving from the origin database?

This gives us four scenarios to consider:

Parse with cache hit
Parse with cache miss
Reuse with cache hit
Reuse with cache miss

A Parse with a cache hit is the easiest path to address, as we don’t need to do anything special. We use the messages sent as a cache key, and serve the results back to the client. We will still keep the Parse in our HashMap in case we want it later (#2 below), but otherwise we’re good to go.

A Parse with a cache miss is a bit more complicated, as now we need to send these messages to the origin server. We take a connection at random from our pool and do so, passing the results back to the client. With that, we’ve begun to make changes to session state such that all our database connections are no longer identical to each other. To keep track of what we’ve done to muddy up our state, we keep a LruCache on each connection of which statements it already has prepared. In the case where we need to evict from such a cache, we will also DEALLOCATE the statement on the connection to keep things tracked correctly.

Reuse with a cache hit is yet more tricky, but still straightforward enough. In the example below, we are sent a Bind with the same parameters twice (#1 and #9). We must identify that we received a Bind without a preceding Parse, we must go retrieve that Parse (#10), and we must use the information from it to build our cache key. Once all that is accomplished, we can serve our results from cache, needing only to trim out the ParseComplete within the cached results before returning them to the client.

Reuse with a cache miss is the hardest scenario, as it may require us to lie in both directions. In the example below, we cache results for one set of parameters (#8), but are sent a Bind with different parameters (#9). As in the cache hit scenario, we must identify that we were not sent a Parse as part of the current message sequence, retrieve it from our HashMap (#10), and build our cache key to GET from cache and confirm the miss (#11). Once we take a connection from the pool, though, we then need to check if it already has the statement we want prepared. If not, we must take our saved Parse and prepend it to our message log to be sent along to the origin database (#13). Thus, what the server receives looks like a perfectly valid Parse-Bind-Execute-Sync sequence. This is where our VecDeque (mentioned above) comes in, as converting our message log to that structure allowed us to very ergonomically make such changes without needing to rebuild the whole byte sequence. Once we receive the response from the server, all that’s needed is to trim out the initial ParseComplete response from the server, as a well-made client would likely be very confused receiving such a response to a Parse it didn’t send. With that message trimmed out, however, the client is in the position of getting exactly what it asked for, and both sides of the conversation are happy.

Dénouement

Now that we’ve got a working solution, where all parties are functioning well, let’s review! Our solution lets us share database connections across arbitrary clients with no “pinning”, no custom handling on either client or server, and supports reuse of prepared statements to reduce CPU load on re-parsing queries and reduce network traffic on re-sending Parse messages. Engineering always involves tradeoffs, so the cost of this is that we will sometimes still need to sneak in a Parse because a client got assigned a different connection on reuse, and in those scenarios there is a small amount of additional memory overhead because the same statement is prepared on multiple connections.

And now, if you haven’t already, go give Hyperdrive a spin, and let us know what you think in the Hyperdrive Discord channel!

Disrupting FlyingYeti’s campaign targeting Ukraine

2024-05-30 Cloudforce One

Post Syndicated from Cloudforce One original https://blog.cloudflare.com/disrupting-flyingyeti-campaign-targeting-ukraine

Cloudforce One is publishing the results of our investigation and real-time effort to detect, deny, degrade, disrupt, and delay threat activity by the Russia-aligned threat actor FlyingYeti during their latest phishing campaign targeting Ukraine. At the onset of Russia’s invasion of Ukraine on February 24, 2022, Ukraine introduced a moratorium on evictions and termination of utility services for unpaid debt. The moratorium ended in January 2024, resulting in significant debt liability and increased financial stress for Ukrainian citizens. The FlyingYeti campaign capitalized on anxiety over the potential loss of access to housing and utilities by enticing targets to open malicious files via debt-themed lures. If opened, the files would result in infection with the PowerShell malware known as COOKBOX, allowing FlyingYeti to support follow-on objectives, such as installation of additional payloads and control over the victim’s system.

Since April 26, 2024, Cloudforce One has taken measures to prevent FlyingYeti from launching their phishing campaign – a campaign involving the use of Cloudflare Workers and GitHub, as well as exploitation of the WinRAR vulnerability CVE-2023-38831. Our countermeasures included internal actions, such as detections and code takedowns, as well as external collaboration with third parties to remove the actor’s cloud-hosted malware. Our effectiveness against this actor prolonged their operational timeline from days to weeks. For example, in a single instance, FlyingYeti spent almost eight hours debugging their code as a result of our mitigations. By employing proactive defense measures, we successfully stopped this determined threat actor from achieving their objectives.

Executive Summary

On April 18, 2024, Cloudforce One detected the Russia-aligned threat actor FlyingYeti preparing to launch a phishing espionage campaign targeting individuals in Ukraine.
We discovered the actor used similar tactics, techniques, and procedures (TTPs) as those detailed in Ukranian CERT’s article on UAC-0149, a threat group that has primarily targeted Ukrainian defense entities with COOKBOX malware since at least the fall of 2023.
From mid-April to mid-May, we observed FlyingYeti conduct reconnaissance activity, create lure content for use in their phishing campaign, and develop various iterations of their malware. We assessed that the threat actor intended to launch their campaign in early May, likely following Orthodox Easter.
After several weeks of monitoring actor reconnaissance and weaponization activity (Cyber Kill Chain Stages 1 and 2), we successfully disrupted FlyingYeti’s operation moments after the final COOKBOX payload was built.
The payload included an exploit for the WinRAR vulnerability CVE-2023-38831, which FlyingYeti will likely continue to use in their phishing campaigns to infect targets with malware.
We offer steps users can take to defend themselves against FlyingYeti phishing operations, and also provide recommendations, detections, and indicators of compromise.

Who is FlyingYeti?

FlyingYeti is the cryptonym given by Cloudforce One to the threat group behind this phishing campaign, which overlaps with UAC-0149 activity tracked by CERT-UA in February and April 2024. The threat actor uses dynamic DNS (DDNS) for their infrastructure and leverages cloud-based platforms for hosting malicious content and for malware command and control (C2). Our investigation of FlyingYeti TTPs suggests this is likely a Russia-aligned threat group. The actor appears to primarily focus on targeting Ukrainian military entities. Additionally, we observed Russian-language comments in FlyingYeti’s code, and the actor’s operational hours falling within the UTC+3 time zone.

Campaign background

In the days leading up to the start of the campaign, Cloudforce One observed FlyingYeti conducting reconnaissance on payment processes for Ukrainian communal housing and utility services:

April 22, 2024 – research into changes made in 2016 that introduced the use of QR codes in payment notices
April 22, 2024 – research on current developments concerning housing and utility debt in Ukraine
April 25, 2024 – research on the legal basis for restructuring housing debt in Ukraine as well as debt involving utilities, such as gas and electricity

Cloudforce One judges that the observed reconnaissance is likely due to the Ukrainian government’s payment moratorium introduced at the start of the full-fledged invasion in February 2022. Under this moratorium, outstanding debt would not lead to evictions or termination of provision of utility services. However, on January 9, 2024, the government lifted this ban, resulting in increased pressure on Ukrainian citizens with outstanding debt. FlyingYeti sought to capitalize on that pressure, leveraging debt restructuring and payment-related lures in an attempt to increase their chances of successfully targeting Ukrainian individuals.

Analysis of the Komunalka-themed phishing site

The disrupted phishing campaign would have directed FlyingYeti targets to an actor-controlled GitHub page at hxxps[:]//komunalka[.]github[.]io, which is a spoofed version of the Kyiv Komunalka communal housing site https://www.komunalka.ua. Komunalka functions as a payment processor for residents in the Kyiv region and allows for payment of utilities, such as gas, electricity, telephone, and Internet. Additionally, users can pay other fees and fines, and even donate to Ukraine’s defense forces.

Based on past FlyingYeti operations, targets may be directed to the actor’s Github page via a link in a phishing email or an encrypted Signal message. If a target accesses the spoofed Komunalka platform at hxxps[:]//komunalka[.]github[.]io, the page displays a large green button with a prompt to download the document “Рахунок.docx” (“Invoice.docx”), as shown in Figure 1. This button masquerades as a link to an overdue payment invoice but actually results in the download of the malicious archive “Заборгованість по ЖКП.rar” (“Debt for housing and utility services.rar”).

Figure 1: Prompt to download malicious archive “Заборгованість по ЖКП.rar”

A series of steps must take place for the download to successfully occur:

The target clicks the green button on the actor’s GitHub page hxxps[:]//komunalka.github[.]io
The target’s device sends an HTTP POST request to the Cloudflare Worker worker-polished-union-f396[.]vqu89698[.]workers[.]dev with the HTTP request body set to “user=Iahhdr”
The Cloudflare Worker processes the request and evaluates the HTTP request body
If the request conditions are met, the Worker fetches the RAR file from hxxps[:]//raw[.]githubusercontent[.]com/kudoc8989/project/main/Заборгованість по ЖКП.rar, which is then downloaded on the target’s device

Cloudforce One identified the infrastructure responsible for facilitating the download of the malicious RAR file and remediated the actor-associated Worker, preventing FlyingYeti from delivering its malicious tooling. In an effort to circumvent Cloudforce One’s mitigation measures, FlyingYeti later changed their malware delivery method. Instead of the Workers domain fetching the malicious RAR file, it was loaded directly from GitHub.

Analysis of the malicious RAR file

During remediation, Cloudforce One recovered the RAR file “Заборгованість по ЖКП.rar” and performed analysis of the malicious payload. The downloaded RAR archive contains multiple files, including a file with a name that contains the unicode character “U+201F”. This character appears as whitespace on Windows devices and can be used to “hide” file extensions by adding excessive whitespace between the filename and the file extension. As highlighted in blue in Figure 2, this cleverly named file within the RAR archive appears to be a PDF document but is actually a malicious CMD file (“Рахунок на оплату.pdf[unicode character U+201F].cmd”).

Figure 2: Files contained in the malicious RAR archive “Заборгованість по ЖКП.rar” (“Housing Debt.rar”)

FlyingYeti included a benign PDF in the archive with the same name as the CMD file but without the unicode character, “Рахунок на оплату.pdf” (“Invoice for payment.pdf”). Additionally, the directory name for the archive once decompressed also contained the name “Рахунок на оплату.pdf”. This overlap in names of the benign PDF and the directory allows the actor to exploit the WinRAR vulnerability CVE-2023-38831. More specifically, when an archive includes a benign file with the same name as the directory, the entire contents of the directory are opened by the WinRAR application, resulting in the execution of the malicious CMD. In other words, when the target believes they are opening the benign PDF “Рахунок на оплату.pdf”, the malicious CMD file is executed.

The CMD file contains the FlyingYeti PowerShell malware known as COOKBOX. The malware is designed to persist on a host, serving as a foothold in the infected device. Once installed, this variant of COOKBOX will make requests to the DDNS domain postdock[.]serveftp[.]com for C2, awaiting PowerShell cmdlets that the malware will subsequently run.

Alongside COOKBOX, several decoy documents are opened, which contain hidden tracking links using the Canary Tokens service. The first document, shown in Figure 3 below, poses as an agreement under which debt for housing and utility services will be restructured.

Figure 3: Decoy document Реструктуризація боргу за житлово комунальні послуги.docx

The second document (Figure 4) is a user agreement outlining the terms and conditions for the usage of the payment platform komunalka[.]ua.

Figure 4: Decoy document Угода користувача.docx (User Agreement.docx)

The use of relevant decoy documents as part of the phishing and delivery activity are likely an effort by FlyingYeti operators to increase the appearance of legitimacy of their activities.

The phishing theme we identified in this campaign is likely one of many themes leveraged by this actor in a larger operation to target Ukrainian entities, in particular their defense forces. In fact, the threat activity we detailed in this blog uses many of the same techniques outlined in a recent FlyingYeti campaign disclosed by CERT-UA in mid-April 2024, where the actor leveraged United Nations-themed lures involving Peace Support Operations to target Ukraine’s military. Due to Cloudforce One’s defensive actions covered in the next section, this latest FlyingYeti campaign was prevented as of the time of publication.

Mitigating FlyingYeti activity

Cloudforce One mitigated FlyingYeti’s campaign through a series of actions. Each action was taken to increase the actor’s cost of continuing their operations. When assessing which action to take and why, we carefully weighed the pros and cons in order to provide an effective active defense strategy against this actor. Our general goal was to increase the amount of time the threat actor spent trying to develop and weaponize their campaign.

We were able to successfully extend the timeline of the threat actor’s operations from hours to weeks. At each interdiction point, we assessed the impact of our mitigation to ensure the actor would spend more time attempting to launch their campaign. Our mitigation measures disrupted the actor’s activity, in one instance resulting in eight additional hours spent on debugging code.

Due to our proactive defense efforts, FlyingYeti operators adapted their tactics multiple times in their attempts to launch the campaign. The actor originally intended to have the Cloudflare Worker fetch the malicious RAR file from GitHub. After Cloudforce One interdiction of the Worker, the actor attempted to create additional Workers via a new account. In response, we disabled all Workers, leading the actor to load the RAR file directly from GitHub. Cloudforce One notified GitHub, resulting in the takedown of the RAR file, the GitHub project, and suspension of the account used to host the RAR file. In return, FlyingYeti began testing the option to host the RAR file on the file sharing sites pixeldrain and Filemail, where we observed the actor alternating the link on the Komunalka phishing site between the following:

hxxps://pixeldrain[.]com/api/file/ZAJxwFFX?download=one
hxxps://1014.filemail[.]com/api/file/get?filekey=e_8S1HEnM5Rzhy_jpN6nL-GF4UAP533VrXzgXjxH1GzbVQZvmpFzrFA&pk_vid=a3d82455433c8ad11715865826cf18f6

We notified GitHub of the actor’s evolving tactics, and in response GitHub removed the Komunalka phishing site. After analyzing the files hosted on pixeldrain and Filemail, we determined the actor uploaded dummy payloads, likely to monitor access to their phishing infrastructure (FileMail logs IP addresses, and both file hosting sites provide view and download counts). At the time of publication, we did not observe FlyingYeti upload the malicious RAR file to either file hosting site, nor did we identify the use of alternative phishing or malware delivery methods.

A timeline of FlyingYeti’s activity and our corresponding mitigations can be found below.

Event timeline

Date	Event Description
2024-04-18 12:18	Threat Actor (TA) creates a Worker to handle requests from a phishing site
2024-04-18 14:16	TA creates phishing site komunalka[.]github[.]io on GitHub
2024-04-25 12:25	TA creates a GitHub repo to host a RAR file
2024-04-26 07:46	TA updates the first Worker to handle requests from users visiting komunalka[.]github[.]io
2024-04-26 08:24	TA uploads a benign test RAR to the GitHub repo
2024-04-26 13:38	Cloudforce One identifies a Worker receiving requests from users visiting komunalka[.]github[.]io, observes its use as a phishing page
2024-04-26 13:46	Cloudforce One identifies that the Worker fetches a RAR file from GitHub (the malicious RAR payload is not yet hosted on the site)
2024-04-26 19:22	Cloudforce One creates a detection to identify the Worker that fetches the RAR
2024-04-26 21:13	Cloudforce One deploys real-time monitoring of the RAR file on GitHub
2024-05-02 06:35	TA deploys a weaponized RAR (CVE-2023-38831) to GitHub with their COOKBOX malware packaged in the archive
2024-05-06 10:03	TA attempts to update the Worker with link to weaponized RAR, the Worker is immediately blocked
2024-05-06 10:38	TA creates a new Worker, the Worker is immediately blocked
2024-05-06 11:04	TA creates a new account (#2) on Cloudflare
2024-05-06 11:06	TA creates a new Worker on account #2 (blocked)
2024-05-06 11:50	TA creates a new Worker on account #2 (blocked)
2024-05-06 12:22	TA creates a new modified Worker on account #2
2024-05-06 16:05	Cloudforce One disables the running Worker on account #2
2024-05-07 22:16	TA notices the Worker is blocked, ceases all operations
2024-05-07 22:18	TA deletes original Worker first created to fetch the RAR file from the GitHub phishing page
2024-05-09 19:28	Cloudforce One adds phishing page komunalka[.]github[.]io to real-time monitoring
2024-05-13 07:36	TA updates the github.io phishing site to point directly to the GitHub RAR link
2024-05-13 17:47	Cloudforce One adds COOKBOX C2 postdock[.]serveftp[.]com to real-time monitoring for DNS resolution
2024-05-14 00:04	Cloudforce One notifies GitHub to take down the RAR file
2024-05-15 09:00	GitHub user, project, and link for RAR are no longer accessible
2024-05-21 08:23	TA updates Komunalka phishing site on github.io to link to pixeldrain URL for dummy payload (pixeldrain only tracks view and download counts)
2024-05-21 08:25	TA updates Komunalka phishing site to link to FileMail URL for dummy payload (FileMail tracks not only view and download counts, but also IP addresses)
2024-05-21 12:21	Cloudforce One downloads PixelDrain document to evaluate payload
2024-05-21 12:47	Cloudforce One downloads FileMail document to evaluate payload
2024-05-29 23:59	GitHub takes down Komunalka phishing site
2024-05-30 13:00	Cloudforce One publishes the results of this investigation

Coordinating our FlyingYeti response

Cloudforce One leveraged industry relationships to provide advanced warning and to mitigate the actor’s activity. To further protect the intended targets from this phishing threat, Cloudforce One notified and collaborated closely with GitHub’s Threat Intelligence and Trust and Safety Teams. We also notified CERT-UA and Cloudflare industry partners such as CrowdStrike, Mandiant/Google Threat Intelligence, and Microsoft Threat Intelligence.

Hunting FlyingYeti operations

There are several ways to hunt FlyingYeti in your environment. These include using PowerShell to hunt for WinRAR files, deploying Microsoft Sentinel analytics rules, and running Splunk scripts as detailed below. Note that these detections may identify activity related to this threat, but may also trigger unrelated threat activity.

PowerShell hunting

Consider running a PowerShell script such as this one in your environment to identify exploitation of CVE-2023-38831. This script will interrogate WinRAR files for evidence of the exploit.

CVE-2023-38831
Description:winrar exploit detection 
open suspios (.tar / .zip / .rar) and run this script to check it 

function winrar-exploit-detect(){
$targetExtensions = @(".cmd" , ".ps1" , ".bat")
$tempDir = [System.Environment]::GetEnvironmentVariable("TEMP")
$dirsToCheck = Get-ChildItem -Path $tempDir -Directory -Filter "Rar*"
foreach ($dir in $dirsToCheck) {
    $files = Get-ChildItem -Path $dir.FullName -File
    foreach ($file in $files) {
        $fileName = $file.Name
        $fileExtension = [System.IO.Path]::GetExtension($fileName)
        if ($targetExtensions -contains $fileExtension) {
            $fileWithoutExtension = [System.IO.Path]::GetFileNameWithoutExtension($fileName); $filename.TrimEnd() -replace '\.$'
            $cmdFileName = "$fileWithoutExtension"
            $secondFile = Join-Path -Path $dir.FullName -ChildPath $cmdFileName
            
            if (Test-Path $secondFile -PathType Leaf) {
                Write-Host "[!] Suspicious pair detected "
                Write-Host "[*]  Original File:$($secondFile)" -ForegroundColor Green 
                Write-Host "[*] Suspicious File:$($file.FullName)" -ForegroundColor Red

                # Read and display the content of the command file
                $cmdFileContent = Get-Content -Path $($file.FullName)
                Write-Host "[+] Command File Content:$cmdFileContent"
            }
        }
    }
}
}
winrar-exploit-detect

Microsoft Sentinel

In Microsoft Sentinel, consider deploying the rule provided below, which identifies WinRAR execution via cmd.exe. Results generated by this rule may be indicative of attack activity on the endpoint and should be analyzed.

DeviceProcessEvents
| where InitiatingProcessParentFileName has @"winrar.exe"
| where InitiatingProcessFileName has @"cmd.exe"
| project Timestamp, DeviceName, FileName, FolderPath, ProcessCommandLine, AccountName
| sort by Timestamp desc

Splunk

Consider using this script in your Splunk environment to look for WinRAR CVE-2023-38831 execution on your Microsoft endpoints. Results generated by this script may be indicative of attack activity on the endpoint and should be analyzed.

| tstats `security_content_summariesonly` count min(_time) as firstTime max(_time) as lastTime from datamodel=Endpoint.Processes where Processes.parent_process_name=winrar.exe `windows_shells` OR Processes.process_name IN ("certutil.exe","mshta.exe","bitsadmin.exe") by Processes.dest Processes.user Processes.parent_process_name Processes.parent_process Processes.process_name Processes.process Processes.process_id Processes.parent_process_id 
| `drop_dm_object_name(Processes)` 
| `security_content_ctime(firstTime)` 
| `security_content_ctime(lastTime)` 
| `winrar_spawning_shell_application_filter`

Cloudflare product detections

Cloudflare Email Security

Cloudflare Email Security (CES) customers can identify FlyingYeti threat activity with the following detections.

CVE-2023-38831
FLYINGYETI.COOKBOX
FLYINGYETI.COOKBOX.Launcher
FLYINGYETI.Rar

Recommendations

Cloudflare recommends taking the following steps to mitigate this type of activity:

Implement Zero Trust architecture foundations:
Deploy Cloud Email Security to ensure that email services are protected against phishing, BEC and other threats
Leverage browser isolation to separate messaging applications like LinkedIn, email, and Signal from your main network
Scan, monitor and/or enforce controls on specific or sensitive data moving through your network environment with data loss prevention policies
Ensure your systems have the latest WinRAR and Microsoft security updates installed
Consider preventing WinRAR files from entering your environment, both at your Cloud Email Security solution and your Internet Traffic Gateway
Run an Endpoint Detection and Response (EDR) tool such as CrowdStrike or Microsoft Defender for Endpoint to get visibility into binary execution on hosts
Search your environment for the FlyingYeti indicators of compromise (IOCs) shown below to identify potential actor activity within your network.

If you’re looking to uncover additional Threat Intelligence insights for your organization or need bespoke Threat Intelligence information for an incident, consider engaging with Cloudforce One by contacting your Customer Success manager or filling out this form.

Indicators of Compromise

Filename	SHA256 Hash	Description
Заборгованість по ЖКП.rar	a0a294f85c8a19be048ffcc05ede6fd5a7ac5e2f0032a3ca0050dc1ae960c314	RAR archive
Рахунок на оплату.pdf                                                                                  .cmd	0cca8f795c7a81d33d36d5204fcd9bc73bdc2af7de315c1449cbc3551ef4fb59	COOKBOX Sample (contained in RAR archive)
Реструктуризація боргу за житлово комунальні послуги.docx	915721b94e3dffa6cef3664532b586be6cf989fec923b26c62fdaf201ee81d2c	Benign Word Document with Tracking Link (contained in RAR archive)
Угода користувача.docx	79a9740f5e5ea4aa2157d9d96df34ee49a32e2d386fe55fedfd1aa33e151c06d	Benign Word Document with Tracking Link (contained in RAR archive)
Рахунок на оплату.pdf	19e25456c2996ded3e29577b609de54a2bef90dad8f868cdad795c18df05a79b	Random Binary Data (contained in RAR archive)
Заборгованість по ЖКП станом на 26.04.24.docx	e0d65e2d36afd3db1b603f10e0488cee3f58ade24d8abc6bee240314d8696708	Random Binary Data (contained in RAR archive)

Domain / URL	Description
komunalka[.]github[.]io	Phishing page
hxxps[:]//github[.]com/komunalka/komunalka[.]github[.]io	Phishing page
hxxps[:]//worker-polished-union-f396[.]vqu89698[.]workers[.]dev	Worker that fetches malicious RAR file
hxxps[:]//raw[.]githubusercontent[.]com/kudoc8989/project/main/Заборгованість по ЖКП.rar	Delivery of malicious RAR file
hxxps[:]//1014[.]filemail[.]com/api/file/get?filekey=e_8S1HEnM5Rzhy_jpN6nL-GF4UAP533VrXzgXjxH1GzbVQZvmpFzrFA&pk_vid=a3d82455433c8ad11715865826cf18f6	Dummy payload
hxxps[:]//pixeldrain[.]com/api/file/ZAJxwFFX?download=	Dummy payload
hxxp[:]//canarytokens[.]com/stuff/tags/ni1cknk2yq3xfcw2al3efs37m/payments.js	Tracking link
hxxp[:]//canarytokens[.]com/stuff/terms/images/k22r2dnjrvjsme8680ojf5ccs/index.html	Tracking link
postdock[.]serveftp[.]com	COOKBOX C2

Using Fortran on Cloudflare Workers

2024-05-07 John Graham-Cumming

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/using-fortran-on-cloudflare-workers

In April 2020, we blogged about how to get COBOL running on Cloudflare Workers by compiling to WebAssembly. The ecosystem around WebAssembly has grown significantly since then, and it has become a solid foundation for all types of projects, be they client-side or server-side.

As WebAssembly support has grown, more and more languages are able to compile to WebAssembly for execution on servers and in browsers. As Cloudflare Workers uses the V8 engine and supports WebAssembly natively, we’re able to support languages that compile to WebAssembly on the platform.

Recently, work on LLVM has enabled Fortran to compile to WebAssembly. So, today, we’re writing about running Fortran code on Cloudflare Workers.

Before we dive into how to do this, here’s a little demonstration of number recognition in Fortran. Draw a number from 0 to 9 and Fortran code running somewhere on Cloudflare’s network will predict the number you drew.

Try yourself on handwritten-digit-classifier.fortran.demos.cloudflare.com.

This is taken from the wonderful Fortran on WebAssembly post but instead of running client-side, the Fortran code is running on Cloudflare Workers. Read on to find out how you can use Fortran on Cloudflare Workers and how that demonstration works.

Wait, Fortran? No one uses that!

Not so fast! Or rather, actually pretty darn fast if you’re doing a lot of numerical programming or have scientific data to work with. Fortran (originally FORmula TRANslator) is very well suited for scientific workloads because of its native functionality for things like arithmetic and handling large arrays and matrices.

If you look at the ranking of the fastest supercomputers in the world you’ll discover that the measurement of “fast” is based on these supercomputers running a piece of software called LINPACK that was originally written in Fortran. LINPACK is designed to help with problems solvable using linear algebra.

The LINPACK benchmarks use LINPACK to solve an n x n system of linear equations using matrix operations and, in doing so, determine how fast supercomputers are. The code is available in Fortran, C and Java.

A related Fortran package, BLAS, also does linear algebra and forms the basis of the number identifying code above. But other Fortran packages are still relevant. Back in 2017, NASA ran a competition to make FUN3D (used to perform calculations of airflow over simulated aircraft). FUN3D is written in Fortran.

So, although Fortran (or at the time FORTRAN) first came to life in 1957, it’s alive and well and being used widely for scientific applications (there’s even Fortran for CUDA). One particular application left Earth 20 years after Fortran was born: Voyager. The Voyager probes use a combination of assembly language and Fortran to keep chugging along.

But back in our solar system, and back on Region: Earth, you can now use Fortran on Cloudflare Workers. Here’s how.

How to get your Fortran code running on Cloudflare Workers

To make it easy to run your Fortran code on Cloudflare Workers, we created a tool called Fortiche (translates to smart in French). It uses Flang and Emscripten under the hood.

Flang is a frontend in LLVM and, if you read the Fortran on WebAssembly blog post, we currently have to patch LLVM to work around a few issues.

Emscripten is used to compile LLVM output and produce code that is compatible with Cloudflare Workers.

This is all packaged in the Fortiche Docker image. Let’s see a simple example.

add.f90:

SUBROUTINE add(a, b, res)
    INTEGER, INTENT(IN) :: a, b
    INTEGER, INTENT(OUT) :: res

    res = a + b
END

Here we defined a subroutine called add that takes a and b, sums them together and places the result in res.

Compile with Fortiche:

docker run -v $PWD:/input -v $PWD/output:/output xtuc/fortiche --export-func=add add.f90

Passing --export-func=add to Fortiche makes the Fortran add subroutine available to JavaScript.

The output folder contains the compiled WebAssembly module and JavaScript from Emscripten, and a JavaScript endpoint generated by Fortiche:

$ ls -lh ./output
total 84K
-rw-r--r-- 1 root root 392 avril 22 12:00 index.mjs
-rw-r--r-- 1 root root 27K avril 22 12:00 out.mjs
-rwxr-xr-x 1 root root 49K avril 22 12:00 out.wasm

And finally the Cloudflare Worker:

// Import what Fortiche generated
import {load} from "../output/index.mjs"

export default {
    async fetch(request: Request): Promise<Response> {
        // Load the Fortran program
        const program = await load();

        // Allocate space in memory for the arguments and result
        const aPtr = program.malloc(4);
        const bPtr = program.malloc(4);
        const outPtr = program.malloc(4);

        // Set argument values
        program.HEAP32[aPtr / 4] = 123;
        program.HEAP32[bPtr / 4] = 321;

        // Run the Fortran add subroutine
        program.add(aPtr, bPtr, outPtr);

        // Read the result
        const res = program.HEAP32[outPtr / 4];

        // Free everything
        program.free(aPtr);
        program.free(bPtr);
        program.free(outPtr);

        return Response.json({ res });
    },
};

Interestingly, the values we pass to Fortran are all pointers, therefore we have to allocate space for each argument and result (the Fortran integer type is four bytes wide), and pass the pointers to the `add` subroutine.

Running the Worker gives us the right answer:

$ curl https://fortran-add.cfdemos.workers.dev

{"res":444}

You can find the full example here.

Handwritten digit classifier

This example is taken from https://gws.phd/posts/fortran_wasm/#mnist. It relies on the BLAS library, which is available in Fortiche with the flag: --with-BLAS-3-12-0.

Note that the LAPACK library is also available in Fortiche with the flag: --with-LAPACK-3-12-0.

You can try on https://handwritten-digit-classifier.fortran.demos.cloudflare.com and find the source code here.

Let us know what you write using Fortran and Cloudflare Workers!

Meta Llama 3 available on Cloudflare Workers AI

2024-04-18 Michelle Chen

Post Syndicated from Michelle Chen original https://blog.cloudflare.com/meta-llama-3-available-on-cloudflare-workers-ai

Workers AI

Workers AI’s initial launch in beta included support for Llama 2, as it was one of the most requested open source models from the developer community. Since that initial launch, we’ve seen developers build all kinds of innovative applications including knowledge sharing chatbots, creative content generation, and automation for various workflows.

At Cloudflare, we know developers want simplicity and flexibility, with the ability to build with multiple AI models while optimizing for accuracy, performance, and cost, among other factors. Our goal is to make it as easy as possible for developers to use their models of choice without having to worry about the complexities of hosting or deploying models.

As soon as we learned about the development of Llama 3 from our partners at Meta, we knew developers would want to start building with it as quickly as possible. Workers AI’s serverless inference platform makes it extremely easy and cost effective to start using the latest large language models (LLMs). Meta’s commitment to developing and growing an open AI-ecosystem makes it possible for customers of all sizes to use AI at scale in production. All it takes is a few lines of code to run inference to Llama 3:

export interface Env {
  // If you set another name in wrangler.toml as the value for 'binding',
  // replace "AI" with the variable name you defined.
  AI: any;
}

export default {
  async fetch(request: Request, env: Env) {
    const response = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
        messages: [
{role: "user", content: "What is the origin of the phrase Hello, World?"}
	 ]
      }
    );

    return new Response(JSON.stringify(response));
  },
};

Built with Meta Llama 3

Llama 3 offers leading performance on a wide range of industry benchmarks. You can learn more about the architecture and improvements on Meta’s blog post. Cloudflare Workers AI supports Llama 3 8B, including the instruction fine-tuned model.

Meta’s testing shows that Llama 3 is the most advanced open LLM today on evaluation benchmarks such as MMLU, GPQA, HumanEval, GSM-8K, and MATH. Llama 3 was trained on an increased number of training tokens (15T), allowing the model to have a better grasp on language intricacies. Larger context windows doubles the capacity of Llama 2, and allows the model to better understand lengthy passages with rich contextual data. Although the model supports a context window of 8k, we currently only support 2.8k but are looking to support 8k context windows through quantized models soon. As well, the new model introduces an efficient new tiktoken-based tokenizer with a vocabulary of 128k tokens, encoding more characters per token, and achieving better performance on English and multilingual benchmarks. This means that there are 4 times as many parameters in the embedding and output layers, making the model larger than the previous Llama 2 generation of models.

Under the hood, Llama 3 uses grouped-query attention (GQA), which improves inference efficiency for longer sequences and also renders their 8B model architecturally equivalent to Mistral-7B. For tokenization, it uses byte-level byte-pair encoding (BPE), similar to OpenAI’s GPT tokenizers. This allows tokens to represent any arbitrary byte sequence — even those without a valid utf-8 encoding. This makes the end-to-end model much more flexible in its representation of language, and leads to improved performance.

Along with the base Llama 3 models, Meta has released a suite of offerings with tools such as Llama Guard 2, Code Shield, and CyberSec Eval 2, which we are hoping to release on our Workers AI platform shortly.

Try it out now

Meta Llama 3 8B is available in the Workers AI Model Catalog today! Check out the documentation here and as always if you want to share your experiences or learn more, join us in the Developer Discord.

Developer Week 2024 wrap-up

2024-04-08 Phillip Jones

Post Syndicated from Phillip Jones original https://blog.cloudflare.com/developer-week-2024-wrap-up

Developer Week 2024 has officially come to a close. Each day last week, we shipped new products and functionality geared towards giving developers the components they need to build full-stack applications on Cloudflare.

Even though Developer Week is now over, we are continuing to innovate with the over two million developers who build on our platform. Building a platform is only as exciting as seeing what developers build on it. Before we dive into a recap of the announcements, to send off the week, we wanted to share how a couple of companies are using Cloudflare to power their applications:

We have been using Workers for image delivery using R2 and have been able to maintain stable operations for a year after implementation. The speed of deployment and the flexibility of detailed configurations have greatly reduced the time and effort required for traditional server management. In particular, we have seen a noticeable cost savings and are deeply appreciative of the support we have received from Cloudflare Workers.
– FAN Communications

Milkshake helps creators, influencers, and business owners create engaging web pages directly from their phone, to simply and creatively promote their projects and passions. Cloudflare has helped us migrate data quickly and affordably with R2. We use Workers as a routing layer between our users’ websites and their images and assets, and to build a personalized analytics offering affordably. Cloudflare’s innovations have consistently allowed us to run infrastructure at a fraction of the cost of other developer platforms and we have been eagerly awaiting updates to D1 and Queues to sustainably scale Milkshake as the product continues to grow.
– Milkshake

In case you missed anything, here’s a quick recap of the announcements and in-depth technical explorations that went out last week:

Summary of announcements

Monday

Announcement	Summary
Making state easy with D1 GA, Hyperdrive, Queues and Workers Analytics Engine updates	A core part of any full-stack application is storing and persisting data! We kicked off the week with announcements that help developers build stateful applications on top of Cloudflare, including making D1, Cloudflare’s SQL database and Hyperdrive, our database accelerating service, generally available.
Building D1: a Global Database	D1, Cloudflare’s SQL database, is now generally available. With new support for 10GB databases, data export, and enhanced query debugging, we empower developers to build production-ready applications with D1 to meet all their relational SQL needs. To support Workers in global applications, we’re sharing a sneak peek of our design and API for D1 global read replication to demonstrate how developers scale their workloads with D1.
Why Workers environment variables contain live objects	Bindings don’t just reduce boilerplate. They are a core design feature of the Workers platform which simultaneously improve developer experience and application security in several ways. Usually these two goals are in opposition to each other, but bindings elegantly solve for both at the same time.

Tuesday

Announcement	Summary
Leveling up Workers AI: General Availability and more new capabilities	We made a series of AI-related announcements, including Workers AI, Cloudflare’s inference platform becoming GA, support for fine-tuned models with LoRAs, one-click deploys from HuggingFace, Python support for Cloudflare Workers, and more.
Running fine-tuned models on Workers AI with LoRAs	Workers AI now supports fine-tuned models using LoRAs. But what is a LoRA and how does it work? In this post, we dive into fine-tuning, LoRAs and even some math to share the details of how it all works under the hood.
Bringing Python to Workers using Pyodide and WebAssembly	We introduced Python support for Cloudflare Workers, now in open beta. We’ve revamped our systems to support Python, from the Workers runtime itself to the way Workers are deployed to Cloudflare’s network. Learn about a Python Worker’s lifecycle, Pyodide, dynamic linking, and memory snapshots in this post.

Wednesday

Announcement	Summary
R2 adds event notifications, support for migrations from Google Cloud Storage, and an infrequent access storage tier	We announced three new features for Cloudflare R2: event notifications, support for migrations from Google Cloud Storage, and an infrequent access storage tier.
Data Anywhere with Pipelines, Event Notifications, and Workflows	We’re making it easier to build scalable, reliable, data-driven applications on top of our global network, and so we announced a new Event Notifications framework; our take on durable execution, Workflows; and an upcoming streaming ingestion service, Pipelines.
Improving Cloudflare Workers and D1 developer experience with Prisma ORM	Together, Cloudflare and Prisma make it easier than ever to deploy globally available apps with a focus on developer experience. To further that goal, Prisma ORM now natively supports Cloudflare Workers and D1 in Preview. With version 5.12.0 of Prisma ORM you can now interact with your data stored in D1 from your Cloudflare Workers with the convenience of the Prisma Client API. Learn more and try it out now.
How Picsart leverages Cloudflare’s Developer Platform to build globally performant services	Picsart, one of the world’s largest digital creation platforms, encountered performance challenges in catering to its global audience. Adopting Cloudflare’s global-by-default Developer Platform emerged as the optimal solution, empowering Picsart to enhance performance and scalability substantially.

Thursday

Announcement	Summary
Announcing Pages support for monorepos, wrangler.toml, database integrations and more!	We launched four improvements to Pages that bring functionality previously restricted to Workers, with the goal of unifying the development experience between the two. Support for monorepos, wrangler.toml, new additions to Next.js support and database integrations!
New tools for production safety — Gradual Deployments, Stack Traces, Rate Limiting, and API SDKs	Production readiness isn’t just about scale and reliability of the services you build with. We announced five updates that put more power in your hands – Gradual Deployments, Source mapped stack traces in Tail Workers, a new Rate Limiting API, brand-new API SDKs, and updates to Durable Objects – each built with mission-critical production services in mind.
What’s new with Cloudflare Media: updates for Calls, Stream, and Images	With Cloudflare Calls in open beta, you can build real-time, serverless video and audio applications. Cloudflare Stream lets your viewers instantly clip from ongoing streams. Finally, Cloudflare Images now supports automatic face cropping and has an upload widget that lets you easily integrate into your application.
Cloudflare Calls: millions of cascading trees all the way down	Cloudflare Calls is a serverless SFU and TURN service running at Cloudflare’s edge. It’s now in open beta and costs $0.05/ real-time GB. It’s 100% anycast WebRTC.

Friday

Announcement	Summary
Browser Rendering API GA, rolling out Cloudflare Snippets, SWR, and bringing Workers for Platforms to all users	Browser Rendering API is now available to all paid Workers customers with improved session management.
Cloudflare acquires Baselime to expand serverless application observability capabilities	We announced that Cloudflare has acquired Baselime, a serverless observability company.
Cloudflare acquires PartyKit to allow developers to build real-time multi-user applications	We announced that PartyKit, a trailblazer in enabling developers to craft ambitious real-time, collaborative, multiplayer applications, is now a part of Cloudflare. This acquisition marks a significant milestone in our journey to redefine the boundaries of serverless computing, making it more dynamic, interactive, and, importantly, stateful.
Blazing fast development with full-stack frameworks and Cloudflare	Full-stack web development with Cloudflare is now faster and easier! You can now use your framework’s development server while accessing D1 databases, R2 object stores, AI models, and more. Iterate locally in milliseconds to build sophisticated web apps that run on Cloudflare. Let’s dev together!
We’ve added JavaScript-native RPC to Cloudflare Workers	Cloudflare Workers now features a built-in RPC (Remote Procedure Call) system for use in Worker-to-Worker and Worker-to-Durable Object communication, with absolutely minimal boilerplate. We’ve designed an RPC system so expressive that calling a remote service can feel like using a library.
Community Update: empowering startups building on Cloudflare and creating an inclusive community	We closed out Developer Week by sharing updates on our Workers Launchpad program, our latest Developer Challenge, and the work we’re doing to ensure our community spaces – like our Discord and Community forums – are safe and inclusive for all developers.

Continue the conversation

Thank you for being a part of Developer Week! Want to continue the conversation and share what you’re building? Join us on Discord. To get started building on Workers, check out our developer documentation.

Cloudflare acquires Baselime to expand serverless application observability capabilities

2024-04-05 Boris Tane

Post Syndicated from Boris Tane original https://blog.cloudflare.com/cloudflare-acquires-baselime-expands-observability-capabilities

Today, we’re thrilled to announce that Cloudflare has acquired Baselime.

The cloud is changing. Just a few years ago, serverless functions were revolutionary. Today, entire applications are built on serverless architectures, from compute to databases, storage, queues, etc. — with Cloudflare leading the way in making it easier than ever for developers to build, without having to think about their architecture. And while the adoption of serverless has made it simple for developers to run fast, it has also made one of the most difficult problems in software even harder: how the heck do you unravel the behavior of distributed systems?

When I started Baselime 2 years ago, our goal was simple: enable every developer to build, ship, and learn from their serverless applications such that they can resolve issues before they become problems.

Since then, we built an observability platform that enables developers to understand the behaviour of their cloud applications. It’s designed for high cardinality and dimensionality data, from logs to distributed tracing with OpenTelemetry. With this data, we automatically surface insights from your applications, and enable you to quickly detect, troubleshoot, and resolve issues in production.

In parallel, Cloudflare has been busy the past few years building the next frontier of cloud computing: the connectivity cloud. The team is building primitives that enable developers to build applications with a completely new set of paradigms, from Workers to D1, R2, Queues, KV, Durable Objects, AI, and all the other services available on the Cloudflare Developers Platform.

This synergy makes Cloudflare the perfect home for Baselime. Our core mission has always been to simplify and innovate around observability for the future of the cloud, and Cloudflare’s ecosystem offers the ideal ground to further this cause. With Cloudflare, we’re positioned to deeply integrate into a platform that tens of thousands of developers trust and use daily, enabling them to quickly build, ship, and troubleshoot applications. We believe that every Worker, Queue, KV, Durable Object, AI call, etc. should have built-in observability by default.

That’s why we’re incredibly excited about the potential of what we can build together and the impact it will have on developers around the world.

To give you a preview into what’s ahead, I wanted to dive deeper into the 3 core concepts we followed while building Baselime.

High Cardinality and Dimensionality

Cardinality and dimensionality are best described using examples. Imagine you’re playing a board game with a deck of cards. High cardinality is like playing a game where every card is a unique character, making it hard to remember or match them. And high dimensionality is like each card has tons of details like strength, speed, magic, aura, etc., making the game’s strategy complex because there’s so much to consider.

This also applies to the data your application emits. For example, when you log an HTTP request that makes database calls.

High cardinality means that your logs can have a unique userId or requestId (which can take millions of distinct values). Those are high cardinality fields.
High dimensionality means that your logs can have thousands of possible fields. You can record each HTTP header of your request and the details of each database call. Any log can be a key-value object with thousands of individual keys.

The ability to query on high cardinality and dimensionality fields is key to modern observability. You can surface all errors or requests for a specific user, compute the duration of each of those requests, and group by location. You can answer all of those questions with a single tool.

OpenTelemetry

OpenTelemetry provides a common set of tools, APIs, SDKs, and standards for instrumenting applications. It is a game-changer for debugging and understanding cloud applications. You get to see the big picture: how fast your HTTP APIs are, which routes are experiencing the most errors, or which database queries are slowest. You can also get into the details by following the path of a single request or user across your entire application.

Baselime is OpenTelemetry native, and it is built from the ground up to leverage OpenTelemetry data. To support this, we built a set of OpenTelemetry SDKs compatible with several serverless runtimes.

Cloudflare is building the cloud of tomorrow and has developed workerd, a modern JavaScript runtime for Workers. With Cloudflare, we are considering embedding OpenTelemetry directly in the Workers’ runtime. That’s one more reason we’re excited to grow further at Cloudflare, enabling more developers to understand their applications, even in the most unconventional scenarios.

Developer Experience

Observability without action is just storage. I have seen too many developers pay for tools to store logs and metrics they never use, and the key reason is how opaque these tools are.

The crux of the issue in modern observability isn’t the technology itself, but rather the developer experience. Many tools are complex, with a significant learning curve. This friction reduces the speed at which developers can identify and resolve issues, ultimately affecting the reliability of their applications. Improving developer experience is key to unlocking the full potential of observability.

We built Baselime to be an exploratory solution that surfaces insights to you rather than requiring you to dig for them. For example, we notify you in real time when errors are discovered in your application, based on your logs and traces. You can quickly search through all of your data with full-text search, or using our powerful query engine, which makes it easy to correlate logs and traces for increased visibility, or ask our AI debugging assistant for insights on the issue you’re investigating.

It is always possible to go from one insight to another, asking questions about the state of your app iteratively until you get to the root cause of the issue you are troubleshooting.

Cloudflare has always prioritised the developer experience of its developer platform, especially with Wrangler, and we are convinced it’s the right place to solve the developer experience problem of observability.

What’s next?

Over the next few months, we’ll work to bring the core of Baselime into the Cloudflare ecosystem, starting with OpenTelemetry, real-time error tracking, and all the developer experience capabilities that make a great observability solution. We will keep building and improving observability for applications deployed outside Cloudflare because we understand that observability should work across providers.

But we don’t want to stop there. We want to push the boundaries of what modern observability looks like. For instance, directly connecting to your codebase and correlating insights from your logs and traces to functions and classes in your codebase. We also want to enable more AI capabilities beyond our debugging assistant. We want to deeply integrate with your repositories such that you can go from an error in your logs and traces to a Pull Request in your codebase within minutes.

We also want to enable everyone building on top of Large Language Models to do all your LLM observability directly within Cloudflare, such that you can optimise your prompts, improve latencies and reduce error rates directly within your cloud provider. These are just a handful of capabilities we can now build with the support of the Cloudflare platform.

Thanks

We are incredibly thankful to our community for its continued support, from day 0 to today. With your continuous feedback, you’ve helped us build something we’re incredibly proud of.

To all the developers currently using Baselime, you’ll be able to keep using the product and will receive ongoing support. Also, we are now making all the paid Baselime features completely free.

Baselime products remain available to sign up for while we work on integrating with the Cloudflare platform. We anticipate sunsetting the Baselime products towards the end of 2024 when you will be able to observe all of your applications within the Cloudflare dashboard. If you’re interested in staying up-to-date on our work with Cloudflare, we will release a signup link in the coming weeks!

We are looking forward to continuing to innovate with you.

Cloudflare acquires PartyKit to allow developers to build real-time multi-user applications

2024-04-05 Sunil Pai

Post Syndicated from Sunil Pai original https://blog.cloudflare.com/cloudflare-acquires-partykit

We’re thrilled to announce that PartyKit, an open source platform for deploying real-time, collaborative, multiplayer applications, is now a part of Cloudflare. This acquisition marks a significant milestone in our journey to redefine the boundaries of serverless computing, making it more dynamic, interactive, and, importantly, stateful.

Defining the future of serverless compute around state

Building real-time applications on the web have always been difficult. Not only is it a distributed systems problem, but you need to provision and manage infrastructure, databases, and other services to maintain state across multiple clients. This complexity has traditionally been a barrier to entry for many developers, especially those who are just starting out.

We announced Durable Objects in 2020 as a way of building synchronized real time experiences for the web. Unlike regular serverless functions that are ephemeral and stateless, Durable Objects are stateful, allowing developers to build applications that maintain state across requests. They also act as an ideal synchronization point for building real-time applications that need to maintain state across multiple clients. Combined with WebSockets, Durable Objects can be used to build a wide range of applications, from multiplayer games to collaborative drawing tools.

In 2022, PartyKit began as a project to further explore the capabilities of Durable Objects and make them more accessible to developers by exposing them through familiar components. In seconds, you could create a project that configured behavior for these objects, and deploy it to Cloudflare. By integrating with popular libraries such as Yjs (the gold standard in collaborative editing) and React, PartyKit made it possible for developers to build a wide range of use cases, from multiplayer games to collaborative drawing tools, into their applications.

Building experiences with real-time components was previously only accessible to multi-billion dollar companies, but new computing primitives like Durable Objects on the edge make this accessible to regular developers and teams. With PartyKit now under our roof, we’re doubling down on our commitment to this future — a future where serverless is stateful.

We’re excited to give you a preview into our shared vision for applications, and the use cases we’re excited to simplify together.

Making state for serverless easy

Unlike conventional approaches that rely on external databases to maintain state, thereby complicating scalability and increasing costs, PartyKit leverages Cloudflare’s Durable Objects to offer a seamless model where stateful serverless functions can operate as if they were running on a single machine, maintaining state across requests. This innovation not only simplifies development but also opens up a broader range of use cases, including real-time computing, collaborative editing, and multiplayer gaming, by allowing thousands of these “machines” to be spun up globally, each maintaining its own state. PartyKit aims to be a complement to traditional serverless computing, providing a more intuitive and efficient method for developing applications that require stateful behavior, thereby marking the “next evolution” of serverless computing.

Simplifying WebSockets for Real-Time Interaction

WebSockets have revolutionized how we think about bidirectional communication on the web. Yet, the challenge has always been about scaling these interactions to millions without a hitch. Cloudflare Workers step in as the hero, providing a serverless framework that makes real-time applications like chat services, multiplayer games, and collaborative tools not just possible but scalable and efficient.

Powering Games and Multiplayer Applications Without Limits

Imagine building multiplayer platforms where the game never lags, the collaboration is seamless, and video conferences are crystal clear. Cloudflare’s Durable Objects morph the stateless serverless landscape into a realm where persistent connections thrive. PartyKit’s integration into this ecosystem means developers now have a powerhouse toolkit to bring ambitious multiplayer visions to life, without the traditional overheads.

This is especially critical in gaming — there are few areas where low-latency and real-time interaction matter more. Every millisecond, every lag, every delay defines the entire experience. With PartyKit’s capabilities integrated into Cloudflare, developers will be able to leverage our combined technologies to create gaming experiences that are not just about playing but living the game, thanks to scalable, immersive, and interactive platforms.

The toolkit for building Local-First applications

The Internet is great, and increasingly always available, but there are still a few situations where we are forced to disconnect — whether on a plane, a train, or a beach.

The premise of local-first applications is that work doesn’t stop when the Internet does. Wherever you left off in your doc, you can keep working on it, assuming the state will be restored when you come back online. By storing data on the client and syncing when back online, these applications offer resilience and responsiveness that’s unmatched. Cloudflare’s vision, enhanced by PartyKit’s technology, aims to make local-first not just an option but the standard for application development.

What’s next for PartyKit users?

Users can expect their existing projects to continue working as expected. We will be adding more features to the platform, including the ability to create and use PartyKit projects inside existing Workers and Pages projects. There will be no extra charges to use PartyKit for commercial purposes, other than the standard usage charges for Cloudflare Workers and other services. Further, we’re going to expand the roadmap to begin working on integrations with popular frameworks and libraries, such as React, Vue, and Angular. We’re deeply committed to executing on the PartyKit vision and roadmap, and we’re excited to see what you build with it.

The Beginning of a New Chapter

The acquisition of PartyKit by Cloudflare isn’t just a milestone for our two teams; it’s a leap forward for developers everywhere. Together, we’re not just building tools; we’re crafting the foundation for the next generation of Internet applications. The future of serverless is stateful, and with PartyKit’s expertise now part of our arsenal, we’re more ready than ever to make that future a reality.

Welcome to the Cloudflare team, PartyKit. Look forward to building something remarkable together.

We’ve added JavaScript-native RPC to Cloudflare Workers

2024-04-05 Kenton Varda

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/javascript-native-rpc

Cloudflare Workers now features a built-in RPC (Remote Procedure Call) system enabling seamless Worker-to-Worker and Worker-to-Durable Object communication, with almost no boilerplate. You just define a class:

export class MyService extends WorkerEntrypoint {
  sum(a, b) {
    return a + b;
  }
}

And then you call it:

let three = await env.MY_SERVICE.sum(1, 2);

No schemas. No routers. Just define methods of a class. Then call them. That’s it.

But that’s not it

This isn’t just any old RPC. We’ve designed an RPC system so expressive that calling a remote service can feel like using a library – without any need to actually import a library! This is important not just for ease of use, but also security: fewer dependencies means fewer critical security updates and less exposure to supply-chain attacks.

To this end, here are some of the features of Workers RPC:

For starters, you can pass Structured Clonable types as the params or return value of an RPC. (That means that, unlike JSON, Dates just work, and you can even have cycles.)
You can additionally pass functions in the params or return value of other functions. When the other side calls the function you passed to it, they make a new RPC back to you.
Similarly, you can pass objects with methods. Method calls become further RPCs.
RPC to another Worker (over a Service Binding) usually does not even cross a network. In fact, the other Worker usually runs in the very same thread as the caller, reducing latency to zero. Performance-wise, it’s almost as fast as an actual function call.
When RPC does cross a network (e.g. to a Durable Object), you can invoke a method and then speculatively invoke further methods on the result in a single network round trip.
You can send a byte stream over RPC, and the system will automatically stream the bytes with proper flow control.
All of this is secure, based on the object-capability model.
The protocol and implementation are fully open source as part of workerd.

Workers RPC is a JavaScript-native RPC system. Under the hood, it is built on Cap’n Proto. However, unlike Cap’n Proto, Workers RPC does not require you to write a schema. (Of course, you can use TypeScript if you like, and we provide tools to help with this.)

In general, Workers RPC is designed to “just work” using idiomatic JavaScript code, so you shouldn’t have to spend too much time looking at docs. We’ll give you an overview in this blog post. But if you want to understand the full feature set, check out the documentation.

Why RPC? (And what is RPC anyway?)

Remote Procedure Calls (RPC) are a way of expressing communications between two programs over a network. Without RPC, you might communicate using a protocol like HTTP. With HTTP, though, you must format and parse your communications as an HTTP request and response, perhaps designed in REST style. RPC systems try to make communications look like a regular function call instead, as if you were calling a library rather than a remote service. The RPC system provides a “stub” object on the client side which stands in for the real server-side object. When a method is called on the stub, the RPC system figures out how to serialize and transmit the parameters to the server, invoke the method on the server, and then transmit the return value back.

The merits of RPC have been subject to a great deal of debate. RPC is often accused of committing many of the fallacies of distributed computing.

But this reputation is outdated. When RPC was first invented some 40 years ago, async programming barely existed. We did not have Promises, much less async and await. Early RPC was synchronous: calls would block the calling thread waiting for a reply. At best, latency made the program slow. At worst, network failures would hang or crash the program. No wonder it was deemed “broken”.

Things are different today. We have Promise and async and await, and we can throw exceptions on network failures. We even understand how RPCs can be pipelined so that a chain of calls takes only one network round trip. Many large distributed systems you likely use every day are built on RPC. It works.

The fact is, RPC fits the programming model we’re used to. Every programmer is trained to think in terms of APIs composed of function calls, not in terms of byte stream protocols nor even REST. Using RPC frees you from the need to constantly translate between mental models, allowing you to move faster.

Example: Authentication Service

Here’s a common scenario: You have one Worker that implements an application, and another Worker that is responsible for authenticating user credentials. The app Worker needs to call the auth Worker on each request to check the user’s cookie.

This example uses a Service Binding, which is a way of configuring one Worker with a private channel to talk to another, without going through a public URL. Here, we have an application Worker that has been configured with a service binding to the Auth worker.

Before RPC, all communications between Workers needed to use HTTP. So, you might write code like this:

// OLD STYLE: HTTP-based service bindings.
export default {
  async fetch(req, env, ctx) {
    // Call the auth service to authenticate the user's cookie.
    // We send it an HTTP request using a service binding.

    // Construct a JSON request to the auth service.
    let authRequest = {
      cookie: req.headers.get("Cookie")
    };

    // Send it to env.AUTH_SERVICE, which is our service binding
    // to the auth worker.
    let resp = await env.AUTH_SERVICE.fetch(
        "https://auth/check-cookie", {
      method: "POST",
      headers: {
        "Content-Type": "application/json; charset=utf-8",
      },
      body: JSON.stringify(authRequest)
    });

    if (!resp.ok) {
      return new Response("Internal Server Error", {status: 500});
    }

    // Parse the JSON result.
    let authResult = await resp.json();

    // Use the result.
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }
    let username = authResult.username;

    return new Response(`Hello, ${username}!`);
  }
}

Meanwhile, your auth server might look like:

// OLD STYLE: HTTP-based auth server.
export default {
  async fetch(req, env, ctx) {
    // Parse URL to decide what endpoint is being called.
    let url = new URL(req.url);
    if (url.pathname == "/check-cookie") {
      // Parse the request.
      let authRequest = await req.json();

      // Look up cookie in Workers KV.
      let cookieInfo = await env.COOKIE_MAP.get(
          hash(authRequest.cookie), "json");

      // Construct the response.
      let result;
      if (cookieInfo) {
        result = {
          authorized: true,
          username: cookieInfo.username
        };
      } else {
        result = { authorized: false };
      }

      return Response.json(result);
    } else {
      return new Response("Not found", {status: 404});
    }
  }
}

This code has a lot of boilerplate involved in setting up an HTTP request to the auth service. With RPC, we can instead express this as a function call:

// NEW STYLE: RPC-based service bindings
export default {
  async fetch(req, env, ctx) {
    // Call the auth service to authenticate the user's cookie.
    // We invoke it using a service binding.
    let authResult = await env.AUTH_SERVICE.checkCookie(
        req.headers.get("Cookie"));

    // Use the result.
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }
    let username = authResult.username;

    return new Response(`Hello, ${username}!`);
  }
}

And the server side becomes:

// NEW STYLE: RPC-based auth server.
import { WorkerEntrypoint } from "cloudflare:workers";

export class AuthService extends WorkerEntrypoint {
  async checkCookie(cookie) {
    // Look up cookie in Workers KV.
    let cookieInfo = await this.env.COOKIE_MAP.get(
        hash(cookie), "json");

    // Return result.
    if (cookieInfo) {
      return {
        authorized: true,
        username: cookieInfo.username
      };
    } else {
      return { authorized: false };
    }
  }
}

This is a pretty nice simplification… but we can do much more!

Let’s get fancy! Or should I say… classy?

Let’s say we want our auth service to do a little more. Instead of just checking cookies, it provides a whole API around user accounts. In particular, it should let you:

Get or update the user’s profile info.
Send the user an email notification.
Append to the user’s activity log.

But, these operations should only be allowed after presenting the user’s credentials.

Here’s what the server might look like:

import { WorkerEntrypoint, RpcTarget } from "cloudflare:workers";

// `User` is an RPC interface to perform operations on a particular
// user. This class is NOT exported as an entrypoint; it must be
// received as the result of the checkCookie() RPC.
class User extends RpcTarget {
  constructor(uid, env) {
    super();

    // Note: Instance members like these are NOT exposed over RPC.
    // Only class (prototype) methods and properties are exposed.
    this.uid = uid;
    this.env = env;
  }

  // Get/set user profile, backed by Worker KV.
  async getProfile() {
    return await this.env.PROFILES.get(this.uid, "json");
  }
  async setProfile(profile) {
    await this.env.PROFILES.put(this.uid, JSON.stringify(profile));
  }

  // Send the user a notification email.
  async sendNotification(message) {
    let addr = await this.env.EMAILS.get(this.uid);
    await this.env.EMAIL_SERVICE.send(addr, message);
  }

  // Append to the user's activity log.
  async logActivity(description) {
    // (Please excuse this somewhat problematic implementation,
    // this is just a dumb example.)
    let timestamp = new Date().toISOString();
    await this.env.ACTIVITY.put(
        `${this.uid}/${timestamp}`, description);
  }
}

// Now we define the entrypoint service, which can be used to
// get User instances -- but only by presenting the cookie.
export class AuthService extends WorkerEntrypoint {
  async checkCookie(cookie) {
    // Look up cookie in Workers KV.
    let cookieInfo = await this.env.COOKIE_MAP.get(
        hash(cookie), "json");

    if (cookieInfo) {
      return {
        authorized: true,
        user: new User(cookieInfo.uid, this.env),
      };
    } else {
      return { authorized: false };
    }
  }
}

Now we can write a Worker that uses this API while displaying a web page:

export default {
  async fetch(req, env, ctx) {
    // `using` is a new JavaScript feature. Check out the
    // docs for more on this:
    // https://developers.cloudflare.com/workers/runtime-apis/rpc/lifecycle/
    using authResult = await env.AUTH_SERVICE.checkCookie(
        req.headers.get("Cookie"));
    if (!authResult.authorized) {
      return new Response("Not authorized", {status: 403});
    }

    let user = authResult.user;
    let profile = await user.getProfile();

    await user.logActivity("You visited the site!");
    await user.sendNotification(
        `Thanks for visiting, ${profile.name}!`);

    return new Response(`Hello, ${profile.name}!`);
  }
}

Finally, this worker needs to be configured with a service binding pointing at the AuthService class. Its wrangler.toml may look like:

name = "app-worker"
main = "./src/app.js"

# Declare a service binding to the auth service.
[[services]]
binding = "AUTH_SERVICE"    # name of the binding in `env`
service = "auth-service"    # name of the worker in the dashboard
entrypoint = "AuthService"  # name of the exported RPC class

Wait, how?

What exactly happened here? The Server created an instance of the class User and returned it to the client. It has methods that the client can then just call? Are we somehow transferring code over the wire?

No, absolutely not! All code runs strictly in the isolate where it was originally loaded. What actually happens is, when the return value is passed over RPC, all class instances are replaced with RPC stubs. The stub, when called, makes a new RPC back to the server, where it calls the method on the original User object that was created there:

But then you might ask: how does the RPC stub know what methods are available? Is a list of methods passed over the wire?

In fact, no. The RPC stub is a special object called a “Proxy“. It implements a “wildcard method”, that is, it appears to have an infinite number of methods of every possible name. When you try to call a method, the name you called is sent to the server. If the original object has no such method, an exception is thrown.

Did you spot the security?

In the above example, we see that RPC is easy to use. We made an object! We called it! It all just felt natural, like calling a local API! Hooray!

But there’s another extremely important property that the AuthService API has which you may have missed: As designed, you cannot perform any operation on a user without first checking the cookie. This is true despite the fact that the individual method calls do not require sending the cookie again, and the User object itself doesn’t store the cookie.

The trick is, the initial checkCookie() RPC is what returns a User object in the first place. The AuthService API does not provide any other way to obtain a User instance. The RPC client cannot create a User object out of thin air, and cannot call methods of an object without first explicitly receiving a reference to it.

This is called capability-based security: we say that the User reference received by the client is a “capability”, because receiving it grants the client the ability to perform operations on the user. The getProfile() method grants this capability only when the client has presented the correct cookie.

Capability-based security is often like this: security can be woven naturally into your APIs, rather than feel like an additional concern bolted on top.

More security: Named entrypoints

Another subtle but important detail to call out: in the above example, the auth service’s RPC API is exported as a named class:

export class AuthService extends WorkerEntrypoint {

And in our wrangler.toml for the calling worker, we had to specify an “entrypoint”, matching the class name:

entrypoint = "AuthService"  # name of the exported RPC class

In the past, service bindings would bind to the “default” entrypoint, declared with export default {. But, the default entrypoint is also typically exposed to the Internet, e.g. automatically mapped to a hostname under workers.dev (unless you explicitly turn that off). It can be tricky to safely assume that requests arriving at this entrypoint are in any way trusted.

With named entrypoints, this all changes. A named entrypoint is only accessible to Workers which have explicitly declared a binding to it. By default, only Workers on your own account can declare such bindings. Moreover, the binding must be declared at deploy time; a Worker cannot create new service bindings at runtime.

Thus, you can trust that requests arriving at a named entrypoint can only have come from Workers on your account and for which you explicitly created a service binding. In the future, we plan to extend this pattern further with the ability to lock down entrypoints, audit which Workers have bindings to them, tell the callee information about who is calling at runtime, and so on. With these tools, there is no need to write code in your app itself to authenticate access to internal APIs; the system does it for you.

What about type safety?

Workers RPC works in an entirely dynamically-typed way, just as JavaScript itself does. But just as you can apply TypeScript on top of JavaScript in general, you can apply it to Workers RPC.

The @cloudflare/workers-types package defines the type Service<MyEntrypointType>, which describes the type of a service binding. MyEntrypointType is the type of your server-side interface. Service<MyEntrypointType> applies all the necessary transformations to turn this into a client-side type, such as converting all methods to async, replacing functions and RpcTargets with (properly-typed) stubs, and so on.

It is up to you to share the definition of MyEntrypointType between your server app and its clients. You might do this by defining the interface in a separate shared TypeScript file, or by extracting a .d.ts type declaration file from your server code using tsc --declaration.

With that done, you can apply types to your client:

import { WorkerEntrypoint } from "cloudflare:workers";

// The interface that your server-side entrypoint implements.
// (This would probably be imported from a .d.ts file generated
// from your server code.)
declare class MyEntrypointType extends WorkerEntrypoint {
  sum(a: number, b: number): number;
}

// Define an interface Env specifying the bindings your client-side
// worker expects.
interface Env {
  MY_SERVICE: Service<MyEntrypointType>;
}

// Define the client worker's fetch handler with typed Env.
export default <ExportedHandler<Env>> {
  async fetch(req, env, ctx) {
    // Now env.MY_SERVICE is properly typed!
    const result = await env.MY_SERVICE.sum(1, 2);
    return new Response(result.toString());
  }
}

RPC to Durable Objects

Durable Objects allow you to create a “named” worker instance somewhere on the network that multiple other workers can then talk to, in order to coordinate between them. Each Durable Object also has its own private on-disk storage where it can store state long-term.

Previously, communications with a Durable Object had to take the form of HTTP requests and responses. With RPC, you can now just declare methods on your Durable Object class, and call them on the stub. One catch: to opt into RPC, you must declare your Durable Object class with extends DurableObject, like so:

import { DurableObject } from "cloudflare:workers";

export class Counter extends DurableObject {
  async increment() {
    // Increment our stored value and return it.
    let value = await this.ctx.storage.get("value");
    value = (value || 0) + 1;
    this.ctx.storage.put("value", value);
    return value;
  }
}

Now we can call it like:

let stub = env.COUNTER_NAMEPSACE.get(id);
let value = await stub.increment();

TypeScript is supported here too, by defining your binding with type DurableObjectNamespace<ServerType>:

interface Env {
  COUNTER_NAMESPACE: DurableObjectNamespace<Counter>;
}

Eliding awaits with speculative calls

When talking to a Durable Object, the object may be somewhere else in the world from the caller. RPCs must cross the network. This takes time: despite our best efforts, we still haven’t figured out how to make information travel faster than the speed of light.

When you have a complex RPC interface where one call returns an object on which you wish to make further method calls, it’s easy to end up with slow code that makes too many round trips over the network.

// Makes three round trips.
let foo = await stub.foo();
let baz = await foo.bar.baz();
let corge = await baz.qux[3].corge();

Workers RPC features a way to avoid this: If you know that a call will return a value containing a stub, and all you want to do with it is invoke a method on that stub, you can skip awaiting it:

// Same thing, only one round trip.
let foo = stub.foo();
let baz = foo.bar.baz();
let corge = await baz.qux[3].corge();

Whoa! How does this work?

RPC methods do not return normal promises. Instead, they return special RPC promises. These objects are “custom thenables“, which means you can use them in all the ways you’d use a regular Promise, like awaiting it or calling .then() on it.

But an RPC promise is more than just a thenable. It is also a proxy. Like an RPC stub, it has a wildcard property. You can use this to express speculative RPC calls on the eventual result, before it has actually resolved. These speculative calls will be sent to the server immediately, so that they can begin executing as soon as the first RPC has finished there, before the result has actually made its way back over the network to the client.

This feature is also known as “Promise Pipelining”. Although it isn’t explicitly a security feature, it is commonly provided by object-capability RPC systems like Cap’n Proto.

The future: Custom Bindings Marketplace?

For now, Service Bindings and Durable Objects only allow communication between Workers running on the same account. So, RPC can only be used to talk between your own Workers.

But we’d like to take it further.

We have previously explained why Workers environments contain live objects, also known as “bindings”. But today, only Cloudflare can add new binding types to the Workers platform – like Queues, KV, or D1. But what if anyone could invent their own binding type, and give it to other people?

Previously, we thought this would require creating a way to automatically load client libraries into the calling Workers. That seemed scary: it meant using someone’s binding would require trusting their code to run inside your isolate. With RPC, there’s no such trust. The binding only sees exactly what you explicitly pass to it. It cannot compromise the rest of your Worker.

Could Workers RPC provide the basis for a “bindings marketplace”, where people can offer rich JavaScript APIs to each other in an easy and secure way? We’re excited to explore and find out.

Try it now

Workers RPC is available today for all Workers users. To get started, check out the docs.