All posts by Ben Yule

Announcing support for WASI on Cloudflare Workers

Post Syndicated from Ben Yule original https://blog.cloudflare.com/announcing-wasi-on-workers/

Announcing support for WASI on Cloudflare Workers

Announcing support for WASI on Cloudflare Workers

Today, we are announcing experimental support for WASI (the WebAssembly System Interface) on Cloudflare Workers and support within wrangler2 to make it a joy to work with. We continue to be incredibly excited about the entire WebAssembly ecosystem and are eager to adopt the standards as they are developed.

A Quick Primer on WebAssembly

So what is WASI anyway? To understand WASI, and why we’re excited about it, it’s worth a quick recap of WebAssembly, and the ecosystem around it.

WebAssembly promised us a future in which code written in compiled languages could be compiled to a common binary format and run in a secure sandbox, at near native speeds. While WebAssembly was designed with the browser in mind, the model rapidly extended to server-side platforms such as Cloudflare Workers (which has supported WebAssembly since 2017).

WebAssembly was originally designed to run alongside Javascript, and requires developers to interface directly with Javascript in order to access the world outside the sandbox. To put it another way, WebAssembly does not provide any standard interface for I/O tasks such as interacting with files, accessing the network, or reading the system clock. This means if you want to respond to an event from the outside world, it’s up to the developer to handle that event in JavaScript, and directly call functions exported from the WebAssembly module. Similarly, if you want to perform I/O from within WebAssembly, you need to implement that logic in Javascript and import it into the WebAssembly module.

Custom toolchains such as Emscripten or libraries such as wasm-bindgen have emerged to make this easier, but they are language specific and add a tremendous amount of complexity and bloat. We’ve even built our own library, workers-rs, using wasm-bindgen that attempts to make writing applications in Rust feel native within a Worker – but this has proven not only difficult to maintain, but requires developers to write code that is Workers specific, and is not portable outside the Workers ecosystem.

We need more.

The WebAssembly System Interface (WASI)

WASI aims to provide a standard interface that any language compiling to WebAssembly can target. You can read the original post by Lin Clark here, which gives an excellent introduction – code cartoons and all. In a nutshell, Lin describes WebAssembly as an assembly language for a ‘conceptual machine’, whereas WASI is a systems interface for a ‘conceptual operating system.’

This standardization of the system interface has paved the way for existing toolchains to cross-compile existing codebases to the wasm32-wasi target. A tremendous amount of progress has already been made, specifically within Clang/LLVM via the wasi-sdk and Rust toolchains. These toolchains leverage a version of Libc, which provides POSIX standard API calls, that is built on top of WASI ‘system calls.’ There are even basic implementations in more fringe toolchains such as TinyGo and SwiftWasm.

Practically speaking, this means that you can now write applications that not only interoperate with any WebAssembly runtime implementing the standard, but also any POSIX compliant system! This means the exact same ‘Hello World!’ that runs on your local Linux/Mac/Windows WSL machine.

Show me the code

WASI sounds great, but does it actually make my life easier? You tell us. Let’s run through an example of how this would work in practice.

First, let’s generate a basic Rust “Hello, world!” application, compile, and run it.

$ cargo new hello_world
$ cd ./hello_world
$ cargo build --release
   Compiling hello_world v0.1.0 (/Users/benyule/hello_world)
    Finished release [optimized] target(s) in 0.28s
$ ./target/release/hello_world
Hello, world!

It doesn’t get much simpler than this. You’ll notice we only define a main() function followed by a println to stdout.

fn main() {
    println!("Hello, world!");
}

Now, let’s take the exact same program and compile against the wasm32-wasi target, and run it in an ‘off the shelf’ wasm runtime such as Wasmtime.

$ cargo build --target wasm32-wasi --release
$ wasmtime target/wasm32-wasi/release/hello_world.wasm

Hello, world!

Neat! The same code compiles and runs in multiple POSIX environments.

Finally, let’s take the binary we just generated for Wasmtime, but instead publish it to Workers using Wrangler2.

$ npx wrangler@wasm dev target/wasm32-wasi/release/hello_world.wasm
$ curl http://localhost:8787/

Hello, world!

Unsurprisingly, it works! The same code is compatible in multiple POSIX environments and the same binary is compatible across multiple WASM runtimes.

Running your CLI apps in the cloud

The attentive reader may notice that we played a small trick with the HTTP request made via cURL. In this example, we actually stream stdin and stdout to/frome the Worker using the HTTP request and response body respectively. This pattern enables some really interesting use cases, specifically, programs designed to run on the command line can be deployed as ‘services’ to the cloud.

‘Hexyl’ is an example that works completely out of the box. Here, we ‘cat’ a binary file on our local machine and ‘pipe’ the output to curl, which will then POST that output to our service and stream the result back. Following the steps we used to compile our ‘Hello World!’, we can compile hexyl.

$ git clone [email protected]:sharkdp/hexyl.git
$ cd ./hexyl
$ cargo build --target wasm32-wasi --release

And without further modification we were able to take a real-world program and create something we can now run or deploy. Again, let’s tell wrangler2 to preview hexyl, but this time give it some input.

$ npx wrangler@wasm dev target/wasm32-wasi/release/hexyl.wasm
$ echo "Hello, world\!" | curl -X POST --data-binary @- http://localhost:8787

┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 48 65 6c 6c 6f 20 77 6f ┊ 72 6c 64 21 0a          │Hello wo┊rld!_   │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

Give it a try yourself by hitting https://hexyl.examples.workers.dev.

echo "Hello world\!" | curl https://hexyl.examples.workers.dev/ -X POST --data-binary @- --output -

A more useful example, but requires a bit more work, would be to deploy a utility such as swc (swc.rs), to the cloud and use it as an on demand JavaScript/TypeScript transpilation service. Here, we have a few extra steps to ensure that the compiled output is as small as possible, but it otherwise runs out-of-the-box. Those steps are detailed in https://github.com/zebp/wasi-example-swc, but for now let’s gloss over that and interact with the hosted example.

$ echo "const x = (x, y) => x * y;" | curl -X POST --data-binary @- https://swc-wasi.examples.workers.dev/ --output -

var x=function(a,b){return a*b}

Finally, we can also do the same with C/C++, but requires a little more lifting to get our Makefile right. Here we show an example of compiling zstd and uploading it as a streaming compression service.

https://github.com/zebp/wasi-example-zstd

$ echo "Hello world\!" | curl https://zstd.examples.workers.dev/ -s -X POST --data-binary @- | file -

What if I want to use WASI from within a JavaScript Worker?

Wrangler can make it really easy to deploy code without having to worry about the Workers ecosystem, but in some cases you may actually want to invoke your WASI based WASM module from Javascript. This can be achieved with the following simple boilerplate. An updated README will be kept at https://github.com/cloudflare/workers-wasi.

import { WASI } from "@cloudflare/workers-wasi";
import demoWasm from "./demo.wasm";

export default {
  async fetch(request, _env, ctx) {
    // Creates a TransformStream we can use to pipe our stdout to our response body.
    const stdout = new TransformStream();
    const wasi = new WASI({
      args: [],
      stdin: request.body,
      stdout: stdout.writable,
    });

    // Instantiate our WASM with our demo module and our configured WASI import.
    const instance = new WebAssembly.Instance(demoWasm, {
      wasi_snapshot_preview1: wasi.wasiImport,
    });

    // Keep our worker alive until the WASM has finished executing.
    ctx.waitUntil(wasi.start(instance));

    // Finally, let's reply with the WASM's output.
    return new Response(stdout.readable);
  },
};

Now with our JavaScript boilerplate and wasm, we can easily deploy our worker with Wrangler’s WASM feature.

$ npx wrangler publish
Total Upload: 473.89 KiB / gzip: 163.79 KiB
Uploaded wasi-javascript (2.75 sec)
Published wasi-javascript (0.30 sec)
  wasi-javascript.zeb.workers.dev

Back to the future

For those of you who have been around for the better part of the past couple of decades, you may notice this looks very similar to RFC3875, better known as CGI (The Common Gateway Interface). While our example here certainly does not conform to the specification, you can imagine how this can be extended to turn the stdin of a basic ‘command line’ application into a full-blown http handler.

We are thrilled to learn where developers take this from here. Share what you build with us on Discord or Twitter!

How we built Instant Logs

Post Syndicated from Ben Yule original https://blog.cloudflare.com/how-we-built-instant-logs/

How we built Instant Logs

How we built Instant Logs

As a developer, you may be all too familiar with the stress of responding to a major service outage, becoming aware of an ongoing security breach, or simply dealing with the frustration of setting up a new service for the first time. When confronted with these situations, you want a real-time view into the events flowing through your network, so you can receive, process, and act on information as quickly as possible.

If you have a UNIX mindset, you’ll be familiar with tailing web service logs and searching for patterns using grep. With distributed systems like Cloudflare’s edge network, this task becomes much more complex because you’ll either need to log in to thousands of servers, or ship all the logs to a single place.

This is why we built Instant Logs. Instant Logs removes all barriers to accessing your Cloudflare logs, giving you a complete platform to view your HTTP logs in real time, with just a single click, right from within Cloudflare’s dashboard. Powerful filters then let you drill into specific events or search for patterns, and act on them instantly.

The Challenge

Today, Cloudflare’s Logpush product already gives customers the ability to ship their logs to a third-party analytics or storage provider of their choosing. While this system is already exceptionally fast, delivering logs in about 15s on average, it is optimized for completeness and the utmost certainty that your data is reliably making it to its destination. It is the ideal solution for after things have settled down, and you want to perform a forensic deep dive or retrospective.

We originally aimed to extend this system to provide our real-time logging capabilities, but we soon realized the objectives were inherently at odds with each other. In order to get all of your data, to a single place, all the time, the laws of the universe require that latencies be introduced into the system. We needed a complementary solution, with its own unique set of objectives.

This ultimately boiled down to the following

  1. It has to be extremely fast, in human terms. This means average latencies between an event occurring at the edge and being received by the client should be under three seconds.
  2. We wanted the system design to be simple, and communication to be as direct to the client as possible. This meant operating the dataplane entirely at the edge, eliminating unnecessary round trips to a core data center.
  3. The pipeline needs to provide sensible results on properties of all sizes, ranging from a few requests per day to hundreds of thousands of requests per second.
  4. The pipeline must support a broad set of user-definable filters that are applied before any sampling occurs, such that a user can target and receive exactly what they want.

Workers and Durable Objects

Our existing Logpush pipeline relies heavily on Kafka to provide sharding, buffering, and aggregation at a single, central location. While we’ve had excellent results using Kafka for these pipelines, the clusters are optimized to run only within our core data centers. Using Kafka would require extra hops to far away data centers, adding a latency penalty we were not willing to incur.

In order to keep the data plane running on the edge, we needed primitives that would allow us to perform some of the same key functions we needed out of Kafka. This is where Workers and the recently released Durable Objects come in. Workers provide an incredibly simple to use, highly elastic, edge-native, compute platform we can use to receive events, and perform transformations. Durable Objects, through their global uniqueness, allow us to coordinate messages streaming from thousands of servers and route them to a singular object. This is where aggregation and buffering are performed, before finally pushing to a client over a thin WebSocket. We get all of this, without ever having to leave the edge!

Let’s walk through what this looks like in practice.

A Simple Start

Imagine a simple scenario in which we have a single web server which produces log messages, and a single client which wants to consume them. This can be implemented by creating a Durable Object, which we will refer to as a Durable Session, that serves as the point of coordination between the server and client. In this case, the client initiates a WebSocket connection with the Durable Object, and the server sends messages to the Durable Object over HTTP, which are then forwarded directly to the client.

How we built Instant Logs

This model is quite quick and introduces very little additional latency other than what would be required to send a payload directly from the web server to the client. This is thanks to the fact that Durable Objects are generally located at or near the data center where they are first requested. At least in human terms, it’s instant. Adding more servers to our model is also trivial. As the additional servers produce events, they will all be routed to the same Durable Object, which merges them into a single stream, and sends them to the client over the same WebSocket.

How we built Instant Logs

Durable Objects are inherently single threaded. As the number of servers in our simple example increases, the Durable Object will eventually saturate its CPU time and will eventually start to reject incoming requests. And even if it didn’t, as data volumes increase, we risk overwhelming a client’s ability to download and render log lines. We’ll handle this in a few different ways.

Honing in on specific events

Filtering is the most simple and obvious way to reduce data volume before it reaches the client. If we can filter out the noise, and stream only the events of interest, we can substantially reduce volume. Performing this transformation in the Durable Object itself will provide no relief from CPU saturation concerns. Instead, we can push this filtering out to an invoking Worker, which will run many filter operations in parallel, as it elastically scales to process all the incoming requests to the Durable Object. At this point, our architecture starts to look a lot like the MapReduce pattern!

How we built Instant Logs

Scaling up with shards

Ok, so filtering may be great in some situations, but it’s not going to save us under all scenarios. We still need a solution to help us coordinate between potentially thousands of servers that are sending events every single second. Durable Objects will come to the rescue, yet again. We can implement a sharding layer consisting of Durable Objects, we will call them Durable Shards, that effectively allow us to reduce the number of requests being sent to our primary object.

How we built Instant Logs

But how do we implement this layer if Durable Objects are globally unique? We first need to decide on a shard key, which is used to determine which Durable Object a given message should first be routed to. When the Worker processes a message, the key will be added to the name of the downstream Durable Object. Assuming our keys are well-balanced, this should effectively reduce the load on the primary Durable Object by approximately 1/N.

Reaching the moon by sampling

But wait, there’s more to do. Going back to our original product requirements, “The pipeline needs to provide sensible results on properties of all sizes, ranging from a few requests per day to hundreds of thousands of requests per second.” With the system as designed so far, we have the technical headroom to process an almost arbitrary number of logs. However, we’ve done nothing to reduce the absolute volume of messages that need to be processed and sent to the client, and at high log volumes, clients would quickly be overwhelmed. To deliver the interactive, instant user experience customers expect, we need to roll up our sleeves one more time.

This is where our final trick, sampling, comes into play.

Up to this point, when our pipeline saturates, it still makes forward progress by dropping excess data as the Durable Object starts to refuse connections. However, this form of ‘uncontrolled shedding’ is dangerous because it causes us to lose information. When we drop data in this way, we can’t keep a record of the data we dropped, and we cannot infer things about the original shape of the traffic from the messages that we do receive. Instead, we implement a form of ‘controlled’ sampling, which still preserves the statistics, and information about the original traffic.

For Instant Logs, we implement a sampling technique called Reservoir Sampling. Reservoir sampling is a form of dynamic sampling that has this amazing property of letting us pick a specific k number of items from a stream of unknown length n, with a single pass through the data. By buffering data in the reservoir, and flushing it on a short (sub second) time interval, we can output random samples to the client at the maximum data rate of our choosing. Sampling is implemented in both layers of Durable Objects.

How we built Instant Logs

Information about the original traffic shape is preserved by assigning a sample interval to each line, which is equivalent to the number of samples that were dropped for this given sample to make it through, or 1/probability. The actual number of requests can then be calculated by taking the sum of all sample intervals within a time window. This technique adds a slight amount of latency to the pipeline to account for buffering, but enables us to point an event source of nearly any size at the pipeline, and we can expect it will be handled in a sensible, controlled way.

Putting it all together

What we are left with is a pipeline that sensibly handles wildly different volumes of traffic, from single digits to hundreds of thousands of requests a second. It allows the user to pinpoint an exact event in a sea of millions, or calculate summaries over every single one. It delivers insight within seconds, all without ever having to do more than click a button.

Best of all? Workers and Durable Objects handle this workload with aplomb and no tuning, and the available developer tooling allowed me to be productive from my first day writing code targeting the Workers ecosystem.

How to get involved?

We’ll be starting our Beta for Instant Logs in a couple of weeks. Join the waitlist to get notified about when you can get access!

If you want to be part of building the future of data at Cloudflare, we’re hiring engineers for our data team in Lisbon, London, Austin, and San Francisco!