Tag Archives: Developers

Building Cloudflare Images in Rust and Cloudflare Workers

Post Syndicated from Yevgen Safronov original https://blog.cloudflare.com/building-cloudflare-images-in-rust-and-cloudflare-workers/

Building Cloudflare Images in Rust and Cloudflare Workers

Building Cloudflare Images in Rust and Cloudflare Workers

This post explains how we implemented the Cloudflare Images product with reusable Rust libraries and Cloudflare Workers. It covers the technical design of Cloudflare Image Resizing and Cloudflare Images. Using Rust and Cloudflare Workers helps us quickly iterate and deliver product improvements over the coming weeks and months.

Reuse of code in Rusty image projects

We developed Image Resizing in Rust. It’s a web server that receives HTTP requests for images along with resizing options, fetches the full-size images from the origin, applies resizing and other image processing operations, compresses, and returns the HTTP response with the optimized image.

Rust makes it easy to split projects into libraries (called crates). The image processing and compression parts of Image Resizing are usable as libraries.

We also have a product called  Polish, which is a Golang-based service that recompresses images in our cache. Polish was initially designed to run command-line programs like jpegtran and pngcrush. We took the core of Image Resizing and wrapped it in a command-line executable. This way, when Polish needs to apply lossy compression or generate WebP images or animations, it can use Image Resizing via a command-line tool instead of a third-party tool.

Reusing libraries has allowed us to easily unify processing between Image Resizing and Polish (for example, to ensure that both handle metadata and color profiles in the same way).

Cloudflare Images is another product we’ve built in Rust. It added support for a custom storage back-end, variants (size presets), support for signing URLs and more. We made it as a collection of Rust crates, so we can reuse pieces of it in other services running anywhere in our network. Image Resizing provides image processing for Cloudflare Images and shares libraries with Images to understand the new URL scheme, access the storage back-end, and database for variants.

How Image Resizing works

Building Cloudflare Images in Rust and Cloudflare Workers

The Image Resizing service runs at the edge and is deployed on every server of the Cloudflare global network. Thanks to Cloudflare’s global Anycast network, the closest Cloudflare data center will handle eyeball image resizing requests. Image Resizing is tightly integrated with the Cloudflare cache and handles eyeball requests only on a cache miss.

There are two ways to use Image Resizing. The default URL scheme provides an easy, declarative way of specifying image dimensions and other options. The other way is to use a JavaScript API in a Worker. Cloudflare Workers give powerful programmatic control over every image resizing request.

How Cloudflare Images work

Building Cloudflare Images in Rust and Cloudflare Workers

Cloudflare Images consists of the following components:

  • The Images core service that powers the public API to manage images assets.
  • The Image Resizing service responsible for image transformations and caching.
  • The Image delivery Cloudflare Worker responsible for serving images and passing corresponding parameters through to the Imaging Resizing service.
  • Image storage that provides access and storage for original image assets.

To support Cloudflare Images scenarios for image transformations, we made several changes to the Image Resizing service:

  • Added access to Cloudflare storage with original image assets.
  • Added access to variant definitions (size presets).
  • Added support for signing URLs.

Image delivery

The primary use case for Cloudflare Images is to provide a simple and easy-to-use way of managing images assets. To cover egress costs, we provide image delivery through the Cloudflare managed imagedelivery.net domain. It is configured with Tiered Caching to maximize the cache hit ratio for image assets. imagedelivery.net provides image hosting without a need to configure a custom domain to proxy through Cloudflare.

A Cloudflare Worker powers image delivery. It parses image URLs and passes the corresponding parameters to the image resizing service.

How we store Cloudflare Images

There are several places we store information on Cloudflare Images:

  • image metadata in Cloudflare’s core data centers
  • variant definitions in Cloudflare’s edge data centers
  • original images in core data centers
  • optimized images in Cloudflare cache, physically close to eyeballs.

Image variant definitions are stored and delivered to the edge using Cloudflare’s distributed key-value store called Quicksilver. We use a single source of truth for variants. The Images core service makes calls to Quicksilver to read and update variant definitions.

The rest of the information about the image is stored in the image URL itself:
https://imagedelivery.net/<encoded account id>/<image id>/<variant name>

<image id> contains a flag, whether it’s publicly available or requires access verification. It’s not feasible to store any image metadata in Quicksilver as the data volume would increase linearly with the number of images we host. Instead, we only allow a finite number of variants per account, so we responsibly utilize available disk space on the edge. The downside of storing image metadata as part of <image id> is that <image id> will change on access change.

How we keep Cloudflare Images up to date

The only way to access images is through the use of variants. Each variant is a named image resizing configuration. Once the image asset is fetched, we cache the transformed image in the Cloudflare cache. The critical question is how we keep processed images up to date. The answer is by purging the Cloudflare cache when necessary. There are two use cases:

  • access to the image is changed
  • the variant definition is updated

In the first instance, we purge the cache by calling a URL:
https://imagedelivery.net/<encoded account id>/<image id>

Then, the customer updates the variant we issue a cache purge request by tag:
account-id/variant-name

To support cache purge by tag, the image resizing service adds the necessary tags for all transformed images.

How we restrict access to Cloudflare Images

The Image resizing service supports restricted access to images by using URL signatures with expiration. URLs are signed with an SHA-256 HMAC key. The steps to produce valid signatures are:

  1. Take the path and query string (the path starts with /).
  2. Compute the path’s SHA-256 HMAC with the query string, using the Images’ URL signing key as the secret. The key is configured in the Dashboard.
  3. If the URL is meant to expire, compute the Unix timestamp (number of seconds since 1970) of the expiration time, and append ?exp= and the timestamp as an integer to the URL.
  4. Append ? or & to the URL as appropriate (? if it had no query string; & if it had a query string).
  5. Append sig= and the HMAC as hex-encoded 64 characters.

A signed URL looks like this:

Building Cloudflare Images in Rust and Cloudflare Workers

A signed URL with an expiration timestamp looks like this:

Building Cloudflare Images in Rust and Cloudflare Workers

Signature of /hello/world URL with a secret ‘this is a secret’ is 6293f9144b4e9adc83416d1b059abcac750bf05b2c5c99ea72fd47cc9c2ace34.

https://imagedelivery.net/hello/world?sig=6293f9144b4e9adc83416d1b059abcac750bf05b2c5c99ea72fd47cc9c2ace34

Building Cloudflare Images in Rust and Cloudflare Workers
Building Cloudflare Images in Rust and Cloudflare Workers

Direct creator uploads with Cloudflare Worker and KV

Similar to Cloudflare Stream, Images supports direct creator uploads. That allow users to upload images without API tokens. Everyday use of direct creator uploads is by web apps, client-side applications, or mobile apps where users upload content directly to Cloudflare Images.

Once again, we used our serverless platform to support direct creator uploads. The successful API call stores the account’s information in Workers KV with the specified expiration date. A simple Cloudflare Worker handles the upload URL, which reads the KV value and grants upload access only on a successful call to KV.

Future Work

Cloudflare Images product has an exciting product roadmap. Let’s review what’s possible with the current architecture of Cloudflare Images.

Resizing hints on upload

At the moment, no image transformations happen on upload. That means we can serve the image globally once it is uploaded to Image storage. We are considering adding resizing hints on image upload. That won’t necessarily schedule image processing in all cases but could provide a valuable signal to resize the most critical image variants. An example could be to generate an AVIF variant for the most vital image assets.

Serving images from custom domains

We think serving images from a domain we manage (with Tiered Caching) is a great default option for many customers. The downside is that loading Cloudflare images requires additional TLS negotiations on the client-side, adding latency and impacting loading performance. On the other hand, serving Cloudflare Images from custom domains will be a viable option for customers who set up a website through Cloudflare. The good news is that we can support such functionality with the current architecture without radical changes in the architecture.

Conclusion

The Cloudflare Images product runs on top of the Cloudflare global network. We built Cloudflare Images in Rust and Cloudflare Workers. This way, we use Rust reusable libraries in several products such as Cloudflare Images, Image Resizing, and Polish. Cloudflare’s serverless platform is an indispensable tool to build Cloudflare products internally. If you are interested in building innovative products in Rust and Cloudflare Workers, we’re hiring.

Pin, Unpin, and why Rust needs them

Post Syndicated from Adam Chalmers original https://blog.cloudflare.com/pin-and-unpin-in-rust/

Pin, Unpin, and why Rust needs them

Pin, Unpin, and why Rust needs them

Using async Rust libraries is usually easy. It’s just like using normal Rust code, with a little async or .await here and there. But writing your own async libraries can be hard. The first time I tried this, I got really confused by arcane, esoteric syntax like T: ?Unpin and Pin<&mut Self>. I had never seen these types before, and I didn’t understand what they were doing. Now that I understand them, I’ve written the explainer I wish I could have read back then. In this post, we’re gonna learn

  • What Futures are
  • What self-referential types are
  • Why they were unsafe
  • How Pin/Unpin made them safe
  • Using Pin/Unpin to write tricky nested futures

What are Futures?

A few years ago, I needed to write some code which would take some async function, run it and collect some metrics about it, e.g. how long it took to resolve. I wanted to write a type TimedWrapper that would work like this:

// Some async function, e.g. polling a URL with [https://docs.rs/reqwest]
// Remember, Rust functions do nothing until you .await them, so this isn't
// actually making a HTTP request yet.
let async_fn = reqwest::get("http://adamchalmers.com");

// Wrap the async function in my hypothetical wrapper.
let timed_async_fn = TimedWrapper::new(async_fn);

// Call the async function, which will send a HTTP request and time it.
let (resp, time) = timed_async_fn.await;
println!("Got a HTTP {} in {}ms", resp.unwrap().status(), time.as_millis())

I like this interface, it’s simple and should be easy for the other programmers on my team to use. OK, let’s implement it! I know that, under the hood, Rust’s async functions are just regular functions that return a Future. The Future trait is pretty simple. It just means a type which:

  • Can be polled
  • When it’s polled, it might return “Pending” or “Ready”
  • If it’s pending, you should poll it again later
  • If it’s ready, it responds with a value. We call this “resolving”.

Here’s a really easy example of implementing a Future. Let’s make a Future that returns a random u16.

use std::{future::Future, pin::Pin, task::Context}

/// A future which returns a random number when it resolves.
#[derive(Default)]
struct RandFuture;

impl Future for RandFuture {
	// Every future has to specify what type of value it returns when it resolves.
	// This particular future will return a u16.
	type Output = u16;

	// The `Future` trait has only one method, named "poll".
fn poll(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output  {
		Poll::ready(rand::random())
	}
}

Not too hard! I think we’re ready to implement TimedWrapper.

Trying and failing to use nested Futures

Let’s start by defining the type.

pub struct TimedWrapper<Fut: Future> {
	start: Option<Instant>,
	future: Fut,
}

OK, so a TimedWrapper is generic over a type Fut, which must be a Future. And it will store a future of that type as a field. It’ll also have a start field which will record when it first was first polled. Let’s write a constructor:

impl<Fut: Future> TimedWrapper<Fut> {
	pub fn new(future: Fut) -> Self {
		Self { future, start: None }
	}
}

Nothing too complicated here. The new function takes a future and wraps it in the TimedWrapper. Of course, we have to set start to None, because it hasn’t been polled yet. So, let’s implement the poll method, which is the only thing we need to implement Future and make it .awaitable.

impl<Fut: Future> Future for TimedWrapper<Fut> {
	// This future will output a pair of values:
	// 1. The value from the inner future
	// 2. How long it took for the inner future to resolve
	type Output = (Fut::Output, Duration);

	fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
		// Call the inner poll, measuring how long it took.
		let start = self.start.get_or_insert_with(Instant::now);
		let inner_poll = self.future.poll(cx);
		let elapsed = self.elapsed();

		match inner_poll {
			// The inner future needs more time, so this future needs more time too
			Poll::Pending => Poll::Pending,
			// Success!
			Poll::Ready(output) => Poll::Ready((output, elapsed)),
		}
	}
}

OK, that wasn’t too hard. There’s just one problem: this doesn’t work.

Pin, Unpin, and why Rust needs them

So, the Rust compiler reports an error on self.future.poll(cx), which is “no method named poll found for type parameter Fut in the current scope”. This is confusing, because we know Fut is a Future, so surely it has a poll method? OK, but Rust continues: Fut doesn’t have a poll method, but Pin<&mut Fut> has one. What is this weird type?

Well, we know that methods have a “receiver”, which is some way it can access self. The receiver might be self, &self or &mut self, which mean “take ownership of self,” “borrow self,” and “mutably borrow self” respectively. So this is just a new, unfamiliar kind of receiver. Rust is complaining because we have Fut and we really need a Pin<&mut Fut>. At this point I have two questions:

  1. What is Pin?
  2. If I have a T value, how do I get a Pin<&mut T>?

The rest of this post is going to be answering those questions. I’ll explain some problems in Rust that could lead to unsafe code, and why Pin safely solves them.

Self-reference is unsafe

Pin exists to solve a very specific problem: self-referential datatypes, i.e. data structures which have pointers into themselves. For example, a binary search tree might have self-referential pointers, which point to other nodes in the same struct.

Self-referential types can be really useful, but they’re also hard to make memory-safe. To see why, let’s use this example type with two fields, an i32 called val and a pointer to an i32 called pointer.

Pin, Unpin, and why Rust needs them

So far, everything is OK. The pointer field points to the val field in memory address A, which contains a valid i32. All the pointers are valid, i.e. they point to memory that does indeed encode a value of the right type (in this case, an i32). But the Rust compiler often moves values around in memory. For example, if we pass this struct into another function, it might get moved to a different memory address. Or we might Box it and put it on the heap. Or if this struct was in a Vec<MyStruct>, and we pushed more values in, the Vec might outgrow its capacity and need to move its elements into a new, larger buffer.

Pin, Unpin, and why Rust needs them

When we move it, the struct’s fields change their address, but not their value. So the pointer field is still pointing at address A, but address A now doesn’t have a valid i32. The data that was there was moved to address B, and some other value might have been written there instead! So now the pointer is invalid. This is bad — at best, invalid pointers cause crashes, at worst they cause hackable vulnerabilities. We only want to allow memory-unsafe behaviour in unsafe blocks, and we should be very careful to document this type and tell users to update the pointers after moves.

Unpin and !Unpin

To recap, all Rust types fall into two categories.

  1. Types that are safe to move around in memory. This is the default, the norm. For example, this includes primitives like numbers, strings, bools, as well as structs or enums entirely made of them. Most types fall into this category!
  2. Self-referential types, which are not safe to move around in memory. These are pretty rare. An example is the intrusive linked list inside some Tokio internals. Another example is most types which implement Future and also borrow data, for reasons explained in the Rust async book.

Types in category (1) are totally safe to move around in memory. You won’t invalidate any pointers by moving them around. But if you move a type in (2), then you invalidate pointers and can get undefined behaviour, as we saw before. In earlier versions of Rust, you had to be really careful using these types to not move them, or if you moved them, to use unsafe and update all the pointers. But since Rust 1.33, the compiler can automatically figure out which category any type is in, and make sure you only use it safely.

Any type in (1) implements a special auto trait called Unpin. Weird name, but its meaning will become clear soon. Again, most “normal” types implement Unpin, and because it’s an auto trait (like Send or Sync or Sized1), so you don’t have to worry about implementing it yourself. If you’re unsure if a type can be safely moved, just check it on docs.rs and see if it impls Unpin!

Types in (2) are creatively named !Unpin (the ! in a trait means “does not implement”). To use these types safely, we can’t use regular pointers for self-reference. Instead, we use special pointers that “pin” their values into place, ensuring they can’t be moved. This is exactly what the Pin type does.

Pin, Unpin, and why Rust needs them

Pin wraps a pointer and stops its value from moving. The only exception is if the value impls Unpin — then we know it’s safe to move. Voila! Now we can write self-referential structs safely! This is really important, because as discussed above, many Futures are self-referential, and we need them for async/await.

Using Pin

So now we understand why Pin exists, and why our Future poll method has a pinned &mut self to self instead of a regular &mut self. So let’s get back to the problem we had before: I need a pinned reference to the inner future. More generally: given a pinned struct, how do we access its fields?

The solution is to write helper functions which give you references to the fields. These references might be normal Rust references like &mut, or they might also be pinned. You can choose whichever one you need. This is called projection: if you have a pinned struct, you can write a projection method that gives you access to all its fields.

Projecting is really just getting data into and out of Pins. For example, we get the start: Option<Duration> field from the Pin<&mut self>, and we need to put the future: Fut into a Pin so we can call its poll method). If you read the Pin methods you’ll see this is always safe if it points to an Unpin value, but requires unsafe otherwise.

// Putting data into Pin
pub        fn new          <P: Deref<Target:Unpin>>(pointer: P) -> Pin<P>;
pub unsafe fn new_unchecked<P>                     (pointer: P) -> Pin<P>;

// Getting data from Pin
pub        fn into_inner          <P: Deref<Target: Unpin>>(pin: Pin<P>) -> P;
pub unsafe fn into_inner_unchecked<P>                      (pin: Pin<P>) -> P;

I know unsafe can be a bit scary, but it’s OK to write unsafe code! I think of unsafe as the compiler saying “hey, I can’t tell if this code follows the rules here, so I’m going to rely on you to check for me.” The Rust compiler does so much work for us, it’s only fair that we do some of the work every now and then. If you want to learn how to write your own projection methods, I can highly recommend this fasterthanli.me blog post on the topic. But we’re going to take a little shortcut.

Using pin-project instead

So, OK, look, it’s time for a confession: I don’t like using unsafe. I know I just explained why it’s OK, but still, given the option, I would rather not.

I didn’t start writing Rust because I wanted to carefully think about the consequences of my actions, damnit, I just want to go fast and not break things. Luckily, someone sympathized with me and made a crate which generates totally safe projections! It’s called pin-project and it’s awesome. All we need to do is change our definition:

#[pin_project::pin_project] // This generates a `project` method
pub struct TimedWrapper<Fut: Future> {
	// For each field, we need to choose whether `project` returns an
	// unpinned (&mut T) or pinned (Pin<&mut T>) reference to the field.
	// By default, it assumes unpinned:
	start: Option<Instant>,
	// Opt into pinned references with this attribute:
	#[pin]
	future: Fut,
}

For each field, you have to choose whether its projection should be pinned or not. By default, you should use a normal reference, just because they’re easier and simpler. But if you know you need a pinned reference — for example, because you want to call .poll(), whose receiver is Pin<&mut Self> — then you can do that with #[pin].

Now we can finally poll the inner future!

fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
	// This returns a type with all the same fields, with all the same types,
	// except that the fields defined with #[pin] will be pinned.
	let mut this = self.project();
	
    // Call the inner poll, measuring how long it took.
	let start = this.start.get_or_insert_with(Instant::now);
	let inner_poll = this.future.as_mut().poll(cx);
	let elapsed = start.elapsed();

	match inner_poll {
		// The inner future needs more time, so this future needs more time too
		Poll::Pending => Poll::Pending,
		// Success!
		Poll::Ready(output) => Poll::Ready((output, elapsed)),
	}
}

Finally, our goal is complete — and we did it all without any unsafe code.

Summary

If a Rust type has self-referential pointers, it can’t be moved safely. After all, moving doesn’t update the pointers, so they’ll still be pointing at the old memory address, so they’re now invalid. Rust can automatically tell which types are safe to move (and will auto impl the Unpin trait for them). If you have a Pin-ned pointer to some data, Rust can guarantee that nothing unsafe will happen (if it’s safe to move, you can move it, if it’s unsafe to move, then you can’t). This is important because many Future types are self-referential, so we need Pin to safely poll a Future. You probably won’t have to poll a future yourself (just use async/await instead), but if you do, use the pin-project crate to simplify things.

I hope this helped — if you have any questions, please ask me on Twitter. And if you want to get paid to talk to me about Rust and networking protocols, my team at Cloudflare is hiring, so be sure to visit careers.cloudflare.com.

References

Introducing logs from the dashboard for Cloudflare Workers

Post Syndicated from Ashcon Partovi original https://blog.cloudflare.com/workers-dashboard-logs/

Introducing logs from the dashboard for Cloudflare Workers

Introducing logs from the dashboard for Cloudflare Workers

If you’re writing code: what can go wrong, will go wrong.

Many developers know the feeling: “It worked in the local testing suite, it worked in our staging environment, but… it’s broken in production?” Testing can reduce mistakes and debugging can help find them, but logs give us the tools to understand and improve what we are creating.

if (this === undefined) {
  console.log("there’s no way… right?") // Narrator: there was.
}

While logging can help you understand when the seemingly impossible is actually possible, it’s something that no developer really wants to set up or maintain on their own. That’s why we’re excited to launch a new addition to the Cloudflare Workers platform: logs and exceptions from the dashboard.

Starting today, you can view and filter the console.log output and exceptions from a Worker… at no additional cost with no configuration needed!

View logs, just a click away

When you view a Worker in the dashboard, you’ll now see a “Logs” tab which you can click on to view a detailed stream of logs and exceptions. Here’s what it looks like in action:

Each log entry contains an event with a list of logs, exceptions, and request headers if it was triggered by an HTTP request. We also automatically redact sensitive URLs and headers such as Authorization, Cookie, or anything else that appears to have a sensitive name.

If you are in the Durable Objects open beta, you will also be able to view the logs and requests sent to each Durable Object. This is a great tool to help you understand and debug the interactions between your Worker and a Durable Object.

For now, we support filtering by event status and type. Though, you can expect more filters to be added to the dashboard very soon! Today, we support advanced filtering with the wrangler CLI, which will be discussed later in this blog.

console.log(), and you’re all set

It’s really simple to get started with logging for Workers. Simply invoke one of the standard console APIs, such as console.log(), and we handle the rest. That’s it! There’s no extra setup, no configuration needed, and no hidden logging fees.

function logRequest (request) {
  const { cf, headers } = request
  const { city, region, country, colo, clientTcpRtt  } = cf
  
  console.log("Detected location:", [city, region, country].filter(Boolean).join(", "))
  if (clientTcpRtt) {
     console.debug("Round-trip time from client to", colo, "is", clientTcpRtt, "ms")
  }

  // You can also pass an object, which will be interpreted as JSON.
  // This is great if you want to define your own structured log schema.
  console.log({ headers })
}

In fact, you don’t even need to use console.log to view an event from the dashboard. If your Worker doesn’t generate any logs or exceptions, you will still be able to see the request headers from the event.

Advanced filters, from your terminal

If you need more advanced filters you can use wrangler, our command-line tool for deploying Workers. We’ve updated the wrangler tail command to support sampling and a new set of advanced filters. You also no longer need to install or configure cloudflared to use the command. Not to mention it’s much faster, no more waiting around for logs to appear. Here are a few examples:

# Filter by your own IP address, and if there was an uncaught exception.
wrangler tail --format=pretty --ip-address=self --status=error

# Filter by HTTP method, then apply a 10% sampling rate.
wrangler tail --format=pretty --method=GET --sampling-rate=0.1

# Filter using a generic search query.
wrangler tail --format=pretty --search="TypeError"

We recommend using the “pretty” format, since wrangler will output your logs in a colored, human-readable format. (We’re also working on a similar display for the dashboard.)

However, if you want to access structured logs, you can use the “json” format. This is great if you want to pipe your logs to another tool, such as jq, or save them to a file. Here are a few more examples:

# Parses each log event, but only outputs the url.
wrangler tail --format=json | jq .event.request?.url

# You can also specify --once to disconnect the tail after receiving the first log.
# This is useful if you want to run tests in a CI/CD environment.
wrangler tail --format=json --once > event.json

Try it out!

Both logs from the dashboard and wrangler tail are available and free for existing Workers customers. If you would like more information or a step-by-step guide, check out any of the resources below.

Cloudflare Developer Summer Challenge

Post Syndicated from Jenanne Vaccaro original https://blog.cloudflare.com/developer-summer-challenge/

Cloudflare Developer Summer Challenge

Cloudflare Developer Summer Challenge

There are a lot of experiences we have all grown to miss over the last year and a half. After hearing from our community, two of the top experiences they miss are collaborating with peers, and getting Cloudflare swag. Perhaps even in the reverse order! In-person events like conferences were once a key channel to satisfy both these interests, however today’s remote world makes that much harder. But does it have to?

Today, we are excited to introduce the Cloudflare Developer Summer Challenge. We will be rewarding 300 participants with boxes of our most popular swag, while enabling collaboration with other participants through our Workers Discord channel.

To participate, you have to build a project that uses Cloudflare Workers and at least one other product in our rapidly expanding developer platform. We will judge submissions and award swag boxes to those with the most innovative projects, limited to one box per person. The Challenge will be open for submissions between today, and closing November 1, 2021. See a full list of terms and conditions here.

Cloudflare Developer Summer Challenge

What are the details?

Cloudflare’s developer platform offers all the building blocks to create end-to-end applications with products across compute, storage, and frontend services. To successfully participate in the Cloudflare Developer Summer Challenge, you need to build a project with at least two of the following products in the Cloudflare developer platform. Bonus points for using more! These products include:

  • Cloudflare Workers: an edge-based serverless computing platform where you can deploy code automatically worldwide, with speed, security, and scale baked in.
  • Workers KV: a global low-latency key value data store for exceptionally high read volumes, making it possible to build highly dynamic APIs and websites which respond as quickly as a cached static file would.
  • Durable Objects: a storage platform providing low-latency coordination and consistent storage at the edge, enabling stateful serverless use cases.
  • Cloudflare Pages: a Jamstack web development platform to collaborate on and deploy high performance sites quickly across the world.

How will submissions be judged?

Submissions will be based on three criteria:

  • The first criteria is to meet the basic requirements of the challenge. This includes using at least two products in the platform, and submitting the live link of your project and your code repository. You must submit before November 1, 2021.
  • The second criteria we will judge is around innovation. Projects that are unique and useful to users will be more likely to win the swag boxes.
  • The third criteria is based on the breadth of the Cloudflare platform you use. The more products you integrate, the better!

How do I participate?

Step 1: Get Started: You can get started with the developer platform by visiting the Cloudflare Workers Quickstart Guide, and reviewing the Workers KV and/or Durable Objects documentation. To host your frontend site on Cloudflare Pages, you can visit the Pages Quickstart Guide.

Step 2: Build your Project: If you would like inspiration, you can view our tutorials, and examples. You can also view our Built With Workers page. If you want to try and build something more advanced, you can see some additional examples below.

Step 3: Share your Project: Successful submission includes a link to your live project, and a link to your code repository. We also encourage you to share your work, or pictures of you unboxing/ using your swag by tagging @CloudflareDev on Twitter with the hashtag #CloudflareSummerChallenge.

Cloudflare Developer Summer Challenge

Optional: Engage with the Community To promote collaboration and interaction with peers, we will have a dedicated channel in the Cloudflare Workers Discord server. For those who want to, you can learn what others are building, share your project, or join Q&A sessions if you have questions or need help getting started. You can also meet developers participating around the world.

What are examples of advanced projects I can build?

The Cloudflare developer platform is ideal for a wide range of use cases – from augmenting existing applications to building entirely new ones without needing to maintain underlying infrastructure. Essentially, you write the code, and we handle the rest. Running on the Cloudflare edge network, applications scale automatically and run worldwide within seconds of users.

Build a Serverless API for your Frontend

Cloudflare Workers’ high performance and edge network make it well-suited for building APIs. It is also a great companion to your frontend applications on Cloudflare Pages. You can use Workers as the backend, and build your frontend with frameworks such as React, Gatsby, Hugo, Svelte, and more — and then deploy your site onto Pages.

You can easily begin building a serverless API for your frontend by creating a new Workers project with the Wrangler CLI:

‘wrangler generate serverless-api https://github.com/cloudflare/worker-typescript-template’

To complete this type of project, you can follow the rest of the steps outlined in our Pages documentation.

Build an Interactive Game

Combining Cloudflare Workers, Durable Objects, and WebSockets makes a powerful platform for managing state at the edge. Running on Cloudflare’s global network enables exceptionally low latency, so users can interact instantly worldwide. Examples of interactive applications can range from chat rooms to multiplayer video games.

To build interactive video games, you can integrate with popular tools such as Unity and WebGL using an authoritative client model (we have an example of how to achieve this). You do not even need expertise in building video games. Following this example above, the client can run a compiled game directly in the browser with WebAssembly. The server, running on Cloudflare Workers, can be interacted with via WebSockets, and uses Durable Objects to manage game state.

Cloudflare Developer Summer Challenge

Build an E-Commerce Experience

Whether you are building a small e-commerce app or an online ordering site with millions of requests per month, your users can experience exceptional performance and reliability with Cloudflare’s developer platform.

At a small scale, you can instantly personalize a user’s e-commerce experience by setting up A/B testing, performing localization, or providing geo-specific targeting, such as local currency or exchange rates. You can see the linked code samples to quickly get started.

Integration with Workers KV and other third party tools can also streamline the development process. Instead of having to build out an entire database for your application, you can use Workers KV to store product information such as product ID, name, description, price, etc. Outside the Workers platform, you can integrate with popular tools such as Stripe or Shopify. To learn more about how to build an e-commerce application with Workers, Workers KV, and Stripe, you can read this blog post on building an e-commerce experience.

At a larger scale, we have customers building their entire online ordering site on Workers and Pages. Dig, a popular American restaurant chain with nearly 50 locations nationwide, decided to run their ordering site on Cloudflare’s developer platform. They needed high performance and reliability as traffic spiked during the pandemic. The team used a headless React application that is entirely hosted on Cloudflare Pages. Javascript calls an underlying API to get and handle dynamic ordering logic. You can learn more about their story in this case study.

Conclusion

We have heard from our community, and we want to help bring back some of their favorite pre-pandemic experiences: collaboration and swag. The Cloudflare Developer Summer Challenge is meant to do just that. While the last year and a half has transformed the way we all live, it has also been a period of significant expansion of the Cloudflare developer platform. With new products across compute, storage, and frontend services, you now have all the building blocks to quickly create powerful applications end-to-end on our edge network. We hope you enjoy the experience (and the swag!). We cannot wait to see what you build!

Building the Cloudflare Summer Challenge Application

Post Syndicated from Luke Edwards original https://blog.cloudflare.com/building-the-cloudflare-summer-challenge-application/

Building the Cloudflare Summer Challenge Application

Building the Cloudflare Summer Challenge Application

If you haven’t already heard, we’re hosting the Cloudflare Summer Developer Challenge, a contest for the Cloudflare community at large. Anybody – yes, including you – can sign up for free and compete for a chance to win one of 300 available prizes. To submit you need to use  at least two products from the Cloudflare developer platform — which makes this contest a great opportunity to give them a try if you haven’t already! The top 300 submissions will receive a box of our most popular swag, so you should give it a go!

Coincidentally, the Cloudflare Summer Developer Challenge’s landing page and signup workflow qualifies as a valid project submission (so meta), so if you’re looking for some inspiration, this walkthrough will shed some light on how it was built.

Overview

At its core, the application is a series of static HTML pages, most of which have a form to submit, with a backend API to handle those submissions, and a storage layer to persist the data. In a Cloudflare lens, this would point towards using Pages, a Worker, and Workers KV. And while this should be the preferred stack for a project like this, truthfully, this “application” was originally intended to be a single HTML page with a single form, but its list of requirements grew over time, as things tend to do. So instead, this project began as–and remains–a Workers Site project, comprised of a single Worker and a single Workers KV namespace.

Workers Sites, the precursor to our Pages product, is a pattern where your Worker handles all the requests for your site’s assets. While doing this, your Worker Site can still include backend-y things, like offering a collection of JSON API endpoints. Basically, Workers Sites is a coined term for building monoliths within a Worker, but without the negative associations that the word “monolith” can bring. Given that a Workers Site is still a Worker, this means your monolith is deployed globally – tough to beat!

As with all Workers Sites, routing is the primary concern. For this, I used the worktop web framework, which includes a router among many other utilities. (Disclosure: I am also the author of worktop.) This allowed me to quickly structure the layout of the entire application:

import { Router } from 'worktop';
import * as Cache from 'worktop/cache';

const API = new Router;

API.add('GET', '/', (req, res) => {
  res.send(200, 'TODO: send HTML for landing page');
});

API.add('GET', '/rules', (req, res) => {
  res.send(200, 'TODO: send HTML for terms & conditions');
});

API.add('POST', '/signup', (req, res) => {
  res.send(201, 'TODO: parse & save initial registration');
});

API.add('GET', '/submit', (req, res) => {
  res.send(200, 'TODO: render the unique submission form');
});

API.add('POST', '/submit', (req, res) => {
  res.send(201, 'TODO: parse, validate, save submission data');
});

// init; w/ Cache API
Cache.listen(API.run);

At this point, nothing useful is happening, but having an application skeleton laid out like this is my preferred format for a TODO list. It’s very satisfying to go through and fill out the handler bodies as development progresses. Additionally, the Cache.listen helper at the bottom of the file integrates the entire application with the Cache API, which I know I’ll want since most of the requests will be for the static HTML pages anyway.

Building and Optimizing the Client pages

Historically, deploying a Workers Site meant uploading all of your assets into a KV namespace. Then you would include something like @cloudflare/kv-asset-handler into your Worker so that incoming requests would seamlessly route to keys within the namespace. However, I chose to go a different route.

Knowing that each of my static pages would – at most – have one CSS stylesheet and sometimes only one JavaScript file, I thought it would be pretty nifty to include a build system that would inline these assets into the built HTML page. This would mean that my static HTML pages would have absolutely zero network requests for additional resources, which is generally good news for performance.

And while I would love to say that I did this purely for performance reasons, I must also admit that the lazy-me appreciated that I didn’t have to set up additional URL routing, deal with KV asset uploading, or deal with additional Cache lifespans. A win-win in this case!

The trouble is: avoiding any external assets is not a common goal. In fact, this is very much a side quest I bestowed upon myself. And since no frameworks (that I know of, at least) can do this, I had to assemble my own miniature toolkit to accommodate my needs.

In the end, it proved to be a fun detour and didn’t take very long at all to put together. I incorporated Stylus, my preferred CSS preprocessor, and came up with a rather simple convention to inline CSS and/or JS files where needed. Instead of fancy AST parsers and transformers, I opted to simply read the HTML file contents as strings and search for HTML comments that matched the <!-- inject:(path) --> format:

<!-- submit/index.html -->
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8"/>
    <title>Submit Project | Cloudflare Developer Summer Challenge</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="icon" type="image/png" href="https://www.cloudflare.com/favicon-128.png">
    <!-- inject:submit/index.styl -->
    <!-- inject:index.js -->
  </head>
  <body>
    <!-- ... -->
  </body>
</html>

In this example, the submit/index.html file is injecting the submit/index.styl, which is its own stylesheet, and the index.js script, which does not live within the `submit` directory because it’s used by other pages. The toolkit looks at both asset paths, converts the Stylus to plain CSS, and then embeds the contents into the appropriate <script> or <style> HTML tags.

Finally, for production builds, the setup will pass the final HTML source through a minifier, which compresses the entire document, including any CSS or JavaScript that was injected. This step is optional, but it never hurts to send fewer bytes down the wire.

Once these pages were built, I was satisfied with the Network Activity panel when loading the main page:

Building the Cloudflare Summer Challenge Application

You can see how the localhost document loads, only dispatching a single request for the favicon-128.png file, which is hosted externally. The three data:image/* requests are Blob URLs and don’t actually transfer network packets. All in all, this means that the HTML document is fully self-contained.

Including HTML into the Worker

Workers can send anything in a Response. Of course, this includes a HTML string. If I wanted to make things incredibly difficult for myself, I could have skipped the /src directory with its own build system, and instead written the HTML, CSS, and JS entirely within a JS string. This would certainly work, but it would be a nightmare to maintain and (for me, at least) be extremely error prone:

API.add('GET', '/', (req, res) => {
  // Note: Worktop APIs
  res.setHeader('Content-Type', 'text/html;charset=utf-8');
  res.send(200, `
    <!doctype html>
    <html lang="en">
      <head>
        <title>Demo | Insanity</title>
        <style>
          body {
            background: #fff;
            color: #424242;
          }
          /* more */
        </style>
        <script>
          $('form').onsubmit = function (ev) {
            ev.preventDefault();
            // ...
          });
        </script>
      </head>
      <body>
        <!-- my entire page content -->
      </body>
    </html>
  `);
});

Thankfully, I planned ahead and already have a build system that produces better HTML files anyway. So now I just needed a way to load those built outputs into my Worker code.

Now for the second half of this project’s toolkit; I find it perfectly acceptable to have a two-step build pipeline. Here, this means that the static site should be built first, followed by building the Worker. I was planning to use TypeScript to author my Worker anyway, which meant I was already going to need a build step – the only change here is that these build steps would now have to be sequential and ordered.

The Worker is built using esbuild, which is an extremely quick JavaScript bundler and compiler that is capable of translating TypeScript, too. It also has its own plugin system, which allowed me the opportunity to add the “inline my HTML files” behavior I needed. The Worker’s build script actually isn’t too intimidating and allows the Worker to `import` HTML files directly. This allows the insanity from above can be safely replaced with this pattern:

import { Router } from 'worktop';
import * as Cache from 'worktop/cache';

// loaded via esbuild plugin
import LANDING from 'index.html';
import RULES from 'rules/index.html';

API.add('GET', '/', (req, res) => {
  res.setHeader('Content-Type', 'text/html;charset=utf-8');
  res.setHeader('Cache-Control', 'public,max-age=60');
  res.send(200, LANDING);
});

API.add('GET', '/rulees', (req, res) => {
  res.setHeader('Content-Type', 'text/html;charset=utf-8');
  res.setHeader('Cache-Control', 'public,max-age=1800');
  res.send(200, RULES);
});

// ...

// init; w/ Cache API
Cache.listen(API.run);

Of course, this is much cleaner and sensible in the long-run. Clarity makes it easier to identify and extract common patterns into utility functions. I took the opportunity to introduce a render function, the first of many reusable helpers this project would encounter:

// worker/utils.ts
import type { ServerResponse } from 'worktop/response';

export function render(res: ServerResponse, template: string) {
  res.setHeader('Content-Type', 'text/html;charset=UTF-8');
  res.send(200, template);
}

// worker/index.ts
import * as utils from './utils';

API.add('GET', '/', (req, res) => {
  res.setHeader('Cache-Control', 'public,max-age=60');
  return utils.render(res, LANDING);
});

API.add('GET', '/rulees', (req, res) => {
  res.setHeader('Cache-Control', 'public,max-age=1800');
  return utils.render(res, RULES);
});

Finally, most of the pages need to dynamically insert values into the HTML markup. For example, the submission form should render with the participant’s name and email address and the landing page is required to reflect the current value of remaining prizes. Much like any other monolithic application, the Worker Site is fully aware – and capable – of injecting these values where needed.

To do this, I standardized the {{ variable }} syntax in my project’s HTML. Each of these variables would be replaced during the Worker request with the appropriate value. Of course, it also requires that each endpoint actually provide the correct information to make the substitutions. With this in mind, I modified the `render` utility and updated the landing page’s route handler:

// worker/utils.ts
import type { KV } from 'worktop/kv';
import type { ServerResponse } from 'worktop/response';

// TypeScript placeholder
// Defines the `DATA` KV binding
declare const DATA: KV.Namespace;

export function render(res: ServerResponse, template: string, values: Record<string, string> = {}) {
  for (let key in values) {
    template = template.replace('{{ ' + key + ' }}', values[key]);
  }
  res.setHeader('Content-Type', 'text/html;charset=UTF-8');
  res.send(200, template);
}
  
export function toCount(): Promise<string> {
  return DATA.get('::remain', 'text').then(v => v || '300+');
}
  
// worker/index.ts
import * as utils from './utils';

API.add('GET', '/', async (req, res) => {
  // Get the "::remain" count from KV
  const count = await utils.toCount();
  
  // Short-term TTL for remaining swag updates
  res.setHeader('Cache-Control', 'public,max-age=60');
  
  // Render the HTML, passing in `count` variable
  return utils.render(res, LANDING, { count });
});

With these changes, the landing page will always check the KV namespace for the latest ::remain value and inject it into the correct location. If you’re interested in checking out the project’s source code, you’ll find that this pattern is used in nearly every HTML response.

Accepting Form Submissions

As expected, this application made heavy use of form submissions. Luckily, the Fetch API offers a variety of built-in body parsers to make retrieval of the data trivial. Additionally, worktop offers a convenience function that will automatically invoke the correct parser based on the request’s Content-Type header. It’s aptly named req.body().

It’s easy to parse and retrieve user data, but it still has to be validated. There are a number of ways to do this, all of which boil down to an input object, a group of rules, and a loop through those rules, collecting any error messages into an errors object. This is precisely what my utils.validate helper does, allowing me to clearly define and manage my rules inline.

Let’s see how this looks within the POST /submit handler, which accepts the initial registration form:

// worker/index.ts
import * as utils from './utils';

API.add('POST', '/signup', async (req, res) => {
  try {
    var input = await req.body<Entry>();
  } catch (err) {
    return toError(res, 400, 'Error parsing input');
  }

  let { email, firstname, lastname } = input || {};
  firstname = String(firstname||'').trim();
  lastname = String(lastname||'').trim();
  email = String(email||'').trim();

  let { errors, invalid } = utils.validate({
    email, firstname, lastname
  }, {
    email(val: string) {
      if (val.length < 1) return 'Required';
      return utils.isEmail(val) || 'Invalid email address';
    },
    firstname(val: string) {
      return val.length > 1 || 'Required';
    },
    lastname(val: string) {
      return val.length > 1 || 'Required';
    }
  });

  if (invalid) {
    return res.send(422, errors);
  }
      
  // The `input` is valid!
  
  return res.send(200, 'TODO: finish me');
});

Only after the data is considered valid can data be stored into KV for future use. For the initial registration, a number of things need to happen:

  1. Ensure that the input.email hasn’t already been registered;
  2. Persist the new registration using the `input` values, identifying each document with the user:<email> key;
  3. Generate and save a unique code for the registration, which will be used later to ensure (a) that unregistered persons cannot submit projects and (b) that a registrant can only submit once;
  4. Send the user an email, containing their unique submission link; and
  5. Render a confirmation page, reminding the user to check their inbox for their link.

It can seem like a lot, but after piecing together a few utility helpers and abstractions, it can actually feel quite approachable:

// worker/index.ts
import * as utils from './utils';
import * as Sparkpost from './emails';
import * as Signup from './signup';
import * as Code from './code';

function toError(res: ServerResponse, status: number, reason: string) {
  return res.send(status, { status, reason });
}

API.add('POST', '/signup', async (req, res) => {
  try {
    var input = await req.body<Entry>();
  } catch (err) {
    return toError(res, 400, 'Error parsing input');
  }
  
  let { email, firstname, lastname } = input || {};
  firstname = String(firstname||'').trim();
  lastname = String(lastname||'').trim();
  email = String(email||'').trim();
  
  // truncated: validation
  
  // Ensure email is not already in use
  let exists = await Signup.find(email);
  if (exists) return toError(res, 400, 'You have already signed up');

  // Generate new `Entry` record
  let entry = Signup.prepare({ email, firstname, lastname });

  // create "user:<unique email>" document
  let isOK = await Signup.save(entry);
  if (!isOK) return toError(res, 500, 'Error persisting entry');

  // create "code:<unique value>" document
  isOK = await Code.save(entry);
  if (!isOK) return toError(res, 500, 'Error saving unique code');

  // dispatch "We received your registration" email
  let sent = await Sparkpost.confirm(entry);
  if (!sent) return toError(res, 500, 'Error sending confirmation email');

  // render "Thank you, check your {{ email }} for next steps" page
  return utils.render(res, CONFIRM, { email: entry.email });
});

A full HTML response is returned, which means that the client-side form handler should be able to see this content and render it directly in the browser window. This can be seen in the following index.js snippet, which was referenced earlier in the submit/index.html as an injected asset:

// (client) index.js

$('form').onsubmit = async function (ev) {
  ev.preventDefault();

  var form = ev.target;
  var res = await fetch(form.action, {
    method: form.method || 'POST',
    body: new FormData(form),
  });

  // truncate: clear existing errors

  if (res.ok) {
    form.reset();
    // Receive HTML response
    let html = await res.text();
    // Force-write the new HTML into this window
    document.documentElement.innerHTML = html;
  } else {
    // truncate: render errors
  }
};

BONUS: Because a full HTML response is returned, and all the client-side <form> elements are semantically correct, the form submission workflow will work with JavaScript disabled! The client-side validation will remain functional, but be a degraded experience – the error dialog won’t popup and any error messages will not appear beneath their respective form inputs.

Sending Transactional Emails

It should (hopefully) come as no surprise that programmatically sending an email is pretty straightforward these days. We chose to use SparkPost, but practically every service has the same API mechanics:

  • Obtain an API Token
  • Send a POST request to an endpoint with:
    • your API Token as an Authorization header
    • your recipient, sender identity, and text and/or HTML content as the POST body
  • Wait for a 200-level response, or deal with any API errors

Most email-as-a-service providers allow you to define templates, which allow you to replace variables with unique values per email – essentially the same thing our utils.render function is doing with our HTML contents. The benefit of this is that you only have to worry about writing your emails once; then you’re just POST’ing new values to the API endpoint.

SparkPost allows templates to be referenced by a custom name rather than a randomly generated identifier, which makes it easy to track and debug templates over time.

// worker/emails.ts
import type { Entry } from './signup';

// wrangler secret
// @see https://developers.sparkpost.com/api/#header-authentication
declare const SPARKPOST_KEY: string;

/**
 * Assemble the POST request for all SparkPost email triggers
 * @see https://developers.sparkpost.com/api/transmissions/#transmissions-post-send-a-template
 */
async function send(
  templateid: string,
  recipient: Entry,
  values?: Record<string, string>
): Promise<boolean> {
  const res = await fetch('https://api.sparkpost.com/api/v1/transmissions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': SPARKPOST_KEY,
    },
    body: JSON.stringify({
      content: {
        template_id: templateid,
      },
      recipients: [{
        address: {
          email: recipient.email,
          name: recipient.firstname + ' ' + recipient.lastname,
        },
        substitution_data: values || {},
      }]
    })
  });

  let data = await res.json() as {
    results: {
      id: string;
      total_rejected_recipients: number;
      total_accepted_recipients: number;
    }
  };

  return res.ok && data.results.total_accepted_recipients === 1;
}
    
/**
 * Confirming user's signup
 * Sending unique submission form
 */
export function confirm(entry: Entry): Promise<boolean> {
  return send('devchallenge-confirm', entry, {
    firstname: entry.firstname,
    code: entry.code,
  });
}

The above snippet includes the entire POST request formatter – there’s nearly more type-hinting than there is code! Also shown is an example confirm method, which is responsible for sending the unique submission link to the newly-registered user. You’ll notice that firstname and code are the injected variables, required by the “devchallenge-confirm” template.

Overall Performance

I’d call this a success!

Even though this certainly wasn’t my first Worker project – and definitely won’t be my last – I’m consistently amazed how much the Workers runtime lets me get away with. I mean, if you could only take away two points from this article, they should be:

  1. I was able to build a moderately complex application, from scratch, while incorporating a Cache layer, a globally-replicated storage layer, and a super-performant JS runtime, all of which live under the same roof.
  2. I (probably) spent more time fussing with a custom client-side build pipeline than I did piecing together the mission-critical API form handlers.

The cherry on top: Should this contest go viral and lure in millions of visitors, I’d only be paying a couple of dollars at the end of the month. Obviously I have a bias here, but it’s pretty amazing really.

Finally, performance-wise, this may justify the time spent fiddling with the HTML build output:

Building the Cloudflare Summer Challenge Application

Lessons Learned

As I alluded to earlier, if I were to rebuild this application, or if I were to add more to it down the road, I would replace the Workers Site architecture with a Pages project and deploy a Worker in front of it for my API requirements and dynamic KV injections.

Since the static assets would no longer be embedded into the Worker’s source, I would need to replace the `utils.render` approach for another utility that fetches the URL from Pages (which becomes my “origin server”) and then uses HTMLRewriter to inject the variables. Also, not that I was anywhere near the 1MB size limit, the largest contributor to my Worker’s bytesize would disappear.

But, more significantly, this refactor would also reduce my total tooling since the majority of the project’s complexity lies in the custom build system for the frontend assets. In other words, the entire /src directory could have been built and deployed like a normal static website, which would allow me to make use of existing frameworks and/or toolkits instead of taking my self-imposed detour. There would have been no need to create a custom frontend toolkit and its bridge to get the static assets loaded into my Worker.

However, none of this is to say that Workers Sites was a bad approach for this application. It’s quite the contrary! This is all to highlight the flexibility of Worker Sites – and the Workers platform at large. Cloudflare Pages exists so that I, the developer, can lean into existing, well-traveled paths and let the experts worry about toolkits, build pipelines, and deployments… But that doesn’t prevent you, the resident expert, from customizing every aspect if that’s your desire.

Resources

Announcing Green Compute on Cloudflare Workers

Post Syndicated from Aly Cabral original https://blog.cloudflare.com/announcing-green-compute/

Announcing Green Compute on Cloudflare Workers

Announcing Green Compute on Cloudflare Workers

All too often we are confronted with the choice to move quickly or act responsibly. Whether the topic is safety, security, or in this case sustainability, we’re asked to make the trade off of halting innovation to protect ourselves, our users, or the planet. But what if that didn’t always need to be the case? At Cloudflare, our goal is to bring sustainable computing to you without the need for any additional time, work, or complexity.

Enter Green Compute on Cloudflare Workers.

Green Compute can be enabled for any Cron triggered Workers. The concept is simple: when turned on, we’ll take your compute workload and run it exclusively on parts of our edge network located in facilities powered by renewable energy. Even though all of Cloudflare’s edge network is powered by renewable energy already, some of our data centers are located in third-party facilities that are not 100% powered by renewable energy. Green Compute takes our commitment to sustainability one step further by ensuring that not only our network equipment but also the building facility as a whole are powered by renewable energy. There are absolutely no code changes needed. Now, whether you need to update a leaderboard every five minutes or do DNA sequencing directly on our edge (yes, that’s a real use case!), you can minimize the impact of any scheduled work, regardless of how complex or energy intensive.

How it works

Cron triggers allow developers to set time-based invocations for their Workers. These Workers happen on a recurring schedule, as opposed to being triggered by application users via HTTP requests. Developers specify a job schedule in familiar cron syntax either through wrangler or within the Workers Dashboard. To set up a scheduled job, first create a Worker that performs a periodic task, then navigate to the ‘Triggers’ tab to define a Cron Trigger.

Announcing Green Compute on Cloudflare Workers

The great thing about cron triggered Workers is that there is no human on the other side waiting for a response in real time. There is no end user we need to run the job close to. Instead, these Workers are scheduled to run as (often computationally expensive) background jobs making them a no-brainer candidate to run exclusively on sustainable hardware, even when that hardware isn’t the closest to your user base.

Cloudflare’s massive global network is logically one distributed system with all the parts connected, secured, and trusted. Because our network works as a single system, as opposed to a system with logically isolated regions, we have the flexibility to seamlessly move workloads around the world keeping your impact goals in mind without any additional management complexity for you.

Announcing Green Compute on Cloudflare Workers

When you set up a Cron Trigger with Green Compute enabled, the Cloudflare network will route all scheduled jobs to green energy hardware automatically, without any application changes needed. To turn on Green Compute today, signup for our beta.

Real world use

If you haven’t ever had the pleasure of writing a cron job yourself, you might be wondering — what do you use scheduled compute for anyway?

There are a wide range of periodic maintenance tasks necessary to power any application. In my working life, I’ve built a scheduled job that ran every minute to monitor the availability of the system I was responsible for, texting me if any service was unavailable. In another instance, a job ran every five mins, keeping the core database and search feature in sync by pulling all new application data, transforming it, then inserting into a search database. In yet another example, a periodic job ran every half hour to iterate over all user sessions and cleanup sessions that were no longer active.

Scheduled jobs are the backbone of real world systems. Now, with Green Compute on Cloudflare Workers all these real world systems and their computationally expensive background maintenance tasks, can take advantage of running compute exclusively on machines powered by renewable energy.

The Green Network

Our mission at Cloudflare is to help you tackle your sustainability goals. Today, with the launch of the Carbon Impact Report we gave you visibility into your environmental impact. The collaboration with the Green Web Foundation gave green hosting certification for Cloudflare Pages. And our launch of Green Compute on Cloudflare Workers allows you to exclusively run on hardware powered by renewable energy. And the best part? No additional system complexity is required for any of the above.

Cloudflare is focused on making it easy to hit your ambitious goals. We are just getting started.

Introducing Workers Usage Notifications

Post Syndicated from Aly Cabral original https://blog.cloudflare.com/introducing-workers-usage-notifications/

Introducing Workers Usage Notifications

Introducing Workers Usage Notifications

So you’ve built an application on the Workers platform. The first thing you might be wondering after pushing your code out into the world is “what does my production traffic look like?” How many requests is my Worker handling? How long are those requests taking? And as your production traffic evolves overtime it can be a lot to keep up with. The last thing you want is to be surprised by the traffic your serverless application is handling.  But, you have a million things to do in your day job, and having to log in to the Workers dashboard every day to check usage statistics is one extra thing you shouldn’t need to worry about.

Today we’re excited to launch Workers usage notifications that proactively send relevant usage information directly to your inbox. Usage notifications come in two flavors. The first is a weekly summary of your Workers usage with a breakdown of your most popular Workers. The second flavor is an on-demand usage notification, triggered when a worker’s CPU usage is 25% above its average CPU usage over the previous seven days. This on-demand notification helps you proactively catch large changes in Workers usage as soon as those changes occur whether from a huge spike in traffic or a change in your code.

As of today, if you create a new free account with Workers, we’ll enable both the weekly summary and the CPU usage notification by default. All other account types will not have Workers usage notifications turned on by default, but the notifications can be enabled as needed. Once we collect substantial user feedback, our goal is to turn these notifications on by default for all accounts.

The Worker Weekly Summary

The mission of the Worker Weekly Summary is to give you a high-level view of your Workers usage automatically, in your inbox, without needing to sign in to the Worker’s dashboard. The summary includes valuable information such as total request counts, duration and egress data transfer, aggregated across your account, with breakouts for your most popular Workers that include median CPU time. Where duration accounts for the entire time your Worker spends responding to a request, CPU time is the time a script spends in computational work on the Cloudflare edge, discounting any time spent waiting on network requests, including requests to third-party APIs.

Introducing Workers Usage Notifications

Workers Usage Report

Where the Workers Weekly Summary provides a high-level view of your account usage statistics, the Workers Usage Report is targeted, event-driven and potentially actionable. It identifies those Workers with greater than a 25% increase in CPU time compared to the average of the previous 7 days (i.e., those Workers taking significantly more CPU resources now than in the recent past).

While sometimes these increases may be no surprise, perhaps because you’ve recently pushed a new deployment that you expected to do more CPU heavy work, at other times they may indicate a bug or reveal that a script is unintentionally expensive. The Workers Usage Report gives us an opportunity to let you know when a noticeable change in your compute footprint occurs, so that you can remedy any potential problems right away.

Introducing Workers Usage Notifications

Enabling Notifications

If you’d like to explicitly opt in to notifications, you can start off by clicking Add in the Notifications section of the Cloudflare zone dashboard.

Introducing Workers Usage Notifications
Introducing Workers Usage Notifications

After clicking Add, note the two new entries below “Create Notifications” under the “Workers” Product:

Introducing Workers Usage Notifications

Click on the Select box in line with “ Weekly Summary” for the weekly roll-up, which will then allow you to configure email recipients, webhooks or connect the notification to PagerDuty.

Introducing Workers Usage Notifications

Clicking Select next to “Usage Report” for CPU threshold notifications will send you to a similar configuration experience where you can customize email recipients and other integrations.

Introducing Workers Usage Notifications

What’s next?

As mentioned above, we’re enabling notifications by default for all new free plans on Workers. Before rolling out these notifications by default to all our users, we want to hear from you. Let us know your experience with our Workers usage notifications by joining our Developer Community discord or by sending feedback via the survey in the email notifications.

Our Workers Weekly Summary and on-demand CPU usage notification are just the beginning of our journey to support a wide range of useful, relevant notifications that help you get visibility into your deployments. We want to surface the right usage information, exactly when you need it.

Explaining Cloudflare’s ABR Analytics

Post Syndicated from Jamie Herre original https://blog.cloudflare.com/explaining-cloudflares-abr-analytics/

Explaining Cloudflare's ABR Analytics

Cloudflare’s analytics products help customers answer questions about their traffic by analyzing the mind-boggling, ever-increasing number of events (HTTP requests, Workers requests, Spectrum events) logged by Cloudflare products every day.  The answers to these questions depend on the point of view of the question being asked, and we’ve come up with a way to exploit this fact to improve the quality and responsiveness of our analytics.

Useful Accuracy

Consider the following questions and answers:

What is the length of the coastline of Great Britain? 12.4K km
What is the total world population? 7.8B
How many stars are in the Milky Way? 250B
What is the total volume of the Antarctic ice shelf? 25.4M km3
What is the worldwide production of lentils? 6.3M tonnes
How many HTTP requests hit my site in the last week? 22.6M

Useful answers do not benefit from being overly exact.  For large quantities, knowing the correct order of magnitude and a few significant digits gives the most useful answer.  At Cloudflare, the difference in traffic between different sites or when a single site is under attack can cross nine orders of magnitude and, in general, all our traffic follows a Pareto distribution, meaning that what’s appropriate for one site or one moment in time might not work for another.

Explaining Cloudflare's ABR Analytics

Because of this distribution, a query that scans a few hundred records for one customer will need to scan billions for another.  A report that needs to load a handful of rows under normal operation might need to load millions when a site is under attack.

To get a sense of the relative difference of each of these numbers, remember “Powers of Ten”, an amazing visualization that Ray and Charles Eames produced in 1977.  Notice that the scale of an image determines what resolution is practical for recording and displaying it.

Explaining Cloudflare's ABR Analytics

Using ABR to determine resolution

This basic fact informed our design and implementation of ABR for Cloudflare analytics.  ABR stands for “Adaptive Bit Rate”.  It’s essentially an eponym for the term as used in video streaming such as Cloudflare’s own Stream Delivery.  In those cases, the server will select the best resolution for a video stream to match your client and network connection.

In our case, every analytics query that supports ABR will be calculated at a resolution matching the query.  For example, if you’re interested to know from which country the most firewall events were generated in the past week, the system might opt to use a lower resolution version of the firewall data than if you had opted to look at the last hour. The lower resolution version will provide the same answer but take less time and fewer resources.  By using multiple, different resolutions of the same data, our analytics can provide consistent response times and a better user experience.

You might be aware that we use a columnar store called ClickHouse to store and process our analytics data.  When using ABR with ClickHouse, we write the same data at multiple resolutions into separate tables.  Usually, we cover seven orders of magnitude – from 100% to 0.0001% of the original events.  We wind up using an additional 12% of disk storage but enable very fast ad hoc queries on the reduced resolution tables.

Explaining Cloudflare's ABR Analytics

Aggregations and Rollups

The ABR technique facilitates aggregations by making compact estimates of every dimension.  Another way to achieve the same ends is with a system that computes “rollups”.  Rollups save space by computing either complete or partial aggregations of the data as it arrives.  

For example, suppose we wanted to count a total number of lentils. (Lentils are legumes and among the oldest and most widely cultivated crops.  They are a staple food in many parts of the world.)  We could just count each lentil as it passed through the processing system. Of course because there a lot of lentils, that system is distributed – meaning that there are hundreds of separate machines.  Therefore we’ll actually have hundreds of separate counters.

Also, we’ll want to include more information than just the count, so we’ll also include the weight of each lentil and maybe 10 or 20 other attributes. And of course, we don’t want just a total for each attribute, but we’ll want to be able to break it down by color, origin, distributor and many other things, and also we’ll want to break these down by slices of time.

In the end, we’ll have tens of thousands or possibly millions of aggregations to be tabulated and saved every minute.  These aggregations are expensive to compute, especially when using aggregations more complicated than simple counters and sums.  They also destroy some information.  For example, once we’ve processed all the lentils through the rollups, we can’t say for sure that we’ve counted them all, and most importantly, whichever attributes we neglected to aggregate are unavailable.

The number we’re counting, 6.3M tonnes, only includes two significant digits which can easily be achieved by counting a sample.  Most of the rollup computations used on each lentil (on the order 1013 to account for 6.3M tonnes) are wasted.

Other forms of aggregations

So far, we’ve discussed ABR and its application to aggregations, but we’ve only given examples involving “counts” and “sums”.  There are other, more complex forms of aggregations we use quite heavily.  Two examples are “topK” and “count-distinct”.

A “topK” aggregation attempts to show the K most frequent items in a set.  For example, the most frequent IP address, or country.  To compute topK, just count the frequency of each item in the set and return the K items with the highest frequencies. Under ABR, we compute topK based on the set found in the matching resolution sample. Using a sample makes this computation a lot faster and less complex, but there are problems.

The estimate of topK derived from a sample is biased and dependent on the distribution of the underlying data. This can result in overestimating the significance of elements in the set as compared to their frequency in the full set. In practice this effect can only be noticed when the cardinality of the set is very high and you’re not going to notice this effect on a Cloudflare dashboard.  If your site has a lot of traffic and you’re looking at the Top K URLs or browser types, there will be no difference visible at different resolutions.  Also keep in mind that as long as we’re estimating the “proportion” of the element in the set and the set is large, the results will be quite accurate.

The other fascinating aggregation we support is known as “count-distinct”, or number of uniques.  In this case we want to know the number of unique values in a set.  For example, how many unique cache keys have been used.  We can safely say that a uniform random sample of the set cannot be used to estimate this number.  However, we do have a solution.

We can generate another, alternate sample based on the value in question.  For example, instead of taking a random sample of all requests, we take a random sample of IP addresses.  This is sometimes called distinct reservoir sampling, and it allows us to estimate the true number of distinct IPs based on the cardinality of the sampled set. Again, there are techniques available to improve these estimates, and we’ll be implementing some of those.

ABR improves resilience and scalability

Using ABR saves us resources.  Even better, it allows us to query all the attributes in the original data, not just those included in rollups.  And even better, it allows us to check our assumptions against different sample intervals in separate tables as a check that the system is working correctly, because the original events are preserved.

However, the greatest benefits of employing ABR are the ones that aren’t directly visible. Even under ideal conditions, a large distributed system such as Cloudflare’s data pipeline is subject to high tail latency.  This occurs when any single part of the system takes longer than usual for any number of a long list of reasons.  In these cases, the ABR system will adapt to provide the best results available at that moment in time.

For example, compare this chart showing Cache Performance for a site under attack with the same chart generated a moment later while we simulate a failure of some of the servers in our cluster.  In the days before ABR, your Cloudflare dashboard would fail to load in this scenario.  Now, with ABR analytics, you won’t see significant degradation.

Explaining Cloudflare's ABR Analytics
Explaining Cloudflare's ABR Analytics

Stretching the analogy to ABR in video streaming, we want you to be able to enjoy your analytics dashboard without being bothered by issues related to faulty servers, or network latency, or long running queries.  With ABR you can get appropriate answers to your questions reliably and within a predictable amount of time.

In the coming months, we’re going to be releasing a variety of new dashboards and analytics products based on this simple but profound technology.  Watch your Cloudflare dashboard for increasingly useful and interactive analytics.

Rendering React on the Edge with Flareact and Cloudflare Workers

Post Syndicated from Guest Author original https://blog.cloudflare.com/rendering-react-on-the-edge-with-flareact-and-cloudflare-workers/

Rendering React on the Edge with Flareact and Cloudflare Workers

The following is a guest post from Josh Larson, Engineer at Vox Media.

Imagine you’re the maintainer of a high-traffic media website, and your DNS is already hosted on Cloudflare.

Page speed is critical. You need to get content to your audience as quickly as possible on every device. You also need to render ads in a speedy way to maintain a good user experience and make money to support your journalism.

One solution would be to render your site statically and cache it at the edge. This would help ensure you have top-notch delivery speed because you don’t need a server to return a response. However, your site has decades worth of content. If you wanted to make even a small change to the site design, you would need to regenerate every single page during your next deploy. This would take ages.

Another issue is that your site would be static — and future updates to content or new articles would not be available until you deploy again.

That’s not going to work.

Another solution would be to render each page dynamically on your server. This ensures you can return a dynamic response for new or updated articles.

However, you’re going to need to pay for some beefy servers to be able to handle spikes in traffic and respond to requests in a timely manner. You’ll also probably need to implement a system of internal caches to optimize the performance of your app, which could lead to a more complicated development experience. That also means you’ll be at risk of a thundering herd problem if, for any reason, your cache becomes invalidated.

Neither of these solutions are great, and you’re forced to make a tradeoff between one of these two approaches.

Thankfully, you’ve recently come across a project like Next.js which offers a hybrid approach: static-site generation along with incremental regeneration. You’re in love with the patterns and developer experience in Next.js, but you’d also love to take advantage of the Cloudflare Workers platform to host your site.

Cloudflare Workers allow you to run your code on the edge quickly, efficiently and at scale. Instead of paying for a server to host your code, you can host it directly inside the datacenter — reducing the number of network trips required to load your application. In a perfect world, we wouldn’t need to find hosting for a Next.js site, because Cloudflare offers the same JavaScript hosting functionality with the Workers platform. With their dynamic runtime and edge caching capabilities, we wouldn’t need to worry about making a tradeoff between static and dynamic for our site.

Unfortunately, frameworks like Next.js and Cloudflare Workers don’t mesh together particularly well due to technical constraints. Until now:

I’m excited to announce Flareact, a new open-source React framework built for Cloudflare Workers.

Rendering React on the Edge with Flareact and Cloudflare Workers

With Flareact, you don’t need to make the tradeoff between a static site and a dynamic application.

Flareact allows you to render your React apps at the edge rather than on the server. It is modeled after Next.js, which means it supports file-based page routing, dynamic page paths and edge-side data fetching APIs.

Not only are Flareact pages rendered at the edge — they’re also cached at the edge using the Cache API. This allows you to provide a dynamic content source for your app without worrying about traffic spikes or response times.

With no servers or origins to deal with, your site is instantly available to your audience. Cloudflare Workers gives you a 0ms cold start and responses from the edge within milliseconds.

You can check out the docs and get started now by clicking the button below:

Rendering React on the Edge with Flareact and Cloudflare Workers

To get started manually, install the latest wrangler, and use the handy wrangler generate command below to create your first project:

npm i @cloudflare/wrangler -g
wrangler generate my-project https://github.com/flareact/flareact-template

What’s the big deal?

Hosting React apps on Cloudflare Workers Sites is not a new concept. In fact, you’ve always been able to deploy a create-react-app project to Workers Sites in addition to static versions of other frameworks like Gatsby and Next.js.

However, Flareact renders your React application at the edge. This allows you to provide an initial server response with HTML markup — which can be helpful for search engine crawlers. You can also cache the response at the edge and optionally invalidate that cache on a timed basis — meaning your static markup will be regenerated if you need it to be fresh.

This isn’t a new pattern: Next.js has done the hard work in defining the shape of this API with SSG support and Incremental Static Regeneration. While there are nuanced differences in the implementation between Flareact and Next.js, they serve a similar purpose: to get your application to your end-user in the quickest and most-scalable way possible.

A focus on developer experience

A magical developer experience is a crucial ingredient to any successful product.

As a longtime fan and user of Next.js, I wanted to experiment with running the framework on Cloudflare Workers. However, Next.js and its APIs are framed around the Node.js HTTP Server API, while Cloudflare Workers use V8 isolates and are modeled after the FetchEvent type.

Since we don’t have typical access to a filesystem inside V8 isolates, it’s tough to mimic the environment required to run a dynamic Next.js server at the edge. Though projects like Fab have come up with workarounds, I decided to approach the project with a clean slate and use existing patterns established in Next.js in a brand-new framework.

As a developer, I absolutely love the simplicity of exporting an asynchronous function from my page to have it supply props to the component. Flareact implements this pattern by allowing you to export a getEdgeProps function. This is similar to getStaticProps in Next.js, and it matches the expected return shape of that function in Next.js — including a revalidate parameter. Learn more about data fetching in Flareact.

I was also inspired by the API Routes feature of Next.js when I implemented the API Routes feature of Flareact — enabling you to write standard Cloudflare Worker scripts directly within your React app.

I hope porting over an existing Next.js project to Flareact is a breeze!

How it works

When a FetchEvent request comes in, Flareact inspects the URL pathname to decide how to handle it:

If the request is for a page or for page props, it checks the cache for that request and returns it if there’s a hit. If there is a cache miss, it generates the page request or props function, stores the result in the cache, and returns the response.

If the request is for an API route, it sends the entire FetchEvent along to the user-defined API function, allowing the user to respond as they see fit.

Rendering React on the Edge with Flareact and Cloudflare Workers

If you want your cached page to be revalidated after a certain amount of time, you can return an additional revalidate property from getEdgeProps(). This instructs Flareact to cache the endpoint for that number of seconds before generating a new response.

Rendering React on the Edge with Flareact and Cloudflare Workers

Finally, if the request is for a static asset, it returns it directly from the Workers KV.

The Worker

The core responsibilities of the Worker — or in a traditional SSR framework, the server are to:

  1. Render the initial React page component into static HTML markup.
  2. Provide the initial page props as a JSON object, embedded into the static markup in a script tag.
  3. Load the client-side JavaScript bundles and stylesheets necessary to render the interactive page.

One challenge with building Flareact is that the Webpack targets the webworker output rather than the node output. This makes it difficult to inform the worker which pages exist in the filesystem, since there is no access to the filesystem.

To get around this, Flareact leverages require.context, a Webpack-specific API, to inspect the project and build a manifest of pages on the client and the worker. I’d love to replace this with a smarter bundling strategy on the client-side eventually.

The Client

In addition to handling incoming Worker requests, Flareact compiles a client bundle containing the code necessary for routing, data fetching and more from the browser.

The core responsibilities of the client are to:

  1. Listen for routing events
  2. Fetch the necessary page component and its props from the worker over AJAX

Building a client router from scratch has been a challenge. It listens for changes to the internal route state, updates the URL pathname with pushState, makes an AJAX request to the worker for the page props, and then updates the current component in the render tree with the requested page.

It was fun building a flareact/link component similar to next/link:

import Link from "flareact/link";

export default function Index() {
  return (
    <div>
      <Link href="/about">
        <a>Go to About</a>
      </Link>
    </div>
  );
}

I also set out to build a custom version of next/head for Flareact. As it turns out, this was non-trivial! With lots of interesting stuff going on behind the scenes to support SSR and client-side routing events, I decided to make flareact/head a simple wrapper around react-helmet instead:

import Head from "flareact/head";

export default function Index() {
  return (
    <div>
      <Head>
        <title>My page title</title>
      </Head>
      <h1>Hello, world.</h1>
    </div>
  );
}

Local Development

The local developer experience of Flareact leverages the new wrangler dev command, sending server requests through a local tunnel to the Cloudflare edge and back to your machine.


This is a huge win for productivity, since you don’t need to manually build and deploy your application to see how it will perform in a production environment.

It’s also a really exciting update to the serverless toolchain. Running a robust development environment in a serverless world has always been a challenge, since your code is executing in a non-traditional context. Tunneling local code to the edge and back is such a great addition to Cloudflare’s developer experience.

Use cases

Flareact is a great candidate for a lot of Jamstack-adjacent applications, like blogs or static marketing sites.

It could also be used for more dynamic applications, with robust API functions and authentication mechanisms — all implemented using Cloudflare Workers.

Imagine building a high-traffic e-commerce site with Flareact, where both site reliability and dynamic rendering for things like price changes and stock availability are crucial.

There are also untold possibilities for integrating the Workers KV into your edge props or API functions as a first-class database solution. No need to reach for an externally-hosted database!

While the project is still in its early days, here are a couple real-world examples:

The road ahead

I have to be honest: creating a server-side rendered React framework with little prior knowledge was very difficult. There’s still a ton to learn, and Flareact has a long way to go to reach parity with Next.js in the areas of optimization and production-readiness.

Here’s what I’m hoping to add to Flareact in the near future:

  • Smarter client bundling and Webpack chunks to reduce individual page weight
  • A more feature-complete client-side router
  • The ability to extend and customize the root document of the app
  • Support for more style frameworks (CSS-in-JS, Sass, CSS modules, etc)
  • A more stable development environment
  • Documentation and support for environment variables, secrets and KV namespaces
  • A guide for deploying from GitHub Actions and other CI tools

If the project sounds interesting to you, be sure to check out the source code on GitHub. Contributors are welcome!

Asynchronous HTMLRewriter for Cloudflare Workers

Post Syndicated from Ashcon Partovi original https://blog.cloudflare.com/asynchronous-htmlrewriter-for-cloudflare-workers/

Asynchronous HTMLRewriter for Cloudflare Workers

Asynchronous HTMLRewriter for Cloudflare Workers

Last year, we launched HTMLRewriter for Cloudflare Workers, which enables developers to make streaming changes to HTML on the edge. Unlike a traditional DOM parser that loads the entire HTML document into memory, we developed a streaming parser written in Rust. Today, we’re announcing support for asynchronous handlers in HTMLRewriter. Now you can perform asynchronous tasks based on the content of the HTML document: from prefetching fonts and image assets to fetching user-specific content from a CMS.

How can I use HTMLRewriter?

We designed HTMLRewriter to have a jQuery-like experience. First, you define a handler, then you assign it to a CSS selector; Workers does the rest for you. You can look at our new and improved documentation to see our supported list of selectors, which now include nth-child selectors. The example below changes the alternative text for every second image in a document.

async function editHtml(request) {
  return new HTMLRewriter()
     .on("img:nth-child(2)", new ElementHandler())
     .transform(await fetch(request))
}

class ElementHandler {
   element(e) {
      e.setAttribute("alt", "A very interesting image")
   }
}

Since these changes are applied using streams, we maintain a low TTFB (time to first byte) and users never know the HTML was transformed. If you’re interested in how we’re able to accomplish this technically, you can read our blog post about HTML parsing.

What’s new with HTMLRewriter?

Now you can define an async handler which allows any code that uses await. This means you can make dynamic HTML injection, based on the contents of the document, without having prior knowledge of what it contains. This allows you to customize HTML based on a particular user, feature flag, or even an integration with a CMS.

class UserCustomizer {
   // Remember to add the `async` keyword to the handler method
   async element(e) {
      const user = await fetch(`https://my.api.com/user/${e.getAttribute("user-id")}/online`)
      if (user.ok) {
         // Add the user’s name to the element
         e.setAttribute("user-name", await user.text())
      } else {
         // Remove the element, since this user not online
         e.remove()
      }
   }
}

What can I build with HTMLRewriter?

To illustrate the flexibility of HTMLRewriter, I wrote an example that you can deploy on your own website. If you manage a website, you know that old links and images can expire with time. Here’s an excerpt from a years’ old post I wrote on the Cloudflare Blog:

Asynchronous HTMLRewriter for Cloudflare Workers

As you might see, that missing image is not the prettiest sight. However, we can easily fix this using async handlers in HTMLRewriter. Using a service like the Internet Archive API, we can check if an image no longer exists and rewrite the URL to use the latest archive. That means users don’t see an ugly placeholder and won’t even know the image was replaced.

async function fetchAndFixImages(request) {
   return new HTMLRewriter()
      .on("img", new ImageFixer())
      .transform(await fetch(request))
}

class ImageFixer {
   async element(e) {
    var url = e.getAttribute("src")
    var response = await fetch(url)
    if (!response.ok) {
       var archive = await fetch(`https://archive.org/wayback/available?url=${url}`)
       if (archive.ok) {
          var snapshot = await archive.json()
          e.setAttribute("src", snapshot.archived_snapshots.closest.url)
       } else {
          e.remove()
       }
    }
  }
}

Using the Workers Playground, you can view a working sample of the above code. A more complex example could even alert a service like Sentry when a missing image is detected. Using the previous missing image, now you can see the image is restored and users are none of the wiser.

Asynchronous HTMLRewriter for Cloudflare Workers

If you’re interested in deploying this to your own website, click on the button below:

Asynchronous HTMLRewriter for Cloudflare Workers

What else can I build with HTMLRewriter?

We’ve been blown away by developer projects using HTMLRewriter. Here are a few projects that caught our eye and are great examples of the power of Cloudflare Workers and HTMLRewriter:

If you’re interested in using HTMLRewriter, check out our documentation. Also be sure to share any creations you’ve made with @CloudflareDev, we love looking at the awesome projects you build.

Announcing wrangler dev — the Edge on localhost

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/announcing-wrangler-dev-the-edge-on-localhost/

Announcing wrangler dev — the Edge on localhost

Cloudflare Workers — our serverless platform — allows developers around the world to run their applications from our network of 200 datacenters, as close as possible to their users.

A few weeks ago we announced a release candidate for wrangler dev — today, we’re excited to take wrangler dev, the world’s first edge-based development environment, to GA with the release of wrangler 1.11.

Think locally, develop globally

It was once assumed that to successfully run an application on the web, one had to go and acquire a server, set it up (in a data center that hopefully you had access to), and then maintain it on an ongoing basis. Luckily for most of us, that assumption was challenged with the emergence of the cloud. The cloud was always assumed to be centralized — large data centers in a single region (“us-east-1”), reserved for compute. The edge? That was for caching static content.

Again, assumptions are being challenged.

Cloudflare Workers is about moving compute from a centralized location to the edge. And it makes sense: if users are distributed all over the globe, why should all of them be routed to us-east-1, on the opposite side of the world, causing latency and degrading user experience?

But challenging one assumption caused others to come into view. One of the most obvious ones was: would a local development environment actually provide the best experience for someone looking to test their Worker code? Trying to fit the entire Cloudflare edge, with all its dependencies onto a developer’s machine didn’t seem to be the best approach. Especially given that the place the developer was going to run that code in production was mere milliseconds away from the computer they were running on.

When I was in college, getting started with programming, one of the biggest barriers to entry was installing all the dependencies required to run a single library. I would go as far as to say that the third, and often forgotten hardest problem in computer science is dependency management.

We’ve not the first to try and unify development environments across machines — tools such as Docker aim to solve this exact problem by providing a prepackaged development environment.

Yet, packaging up the Workers runtime is not quite so simple.

Beyond the Workers runtime, there are many components that make up Cloudflare’s edge, including DNS resolution, the Cloudflare cache — all of those parts are what makes Cloudflare Workers so powerful. That means that without those components, a standalone runtime is insufficient to represent the behavior of Worker request handling. The reason to develop locally first is to have the opportunity to experiment without affecting production. Thus, having a local development environment that truly reflects production is a requirement.

wrangler dev

wrangler dev provides all the convenience of a local development environment, without the headache of trying to reproduce the reality of production locally — and then having to keep the two environments in sync.

By running at the edge, it provides a high fidelity, consistent experience for all developers, without sacrificing the speedy feedback loop of a local development environment.

Live reloading

Announcing wrangler dev — the Edge on localhost

As you update your code, wrangler dev will detect changes, and push the new version of your code to the edge.

console.log() at your fingertips

Announcing wrangler dev — the Edge on localhost

Previously to extract your console logs from the Workers runtime, you had to have the Workers Preview open in a browser window at all times. With wrangler dev, you can receive your own logs, directly to your terminal of choice.

Cache API, KV, and more!

Since wrangler dev runs on the edge, you can now easily test the state of a cache.put(), without having to deploy your Worker to production.

wrangler dev will spin up a new KV namespace for development, so you don’t have to worry about affecting your production data.

And if you’re looking to test out some of the features provided on request.cf that provide rich information about the request such as geo-location — they will all be provided from the Cloudflare data center.

Get started

wrangler dev is now available in the latest version of Wrangler, the official Cloudflare Workers CLI.

To get started, follow our installation instructions here.

What’s next?

wrangler dev is just our first foray into giving our developers more visibility and agility with their development process.

We recognize that we have a lot more work to do to meet our developers needs, including providing an easy testing framework for Workers, and allowing our customers to observe their Workers’ behavior in production.

Just as wrangler dev provides a quick feedback loop between our developers and their code, we love to have a tight feedback loop between our developers and our product. We love to hear what you’re building, how you’re building it, and how we can help you build it better.

Making magic: Reimagining Developer Experience for the World of Serverless

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/making-magic-reimagining-developer-experiences-for-the-world-of-serverless/

Making magic: Reimagining Developer Experience for the World of Serverless

Making magic: Reimagining Developer Experience for the World of Serverless

This week we’ve talked about how Workers provides a step function improvement in the TTFB (time to first byte) of applications, by running lightweight isolates in over 200 cities around the world, free of cold starts. Today I’m going to talk about another metric, one that’s arguably even more important: TTFD, or time to first dopamine, and announce a huge improvement to the Workers development experience — wrangler dev, our edge-based development environment with all the perks of a local environment.

There’s nothing quite like the rush of getting your first few lines of code to work — no matter how many times you’ve done it before, there’s something so magical about the computer understanding exactly what you wanted it to do and doing it!

Making magic: Reimagining Developer Experience for the World of Serverless

This is the kind of magic I expected of “serverless”, and while it’s true that most serverless offerings today get you to that feeling faster than setting up a virtual server ever would, I still can’t help but be disappointed with how lackluster developing with most serverless platforms is today.

Some of my disappointment can be attributed to the leaky nature of the abstraction: the journey to getting you to the point of writing code is drawn out by forced decision making about servers (regions, memory allocation, etc). Servers, however, are not the only thing holding developers back from getting to the delightful magical feeling in the serverless world today.

The “serverless” experience on AWS Lambda today looks like this: between configuring the right access policy to invoke my own test application, and deciding whether an HTTP or REST API was better suited for my needs, 30 minutes had easily passed, and I still didn’t have a URL I could call to invoke my application. I did, however, spin up five different services, and was already worrying about cleaning them up lest I be charged for them.

That doesn’t feel like magic!

In building what we believe to be the serverless platform of the future — a promise that feels very magical —  we wanted to bring back that magical feeling to every step of the development journey. If serverless is about empowering developers, then they should be empowered every step of the way: from proof of concept to MVP and beyond.

We’re excited to share with you today our approach to making our developer experience delightful — we recognize we still have plenty of room to continue to grow and innovate (and we can’t wait to tell you about everything we have currently in the works as well!), but we’re proud of all the progress we’ve made in making Workers the easiest development platform for developers to use.

Defining “developer experience”

To get us started, let’s look at what the journey of a developer entails. Today, we’ll be defining the user experience as the following four stages:

  • Getting started: All the steps we have to take before putting in some keystrokes
  • Iteration: Does my code do what I expect it to do? What do I need to do to get it there?
  • Release: I’ve tested what I can — time to hit the big red button!
  • Observe: Is anything broken? And how do I fix it?
Making magic: Reimagining Developer Experience for the World of Serverless

When approaching each stage of development, we wanted to reimagine the experience, the way that we’ve always wanted our development flow to work, and fix places along the way where existing platforms have let us down.

Zero to Hello World

Making magic: Reimagining Developer Experience for the World of Serverless

With Workers, we want to get you to that aforementioned delightful feeling as quickly as possible, and remove every obstacle in the way of writing and deploying your code. The first deployment experience is really important — if you’ve done it once and haven’t given up along the way, you can do it again.

We’re very proud to say our TTFD — even for a new user without a Cloudflare account — is as low as three minutes. If you’re an existing customer, you can have your first Worker running in seconds. No regions to choose, no IAM rules to configure, and no API Gateways to set up or worry about paying for.

If you’re new to Workers and still trying to get a feel for it, you can instantly deploy your Worker to 200 cities around the world within seconds, with the simple click of a button.

If you’ve already decided on Workers as the choice for building your next application, we want to make you feel at home by allowing you to use all of your favorite IDEs, be it vim or emacs or VSCode (we don’t care!).

With the release of wrangler — the official command-line tool for Workers, getting started is just as easy as:

wrangler generate hello
cd hello
wrangler publish

Again, in seconds your code is up and running, and easily accessible all over the world.

“Hello, World!”, of course, doesn’t have to be quite so literal. We provide a range of tutorials to help get you started and get familiar with developing with Workers.

To save you that last bit of time in getting started, our template gallery provides starter templates so you can dive straight into building the products you’re excited about — whether it’s a new GraphQL server or a brand new static site, we’ve got you covered.

Local(ish) development: code, test, repeat

Making magic: Reimagining Developer Experience for the World of Serverless

We can’t promise to get the code right on your behalf, but we can promise to do everything we can to get you the feedback you need to help you get your code right.

The development journey requires lots of experimentation, trial and error, and debugging. If my Computer Science degree came with instructions on the back of the bottle, they would read: “code, print, repeat.”

Getting code right is an extremely iterative, feedback-driven process. We would all love to get code right the first time around and move on, but the reality is, computers are bad mind-readers, and you’ve ended up with an extraneous parenthesis or a stray comma in your JSON, so your code is not going to run. Found where the loose parenthesis was introduced? Great! Now your code is running, but the output is not right — time to go find that off-by-one error.

Local development has traditionally been the way for developers to get a tight feedback loop during the development process. The crucial components that make up an effective local development environment and make it a great testing ground are: fast feedback loop, its sandboxed nature (ability to develop without affecting production), and accuracy.

As we started thinking about accomplishing all three of those goals, we realized that being local actually wasn’t itself a requirement — speed is the real requirement, and running on the client is the only way acceptable speed for a good-enough feedback loop could be achieved.

One option was to provide a traditional local development environment, but one thing didn’t sit well with us: we wanted to provide a local development environment for the Workers runtime, however, we knew there was more to handling a request than just the runtime, which could compromise accuracy. We didn’t want to set our users up to fail with code that works on their machine but not ours.

Shipping the rest of our edge infrastructure to the user would pose its own challenges of keeping it up to date, and it would require the user to install hundreds of unnecessary dependencies, all potentially to end up with the most frustrating experience of all: running into some installation bug the explanation to which couldn’t be found on StackOverflow. This experience didn’t sit right with us.

As it turns out, this is a very similar problem to one we commonly solve for our customers: Running code on the client is fast, but it doesn’t give me the control I need; running code on the server gives me the control I need, but it requires a slow round-trip to the origin. All we had to do was take our own advice and run it on the edge! It’s the best of both worlds: your code runs so close to your end user that you get the same performance as running it on the client, without having to lose control.

To provide developers access to this tight feedback loop, we introduced wrangler dev earlier this year!

Making magic: Reimagining Developer Experience for the World of Serverless

wrangler dev  has the look and feel of a local development environment: it runs on localhost but tunnels to the edge, and provides output directly to your favorite IDE of choice. Since wrangler dev now runs on the edge, it works on your machine and ours exactly the same!

Our release candidate for wrangler dev is live and waiting for you to take it for a test drive, as easily as:

npm i @cloudflare/[email protected] -g

Let us know what you think.

Release

Making magic: Reimagining Developer Experience for the World of Serverless

After writing all the code, testing every edge case imaginable, and going through code review, at some point the code needs to be released for the rest of the world to reap the fruits of your hard labor and enjoy the features you’ve built.

For smaller, quick applications, it’s exciting to hit the “Save & deploy” button and let fate take the wheel.

For production level projects, however, the process of deploying to production may be a bit different. Different organizations adopt different processes for code release. For those using GitHub, last year we introduced our GitHub Action, to make it easy to configure an integrated release process.

With Wrangler, you can configure Workers to deploy using your existing CI, to automate deployments, and minimize human intervention.

When deploying to production, again, feedback becomes extremely important. Some platforms today still take as long as a few minutes to deploy your code. A few minutes may seem trivial, but a few minutes of nervously refreshing, wondering whether your code is live yet, and which version of your code your users are seeing is stressful. This is especially true in a rollback or a bug-fix situation where you want the new version to be live ASAP.

New Workers are deployed globally in less than five seconds, which means new changes are instantaneous. Better yet, since Workers runs on lightweight isolates, newly deployed Workers don’t experience dreaded cold starts, which means you can release code as frequently as you’re able to ship it, without having to invest additional time in auxiliary gadgets to pre-warm your Worker — more time for you to start working on your next feature!

Observe & Resolve

Making magic: Reimagining Developer Experience for the World of Serverless

The big red button has been pushed. Dopamine has been replaced with adrenaline: the instant question on your mind is: “Did I break anything? And if so, what, and how do I fix it?” These questions are at the core of what the industry calls “observability”.

There are different ways things can break and incidents can manifest themselves: increases in errors, drops in traffic, even a drop in performance could be considered a regression.

To identify these kinds of issues, you need to be able to spot a trend. Raw data, however, is not a very useful medium for spotting trends — humans simply cannot parse raw lines of logs to identify a subtle increase in errors.

This is why we took a two-punch approach to helping developers identify and fix issues: exposing trend data through analytics, while also providing the ability to tail production logs for forensics and investigation.

Earlier this year, we introduced Workers Metrics: an easy way for developers to identify trends in their production traffic.

With requests metrics, you can easily spot any increases in errors, or drastic changes in traffic patterns after a given release:

Making magic: Reimagining Developer Experience for the World of Serverless

Additionally, sometimes new code can introduce unforeseen regressions in the overall performance of the application. With CPU time metrics, our developers are now able to spot changes in the performance of their Worker, as well as use that information to guide and optimize their code.

Making magic: Reimagining Developer Experience for the World of Serverless

Once you’ve identified a regression, we wanted to provide the tools needed to find your bug and fix it, which is why we also recently launched `wrangler tail`: production logs in a single command.

wrangler tail can help diagnose where code is failing or why certain customers are getting unexpected outcomes because it exposes console.log() output and exceptions. By having access to this output, developers can immediately diagnose, fix, and resolve any issues occurring in production.

Making magic: Reimagining Developer Experience for the World of Serverless

We know how precious every moment can be when a bad code deploy impacts customer traffic. Luckily, once you’ve found and fixed your bug, it’s only a matter of seconds for users to start benefiting from the fix — unlike other platforms which make you wait as long as 5 minutes, Workers get deployed globally within five seconds.

Repeat

Making magic: Reimagining Developer Experience for the World of Serverless

As you’re thinking about your next feature, you checkout a new branch, and the cycle begins all over. We’re excited for you to check out all the improvements we’ve made to the development experience with Workers, all to reduce your time to first dopamine (TTFD).

We are always working on improving it further, looking where we can remove every additional bit of friction, and love to hear your feedback as we do so.

Workers Security

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/workers-security/

Workers Security

Workers Security
Hello, I’m an engineer on the Workers team, and today I want to talk to you about security.

Cloudflare is a security company, and the heart of Workers is, in my view, a security project. Running code written by third parties is always a scary proposition, and the primary concern of the Workers team is to make that safe.

For a project like this, it is not enough to pass a security review and say "ok, we’re secure" and move on. It’s not even enough to consider security at every stage of design and implementation. For Workers, security in and of itself is an ongoing project, and that work is never done. There are always things we can do to reduce the risk and impact of future vulnerabilities.

Today, I want to give you an overview of our security architecture, and then address two specific issues that we are frequently asked about: V8 bugs, and Spectre.

Architectural Overview

Let’s start with a quick overview of the Workers Runtime architecture.

Workers Security

There are two fundamental parts of designing a code sandbox: secure isolation and API design.

Isolation

First, we need to create an execution environment where code can’t access anything it’s not supposed to.

For this, our primary tool is V8, the JavaScript engine developed by Google for use in Chrome. V8 executes code inside "isolates", which prevent that code from accessing memory outside the isolate — even within the same process. Importantly, this means we can run many isolates within a single process. This is essential for an edge compute platform like Workers where we must host many thousands of guest apps on every machine, and rapidly switch between these guests thousands of times per second with minimal overhead. If we had to run a separate process for every guest, the number of tenants we could support would be drastically reduced, and we’d have to limit edge compute to a small number of big enterprise customers who could pay a lot of money. With isolate technology, we can make edge compute available to everyone.

Sometimes, though, we do decide to schedule a worker in its own private process. We do this if it uses certain features that we feel need an extra layer of isolation. For example, when a developer uses the devtools debugger to inspect their worker, we run that worker in a separate process. This is because historically, in the browser, the inspector protocol has only been usable by the browser’s trusted operator, and therefore has not received as much security scrutiny as the rest of V8. In order to hedge against the increased risk of bugs in the inspector protocol, we move inspected workers into a separate process with a process-level sandbox. We also use process isolation as an extra defense against Spectre, which I’ll describe later in this post.

Additionally, even for isolates that run in a shared process with other isolates, we run multiple instances of the whole runtime on each machine, which we call "cordons". Workers are distributed among cordons by assigning each worker a level of trust and separating low-trusted workers from those we trust more highly. As one example of this in operation: a customer who signs up for our free plan will not be scheduled in the same process as an enterprise customer. This provides some defense-in-depth in the case a zero-day security vulnerability is found in V8. But I’ll talk more about V8 bugs, and how we address them, later in this post.

At the whole-process level, we apply another layer of sandboxing for defense in depth. The "layer 2" sandbox uses Linux namespaces and seccomp to prohibit all access to the filesystem and network. Namespaces and seccomp are commonly used to implement containers. However, our use of these technologies is much stricter than what is usually possible in container engines, because we configure namespaces and seccomp after the process has started (but before any isolates have been loaded). This means, for example, we can (and do) use a totally empty filesystem (mount namespace) and use seccomp to block absolutely all filesystem-related system calls. Container engines can’t normally prohibit all filesystem access because doing so would make it impossible to use exec() to start the guest program from disk; in our case, our guest programs are not native binaries, and the Workers runtime itself has already finished loading before we block filesystem access.

The layer 2 sandbox also totally prohibits network access. Instead, the process is limited to communicating only over local Unix domain sockets, to talk to other processes on the same system. Any communication to the outside world must be mediated by some other local process outside the sandbox.

One such process in particular, which we call the "supervisor", is responsible for fetching worker code and configuration from disk or from other internal services. The supervisor ensures that the sandbox process cannot read any configuration except that which is relevant to the workers that it should be running.

For example, when the sandbox process receives a request for a worker it hasn’t seen before, that request includes the encryption key for that worker’s code (including attached secrets). The sandbox can then pass that key to the supervisor in order to request the code. The sandbox cannot request any worker for which it has not received the appropriate key. It cannot enumerate known workers. It also cannot request configuration it doesn’t need; for example, it cannot request the TLS key used for HTTPS traffic to the worker.

Aside from reading configuration, the other reason for the sandbox to talk to other processes on the system is to implement APIs exposed to Workers. Which brings us to API design.

API Design

There is a saying: "If a tree falls in the forest, but no one is there to hear it, does it make a sound?" I have a related saying: "If a Worker executes in a fully-isolated environment in which it is totally prevented from communicating with the outside world, does it actually run?"

Complete code isolation is, in fact, useless. In order for Workers to do anything useful, they have to be allowed to communicate with users. At the very least, a Worker needs to be able to receive requests and respond to them. It would also be nice if it could send requests to the world, safely. For that, we need APIs.

In the context of sandboxing, API design takes on a new level of responsibility. Our APIs define exactly what a Worker can and cannot do. We must be very careful to design each API so that it can only express operations which we want to allow, and no more. For example, we want to allow Workers to make and receive HTTP requests, while we do not want them to be able to access the local filesystem or internal network services.

Let’s dig into the easier example first. Currently, Workers does not allow any access to the local filesystem. Therefore, we do not expose a filesystem API at all. No API means no access.

But, imagine if we did want to support local filesystem access in the future. How would we do that? We obviously wouldn’t want Workers to see the whole filesystem. Imagine, though, that we wanted each Worker to have its own private directory on the filesystem where it can store whatever it wants.

To do this, we would use a design based on capability-based security. Capabilities are a big topic, but in this case, what it would mean is that we would give the worker an object of type Directory, representing a directory on the filesystem. This object would have an API that allows creating and opening files and subdirectories, but does not permit traversing "up" the tree to the parent directory. Effectively, each worker would see its private Directory as if it were the root of their own filesystem.

How would such an API be implemented? As described above, the sandbox process cannot access the real filesystem, and we’d prefer to keep it that way. Instead, file access would be mediated by the supervisor process. The sandbox talks to the supervisor using Cap’n Proto RPC, a capability-based RPC protocol. (Cap’n Proto is an open source project currently maintained by the Cloudflare Workers team.) This protocol makes it very easy to implement capability-based APIs, so that we can strictly limit the sandbox to accessing only the files that belong to the Workers it is running.

Now what about network access? Today, Workers are allowed to talk to the rest of the world only via HTTP — both incoming and outgoing. There is no API for other forms of network access, therefore it is prohibited (though we plan to support other protocols in the future).

As mentioned before, the sandbox process cannot connect directly to the network. Instead, all outbound HTTP requests are sent over a Unix domain socket to a local proxy service. That service implements restrictions on the request. For example, it verifies that the request is either addressed to a public Internet service, or to the Worker’s zone’s own origin server, not to internal services that might be visible on the local machine or network. It also adds a header to every request identifying the worker from which it originates, so that abusive requests can be traced and blocked. Once everything is in order, the request is sent on to our HTTP caching layer, and then out to the Internet.

Similarly, inbound HTTP requests do not go directly to the Workers Runtime. They are first received by an inbound proxy service. That service is responsible for TLS termination (the Workers Runtime never sees TLS keys), as well as identifying the correct Worker script to run for a particular request URL. Once everything is in order, the request is passed over a Unix domain socket to the sandbox process.

V8 bugs and the "patch gap"

Every non-trivial piece of software has bugs, and sandboxing technologies are no exception. Virtual machines have bugs, containers have bugs, and yes, isolates (which we use) also have bugs. We can’t live life pretending that no further bugs will ever be discovered; instead, we must assume they will and plan accordingly.

We rely heavily on isolation provided by V8, the JavaScript engine built by Google for use in Chrome. This has good sides and bad sides. On one hand, V8 is an extraordinarily complicated piece of technology, creating a wider "attack surface" than virtual machines. More complexity means more opportunities for something to go wrong. On the bright side, though, an extraordinary amount of effort goes into finding and fixing V8 bugs, owing to its position as arguably the most popular sandboxing technology in the world. Google regularly pays out 5-figure bounties to anyone finding a V8 sandbox escape. Google also operates fuzzing infrastructure that automatically finds bugs faster than most humans can. Google’s investment does a lot to minimize the danger of V8 "zero-days" — bugs that are found by the bad guys and not known to Google.

But, what happens after a bug is found and reported by the good guys? V8 is open source, so fixes for security bugs are developed in the open and released to everyone at the same time — good guys and bad guys. It’s important that any patch be rolled out to production as fast as possible, before the bad guys can develop an exploit.

The time between publishing the fix and deploying it is known as the "patch gap". Earlier this year, Google announced that Chrome’s patch gap had been reduced from 33 days to 15 days.

Fortunately, we have an advantage here, in that we directly control the machines on which our system runs. We have automated almost our entire build and release process, so the moment a V8 patch is published, our systems automatically build a new release of the Workers Runtime and, after one-click sign-off from the necessary (human) reviewers, automatically push that release out to production.

As a result, our patch gap is now under 24 hours. A patch published by V8’s team in Munich during their work day will usually be in production before the end of our work day in the US.

Spectre: Introduction

Workers Security
We get a lot of questions about Spectre. The V8 team at Google has stated in no uncertain terms that V8 itself cannot defend against Spectre. Since Workers relies on V8 for sandboxing, many have asked if that leaves Workers vulnerable. However, we do not need to depend on V8 for this; the Workers environment presents many alternative approaches to mitigating Spectre.

Spectre is complicated and nuanced, and there’s no way I can cover everything there is to know about it or how Workers addresses it in a single blog post. But, hopefully I can clear up some of the confusion and concern.

What is it?

Spectre is a class of attacks in which a malicious program can trick the CPU into "speculatively" performing computation using data that the program is not supposed to have access to. The CPU eventually realizes the problem and does not allow the program to see the results of the speculative computation. However, the program may be able to derive bits of the secret data by looking at subtle side effects of the computation, such as the effects on cache.

For more background about Spectre, check out our Learning Center page on the topic.

Why does it matter for Workers?

Spectre encompasses a wide variety of vulnerabilities present in modern CPUs. The specific vulnerabilities vary by architecture and model, and it is likely that many vulnerabilities exist which haven’t yet been discovered.

These vulnerabilities are a problem for every cloud compute platform. Any time you have more than one tenant running code on the same machine, Spectre attacks come into play. However, the "closer together" the tenants are, the more difficult it can be to mitigate specific vulnerabilities. Many of the known issues can be mitigated at the kernel level (protecting processes from each other) or at the hypervisor level (protecting VMs), often with the help of CPU microcode updates and various tricks (many of which, unfortunately, come with serious performance impact).

In Cloudflare Workers, we isolate tenants from each other using V8 isolates — not processes nor VMs. This means that we cannot necessarily rely on OS or hypervisor patches to "solve" Spectre for us. We need our own strategy.

Why not use process isolation?

Cloudflare Workers is designed to run your code in every single Cloudflare location, of which there are currently 200 worldwide and growing.

We wanted Workers to be a platform that is accessible to everyone — not just big enterprise customers who can pay megabucks for it. We need to handle a huge number of tenants, where many tenants get very little traffic.

Combine these two points, and things get tricky.

A typical, non-edge serverless provider could handle a low-traffic tenant by sending all of that tenant’s traffic to a single machine, so that only one copy of the application needs to be loaded. If the machine can handle, say, a dozen tenants, that’s plenty. That machine can be hosted in a mega-datacenter with literally millions of machines, achieving economies of scale. However, this centralization incurs latency and worldwide bandwidth costs when the users don’t happen to be nearby.

With Workers, on the other hand, every tenant, regardless of traffic level, currently runs in every Cloudflare location. And in our quest to get as close to the end user as possible, we sometimes choose locations that only have space for a limited number of machines. The net result is that we need to be able to host thousands of active tenants per machine, with the ability to rapidly spin up inactive ones on-demand. That means that each guest cannot take more than a couple megabytes of memory — hardly enough space for a call stack, much less everything else that a process needs.

Moreover, we need context switching to be extremely cheap. Many Workers resident in memory will only handle an event every now and then, and many Workers spend less than a fraction of a millisecond on any particular event. In this environment, a single core can easily find itself switching between thousands of different tenants every second. Moreover, to handle one event, a significant amount of communication needs to happen between the guest application and its host, meaning still more switching and communications overhead. If each tenant lives in its own process, all this overhead is orders of magnitude larger than if many tenants live in a single process. When using strict process isolation in Workers, we find the CPU cost can easily be 10x what it is with a shared process.

In order to keep Workers inexpensive, fast, and accessible to everyone, we must solve these issues, and that means we must find a way to host multiple tenants in a single process.

There is no "fix" for Spectre

A dirty secret that the industry doesn’t like to admit: no one has "fixed" Spectre. Not even when using heavyweight virtual machines. Everyone is still vulnerable.

The current approach being taken by most of the industry is essentially a game of whack-a-mole. Every couple months, researchers uncover a new Spectre vulnerability. CPU vendors release some new microcode, OS vendors release kernel patches, and everyone has to update.

But is it enough to merely deploy the latest patches?

It is abundantly clear that many more vulnerabilities exist, but haven’t yet been publicized. Who might know about those vulnerabilities? Most of the bugs being published are being found by (very smart) graduate students on a shoestring budget. Imagine, for a minute, how many more bugs a well-funded government agency, able to buy the very best talent in the world, could be uncovering.

To truly defend against Spectre, we need to take a different approach. It’s not enough to block individual known vulnerabilities. We must address the entire class of vulnerabilities at once.

We can’t stop it, but we can slow it down

Unfortunately, it’s unlikely that any catch-all "fix" for Spectre will be found. But for the sake of argument, let’s try.

Fundamentally, all Spectre vulnerabilities use side channels to detect hidden processor state. Side channels, by definition, involve observing some non-deterministic behavior of a system. Conveniently, most software execution environments try hard to eliminate non-determinism, because non-deterministic execution makes applications unreliable.

However, there are a few sorts of non-determinism that are still common. The most obvious among these is timing. The industry long ago gave up on the idea that a program should take the same amount of time every time it runs, because deterministic timing is fundamentally at odds with heuristic performance optimization. Sure enough, most Spectre attacks focus on timing as a way to detect the hidden microarchitectural state of the CPU.

Some have proposed that we can solve this by making timers inaccurate or adding random noise. However, it turns out that this does not stop attacks; it only makes them slower. If the timer tracks real time at all, then anything you can do to make it inaccurate can be overcome by running an attack multiple times and using statistics to filter out the noise.

Many security researchers see this as the end of the story. What good is slowing down an attack, if the attack is still possible? Once the attacker gets your private key, it’s game over, right? What difference does it make if it takes them a minute or a month?

Cascading Slow-downs

We find that, actually, measures that slow down an attack can be powerful.

Our key insight is this: as an attack becomes slower, new techniques become practical to make it even slower still. The goal, then, is to chain together enough techniques that an attack becomes so slow as to be uninteresting.

Much of cryptography, after all, is technically vulnerable to "brute force" attacks — technically, with enough time, you can break it. But when the time required is thousands (or even billions) of years, we decide that this is good enough.

So, what do we do to slow down Spectre attacks to the point of meaninglessness?

Freezing a Spectre Attack

Workers Security

Step 0: Don’t allow native code

We do not allow our customers to upload native-code binaries to run on our network. We only accept JavaScript and WebAssembly. Of course, many other languages, like Python, Rust, or even Cobol, can be compiled or transpiled to one of these two formats; the important point is that we do another pass on our end, using V8, to convert these formats into true native code.

This, in itself, doesn’t necessarily make Spectre attacks harder. However, I present this as step 0 because it is fundamental to enabling everything else below.

Accepting native code programs implies being beholden to an existing CPU architecture (typically, x86). In order to execute code with reasonable performance, it is usually necessary to run the code directly on real hardware, severely limiting the host’s control over how that execution plays out. For example, a kernel or hypervisor has no ability to prohibit applications from invoking the CLFLUSH instruction, an instruction which is very useful in side channel attacks and almost nothing else.

Moreover, supporting native code typically implies supporting whole existing operating systems and software stacks, which bring with them decades of expectations about how the architecture works under them. For example, x86 CPUs allow a kernel or hypervisor to disable the RDTSC instruction, which reads a high-precision timer. Realistically, though, disabling it will break many programs because they are implemented to use RDTSC any time they want to know the current time.

Supporting native code would bind our hands in terms of mitigation techniques. By using an abstract intermediate format, we have much greater freedom.

Step 1: Disallow timers and multi-threading

In Workers, you can get the current time using the JavaScript Date API, for example by calling Date.now(). However, the time value returned by this is not really the current time. Instead, it is the time at which the network message was received which caused the application to begin executing. While the application executes, time is locked in place. For example, say an attacker writes:

let start = Date.now();
for (let i = 0; i < 1e6; i++) {
  doSpectreAttack();
}
let end = Date.now();

The values of start and end will always be exactly the same. The attacker cannot use Date to measure the execution time of their code, which they would need to do to carry out an attack.

As an aside: This is a measure we actually implemented in mid-2017, long before Spectre was announced (and before we knew about it). We implemented this measure because we were worried about timing side channels in general. Side channels have been a concern of the Workers team from day one, and we have designed our system from the ground up with this concern in mind.

Related to our taming of Date, we also do not permit multi-threading or shared memory in Workers. Everything related to the processing of one event happens on the same thread — otherwise, it would be possible to "race" threads in order to "MacGyver" an implicit timer. We don’t even allow multiple Workers operating on the same request to run concurrently. For example, if you have installed a Cloudflare App on your zone which is implemented using Workers, and your zone itself also uses Workers, then a request to your zone may actually be processed by two Workers in sequence. These run in the same thread.

So, we have prevented code execution time from being measured locally. However, that doesn’t actually prevent it from being measured: it can still be measured remotely. For example, the HTTP client that is sending a request to trigger the execution of the Worker can measure how long it takes for the Worker to respond. Of course, such a measurement is likely to be very noisy, since it would have to traverse the Internet. Such noise can be overcome, in theory, by executing the attack many times and taking an average.

Another aside: Some people have suggested that if a serverless platform like Workers were to completely reset an application’s state between requests, so that every request "starts fresh", this would make attacks harder. That is, imagine that a Worker’s global variables were reset after every request, meaning you cannot store state in globals in one request and then read that state in the next. Then, doesn’t that mean the attack has to start over from scratch for every request? If each request is limited to, say, 50ms of CPU time, does that mean that a Spectre attack isn’t possible, because there’s not enough time to carry it out? Unfortunately, it’s not so simple. State doesn’t have to be stored in the Worker; it could instead be stored in a conspiring client. The server can return its state to the client in each response, and the client can send it back to the server in the next request.

But is an attack based on remote timers really feasible in practice? In adversarial testing, with help from leading Spectre experts, we have not been able to develop an attack that actually works in production.

However, we don’t feel the lack of a working attack means we should stop building defenses. Instead, we’re currently testing some more advanced measures, which we plan to roll out in the coming weeks.

Step 2: Dynamic Process Isolation

We know that if an attack is possible at all, it would take a very long time to run — hours at the very least, maybe as long as weeks. But once an attack has been running even for a second, we have a huge amount of new data that we can use to trigger further measures.

Spectre attacks, you see, do a lot of "weird stuff" that you wouldn’t usually expect to see in a normal program. These attacks intentionally try to create pathological performance scenarios in order to amplify microarchitectural effects. This is especially true when the attack has already been forced to run billions of times in a loop in order to overcome other mitigations, like those discussed above. This tends to show up in metrics like CPU performance counters.

Now, the usual problem with using performance metrics to detect Spectre attacks is that sometimes you get false positives. Sometimes, a legitimate program behaves really badly. You can’t go around shutting down every app that has bad performance.

Luckily, we don’t have to. Instead, we can choose to reschedule any Worker with suspicious performance metrics into its own process. As I described above, we can’t do this with every Worker, because the overhead would be too high. But, it’s totally fine to process-isolate just a few Workers, defensively. If the Worker is legitimate, it will keep operating just fine, albeit with a little more overhead. Fortunately for us, the nature of our platform is such that we can reschedule a Worker into its own process at basically any time.

In fact, fancy performance-counter based triggering may not even be necessary here. If a Worker merely uses a large amount of CPU time per event, then the overhead of isolating it in its own process is relatively less, because it switches context less often. So, we might as well use process isolation for any Worker that is CPU-hungry.

Once a Worker is isolated, then we can rely on the operating system’s Spectre defenses, just aslike, for example, most desktop web browsers now do.

Over the past year we’ve been working with the experts at Graz Technical University to develop this approach. (TU Graz’s team co-discovered Spectre itself, and has been responsible for a huge number of the follow-on discoveries since then.) We have developed the ability to dynamically isolate workers, and we have identified metrics which reliably detect attacks. The whole system is currently undergoing testing to work out any remaining bugs, and we expect to roll it out fully within the next several weeks.

But wait, didn’t I say earlier that even process isolation isn’t a complete defense, because it only addresses known vulnerabilities? Yes, this is still true. However, the trend over time is that new Spectre attacks tend to be slower and slower to carry out, and hence we can reasonably guess that by imposing process isolation we have further slowed down even attacks that we don’t know about yet.

Step 3: Periodic Whole-Memory Shuffling

After Step 2, we already think we’ve prevented all known attacks, and we’re only worried about hypothetical unknown attacks. How long does a hypothetical unknown attack take to carry out? Well, obviously, nobody knows. But with all the mitigations in place so far, and considering that new attacks have generally been slower than older ones, we think it’s reasonable to guess attacks will take days or longer.

On a time scale of a day, we have new things we can do. In particular, it’s totally reasonable to restart the entire Workers runtime on a daily basis, which resets the locations of everything in memory, forcing attacks to restart the process of discovering the locations of secrets.

We can also reschedule Workers across physical machines or cordons, so that the window to attack any particular neighbor is limited.

In general, because Workers are fundamentally preemptible (unlike containers or VMs), we have a lot of freedom to frustrate attacks.

Once we have dynamic process isolation fully deployed, we plan to develop these ideas next. We see this as an ongoing investment, not something that will ever be "done".

Conclusion

Phew. You just read twelve pages about Workers security. Hopefully I’ve convinced you that designing a secure sandbox is only the beginning of building a secure compute platform, and the real work is never done. Popular security culture often dwells on clever hacks and clean fixes. But for the difficult real-world problems, often there is no right answer or simple fix, only the hard work of building defenses thicker and thicker.

The Migration of Legacy Applications to Workers

Post Syndicated from Kirk Schwenkler original https://blog.cloudflare.com/the-migration-of-legacy-applications-to-workers/

The Migration of Legacy Applications to Workers

The Migration of Legacy Applications to Workers

As Cloudflare Workers, and other Serverless platforms, continue to drive down costs while making it easier for developers to stand up globally scaled applications, the migration of legacy applications is becoming increasingly common. In this post, I want to show how easy it is to migrate such an application onto Workers. To demonstrate, I’m going to use a common migration scenario: moving a legacy application — on an old compute platform behind a VPN or in a private cloud — to a serverless compute platform behind zero-trust security.

Wait but why?

Before we dive further into the technical work, however, let me just address up front: why would someone want to do this? What benefits would they get from such a migration? In my experience, there are two sets of reasons: (1) factors that are “pushing” off legacy platforms, or the constraints and problems of the legacy approach; and (2) factors that are “pulling” onto serverless platforms like Workers, which speaks to the many benefits of this new approach. In terms of the push factors, we often see three core ones:

  • Legacy compute resources are not flexible and must be constantly maintained, often leading to capacity constraints or excess cost;
  • Maintaining VPN credentials is cumbersome, and can introduce security risks if not done properly;
  • VPN client software can be challenging for non-technical users to operate.

Similarly, there are some very key benefits “pulling” folks onto Serverless applications and zero-trust security:

  • Instant scaling, up or down, depending on usage. No capacity constraints, and no excess cost;
  • No separate credentials to maintain, users can use Single Sign On (SSO) across many applications;
  • VPN hardware / private cloud; and existing compute, can be retired to simplify operations and reduce cost

While the benefits to this more modern end-state are clear, there’s one other thing that causes organizations to pause: the costs in time and migration effort seem daunting. Often what organizations find is that migration is not as difficult as they fear. In the rest of this post, I will show you how Cloudflare Workers, and the rest of the Cloudflare platform, can greatly simplify migrations and help you modernize all of your applications.

Getting Started

To take you through this, we will use a contrived application I’ve written in Node.js to illustrate the steps we would take with a real, and far more complex, example. The goal is to show the different tools and features you can use at each step; and how our platform design supports development and cutover of an application.  We’ll use four key Cloudflare technologies, as we see how to move this Application off of my Laptop and into the Cloud:

  1. Serverless Compute through Workers
  2. Robust Developer-focused Tooling for Workers via Wrangler
  3. Zero-Trust security through Access
  4. Instant, Secure Origin Tunnels through Argo Tunnels

Our example application for today is called Post Process, and it performs business logic on input provided in an HTTP POST body. It takes the input data from authenticated clients, performs a processing task, and responds with the result in the body of an HTTP response. The server runs in Node.js on my laptop.

Since the example application is written in Node.js; we will be able to directly copy some of the JavaScript assets for our new application. You could follow this “direct port” method not only for JavaScript applications, but even applications in our other WASM-supported languages. For other languages, you first need to rewrite or transpile into one with WASM support.

Into our ApplicationOur basic example will perform only simple text processing, so that we can focus on the broad features of the migration. I’ve set up an unauthenticated copy (using Workers, to give us a scalable and reliable place to host it) at https://postprocess-workers.kirk.workers.dev/postprocess where you can see how it operates. Here is an example cURL:

curl -X POST https://postprocess-workers.kirk.workers.dev/postprocess --data '{"operation":"2","data":"Data-Gram!"}'

The relevant takeaways from the code itself are pretty simple:

  • There are two code modules, which conveniently split the application logic completely from the Preprocessing / HTTP interface.
  • The application logic module exposes one function postProcess(object) where object is the parsed JSON of the POST body. It returns a JavaScript object, ready to be encoded into a string in the JSON HTTP response. This module can be run on Workers JavaScript, with no changes. It only needs a new preprocessing / HTTP interface.
  • The Preprocessing / HTTP interface runs on raw Node.js; and exposes a local HTTPS server. The server does not directly take inbound traffic from the Internet, but sits behind a gateway which controls access to the service.

Code snippet from Node.js HTTP module

const server = http.createServer((req, res) => {
    if (req.url == '/postprocess') {
        if(req.method == 'POST') {
                gatherPost(req, data => {
                        try{
                                jsonData = JSON.parse(data)
                        } catch (e) {
                                res.end('Invalid JSON payload! \n')
                                return
                        }
                        result = postProcess(jsonData)
                        res.write(JSON.stringify(result) + '\n');
                        res.end();
                })
        } else {
                res.end('Invalid Method, only POST is supported! \nPlease send a POST with data in format {"Operation":1","data","Data-Gram!"        }
    } else {
        res.end('Invalid request. Did you mean to POST to /postprocess? \n');
    }
});

Code snippet from Node.js logic module

function postProcess (postJson) {
        const ServerVersion = "2.5.17"
        if(postJson != null && 'operation' in postJson && 'data' in postJson){
                var output
                var operation = postJson['operation']
                var data = postJson['data']
                switch(operation){
                        case "1":
                              output = String(data).toLowerCase()
                              break
                        case "2":
                              d = data + "\n"
                              output = d + d + d
                              break
                        case "3":
                              output = ServerVersion
                              break
                        default:
                              output = "Invalid Operation"
                }
                return {'Output': output}
        }
        else{
                return {'Error':'Invalid request, invalid JSON format'}
        }

Current State Application Architecture

The Migration of Legacy Applications to Workers

Design Decisions

With all this information in hand, we can arrive at at the details of our new Cloudflare-based design:

  1. Keep the business logic completely intact, and specifically use the same .js asset
  2. Build a new preprocessing layer in Workers to replace the Node.js module
  3. Use Cloudflare Access to authenticate users to our application

Target State Application Architecture

The Migration of Legacy Applications to Workers

Finding the first win

One good way to make a migration successful is to find a quick win early on; a useful task which can be executed while other work is still ongoing. It is even better if the quick win also benefits the eventual cutover. We can find a quick win here, if we solve the zero-trust security problem ahead of the compute problem by putting Cloudflare’s security in front of the existing application.

We will do this by using cloudflare’s Argo Tunnel feature to securely connect to the existing application, and Access for zero-trust authentication. Below, you can see how easy this process is for any command-line user, with our cloudflared tool.

I open up a terminal and use cloudflared tunnel login, which presents me with an authentication flow. I then use the cloudflared tunnel --hostname postprocess.kschwenkler.com --url localhost:8080 command to connect an Argo Tunnel between the “url” (my local server) and the “hostname” (the new, public address we will use on my Cloudflare zone).

The Migration of Legacy Applications to Workers

Next I flip over to my Cloudflare dashboard, and attach an Access Policy to the “hostname” I specified before. We will be using the Service Token mode of Access; which generates a client-specific security token which that client can attach to each HTTP POST. Other modes are better suited to interactive browser use cases.

The Migration of Legacy Applications to Workers

Now, without using the VPN, I can send a POST to the service, still running on Node.js on my laptop, from any Internet-connected device which has the correct token! It has taken only a few minutes to add zero-trust security to this application; and safely expose it to the Internet while still running on a legacy compute platform (my laptop!).

The Migration of Legacy Applications to Workers

“Quick Win” Architecture

The Migration of Legacy Applications to Workers

Beyond the direct benefit of a huge security upgrade; we’ve also made our eventual application migration much easier, by putting the traffic through the target-state API gateway already. We will see later how we can surgically move traffic to the new application for testing, in this state.

Lift to the Cloud

With our zero-trust security benefits in hand, and our traffic running through Cloudflare; we can now proceed with the migration of the application itself to Workers. We’ll be using the Wrangler tooling to make this process very easy.

As noted when we first looked at the code, this contrived application exposes a very clean interface between the Node.js-specific HTTP module, which we need to replace, and the business logic postprocess module which we can use as is with Workers. We’ll first need to re-write the HTTP module, and then bundle it with the existing business logic into a new Workers application.

Here is a handwritten example of the basic pattern we’ll use for the HTTP module. We can see how the Service Workers API makes it very easy to grab the POST body with await, and how the JSON interface lets us easily pass the data to the postprocess module we took directly from the initial Node.js app.

addEventListener('fetch', event => {
 event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
 try{
   requestData = await request.json()
 } catch (e) {
   return new Response("Invalid JSON", {status:500})
 }
 const response = new Response(JSON.stringify(postProcess (requestData)))
 return response
}

For our work on the mock application, we will go a slightly different route; more in line with a real application which would be more complex. Instead of writing this by hand, we will use Wrangler and our Router template, to build the new front end from a robust framework.

We’ll run wrangler generate post-process-workers https://github.com/cloudflare/worker-template-router to initialize a new Wrangler project with the Router template. Most of the configurations for this template will work as is; and we just have to update account_id in our wrangler.toml and make a few small edits to the code in index.js.

Below is our index.js after my edits, Note the line const postProcess = require('./postProcess.js') at the start of the new http module – this will tell Wrangler to include the original business logic, from the legacy app’s postProcess.js module which I will copy to our working directory.

const Router = require('./router')
const postProcess = require('./postProcess.js')

addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
})

async function handler(request) {
    const init = {
        headers: { 'content-type': 'application/json' },
    }
    const body = JSON.stringify(postProcess(await request.json()))
    return new Response(body, init)
}

async function handleRequest(request) {
    const r = new Router()
    r.post('.*/postprocess*', request => handler(request))
    r.get('/', () => new Response('Hello worker!')) // return a default message for the root route

    const resp = await r.route(request)
    return resp
}

Now we can simply run wrangler publish, to put our application on workers.dev for testing! The Router template’s defaults; and the small edits made above, are all we need. Since Wrangler automatically exposes the test application to the Internet (note that we can *also* put the test application behind Access, with a slightly modified method), we can easily send test traffic from any device.

The Migration of Legacy Applications to Workers

Shift, Safely!

With our application up for testing on workers.dev, we finally come to the last and most daunting migration step: cutting over traffic from the legacy application to the new one without any service interruption.

Luckily, we had our quick win earlier and are already routing our production traffic through the Cloudflare network (to the legacy application via Argo Tunnels). This provides huge benefits now that we are at the cutover step. Without changing our IP address, SSL configuration, or any other client-facing properties, we can route traffic to the new application with just one wrangler command.

Seamless cutover from Transition to Target state

The Migration of Legacy Applications to Workers

We simply modify wrangler.toml to indicate the production domain / route we’d like the application to operate on; and wrangler publish. As soon as Cloudflare receives this update; it will send production traffic to our new application instead of the Argo Tunnel. We have configured the application to send a ‘version’ header which lets us verify this easily using curl.

The Migration of Legacy Applications to Workers
The Migration of Legacy Applications to Workers

Rollback, if it is needed, is also very easy. We can either set the wrangler.toml back to the workers.dev only mode, and wrangler publish again; or delete our route manually. Either will send traffic back to the Argo Tunnel.

The Migration of Legacy Applications to Workers
The Migration of Legacy Applications to Workers
The Migration of Legacy Applications to Workers

In Conclusion

Clearly, a real application will be more complex than our example above. It may have multiple components, with complex interactions, which must each be handled in turn. Argo Tunnel might remain in use, to connect to a data store or other application outside of our network. We might use WASM to support modules written in other languages. In any of these scenarios, Cloudflare’s Wrangler tooling and Serverless capabilities will help us work through the complexities and achieve success.

I hope that this simple example has helped you to see how Wrangler, cloudflared, Workers, and our entire global network can work together to make migrations as quick and hassle-free as possible. Whether for this case of an old application behind a VPN, or another application that has outgrown its current home – our Workers platform, Wrangler tooling, and underlying platform will scale to meet your business needs.

Trailblazing a Development Environment for Workers

Post Syndicated from Avery Harnish original https://blog.cloudflare.com/trailblazing-a-development-environment-for-workers/

Trailblazing a Development Environment for Workers

Trailblazing a Development Environment for Workers

When I arrived at Cloudflare for an internship in the summer of 2018, I was taken on a tour, introduced to my mentor who took me out for coffee (shoutout to Preston), and given a quick whiteboard overview of how Cloudflare works. Each of the interns would work on a small project of their own and they’d try to finish them by the end of the summer. The description of the project I was given on my very first day read something along the lines of “implementing signed exchanges in a Cloudflare Worker to fix the AMP URL attribution problem,” which was a lot to take in at once. I asked so many questions those first couple of weeks. What are signed exchanges? Can I put these stickers on my laptop? What’s a Cloudflare Worker? Is there a limit to how much Topo Chico I can take from the fridge? What’s the AMP URL attribution problem? Where’s the bathroom?

I got the answers to all of those questions (and more!) and eventually landed a full-time job at Cloudflare. Here’s the story of my internship and working on the Workers Developer Experience team at Cloudflare.

Getting Started with Workers in 2018

After doing a lot of reading, and asking a lot more questions, it was time to start coding. I set up a Cloudflare account with a Workers subscription, and was greeted with a page that looked something like this:

Trailblazing a Development Environment for Workers

I was able to change the code in the text area on the left, click “Update”, and the changes would be reflected on the right — fairly self-explanatory. There was also a testing tab which allowed me to handcraft HTTP requests with different methods and custom headers. So far so good.

As my project evolved, it became clear that I needed to leave the Workers editor behind. Anything more than a one-off script tends to require JavaScript modules and multiple files. I spent some time setting up a local development environment for myself with npm and webpack (see, purgatory: a place or state of temporary suffering. merriam-webster.com).

After I finally got everything working, my iteration cycle looked a bit like this:

  1. Make a change to my code
  2. Run npm run build (which ran webpack and bundled my code in a single script)
  3. Open ./dist/worker.min.js (the output from my build step)
  4. Copy the entire contents of the built Worker to my clipboard
  5. Switch to the Cloudflare Workers Dashboard
  6. Paste my script into the Workers editor
  7. Click update
  8. Investigate the behavior of my recently modified script
  9. Rinse and repeat

There were two main things here that were decidedly not a fantastic developer experience:

  1. Inspecting the value of a variable by adding a console.log statement would take me ~2-3 minutes and involved lots of manual steps to perform a full rebuild.
  2. I was unable to use familiar HTTP clients such as cURL and Postman without deploying to production. This was because the Workers Preview UI was an iframe nested in the dashboard.

Luckily for me, Cloudflare Workers deploy globally incredibly quickly, so I could push the latest iteration of my Worker, wait just a few seconds for it to go live, and cURL away.

A Better Workers Developer Experience in 2019

Shortly after we shipped AMP Real URL, Cloudflare released Wrangler, the official CLI tool for developing Workers, and I was hired full time to work on it. Wrangler came with a feature that automated steps 2-7 of my workflow by running the command wrangler preview, which was a significant improvement. Running the command would build my Worker and open the browser automatically for me so I could see log messages and test out HTTP requests. That summer, our intern Matt Alonso created wrangler preview --watch. This command automatically updates the Workers preview window when changes are made to your code. You can read more about that here. This was, yet again, another improvement over my old friend Build and Open and Copy and Switch Windows and Paste Forever and Ever, Amen. But there was still no way that I could test my Worker with any HTTP client I wanted without deploying to production — I was still locked in to using the nested iframe.

A few months ago we decided it was time to do something about it. To the whiteboard!

Enter wrangler dev

Most web developers are familiar with developing their applications on localhost, and since Wrangler is written in Rust, it means we could start up a server on localhost that would handle requests to a Worker. The idea was to somehow start a server on localhost and then transform incoming requests and send them off to a preview session running on a Cloudflare server.

Proof of Concept

What we came up with ended up looking a little something like this — when a developer runs wrangler dev, do the following:

Trailblazing a Development Environment for Workers

  1. Build the Worker
  2. Upload the Worker via the Cloudflare API as a previewable Worker
  3. The Cloudflare API takes the uploaded script and creates a preview session, and returns an access token
  4. Start listening for incoming HTTP requests at localhost:8787

Top secret fact: 8787 spells out Rust on a phone numpad Happy Easter!

  1. All incoming requests to localhost:8787 are modified:

  • All headers are prepended with cf-ew-raw- (for instance, X-Auth-Header would become cf-ew-raw-X-Auth-Header)
  • The URL is changed to https://rawhttp.cloudflareworkers.com/${path}
  • The Host header is changed to rawhttp.cloudflareworkers.com
  • The cf-ew-preview header is added with the access token returned from the API in step 3

  1. After sending this request, the response is modified

  • All headers not prefixed with cf-ew-raw- are discarded and headers with the prefix have it removed (for instance, cf-ew-raw-X-Auth-Success would become X-Auth-Success)

The hard part here was already done — the Workers Core team had already implemented the API to support the Preview UI. We just needed to gently nudge Wrangler and the API to be the best of friends. After some investigation into Rust’s HTTP ecosystem, we settled on using the HTTP library hyper, which I highly recommend if you’re in need of a low level HTTP library — it’s fast, correct, and the ergonomics are constantly improving. After a bit of work, we got a prototype working and carved Wrangler ❤️ Cloudflare API into the old oak tree down by Lady Bird Lake.

Usage

Let’s say I have a Workers script that looks like this:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  let message = "Hello, World!"
  return new Response(message)
}

If I created a Wrangler project with this code and ran wrangler dev, this is what it looked like:

$ wrangler dev
👂  Listening on http://127.0.0.1:8787

In another terminal session, I could run the following:

$ curl localhost:8787
Hello, World!

It worked! Hooray!

Just the Right Amount of Scope Creep

At this point, our initial goal was complete: any HTTP client could test out a Worker before it was deployed. However, wrangler dev was still missing crucial functionality. When running wrangler preview, it’s possible to view console.log output in the browser editor. This is incredibly useful for debugging Workers applications, and something with a name like wrangler dev should include a way to view those logs as well. “This will be easy,” I said, not yet knowing what I was signing up for. Buckle up!

console.log, V8, and the Chrome Devtools Protocol, Oh My!

My first goal was to get a Hello, World! message streamed to my terminal session so that developers can debug their applications using wrangler dev. Let’s take the script from earlier and add a console.log statement to it:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  let message = "Hello, World!"
  console.log(message) // this line is new
  return new Response(message)
}

If you’d like to follow along, you can paste that script into the editor at cloudflareworkers.com using Google Chrome.

This is what the Preview editor looks like when that script is run:

Trailblazing a Development Environment for Workers

You can see that Hello, World! has been printed to the console. This may not be the most useful example, but in more complex applications logging different variables is helpful for debugging. If you’re following along, try changing console.log(message) to something more interesting, like console.log(request.url).

The console may look familiar to you if you’re a web developer because it’s the same interface you see when you open the Developer Tools in Google Chrome. Since Cloudflare Workers is built on top of V8 (more info about that here and here), the Workers runtime is able to create a WebSocket that speaks the Chrome Devtools Protocol. This protocol allows the client (your browser, Wrangler, or anything else that supports WebSockets) to send and receive messages that contain information about the script that is running.

In order to see the messages that are being sent back and forth between our browser and the Workers runtime:

  1. Open Chrome Devtools
  2. Click the Network tab at the top of the inspector
  3. Click the filter icon underneath the Network tab (it looks like a funnel and is nested between the cancel icon and the search icon)
  4. Click WS to filter out all requests but WebSocket connections

Your inspector should look like this:

Trailblazing a Development Environment for Workers

Then, reload the page, and select the /inspect item to view its messages. It should look like this:

Trailblazing a Development Environment for Workers

Hey look at that! We can see messages that our browser sent to the Workers runtime to enable different portions of the developer tools for this Worker, and we can see that the runtime sent back our Hello, World! Pretty cool!

On the Wrangler side of things, all we had to do to get started was initialize a WebSocket connection for the current Worker, and send a message with the method Runtime.enable so the Workers runtime would enable the Runtime domain and start sending console.log messages from our script.

After those initial steps, it quickly became clear that a lot more work was needed to get to a useful developer tool. There’s a lot that goes into the Chrome Devtools Inspector and most of the libraries for interacting with it are written in languages other than Rust (which we use for Wrangler). We spent a lot of time switching WebSocket libraries due to incompatibilities across operating systems (turns out TLS is hard) and implementing the part of the Chrome Devtools Protocol in Rust that we needed to. There’s a lot of work that still needs to be done in order to make wrangler dev a top notch developer tool, but we wanted to get it into the hands of developers as quickly as possible.

Try it Out!

wrangler dev is currently in alpha, and we’d love it if you could try it out! You should first check out the Quick Start and then move on to wrangler dev. If you run into issues or have any feedback, please let us know!

Signing Off

I’ve come a long way from where I started in 2018 and so has the Workers ecosystem. It’s been awesome helping to improve the developer experience of Workers for future interns, internal Cloudflare teams, and of course our customers. I can’t wait to see what we do next. I have some ideas for what’s next with Wrangler, so stay posted!

P.S. Wrangler is also open source, and we are more than happy to field bug reports, feedback, and community PRs. Check out our Contribution Guide if you want to help out!

Migrating to React land: Gatsby

Post Syndicated from Victoria Bernard original https://blog.cloudflare.com/migrating-to-react-land-gatsby/

Migrating to React land: Gatsby

Migrating to React land: Gatsby

I am an engineer that loves docs. Well, OK, I don’t love all docs but I believe docs are a crucial, yet often neglected element to a great developer experience. I work on the developer experience team for Cloudflare Workers focusing on several components of Workers, particularly on the docs that we recently migrated to Gatsby.


Through porting our documentation site to Gatsby I learned a lot. In this post, I share some of the learnings that could’ve saved my former self from several headaches. This will hopefully help others considering a move to Gatsby or another static site generator.

Why Gatsby?

Prior to our migration to Gatsby, we used Hugo for our developer documentation. There are a lot of positives about working with Hugo – fast build times, fast load times – that made building a simple static site a great use case for Hugo. Things started to turn sour when we started making our docs more interactive and expanding the content being generated.

Going from writing JSX with TypeScript back to string-based templating languages is difficult. Trying to perform complicated tasks, like generating a sidebar, cost me – a developer who knows nothing about liquid code or Go templating (though with Golang experience) – several tears not even to implement but to just understand what was happening.

Here is the code to template an item in the sidebar in Hugo:

<!-- templates -->
{{ define "section-tree-nav" }}
{{ $currentNode := .currentnode }}
{{ with .sect }}
 {{ if not .Params.Hidden }}
  {{ if .IsSection }}
    {{safeHTML .Params.head}}
    <li data-nav-id="{{.URL}}" class="dd-item
        {{ if .IsAncestor $currentNode }}parent{{ end }}
        {{ if eq .UniqueID $currentNode.UniqueID}}active{{ end }}
        {{ if .Params.alwaysopen}}parent{{ end }}
        {{ if .Params.alwaysopen}}always-open{{ end }}
        ">
      <a href="{{ .RelPermalink}}">
        <span>{{safeHTML .Params.Pre}}{{.Title}}{{safeHTML .Params.Post}}</span>
 
        {{ if .Params.new }}
          <span class="new-badge">NEW</span>
        {{ end }}
 
        {{ $numberOfPages := (add (len .Pages) (len .Sections)) }}
        {{ if ne $numberOfPages 0 }}
 
          {{ if or (.IsAncestor $currentNode) (.Params.alwaysopen)  }}
            <i class="triangle-up"></i>
          {{ else }}
            <i class="triangle-down"></i>
          {{ end }}
 
        {{ end }}
      </a>
      {{ if ne $numberOfPages 0 }}
        <ul>
          {{ .Scratch.Set "pages" .Pages }}
          {{ if .Sections}}
          {{ .Scratch.Set "pages" (.Pages | union .Sections) }}
          {{ end }}
          {{ $pages := (.Scratch.Get "pages") }}
 
        {{ if eq .Site.Params.ordersectionsby "title" }}
          {{ range $pages.ByTitle }}
            {{ if and .Params.hidden (not $.showhidden) }}
            {{ else }}
            {{ template "section-tree-nav" dict "sect" . "currentnode" $currentNode }}
            {{ end }}
          {{ end }}
        {{ else }}
          {{ range $pages.ByWeight }}
            {{ if and .Params.hidden (not $.showhidden) }}
            {{ else }}
            {{ template "section-tree-nav" dict "sect" . "currentnode" $currentNode }}
            {{ end }}
          {{ end }}
        {{ end }}
        </ul>
      {{ end }}
    </li>
  {{ else }}
    {{ if not .Params.Hidden }}
      <li data-nav-id="{{.URL}}" class="dd-item
     {{ if eq .UniqueID $currentNode.UniqueID}}active{{ end }}
      ">
        <a href="{{.RelPermalink}}">
        <span>{{safeHTML .Params.Pre}}{{.Title}}{{safeHTML .Params.Post}}</span>
        {{ if .Params.new }}
          <span class="new-badge">NEW</span>
        {{ end }}
 
        </a></li>
     {{ end }}
  {{ end }}
 {{ end }}
{{ end }}
{{ end }}

Whoa. I may be exceptionally oblivious, but I had to squint at the snippet above for an hour before I realized this was the code for a sidebar item (the li element was the eventual giveaway, but took some parsing to discover where the logic actually started).

(Disclaimer: I am in no way a pro at Hugo and in any situation there are always several ways to code a solution; thus I am in no way claiming this was the only way to write the template nor am I chastising the author of the code. I am just displaying the differences in pieces of code I came across)

Now, here is what the TSX (I will get into the JS later in the article) for the Gatsby project using the exact same styling would look like:

 <li data-nav-id={pathToServe} className={'dd-item ' + ddClass}>
   <Link className="" to={pathToServe} title="Docs Home" activeClassName="active">
     {title || 'No title'}
     {numberOfPages ? <Triangle isAncestor={isAncestor} alwaysopen={showChildren} /> : ''}
     {showNew ? <span className="new-badge">NEW</span> : ''}
   </Link>
   {showChildren ? (
     <ul>
       {' '}
       {myChildren.map((child: mdx) => {
         return (
           <SidebarLi
             frontmatter={child.frontmatter}
             fields={child.fields}
             depth={++depth}
             key={child.frontmatter.title}
           />
         )
       })}
     </ul>
   ) : (
     ''
   )}
 </li>

This code is clean and compact because Gatsby is a static content generation tool based on React. It’s loved for a myriad of reasons, but my honest main reason to migrate to it was to make the Hugo code above much less ugly.

For our purposes, less ugly was important because we had dreams of redesigning our docs to be interactive with support for multiple coding languages and other features.

For example, the template gallery would be a place to go to for how-to recipes and examples. The templates themselves would live in a template registry service and turn into static pages via an API.

We wanted the docs to not be constrained by Go templating. The Hugo docs admit their templates aren’t the best for complicated logic:

Go Templates provide an extremely simple template language that adheres to the belief that only the most basic of logic belongs in the template or view layer.

Gatsby and React enable the more complex logic we were looking for. After our team built workers.cloudflare.com and Built with Workers on Gatsby, I figured this was my shot to really give Gatsby a try on our Workers developer docs.

Decision to Migrate over Starting from Scratch

I’m normally not a fan of fixing things that aren’t broken. Though I didn’t like working with Hugo, did love working in React, and had all the reasons to. I was timid about being the one in charge of switching from Hugo. I was scared. I hated looking at the liquid code of Go templates. I didn’t want to have to port all the existing templates to React without truly understanding what I might be missing.

There comes a point with tech debt though where you have to tackle the tech debt you are most scared of.

The easiest solution would be of course to throw the Hugo code away. Start from scratch. A clean slate. But this means taking something that was not broken and breaking it. The styling, SEO, tagging, and analytics of the site took small iterations over the course of a few years to get right and I didn’t want to be the one to break them. Instead of throwing all the styling and logic tied in for search, SEO, etc…, our plan was to maintain as much of the current design and logic as possible while converting it to React piece-by-piece, component-by-component.

Also there were existing developer docs still using Hugo on Cloudflare by other teams (e.g. Access, Argo Tunnel, etc…). I wanted a team at Cloudflare to be able to import their existing markdown files with frontmatter into the Gatsby repo and preserve the existing design.

I wanted to migrate instead of teleport to Gatsby.

How-to: Hugo to Gatsby

In this blog post, I go through some but not all of the steps of how I ported to Gatsby from Hugo for our complex doc site. The few examples here help to convey the issues that caused the most pain.

Let’s start with getting the markdown files to turn into HTML pages.

Markdown

One goal was to keep all the existing markdown and frontmatter we had set up in Hugo as similar as possible. The reasoning for this was to not break existing content and also maintain the version history of each doc.

Gatsby is built on top of GraphQL. All the data and most all content for Gatsby is put into GraphQL during startup likely via a plugin, then Gatsby will query for this data upon actual page creation. This is quite different from Hugo’s much more abstract model of putting all your content in a folder named content and then Hugo figures out which template to apply based on the logic in the template.

MDX is a sophisticated tool that parses markdown into Gatsby so it can later be represented as HTML (it actually can do much more than that but, I won’t get into it here). I started with Gatsby’s MDX plugin to create nodes from my markdown files. Here is the code to set up the plugin to get all the markdown files (files ending in .md and .mdx) I had in the src/content folder into GraphQL:

gatsby-config.js

const path = require('path')
 
module.exports = {
 plugins: [
   {
     resolve: `gatsby-source-filesystem`,
     options: {
       name: `mdx-pages`,
       path: `${__dirname}/src/content`,
       ignore: [`**/CONTRIBUTING*`, '/styles/**'],
     },
   },
   {
     resolve: `gatsby-plugin-mdx`,
     options: {
       extensions: [`.mdx`, `.md`],
     },
   }, 
]}

Now that Gatsby knows about these files as nodes, we can create pages for them. In gatsby-node.js, I tell Gatsby to grab these MDX pages and use a template markdownTemplate.tsx to create pages for them:

const path = require(`path`)
const { createFilePath } = require(`gatsby-source-filesystem`)
exports.createPages = async ({ actions, GraphQL, reporter }) => {
 const { createPage } = actions
 
 const markdownTemplate = path.resolve(`src/templates/markdownTemplate.tsx`)
 
 result = await GraphQL(`
   {
     allMdx(limit: 1000) {
       edges {
         node {
           fields {
             pathToServe
           }
           frontmatter {
             alwaysopen
             weight
           }
           fileAbsolutePath
         }
       }
     }
   }
 `)
 // Handle errors
 if (result.errors) {
   reporter.panicOnBuild(`Error while running GraphQL query.`)
   return
 }
 result.data.allMdx.edges.forEach(({ node }) => {
   return createPage({
     path: node.fields.pathToServe,
     component: markdownTemplate,
     context: {
       parent: node.fields.parent,
       weight: node.frontmatter.weight,
     }, // additional data can be passed via context, can use as variable on query
   })
 })
}
exports.onCreateNode = ({ node, getNode, actions }) => {
 const { createNodeField } = actions
 // Ensures we are processing only markdown files
 if (node.internal.type === 'Mdx') {
   // Use `createFilePath` to turn markdown files in our `content` directory into `/workers/`pathToServe
   const originalPath = node.fileAbsolutePath.replace(
     node.fileAbsolutePath.match(/.*content/)[0],
     ''
   )
   let pathToServe = createFilePath({
     node,
     getNode,
     basePath: 'content/',
   })
   let parentDir = path.dirname(pathToServe)
   if (pathToServe.includes('index')) {
     pathToServe = parentDir
     parentDir = path.dirname(parentDir) // "/" dirname will = "/"
   }
   pathToServe = pathToServe.replace(/\/+$/, '/') // always end the path with a slash
   // Creates new query'able field with name of 'pathToServe', 'parent'..
   // for allMdx edge nodes
   createNodeField({
     node,
     name: 'pathToServe',
     value: `/workers${pathToServe}`,
   })
   createNodeField({
     node,
     name: 'parent',
     value: parentDir,
   })
   createNodeField({
     node,
     name: 'filePath',
     value: originalPath,
   })
 }
}

Now every time Gatsby runs, it starts running through each node on onCreateNode. If the node is MDX, it passes the node’s content (the markdown, fileAbsolutePath, etc.) and all the node fields (filePath, parent and pathToServe) to the markdownTemplate.tsx component so that the component can render the appropriate information for that markdown file.

The barebone component for a page that renders a React component from the MDX node looks like this:

markdownTemplate.tsx

import React from "react"
import { graphql } from "gatsby"
import { MDXRenderer } from "gatsby-plugin-mdx"
 
export default function PageTemplate({ data: { mdx } }) {
 return (
   <div>
     <h1>{mdx.frontmatter.title}</h1>
     <MDXRenderer>{mdx.body}</MDXRenderer>
   </div>
 )
}
 
export const pageQuery = graphql`
 query BlogPostQuery($id: String) {
   mdx(id: { eq: $id }) {
     id
     body
     frontmatter {
       title
     }
   }
 }
`

A Complex Component: Sidebar

Now let’s get into where I wasted the most time, but learned hard lessons upfront: turning the Hugo template into a React component. At the beginning of this article, I showed that scary sidebar.

To set up the li element we had the Hugo logic looks like:

{{ define "section-tree-nav" }}
{{ $currentNode := .currentnode }}
{{ with .sect }}
 {{ if not .Params.Hidden }}
  {{ if .IsSection }}
    {{safeHTML .Params.head}}
    <li data-nav-id="{{.URL}}" class="dd-item
        {{ if .IsAncestor $currentNode }}parent{{ end }}
        {{ if eq .UniqueID $currentNode.UniqueID}}active{{ end }}
        {{ if .Params.alwaysopen}}parent{{ end }}
        {{ if .Params.alwaysopen}}always-open{{ end }}
        ">

I see that the code is defining some section-tree-nav component-like thing and taking in some currentNode. To be honest, I still don’t know exactly what the variables .sect, IsSection, Params.head, Params.Hidden mean. Although I can take a wild guess, they’re not that important for understanding what the logic is doing. The logic is setting the classes on the li element which is all I really care about: parent, always-open and active.

When focusing on those three classes, we can port them to React in a much more readable way by defining a variable string ddClass:

 let ddClass = ''
 let isAncestor = numberOfPages > 0
 if (isAncestor) {
   ddClass += ' parent'
 }
 if (frontmatter.alwaysopen) {
   ddClass += ' parent alwaysOpen'
 }
 return (
   <Location>
     {({ location }) => {
       const currentPathActive = location.pathname === pathToServe
       if (currentPathActive) {
         ddClass += ' active'
       }
       return (
         <li data-nav-id={pathToServe} className={'dd-item ' + ddClass}>

There are actually a few nice things about the Hugo code, I admit. Using the Location component in React was probably less intuitive than Hugo’s ability to access currentNode to get the active page. Also isAncestor is predefined in Hugo as Whether the current page is an ancestor of the given page. For me though, having to track down the definitions of the predefined variables was frustrating and I appreciate the local explicitness of the definition, but I admit I’m a bit jaded.

Children

The most complex part of the sidebar is getting the children. Now this is a story that really gets me starting to appreciate GraphQL.

Here’s getting the children for the sidebar in Hugo:

    {{ $numberOfPages := (add (len .Pages) (len .Sections)) }}
        {{ if ne $numberOfPages 0 }}
 
          {{ if or (.IsAncestor $currentNode) (.Params.alwaysopen)  }}
            <i class="triangle-up"></i>
          {{ else }}
            <i class="triangle-down"></i>
          {{ end }}
 
        {{ end }}
      </a>
      {{ if ne $numberOfPages 0 }}
        <ul>
          {{ .Scratch.Set "pages" .Pages }}
          {{ if .Sections}}
          {{ .Scratch.Set "pages" (.Pages | union .Sections) }}
          {{ end }}
          {{ $pages := (.Scratch.Get "pages") }}
 
        {{ if eq .Site.Params.ordersectionsby "title" }}
          {{ range $pages.ByTitle }}
            {{ if and .Params.hidden (not $.showhidden) }}
            {{ else }}
            {{ template "section-tree-nav" dict "sect" . "currentnode" $currentNode }}
            {{ end }}
          {{ end }}
        {{ else }}
          {{ range $pages.ByWeight }}
            {{ if and .Params.hidden (not $.showhidden) }}
            {{ else }}
            {{ template "section-tree-nav" dict "sect" . "currentnode" $currentNode }}
            {{ end }}
          {{ end }}
        {{ end }}
        </ul>
      {{ end }}
    </li>
  {{ else }}
    {{ if not .Params.Hidden }}
      <li data-nav-id="{{.URL}}" class="dd-item
     {{ if eq .UniqueID $currentNode.UniqueID}}active{{ end }}
      ">
        <a href="{{.RelPermalink}}">
        <span>{{safeHTML .Params.Pre}}{{.Title}}{{safeHTML .Params.Post}}</span>
        {{ if .Params.new }}
          <span class="new-badge">NEW</span>
        {{ end }}
 
        </a></li>
     {{ end }}
  {{ end }}
 {{ end }}
{{ end }}
{{ end }}

This is just the first layer of children. No grandbabies, sorry. And I won’t even get into all that is going on there exactly. When I started porting this over, I realized a lot of that logic was not even being used.

In React, we grab all the markdown pages and see which have parents that match the current page:

 const topLevelMarkdown: markdownRemarkEdge[] = useStaticQuery(
   GraphQL`
     {
       allMdx(limit: 1000) {
         edges {
           node {
             frontmatter {
               title
               alwaysopen
               hidden
               showNew
               weight
             }
             fileAbsolutePath
             fields {
               pathToServe
               parent
               filePath
             }
           }
         }
       }
     }
   `
 ).allMdx.edges
 const myChildren: mdx[] = topLevelMarkdown
   .filter(
     edge =>
       fields.pathToServe === '/workers' + edge.node.fields.parent &&
       fields.pathToServe !== edge.node.fields.pathToServe
   )
   .map(child => child.node)
   .filter(child => !child.frontmatter.hidden)
   .sort(sortByWeight)
 const numberOfPages = myChildren.length

And then we render the children, so the full JSX becomes:

<li data-nav-id={pathToServe} className={'dd-item ' + ddClass}>
   <Link
     to={pathToServe}
     title="Docs Home"
     activeClassName="active"
   >
     {title || 'No title'}
     {numberOfPages ? (
       <Triangle isAncestor={isAncestor} alwaysopen={showChildren} />
     ) : (
       ''
     )}
     {showNew ? <span className="new-badge">NEW</span> : ''}
   </Link>
   {showChildren ? (
     <ul>
       {' '}
       {myChildren.map((child: mdx) => {
         return (
           <SidebarLi
             frontmatter={child.frontmatter}
             fields={child.fields}
             depth={++depth}
             key={child.frontmatter.title}
           />
         )
       })}
     </ul>
   ) : (
     ''
   )}
 </li>

Ok now that we have a component, and we have Gatsby creating the pages off the markdown, I can go back to my PageTemplate component and render the sidebar:

import Sidebar from './Sidebar'
export default function PageTemplate({ data: { mdx } }) {
 return (
   <div>
     <Sidebar />
     <h1>{mdx.frontmatter.title}</h1>
     <MDXRenderer>{mdx.body}</MDXRenderer>
   </div>
 )
}

I don’t have to pass any props to Sidebar because the GraphQL static query in Sidebar.tsx gets all the data about all the pages that I need. I don’t even maintain state because Location is used to determine which path is active. Gatsby generates pages using the above component for each page that’s a markdown MDX node.

Wrapping up

This was just the beginning of the full migration to Gatsby. I repeated the process above for turning templates, partials, and other HTML component-like parts in Hugo into React, which was actually pretty fun, though turning vanilla JS that once manipulated the DOM into React would probably be a nightmare if I wasn’t somewhat comfortable working in React.

Main lessons learned:

  • Being careful about breaking things and being scared to break things are two very different things. Being careful is good; being scared is bad. If I were to complete this migration again, I would’ve used the Hugo templates as a reference but not as a source of truth. Staging environments are what testing is for. Don’t sacrifice writing things the right way to comply with the old way.
  • When doing a migration like this on a static site, get just a few pages working before moving the content over to avoid intermediate PRs from breaking. It seems obvious but, with the large amounts of content we had, a lot of things broke when porting over content. Get everything polished with each type of page before moving all your content over.
  • When doing a migration like this, it’s OK to compromise some features of the old design until you determine whether to add them back in, just make sure to test this with real users first. For example, I made the mistake of assuming others wouldn’t mind being without anchor tags. (Note Hugo templates create anchor tags for headers automatically as in Gatsby you have to use MDX to customize markdown components). Test this on a single, popular page with real users first to see if it matters before giving it up.
  • Even for those with React background, the ramp up with GraphQL and setting up Gatsby isn’t as simple as it seems at first. But once you’re set up it’s pretty dang nice.

Overall the process of moving to Gatsby was well worth the effort. As we implement a redesign in React it’s much easier to apply the designs in this cleaner code base. Also though Hugo was already very performant with a nice SEO score, in Gatsby we are able to increase the performance and SEO thanks to the framework’s flexibility.

Lastly, working with the Gatsby team was awesome and they even give free T-shirts for your first PR!

An Update on CDNJS

Post Syndicated from Zack Bloom original https://blog.cloudflare.com/an-update-on-cdnjs/

An Update on CDNJS

When you loaded this blog, a file was delivered to your browser called jquery-3.2.1.min.js. jQuery is a library which makes it easier to build websites, and was at one point included on as many as 74.1% of all websites. A full eighteen million sites include jQuery and other libraries using one of the most popular tools on Earth: CDNJS. Beginning about a month ago Cloudflare began to take a more active role in the operation of CDNJS. This post is here to tell you more about CDNJS’ history and explain why we are helping to manage CDNJS.

What CDNJS Does

Virtually every site is composed of not just the code written by its developers, but also dozens or hundreds of libraries. These libraries make it possible for websites to extend what a web browser can do on its own. For example, libraries can allow a site to include powerful data visualizations, respond to user input, or even get more performant.

These libraries created wondrous and magical new capabilities for web browsers, but they can also cause the size of a site to explode. Particularly a decade ago, connections were not always fast enough to permit the use of many libraries while maintaining performance. But if so many websites are all including the same libraries, why was it necessary for each of them to load their own copy?

If we all load jQuery from the same place the browser can do a much better job of not actually needing to download it for every site. When the user visits the first jQuery-powered site it will have to be downloaded, but it will already be cached on the user’s computer for any subsequent jQuery-powered site they might visit.

An Update on CDNJS

The first visit might take time to load:

An Update on CDNJS

But any future visit to any website pointing to this common URL would already be cached:

An Update on CDNJS

<!-- Loaded only on my site, will need to be downloaded by every user -->
<script src="./jquery.js"></script>

<!-- Loaded from a common location across many sites -->
<script src="https://cdnjs.cloudflare.com/jquery.js"></script>

Beyond the performance advantage, including files this way also made it very easy for users to experiment and create. When using a web browser as a creation tool users often didn’t have elaborate build systems (this was also before npm), so being able to include a simple script tag was a boon. It’s worth noting that it’s not clear a massive performance advantage was ever actually provided by this scheme. It is becoming even less of a performance advantage now that browser vendors are beginning to use separate cache’s for each website you visit, but with millions of sites using CDNJS there’s no doubt it is a critical part of the web.

A CDN for all of us

My first Pull Request into the CDNJS project was in 2013. Back then if you created a JavaScript project it wasn’t possible to have it included in the jQuery CDN, or the ones provided by large companies like Google and Microsoft. They were only for big, important, projects. Of course, however, even the biggest project starts small. The community needed a CDN which would agree to host nearly all JavaScript projects, even the ones which weren’t world-changing (yet). In 2011, that project was launched by Ryan Kirkman and Thomas Davis as CDNJS.

The project was quickly wildly successful, far beyond their expectations. Their CDN bill quickly began to skyrocket (it would now be over a million dollars a year on AWS). Under the threat of having to shut down the service, Cloudflare was approached by the CDNJS team to see if we could help. We agreed to support their efforts and created cdnjs.cloudflare.com which serves the CDNJS project free of charge.

CDNJS has been astonishingly successful. The project is currently installed on over eighteen million websites (10% of the Internet!), offers files totaling over 1.5 billion lines of code, and serves over 173 billion requests a month. CDNJS only gets more popular as sites get larger, with 34% of the top 10k websites using the service. Each month we serve almost three petabytes of JavaScript, CSS, and other resources which power the web via cdnjs.cloudflare.com.

An Update on CDNJS
Spikes can happen when a very large or popular site installs CDNJS, or when a misbehaving web crawler discovers a CDNJS link.

The future value of CDNJS is now in doubt, as web browsers are beginning to use a separate cache for every website you visit. It is currently used on such a wide swath of the web, however, it is unlikely it will be disappearing any time soon.

How CDNJS Works

CDNJS starts with a Github repo. That project contains every file served by CDNJS, at every version which it has ever offered. That’s 182 GB without the commit history, over five million files, and over 1.5 billion lines of code.

Given that it stores and delivers versioned code files, in many ways it was the Internet’s first JavaScript package manager. Unlike other package managers and even other CDNs everything CDNJS serves is publicly versioned. All 67,724 commits! This means you as a user can verify that you are being served files which haven’t been tampered with.

To make changes to CDNJS a commit has to be made. For new projects being added to CDNJS, or when projects change significantly, these commits are made by humans, and get reviewed by other humans. When projects just release new versions there is a bot made by Peter and maintained by Sven which sucks up changes from npm and automatically creates commits.

Within Cloudflare’s infrastructure there is a set of machines which are responsible for pulling the latest version of the repo periodically. Those machines then become the origin for cdnjs.cloudflare.com, with Cloudflare’s Global Load Balancer automatically handling failures. Cloudflare’s cache automatically stores copies of many of the projects making it possible for us to deliver them quickly from all 195 of our data centers.

An Update on CDNJS

The Internet on a Shoestring Budget

The CDNJS project has always been administered independently of Cloudflare. In addition to the founders, the project has additionally been maintained by exceptionally hard-working caretakers like Peter and Matt Cowley. Maintaining a single repo of nearly every frontend project on Earth is no small task, and it has required a substantial amount of both manual work and bot development.

Unfortunately approximately thirty days ago one of those bots stopped working, preventing updated projects from appearing in CDNJS. The bot’s open-source maintainer was not able to invest the time necessary to keep the bot running. After several weeks we were asked by the community and the CDNJS founders to take over maintenance of the CDNJS repo itself. This means the Cloudflare engineering team is taking responsibility for keeping the contents of github.com/cdnjs/cdnjs up to date, in addition to ensuring it is correctly served on cdnjs.cloudflare.com.

We agreed to do this because we were, frankly, scared. Like so many open-source projects CDNJS was a critical part of our world, but wasn’t getting the attention it needed to survive. The Internet relies on CDNJS as much as on any other single project, losing it or allowing it to be commandeered would be catastrophic to millions of websites and their visitors. If it began to fail, some sites would adapt and update, others would be broken forever.

CDNJS has always been, and remains, a project for and by the community. We are invested in making all decisions in a transparent and inclusive manner. If you are interested in contributing to CDNJS or in the topics we’re currently discussing please visit the CDNJS Github Issues page.

An Update on CDNJS

A Plan for the Future

One example of an area where we could use your help is in charting a path towards a CDNJS which requires less manual moderation. Nothing can replace the intelligence and creativity of a human (yet), but for a task like managing what resources go into a CDN, it is error prone and time consuming. At present a human has to review every new project to be included, and often has to take additional steps to include new versions of a project.

As a part of our analysis of the project we examined a snapshot of the still-open PRs made against CDNJS for several months:

An Update on CDNJS

The vast majority of these PRs were changes which ultimately passed the automated review but nevertheless couldn’t be merged without manual review.

There is consensus that we should move to a model which does not require human involvement in most cases. We would love your input and collaboration on the best way for that to be solved. If this is something you are passionate about, please contribute here.

Our plan is to support the CDNJS project in whichever ways it requires for as long as the Internet relies upon it. We invite you to use CDNJS in your next project with the full assurance that it is backed by the same network and team who protect and accelerate over twenty million of your favorite websites across the Internet. We are also planning more posts diving further into the CDNJS data, subscribe to this blog if you would like to be notified upon their release.

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Post Syndicated from Filipp Nisenzoun original https://blog.cloudflare.com/introducing-the-graphql-analytics-api-exactly-the-data-you-need-all-in-one-place/

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Today we’re excited to announce a powerful and flexible new way to explore your Cloudflare metrics and logs, with an API conforming to the industry-standard GraphQL specification. With our new GraphQL Analytics API, all of your performance, security, and reliability data is available from one endpoint, and you can select exactly what you need, whether it’s one metric for one domain or multiple metrics aggregated for all of your domains. You can ask questions like “How many cached bytes have been returned for these three domains?” Or, “How many requests have all the domains under my account received?” Or even, “What effect did changing my firewall rule an hour ago have on the responses my users were seeing?”

The GraphQL standard also has strong community resources, from extensive documentation to front-end clients, making it easy to start creating simple queries and progress to building your own sophisticated analytics dashboards.

From many APIs…

Providing insights has always been a core part of Cloudflare’s offering. After all, by using Cloudflare, you’re relying on us for key parts of your infrastructure, and so we need to make sure you have the data to manage, monitor, and troubleshoot your website, app, or service. Over time, we developed a few key data APIs, including ones providing information regarding your domain’s traffic, DNS queries, and firewall events. This multi-API approach was acceptable while we had only a few products, but we started to run into some challenges as we added more products and analytics. We couldn’t expect users to adopt a new analytics API every time they started using a new product. In fact, some of the customers and partners that were relying on many of our products were already becoming confused by the various APIs.

Following the multi-API approach was also affecting how quickly we could develop new analytics within the Cloudflare dashboard, which is used by more people for data exploration than our APIs. Each time we built a new product, our product engineering teams had to implement a corresponding analytics API, which our user interface engineering team then had to learn to use. This process could take up to several months for each new set of analytics dashboards.

…to one

Our new GraphQL Analytics API solves these problems by providing access to all Cloudflare analytics. It offers a standard, flexible syntax for describing exactly the data you need and provides predictable, matching responses. This approach makes it an ideal tool for:

  1. Data exploration. You can think of it as a way to query your own virtual data warehouse, full of metrics and logs regarding the performance, security, and reliability of your Internet property.
  2. Building amazing dashboards, which allow for flexible filtering, sorting, and drilling down or rolling up. Creating these kinds of dashboards would normally require paying thousands of dollars for a specialized analytics tool. You get them as part of our product and can customize them for yourself using the API.

In a companion post that was also published today, my colleague Nick discusses using the GraphQL Analytics API to build dashboards. So, in this post, I’ll focus on examples of how you can use the API to explore your data. To make the queries, I’ll be using GraphiQL, a popular open-source querying tool that takes advantage of GraphQL’s capabilities.

Introspection: what data is available?

The first thing you may be wondering: if the GraphQL Analytics API offers access to so much data, how do I figure out what exactly is available, and how I can ask for it? GraphQL makes this easy by offering “introspection,” meaning you can query the API itself to see the available data sets, the fields and their types, and the operations you can perform. GraphiQL uses this functionality to provide a “Documentation Explorer,” query auto-completion, and syntax validation. For example, here is how I can see all the data sets available for a zone (domain):

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

If I’m writing a query, and I’m interested in data on firewall events, auto-complete will help me quickly find relevant data sets and fields:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Querying: examples of questions you can ask

Let’s say you’ve made a major product announcement and expect a surge in requests to your blog, your application, and several other zones (domains) under your account. You can check if this surge materializes by asking for the requests aggregated under your account, in the 30 minutes after your announcement post, broken down by the minute:

{
 viewer { 
   accounts (filter: {accountTag: $accountTag}) {
     httpRequests1mGroups(limit: 30, filter: {datetime_geq: "2019-09-16T20:00:00Z", datetime_lt: "2019-09-16T20:30:00Z"}, orderBy: [datetimeMinute_ASC]) {
	  dimensions {
		datetimeMinute
	  }
	  sum {
		requests
	  }
	}
   }
 }
}

Here is the first part of the response, showing requests for your account, by the minute:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Now, let’s say you want to compare the traffic coming to your blog versus your marketing site over the last hour. You can do this in one query, asking for the number of requests to each zone:

{
 viewer {
   zones(filter: {zoneTag_in: [$zoneTag1, $zoneTag2]}) {
     httpRequests1hGroups(limit: 2, filter: {datetime_geq: "2019-09-16T20:00:00Z",
datetime_lt: "2019-09-16T21:00:00Z"}) {
       sum {
         requests
       }
     }
   }
 }
}

Here is the response:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Finally, let’s say you’re seeing an increase in error responses. Could this be correlated to an attack? You can look at error codes and firewall events over the last 15 minutes, for example:

{
 viewer {
   zones(filter: {zoneTag: $zoneTag}) {
     httpRequests1mGroups (limit: 100,
filter: {datetime_geq: "2019-09-16T21:00:00Z",
datetime_lt: "2019-09-16T21:15:00Z"}) {
       sum {
         responseStatusMap {
           edgeResponseStatus
           requests
         }
       }
     }
    firewallEventsAdaptiveGroups (limit: 100,
filter: {datetime_geq: "2019-09-16T21:00:00Z",
datetime_lt: "2019-09-16T21:15:00Z"}) {
       dimensions {
         action
       }
       count
     }
    }
  }
}

Notice that, in this query, we’re looking at multiple datasets at once, using a common zone identifier to “join” them. Here are the results:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

By examining both data sets in parallel, we can see a correlation: 31 requests were “dropped” or blocked by the Firewall, which is exactly the same as the number of “403” responses. So, the 403 responses were a result of Firewall actions.

Try it today

To learn more about the GraphQL Analytics API and start exploring your Cloudflare data, follow the “Getting started” guide in our developer documentation, which also has details regarding the current data sets and time periods available. We’ll be adding more data sets over time, so take advantage of the introspection feature to see the latest available.

Finally, to make way for the new API, the Zone Analytics API is now deprecated and will be sunset on May 31, 2020. The data that Zone Analytics provides is available from the GraphQL Analytics API. If you’re currently using the API directly, please follow our migration guide to change your API calls. If you get your analytics using the Cloudflare dashboard or our Datadog integration, you don’t need to take any action.

One more thing….

In the API examples above, if you find it helpful to get analytics aggregated for all the domains under your account, we have something else you may like: a brand new Analytics dashboard (in beta) that provides this same information. If your account has many zones, the dashboard is helpful for knowing summary information on metrics such as requests, bandwidth, cache rate, and error rate. Give it a try and let us know what you think using the feedback link above the new dashboard.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner

Post Syndicated from Nadin El-Yabroudi original https://blog.cloudflare.com/introducing-flan-scan/

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner

Today, we’re excited to open source Flan Scan, Cloudflare’s in-house lightweight network vulnerability scanner. Flan Scan is a thin wrapper around Nmap that converts this popular open source tool into a vulnerability scanner with the added benefit of easy deployment.

We created Flan Scan after two unsuccessful attempts at using “industry standard” scanners for our compliance scans. A little over a year ago, we were paying a big vendor for their scanner until we realized it was one of our highest security costs and many of its features were not relevant to our setup. It became clear we were not getting our money’s worth. Soon after, we switched to an open source scanner and took on the task of managing its complicated setup. That made it difficult to deploy to our entire fleet of more than 190 data centers.

We had a deadline at the end of Q3 to complete an internal scan for our compliance requirements but no tool that met our needs. Given our history with existing scanners, we decided to set off on our own and build a scanner that worked for our setup. To design Flan Scan, we worked closely with our auditors to understand the requirements of such a tool. We needed a scanner that could accurately detect the services on our network and then lookup those services in a database of CVEs to find vulnerabilities relevant to our services. Additionally, unlike other scanners we had tried, our tool had to be easy to deploy across our entire network.

We chose Nmap as our base scanner because, unlike other network scanners which sacrifice accuracy for speed, it prioritizes detecting services thereby reducing false positives. We also liked Nmap because of the Nmap Scripting Engine (NSE), which allows scripts to be run against the scan results. We found that the “vulners” script, available on NSE, mapped the detected services to relevant CVEs from a database, which is exactly what we needed.

The next step was to make the scanner easy to deploy while ensuring it outputted actionable and valuable results. We added three features to Flan Scan which helped package up Nmap into a user-friendly scanner that can be deployed across a large network.

  • Easy Deployment and ConfigurationTo create a lightweight scanner with easy configuration, we chose to run Flan Scan inside a Docker container. As a result, Flan Scan can be built and pushed to a Docker registry and maintains the flexibility to be configured at runtime. Flan Scan also includes sample Kubernetes configuration and deployment files with a few placeholders so you can get up and scanning quickly.
  • Pushing results to the Cloud Flan Scan adds support for pushing results to a Google Cloud Storage Bucket or an S3 bucket. All you need to do is set a few environment variables and Flan Scan will do the rest. This makes it possible to run many scans across a large network and collect the results in one central location for processing.
  • Actionable Reports – Flan Scan generates actionable reports from Nmap’s output so you can quickly identify vulnerable services on your network, the applicable CVEs, and the IP addresses and ports where these services were found. The reports are useful for engineers following up on the results of the scan as well as auditors looking for evidence of compliance scans.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner
Sample run of Flan Scan from start to finish. 

How has Scan Flan improved Cloudflare’s network security?

By the end of Q3, not only had we completed our compliance scans, we also used Flan Scan to tangibly improve the security of our network. At Cloudflare, we pin the software version of some services in production because it allows us to prioritize upgrades by weighing the operational cost of upgrading against the improvements of the latest version. Flan Scan’s results revealed that our FreeIPA nodes, used to manage Linux users and hosts, were running an outdated version of Apache with several medium severity vulnerabilities. As a result, we prioritized their update. Flan Scan also found a vulnerable instance of PostgreSQL leftover from a performance dashboard that no longer exists.

Flan Scan is part of a larger effort to expand our vulnerability management program. We recently deployed osquery to our entire network to perform host-based vulnerability tracking. By complementing osquery’s findings with Flan Scan’s network scans we are working towards comprehensive visibility of the services running at our edge and their vulnerabilities. With two vulnerability trackers in place, we decided to build a tool to manage the increasing number of vulnerability  sources. Our tool sends alerts on new vulnerabilities, filters out false positives, and tracks remediated vulnerabilities. Flan Scan’s valuable security insights were a major impetus for creating this vulnerability tracking tool.

How does Flan Scan work?

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner

The first step of Flan Scan is running an Nmap scan with service detection. Flan Scan’s default Nmap scan runs the following scans:

  1. ICMP ping scan – Nmap determines which of the IP addresses given are online.
  2. SYN scan – Nmap scans the 1000 most common ports of the IP addresses which responded to the ICMP ping. Nmap marks ports as open, closed, or filtered.
  3. Service detection scan – To detect which services are running on open ports Nmap performs TCP handshake and banner grabbing scans.

Other types of scanning such as UDP scanning and IPv6 addresses are also possible with Nmap. Flan Scan allows users to run these and any other extended features of Nmap by passing in Nmap flags at runtime.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner
Sample Nmap output

Flan Scan adds the “vulners” script tag in its default Nmap command to include in the output a list of vulnerabilities applicable to the services detected. The vulners script works by making API calls to a service run by vulners.com which returns any known vulnerabilities for the given service.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner
Sample Nmap output with Vulners script

The next step of Flan Scan uses a Python script to convert the structured XML of Nmap’s output to an actionable report. The reports of the previous scanner we used listed each of the IP addresses scanned and present the vulnerabilities applicable to that location. Since we had multiple IP addresses running the same service, the report would repeat the same list of vulnerabilities under each of these IP addresses. This meant scrolling back and forth on documents hundreds of pages long to obtain a list of all IP addresses with the same vulnerabilities.  The results were impossible to digest.

Flan Scans results are structured around services. The report enumerates all vulnerable services with a list beneath each one of relevant vulnerabilities and all IP addresses running this service. This structure makes the report shorter and actionable since the services that need to be remediated can be clearly identified. Flan Scan reports are made using LaTeX because who doesn’t like nicely formatted reports that can be generated with a script? The raw LaTeX file that Flan Scan outputs can be converted to a beautiful PDF by using tools like pdf2latex or TeXShop.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner
Sample Flan Scan report

What’s next?

Cloudflare’s mission is to help build a better Internet for everyone, not just Internet giants who can afford to buy expensive tools. We’re open sourcing Flan Scan because we believe it shouldn’t cost tons of money to have strong network security.

You can get started running a vulnerability scan on your network in a few minutes by following the instructions on the README. We welcome contributions and suggestions from the community.

Improve Your App Testing With Amplify Console’s Pull Requests Previews and Cypress Testing

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/improve-your-app-testing-with-amplify-consoles-pull-requests-previews-and-cypress-testing/

Amplify Console allows developers to easly configure a Git-based workflow for continuous deployment and hosting of fullstack serverless web apps. Fullstack serverless apps comprise of backend resources such as GraphQL APIs, Data and File Storage, Authentication, or Analytics, integrated with a frontend framework such as React, Gatsby, or Angular. You can read more about the Amplify Console in a previous article I wrote.

Today, we are announcing the ability to create preview URLs and to run end-to-end tests on pull requests before releasing code to production.

Pull Request previews
You can now configure Amplify Console to deploy your application to a unique URL every time a developer submits a pull request to your Git repository. The preview URL is completely different from the one used by the production site. You can see how changes look before merging the pull request into the main branch of your code repository, triggering a new release in the Amplify Console. For fullstack apps with backend environments provisioned via the Amplify CLI, every pull request spins up an ephemeral backend that is deleted when the pull request is closed. You can test changes in complete isolation from the production environment. Amplify Console creates backend infrastructures for pull requests on private git repositories only. This allows to avoid incurring extra costs in case of unsolicited pull requests.

To learn how it works, let’s start a web application with a cloud-based authentication backend, and deploy it on Amplify Console. I first create a React application (check here to learn how to install React).

npx create-react-app amplify-console-demo                                                
cd amplify-console-demo

I initialize the Amplify environment (learn how to install the Amplify CLI first). I add a cloud based authentication backend powered by Amazon Cognito. I accept all the defaults answers proposed by Amplify CLI.

npm install aws-amplify aws-amplify-react
amplify init
amplify add auth
amplify push

I then modify src/App.js to add the front end authentication user interface. The code is available in the AWS Amplify documentation. Once ready, I start the local development server to test the application locally.

npm run start

I point my browser to http://localhost:8080 to verify the scafolding (the below screenshot is taken from my AWS Cloud 9 development environment). I click Create account to create a user, verify the SignUp flow, and authenticate to the app.

After signing up, I see the application page.

There are two important details to note. First, I am using a private GitHub repository. Amplify Console only creates backend infrastructure on pull requests for private repositories, to avoid creating unnecessary infrastructure for unsollicited pull requests. Second, the Amplify Console build process looks for dependencies in package-lock.json only. This is why I added the amplify packages with npm and not with yarn.

When I am happy with my app, I push the code to a GitHub repo (let’s assume I already did git remote add origin ...).

git add amplify
git commit -am "initial commit"
git push origin master

The next step consists of configuring Amplify Console to build and deploy my app on every git commit. I login to the Amplify Console, click Connect App, choose GitHub as repository and click Continue (the first time I do this, I need to authenticate on GitHub, using my GitHub username and password)

I select my repository and the branch I want to use as source:

Amplify Console detects the type of project and proposes a build file. I select the name of my environment (dev). The first time I use Amplify Console, I follow the instructions to create a new service role. This role authorises Amplify Console to access AWS backend services on my behalf.

I click Next. I review the settings and click Save and Deploy. After a few seconds or minutes, my application is ready. I can point my browser to the deployment URL and verify the app is working correctly.

Now, let’s enable previews for pull requests. Click Preview on the left menu and Enable Previews. To enable the previews, Amplify Console requires an app to be installed in my GitHub account. I follow the instructions provided by the console to configure my GitHub account. Once set up, I select a branch, click Manage to enable / disable the pull request previews. (At anytime, I can uninstall the Amplify app from my GitHub account by visiting the Applications section of my GitHub account’s settings.)

Now that the mechanism is in place, let’s create a pull request.

I edit App.js directly on GitHub. I customize the withAuthenticator component to change the color of the Sign In button from orange to green. I save the changes and I create a pull request.

On the Pull Request detail page, I click Show all checks to get the status of the Amplify Console test. I see AWS Amplify Console Web Preview in progress. Amplify Console creates a full backend environment to test the pull request, to build and to deploy the frontend.

Eventually, I see All checks have passed and a green mark. I click Details to get the preview url. In case of an error, you can see the detailled log file of the build phase in the Amplify Console.

I can also check the status of the preview in the Amplify Console.

I point my browser to the preview URL to test my change. I can see the green Sign In button instead of the orange one.

When I try to authenticate using the username and password I created previously, I receive an User does not exist error message because this preview URL points to a different backend than the main application. I can see two Cognito user pools in the Cognito console, one for each environment.

I can control who can access the preview URL using similar access control settings that I use for the main URL.

When I am happy with the proposed changes, I merge the pull request on GitHub to trigger a new build and to deploy the change to the production environment. Amplify Console deletes the preview environment upon merging. The ephemeral backend environment created for the pull request also gets deleted.

Cypress testing
In addition to previewing changes before merging them to the main branch, we also added the capability to run end to end tests during your build process. You can use your favorite test framework to add unit or end-to-end tests to your application and automatically run the tests during the build phase. When you use Cypress test framework, Amplify Console detects the tests in your source tree and automatically adds the testing phase in your application build process.

Only projects that are passing all tests are pushed down your pipeline to the deployment phase. You can learn more about this and follow step by step instructions we posted a few weeks ago.

These two additions to Amplify Console allow you to gain higher confidence in the robustness of your pipeline and the quality of the code delivered to your production environment.

Availability
Web previews are available in all Regions where AWS Amplify Console is available today, at no additional cost on top of the regular Amplify Console pricing. With the AWS Free Usage Tier, you can get started for free. Upon sign up, new AWS customers receive 1,000 build minutes per month for the build and deploy feature, and 15 GB served per month and 5 GB data storage per month for the hosting.

— seb