Tag Archives: Developers

Explaining Cloudflare’s ABR Analytics

Post Syndicated from Jamie Herre original https://blog.cloudflare.com/explaining-cloudflares-abr-analytics/

Explaining Cloudflare's ABR Analytics

Cloudflare’s analytics products help customers answer questions about their traffic by analyzing the mind-boggling, ever-increasing number of events (HTTP requests, Workers requests, Spectrum events) logged by Cloudflare products every day.  The answers to these questions depend on the point of view of the question being asked, and we’ve come up with a way to exploit this fact to improve the quality and responsiveness of our analytics.

Useful Accuracy

Consider the following questions and answers:

What is the length of the coastline of Great Britain? 12.4K km
What is the total world population? 7.8B
How many stars are in the Milky Way? 250B
What is the total volume of the Antarctic ice shelf? 25.4M km3
What is the worldwide production of lentils? 6.3M tonnes
How many HTTP requests hit my site in the last week? 22.6M

Useful answers do not benefit from being overly exact.  For large quantities, knowing the correct order of magnitude and a few significant digits gives the most useful answer.  At Cloudflare, the difference in traffic between different sites or when a single site is under attack can cross nine orders of magnitude and, in general, all our traffic follows a Pareto distribution, meaning that what’s appropriate for one site or one moment in time might not work for another.

Explaining Cloudflare's ABR Analytics

Because of this distribution, a query that scans a few hundred records for one customer will need to scan billions for another.  A report that needs to load a handful of rows under normal operation might need to load millions when a site is under attack.

To get a sense of the relative difference of each of these numbers, remember “Powers of Ten”, an amazing visualization that Ray and Charles Eames produced in 1977.  Notice that the scale of an image determines what resolution is practical for recording and displaying it.

Explaining Cloudflare's ABR Analytics

Using ABR to determine resolution

This basic fact informed our design and implementation of ABR for Cloudflare analytics.  ABR stands for “Adaptive Bit Rate”.  It’s essentially an eponym for the term as used in video streaming such as Cloudflare’s own Stream Delivery.  In those cases, the server will select the best resolution for a video stream to match your client and network connection.

In our case, every analytics query that supports ABR will be calculated at a resolution matching the query.  For example, if you’re interested to know from which country the most firewall events were generated in the past week, the system might opt to use a lower resolution version of the firewall data than if you had opted to look at the last hour. The lower resolution version will provide the same answer but take less time and fewer resources.  By using multiple, different resolutions of the same data, our analytics can provide consistent response times and a better user experience.

You might be aware that we use a columnar store called ClickHouse to store and process our analytics data.  When using ABR with ClickHouse, we write the same data at multiple resolutions into separate tables.  Usually, we cover seven orders of magnitude – from 100% to 0.0001% of the original events.  We wind up using an additional 12% of disk storage but enable very fast ad hoc queries on the reduced resolution tables.

Explaining Cloudflare's ABR Analytics

Aggregations and Rollups

The ABR technique facilitates aggregations by making compact estimates of every dimension.  Another way to achieve the same ends is with a system that computes “rollups”.  Rollups save space by computing either complete or partial aggregations of the data as it arrives.  

For example, suppose we wanted to count a total number of lentils. (Lentils are legumes and among the oldest and most widely cultivated crops.  They are a staple food in many parts of the world.)  We could just count each lentil as it passed through the processing system. Of course because there a lot of lentils, that system is distributed – meaning that there are hundreds of separate machines.  Therefore we’ll actually have hundreds of separate counters.

Also, we’ll want to include more information than just the count, so we’ll also include the weight of each lentil and maybe 10 or 20 other attributes. And of course, we don’t want just a total for each attribute, but we’ll want to be able to break it down by color, origin, distributor and many other things, and also we’ll want to break these down by slices of time.

In the end, we’ll have tens of thousands or possibly millions of aggregations to be tabulated and saved every minute.  These aggregations are expensive to compute, especially when using aggregations more complicated than simple counters and sums.  They also destroy some information.  For example, once we’ve processed all the lentils through the rollups, we can’t say for sure that we’ve counted them all, and most importantly, whichever attributes we neglected to aggregate are unavailable.

The number we’re counting, 6.3M tonnes, only includes two significant digits which can easily be achieved by counting a sample.  Most of the rollup computations used on each lentil (on the order 1013 to account for 6.3M tonnes) are wasted.

Other forms of aggregations

So far, we’ve discussed ABR and its application to aggregations, but we’ve only given examples involving “counts” and “sums”.  There are other, more complex forms of aggregations we use quite heavily.  Two examples are “topK” and “count-distinct”.

A “topK” aggregation attempts to show the K most frequent items in a set.  For example, the most frequent IP address, or country.  To compute topK, just count the frequency of each item in the set and return the K items with the highest frequencies. Under ABR, we compute topK based on the set found in the matching resolution sample. Using a sample makes this computation a lot faster and less complex, but there are problems.

The estimate of topK derived from a sample is biased and dependent on the distribution of the underlying data. This can result in overestimating the significance of elements in the set as compared to their frequency in the full set. In practice this effect can only be noticed when the cardinality of the set is very high and you’re not going to notice this effect on a Cloudflare dashboard.  If your site has a lot of traffic and you’re looking at the Top K URLs or browser types, there will be no difference visible at different resolutions.  Also keep in mind that as long as we’re estimating the “proportion” of the element in the set and the set is large, the results will be quite accurate.

The other fascinating aggregation we support is known as “count-distinct”, or number of uniques.  In this case we want to know the number of unique values in a set.  For example, how many unique cache keys have been used.  We can safely say that a uniform random sample of the set cannot be used to estimate this number.  However, we do have a solution.

We can generate another, alternate sample based on the value in question.  For example, instead of taking a random sample of all requests, we take a random sample of IP addresses.  This is sometimes called distinct reservoir sampling, and it allows us to estimate the true number of distinct IPs based on the cardinality of the sampled set. Again, there are techniques available to improve these estimates, and we’ll be implementing some of those.

ABR improves resilience and scalability

Using ABR saves us resources.  Even better, it allows us to query all the attributes in the original data, not just those included in rollups.  And even better, it allows us to check our assumptions against different sample intervals in separate tables as a check that the system is working correctly, because the original events are preserved.

However, the greatest benefits of employing ABR are the ones that aren’t directly visible. Even under ideal conditions, a large distributed system such as Cloudflare’s data pipeline is subject to high tail latency.  This occurs when any single part of the system takes longer than usual for any number of a long list of reasons.  In these cases, the ABR system will adapt to provide the best results available at that moment in time.

For example, compare this chart showing Cache Performance for a site under attack with the same chart generated a moment later while we simulate a failure of some of the servers in our cluster.  In the days before ABR, your Cloudflare dashboard would fail to load in this scenario.  Now, with ABR analytics, you won’t see significant degradation.

Explaining Cloudflare's ABR Analytics
Explaining Cloudflare's ABR Analytics

Stretching the analogy to ABR in video streaming, we want you to be able to enjoy your analytics dashboard without being bothered by issues related to faulty servers, or network latency, or long running queries.  With ABR you can get appropriate answers to your questions reliably and within a predictable amount of time.

In the coming months, we’re going to be releasing a variety of new dashboards and analytics products based on this simple but profound technology.  Watch your Cloudflare dashboard for increasingly useful and interactive analytics.

Rendering React on the Edge with Flareact and Cloudflare Workers

Post Syndicated from Guest Author original https://blog.cloudflare.com/rendering-react-on-the-edge-with-flareact-and-cloudflare-workers/

Rendering React on the Edge with Flareact and Cloudflare Workers

The following is a guest post from Josh Larson, Engineer at Vox Media.

Imagine you’re the maintainer of a high-traffic media website, and your DNS is already hosted on Cloudflare.

Page speed is critical. You need to get content to your audience as quickly as possible on every device. You also need to render ads in a speedy way to maintain a good user experience and make money to support your journalism.

One solution would be to render your site statically and cache it at the edge. This would help ensure you have top-notch delivery speed because you don’t need a server to return a response. However, your site has decades worth of content. If you wanted to make even a small change to the site design, you would need to regenerate every single page during your next deploy. This would take ages.

Another issue is that your site would be static — and future updates to content or new articles would not be available until you deploy again.

That’s not going to work.

Another solution would be to render each page dynamically on your server. This ensures you can return a dynamic response for new or updated articles.

However, you’re going to need to pay for some beefy servers to be able to handle spikes in traffic and respond to requests in a timely manner. You’ll also probably need to implement a system of internal caches to optimize the performance of your app, which could lead to a more complicated development experience. That also means you’ll be at risk of a thundering herd problem if, for any reason, your cache becomes invalidated.

Neither of these solutions are great, and you’re forced to make a tradeoff between one of these two approaches.

Thankfully, you’ve recently come across a project like Next.js which offers a hybrid approach: static-site generation along with incremental regeneration. You’re in love with the patterns and developer experience in Next.js, but you’d also love to take advantage of the Cloudflare Workers platform to host your site.

Cloudflare Workers allow you to run your code on the edge quickly, efficiently and at scale. Instead of paying for a server to host your code, you can host it directly inside the datacenter — reducing the number of network trips required to load your application. In a perfect world, we wouldn’t need to find hosting for a Next.js site, because Cloudflare offers the same JavaScript hosting functionality with the Workers platform. With their dynamic runtime and edge caching capabilities, we wouldn’t need to worry about making a tradeoff between static and dynamic for our site.

Unfortunately, frameworks like Next.js and Cloudflare Workers don’t mesh together particularly well due to technical constraints. Until now:

I’m excited to announce Flareact, a new open-source React framework built for Cloudflare Workers.

Rendering React on the Edge with Flareact and Cloudflare Workers

With Flareact, you don’t need to make the tradeoff between a static site and a dynamic application.

Flareact allows you to render your React apps at the edge rather than on the server. It is modeled after Next.js, which means it supports file-based page routing, dynamic page paths and edge-side data fetching APIs.

Not only are Flareact pages rendered at the edge — they’re also cached at the edge using the Cache API. This allows you to provide a dynamic content source for your app without worrying about traffic spikes or response times.

With no servers or origins to deal with, your site is instantly available to your audience. Cloudflare Workers gives you a 0ms cold start and responses from the edge within milliseconds.

You can check out the docs and get started now by clicking the button below:

Rendering React on the Edge with Flareact and Cloudflare Workers

To get started manually, install the latest wrangler, and use the handy wrangler generate command below to create your first project:

npm i @cloudflare/wrangler -g
wrangler generate my-project https://github.com/flareact/flareact-template

What’s the big deal?

Hosting React apps on Cloudflare Workers Sites is not a new concept. In fact, you’ve always been able to deploy a create-react-app project to Workers Sites in addition to static versions of other frameworks like Gatsby and Next.js.

However, Flareact renders your React application at the edge. This allows you to provide an initial server response with HTML markup — which can be helpful for search engine crawlers. You can also cache the response at the edge and optionally invalidate that cache on a timed basis — meaning your static markup will be regenerated if you need it to be fresh.

This isn’t a new pattern: Next.js has done the hard work in defining the shape of this API with SSG support and Incremental Static Regeneration. While there are nuanced differences in the implementation between Flareact and Next.js, they serve a similar purpose: to get your application to your end-user in the quickest and most-scalable way possible.

A focus on developer experience

A magical developer experience is a crucial ingredient to any successful product.

As a longtime fan and user of Next.js, I wanted to experiment with running the framework on Cloudflare Workers. However, Next.js and its APIs are framed around the Node.js HTTP Server API, while Cloudflare Workers use V8 isolates and are modeled after the FetchEvent type.

Since we don’t have typical access to a filesystem inside V8 isolates, it’s tough to mimic the environment required to run a dynamic Next.js server at the edge. Though projects like Fab have come up with workarounds, I decided to approach the project with a clean slate and use existing patterns established in Next.js in a brand-new framework.

As a developer, I absolutely love the simplicity of exporting an asynchronous function from my page to have it supply props to the component. Flareact implements this pattern by allowing you to export a getEdgeProps function. This is similar to getStaticProps in Next.js, and it matches the expected return shape of that function in Next.js — including a revalidate parameter. Learn more about data fetching in Flareact.

I was also inspired by the API Routes feature of Next.js when I implemented the API Routes feature of Flareact — enabling you to write standard Cloudflare Worker scripts directly within your React app.

I hope porting over an existing Next.js project to Flareact is a breeze!

How it works

When a FetchEvent request comes in, Flareact inspects the URL pathname to decide how to handle it:

If the request is for a page or for page props, it checks the cache for that request and returns it if there’s a hit. If there is a cache miss, it generates the page request or props function, stores the result in the cache, and returns the response.

If the request is for an API route, it sends the entire FetchEvent along to the user-defined API function, allowing the user to respond as they see fit.

Rendering React on the Edge with Flareact and Cloudflare Workers

If you want your cached page to be revalidated after a certain amount of time, you can return an additional revalidate property from getEdgeProps(). This instructs Flareact to cache the endpoint for that number of seconds before generating a new response.

Rendering React on the Edge with Flareact and Cloudflare Workers

Finally, if the request is for a static asset, it returns it directly from the Workers KV.

The Worker

The core responsibilities of the Worker — or in a traditional SSR framework, the server are to:

  1. Render the initial React page component into static HTML markup.
  2. Provide the initial page props as a JSON object, embedded into the static markup in a script tag.
  3. Load the client-side JavaScript bundles and stylesheets necessary to render the interactive page.

One challenge with building Flareact is that the Webpack targets the webworker output rather than the node output. This makes it difficult to inform the worker which pages exist in the filesystem, since there is no access to the filesystem.

To get around this, Flareact leverages require.context, a Webpack-specific API, to inspect the project and build a manifest of pages on the client and the worker. I’d love to replace this with a smarter bundling strategy on the client-side eventually.

The Client

In addition to handling incoming Worker requests, Flareact compiles a client bundle containing the code necessary for routing, data fetching and more from the browser.

The core responsibilities of the client are to:

  1. Listen for routing events
  2. Fetch the necessary page component and its props from the worker over AJAX

Building a client router from scratch has been a challenge. It listens for changes to the internal route state, updates the URL pathname with pushState, makes an AJAX request to the worker for the page props, and then updates the current component in the render tree with the requested page.

It was fun building a flareact/link component similar to next/link:

import Link from "flareact/link";

export default function Index() {
  return (
    <div>
      <Link href="/about">
        <a>Go to About</a>
      </Link>
    </div>
  );
}

I also set out to build a custom version of next/head for Flareact. As it turns out, this was non-trivial! With lots of interesting stuff going on behind the scenes to support SSR and client-side routing events, I decided to make flareact/head a simple wrapper around react-helmet instead:

import Head from "flareact/head";

export default function Index() {
  return (
    <div>
      <Head>
        <title>My page title</title>
      </Head>
      <h1>Hello, world.</h1>
    </div>
  );
}

Local Development

The local developer experience of Flareact leverages the new wrangler dev command, sending server requests through a local tunnel to the Cloudflare edge and back to your machine.


This is a huge win for productivity, since you don’t need to manually build and deploy your application to see how it will perform in a production environment.

It’s also a really exciting update to the serverless toolchain. Running a robust development environment in a serverless world has always been a challenge, since your code is executing in a non-traditional context. Tunneling local code to the edge and back is such a great addition to Cloudflare’s developer experience.

Use cases

Flareact is a great candidate for a lot of Jamstack-adjacent applications, like blogs or static marketing sites.

It could also be used for more dynamic applications, with robust API functions and authentication mechanisms — all implemented using Cloudflare Workers.

Imagine building a high-traffic e-commerce site with Flareact, where both site reliability and dynamic rendering for things like price changes and stock availability are crucial.

There are also untold possibilities for integrating the Workers KV into your edge props or API functions as a first-class database solution. No need to reach for an externally-hosted database!

While the project is still in its early days, here are a couple real-world examples:

The road ahead

I have to be honest: creating a server-side rendered React framework with little prior knowledge was very difficult. There’s still a ton to learn, and Flareact has a long way to go to reach parity with Next.js in the areas of optimization and production-readiness.

Here’s what I’m hoping to add to Flareact in the near future:

  • Smarter client bundling and Webpack chunks to reduce individual page weight
  • A more feature-complete client-side router
  • The ability to extend and customize the root document of the app
  • Support for more style frameworks (CSS-in-JS, Sass, CSS modules, etc)
  • A more stable development environment
  • Documentation and support for environment variables, secrets and KV namespaces
  • A guide for deploying from GitHub Actions and other CI tools

If the project sounds interesting to you, be sure to check out the source code on GitHub. Contributors are welcome!

Asynchronous HTMLRewriter for Cloudflare Workers

Post Syndicated from Ashcon Partovi original https://blog.cloudflare.com/asynchronous-htmlrewriter-for-cloudflare-workers/

Asynchronous HTMLRewriter for Cloudflare Workers

Asynchronous HTMLRewriter for Cloudflare Workers

Last year, we launched HTMLRewriter for Cloudflare Workers, which enables developers to make streaming changes to HTML on the edge. Unlike a traditional DOM parser that loads the entire HTML document into memory, we developed a streaming parser written in Rust. Today, we’re announcing support for asynchronous handlers in HTMLRewriter. Now you can perform asynchronous tasks based on the content of the HTML document: from prefetching fonts and image assets to fetching user-specific content from a CMS.

How can I use HTMLRewriter?

We designed HTMLRewriter to have a jQuery-like experience. First, you define a handler, then you assign it to a CSS selector; Workers does the rest for you. You can look at our new and improved documentation to see our supported list of selectors, which now include nth-child selectors. The example below changes the alternative text for every second image in a document.

async function editHtml(request) {
  return new HTMLRewriter()
     .on("img:nth-child(2)", new ElementHandler())
     .transform(await fetch(request))
}

class ElementHandler {
   element(e) {
      e.setAttribute("alt", "A very interesting image")
   }
}

Since these changes are applied using streams, we maintain a low TTFB (time to first byte) and users never know the HTML was transformed. If you’re interested in how we’re able to accomplish this technically, you can read our blog post about HTML parsing.

What’s new with HTMLRewriter?

Now you can define an async handler which allows any code that uses await. This means you can make dynamic HTML injection, based on the contents of the document, without having prior knowledge of what it contains. This allows you to customize HTML based on a particular user, feature flag, or even an integration with a CMS.

class UserCustomizer {
   // Remember to add the `async` keyword to the handler method
   async element(e) {
      const user = await fetch(`https://my.api.com/user/${e.getAttribute("user-id")}/online`)
      if (user.ok) {
         // Add the user’s name to the element
         e.setAttribute("user-name", await user.text())
      } else {
         // Remove the element, since this user not online
         e.remove()
      }
   }
}

What can I build with HTMLRewriter?

To illustrate the flexibility of HTMLRewriter, I wrote an example that you can deploy on your own website. If you manage a website, you know that old links and images can expire with time. Here’s an excerpt from a years’ old post I wrote on the Cloudflare Blog:

Asynchronous HTMLRewriter for Cloudflare Workers

As you might see, that missing image is not the prettiest sight. However, we can easily fix this using async handlers in HTMLRewriter. Using a service like the Internet Archive API, we can check if an image no longer exists and rewrite the URL to use the latest archive. That means users don’t see an ugly placeholder and won’t even know the image was replaced.

async function fetchAndFixImages(request) {
   return new HTMLRewriter()
      .on("img", new ImageFixer())
      .transform(await fetch(request))
}

class ImageFixer {
   async element(e) {
    var url = e.getAttribute("src")
    var response = await fetch(url)
    if (!response.ok) {
       var archive = await fetch(`https://archive.org/wayback/available?url=${url}`)
       if (archive.ok) {
          var snapshot = await archive.json()
          e.setAttribute("src", snapshot.archived_snapshots.closest.url)
       } else {
          e.remove()
       }
    }
  }
}

Using the Workers Playground, you can view a working sample of the above code. A more complex example could even alert a service like Sentry when a missing image is detected. Using the previous missing image, now you can see the image is restored and users are none of the wiser.

Asynchronous HTMLRewriter for Cloudflare Workers

If you’re interested in deploying this to your own website, click on the button below:

Asynchronous HTMLRewriter for Cloudflare Workers

What else can I build with HTMLRewriter?

We’ve been blown away by developer projects using HTMLRewriter. Here are a few projects that caught our eye and are great examples of the power of Cloudflare Workers and HTMLRewriter:

If you’re interested in using HTMLRewriter, check out our documentation. Also be sure to share any creations you’ve made with @CloudflareDev, we love looking at the awesome projects you build.

Announcing wrangler dev — the Edge on localhost

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/announcing-wrangler-dev-the-edge-on-localhost/

Announcing wrangler dev — the Edge on localhost

Cloudflare Workers — our serverless platform — allows developers around the world to run their applications from our network of 200 datacenters, as close as possible to their users.

A few weeks ago we announced a release candidate for wrangler dev — today, we’re excited to take wrangler dev, the world’s first edge-based development environment, to GA with the release of wrangler 1.11.

Think locally, develop globally

It was once assumed that to successfully run an application on the web, one had to go and acquire a server, set it up (in a data center that hopefully you had access to), and then maintain it on an ongoing basis. Luckily for most of us, that assumption was challenged with the emergence of the cloud. The cloud was always assumed to be centralized — large data centers in a single region (“us-east-1”), reserved for compute. The edge? That was for caching static content.

Again, assumptions are being challenged.

Cloudflare Workers is about moving compute from a centralized location to the edge. And it makes sense: if users are distributed all over the globe, why should all of them be routed to us-east-1, on the opposite side of the world, causing latency and degrading user experience?

But challenging one assumption caused others to come into view. One of the most obvious ones was: would a local development environment actually provide the best experience for someone looking to test their Worker code? Trying to fit the entire Cloudflare edge, with all its dependencies onto a developer’s machine didn’t seem to be the best approach. Especially given that the place the developer was going to run that code in production was mere milliseconds away from the computer they were running on.

When I was in college, getting started with programming, one of the biggest barriers to entry was installing all the dependencies required to run a single library. I would go as far as to say that the third, and often forgotten hardest problem in computer science is dependency management.

We’ve not the first to try and unify development environments across machines — tools such as Docker aim to solve this exact problem by providing a prepackaged development environment.

Yet, packaging up the Workers runtime is not quite so simple.

Beyond the Workers runtime, there are many components that make up Cloudflare’s edge, including DNS resolution, the Cloudflare cache — all of those parts are what makes Cloudflare Workers so powerful. That means that without those components, a standalone runtime is insufficient to represent the behavior of Worker request handling. The reason to develop locally first is to have the opportunity to experiment without affecting production. Thus, having a local development environment that truly reflects production is a requirement.

wrangler dev

wrangler dev provides all the convenience of a local development environment, without the headache of trying to reproduce the reality of production locally — and then having to keep the two environments in sync.

By running at the edge, it provides a high fidelity, consistent experience for all developers, without sacrificing the speedy feedback loop of a local development environment.

Live reloading

Announcing wrangler dev — the Edge on localhost

As you update your code, wrangler dev will detect changes, and push the new version of your code to the edge.

console.log() at your fingertips

Announcing wrangler dev — the Edge on localhost

Previously to extract your console logs from the Workers runtime, you had to have the Workers Preview open in a browser window at all times. With wrangler dev, you can receive your own logs, directly to your terminal of choice.

Cache API, KV, and more!

Since wrangler dev runs on the edge, you can now easily test the state of a cache.put(), without having to deploy your Worker to production.

wrangler dev will spin up a new KV namespace for development, so you don’t have to worry about affecting your production data.

And if you’re looking to test out some of the features provided on request.cf that provide rich information about the request such as geo-location — they will all be provided from the Cloudflare data center.

Get started

wrangler dev is now available in the latest version of Wrangler, the official Cloudflare Workers CLI.

To get started, follow our installation instructions here.

What’s next?

wrangler dev is just our first foray into giving our developers more visibility and agility with their development process.

We recognize that we have a lot more work to do to meet our developers needs, including providing an easy testing framework for Workers, and allowing our customers to observe their Workers’ behavior in production.

Just as wrangler dev provides a quick feedback loop between our developers and their code, we love to have a tight feedback loop between our developers and our product. We love to hear what you’re building, how you’re building it, and how we can help you build it better.

Making magic: Reimagining Developer Experience for the World of Serverless

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/making-magic-reimagining-developer-experiences-for-the-world-of-serverless/

Making magic: Reimagining Developer Experience for the World of Serverless

Making magic: Reimagining Developer Experience for the World of Serverless

This week we’ve talked about how Workers provides a step function improvement in the TTFB (time to first byte) of applications, by running lightweight isolates in over 200 cities around the world, free of cold starts. Today I’m going to talk about another metric, one that’s arguably even more important: TTFD, or time to first dopamine, and announce a huge improvement to the Workers development experience — wrangler dev, our edge-based development environment with all the perks of a local environment.

There’s nothing quite like the rush of getting your first few lines of code to work — no matter how many times you’ve done it before, there’s something so magical about the computer understanding exactly what you wanted it to do and doing it!

Making magic: Reimagining Developer Experience for the World of Serverless

This is the kind of magic I expected of “serverless”, and while it’s true that most serverless offerings today get you to that feeling faster than setting up a virtual server ever would, I still can’t help but be disappointed with how lackluster developing with most serverless platforms is today.

Some of my disappointment can be attributed to the leaky nature of the abstraction: the journey to getting you to the point of writing code is drawn out by forced decision making about servers (regions, memory allocation, etc). Servers, however, are not the only thing holding developers back from getting to the delightful magical feeling in the serverless world today.

The “serverless” experience on AWS Lambda today looks like this: between configuring the right access policy to invoke my own test application, and deciding whether an HTTP or REST API was better suited for my needs, 30 minutes had easily passed, and I still didn’t have a URL I could call to invoke my application. I did, however, spin up five different services, and was already worrying about cleaning them up lest I be charged for them.

That doesn’t feel like magic!

In building what we believe to be the serverless platform of the future — a promise that feels very magical —  we wanted to bring back that magical feeling to every step of the development journey. If serverless is about empowering developers, then they should be empowered every step of the way: from proof of concept to MVP and beyond.

We’re excited to share with you today our approach to making our developer experience delightful — we recognize we still have plenty of room to continue to grow and innovate (and we can’t wait to tell you about everything we have currently in the works as well!), but we’re proud of all the progress we’ve made in making Workers the easiest development platform for developers to use.

Defining “developer experience”

To get us started, let’s look at what the journey of a developer entails. Today, we’ll be defining the user experience as the following four stages:

  • Getting started: All the steps we have to take before putting in some keystrokes
  • Iteration: Does my code do what I expect it to do? What do I need to do to get it there?
  • Release: I’ve tested what I can — time to hit the big red button!
  • Observe: Is anything broken? And how do I fix it?
Making magic: Reimagining Developer Experience for the World of Serverless

When approaching each stage of development, we wanted to reimagine the experience, the way that we’ve always wanted our development flow to work, and fix places along the way where existing platforms have let us down.

Zero to Hello World

Making magic: Reimagining Developer Experience for the World of Serverless

With Workers, we want to get you to that aforementioned delightful feeling as quickly as possible, and remove every obstacle in the way of writing and deploying your code. The first deployment experience is really important — if you’ve done it once and haven’t given up along the way, you can do it again.

We’re very proud to say our TTFD — even for a new user without a Cloudflare account — is as low as three minutes. If you’re an existing customer, you can have your first Worker running in seconds. No regions to choose, no IAM rules to configure, and no API Gateways to set up or worry about paying for.

If you’re new to Workers and still trying to get a feel for it, you can instantly deploy your Worker to 200 cities around the world within seconds, with the simple click of a button.

If you’ve already decided on Workers as the choice for building your next application, we want to make you feel at home by allowing you to use all of your favorite IDEs, be it vim or emacs or VSCode (we don’t care!).

With the release of wrangler — the official command-line tool for Workers, getting started is just as easy as:

wrangler generate hello
cd hello
wrangler publish

Again, in seconds your code is up and running, and easily accessible all over the world.

“Hello, World!”, of course, doesn’t have to be quite so literal. We provide a range of tutorials to help get you started and get familiar with developing with Workers.

To save you that last bit of time in getting started, our template gallery provides starter templates so you can dive straight into building the products you’re excited about — whether it’s a new GraphQL server or a brand new static site, we’ve got you covered.

Local(ish) development: code, test, repeat

Making magic: Reimagining Developer Experience for the World of Serverless

We can’t promise to get the code right on your behalf, but we can promise to do everything we can to get you the feedback you need to help you get your code right.

The development journey requires lots of experimentation, trial and error, and debugging. If my Computer Science degree came with instructions on the back of the bottle, they would read: “code, print, repeat.”

Getting code right is an extremely iterative, feedback-driven process. We would all love to get code right the first time around and move on, but the reality is, computers are bad mind-readers, and you’ve ended up with an extraneous parenthesis or a stray comma in your JSON, so your code is not going to run. Found where the loose parenthesis was introduced? Great! Now your code is running, but the output is not right — time to go find that off-by-one error.

Local development has traditionally been the way for developers to get a tight feedback loop during the development process. The crucial components that make up an effective local development environment and make it a great testing ground are: fast feedback loop, its sandboxed nature (ability to develop without affecting production), and accuracy.

As we started thinking about accomplishing all three of those goals, we realized that being local actually wasn’t itself a requirement — speed is the real requirement, and running on the client is the only way acceptable speed for a good-enough feedback loop could be achieved.

One option was to provide a traditional local development environment, but one thing didn’t sit well with us: we wanted to provide a local development environment for the Workers runtime, however, we knew there was more to handling a request than just the runtime, which could compromise accuracy. We didn’t want to set our users up to fail with code that works on their machine but not ours.

Shipping the rest of our edge infrastructure to the user would pose its own challenges of keeping it up to date, and it would require the user to install hundreds of unnecessary dependencies, all potentially to end up with the most frustrating experience of all: running into some installation bug the explanation to which couldn’t be found on StackOverflow. This experience didn’t sit right with us.

As it turns out, this is a very similar problem to one we commonly solve for our customers: Running code on the client is fast, but it doesn’t give me the control I need; running code on the server gives me the control I need, but it requires a slow round-trip to the origin. All we had to do was take our own advice and run it on the edge! It’s the best of both worlds: your code runs so close to your end user that you get the same performance as running it on the client, without having to lose control.

To provide developers access to this tight feedback loop, we introduced wrangler dev earlier this year!

Making magic: Reimagining Developer Experience for the World of Serverless

wrangler dev  has the look and feel of a local development environment: it runs on localhost but tunnels to the edge, and provides output directly to your favorite IDE of choice. Since wrangler dev now runs on the edge, it works on your machine and ours exactly the same!

Our release candidate for wrangler dev is live and waiting for you to take it for a test drive, as easily as:

npm i @cloudflare/[email protected] -g

Let us know what you think.

Release

Making magic: Reimagining Developer Experience for the World of Serverless

After writing all the code, testing every edge case imaginable, and going through code review, at some point the code needs to be released for the rest of the world to reap the fruits of your hard labor and enjoy the features you’ve built.

For smaller, quick applications, it’s exciting to hit the “Save & deploy” button and let fate take the wheel.

For production level projects, however, the process of deploying to production may be a bit different. Different organizations adopt different processes for code release. For those using GitHub, last year we introduced our GitHub Action, to make it easy to configure an integrated release process.

With Wrangler, you can configure Workers to deploy using your existing CI, to automate deployments, and minimize human intervention.

When deploying to production, again, feedback becomes extremely important. Some platforms today still take as long as a few minutes to deploy your code. A few minutes may seem trivial, but a few minutes of nervously refreshing, wondering whether your code is live yet, and which version of your code your users are seeing is stressful. This is especially true in a rollback or a bug-fix situation where you want the new version to be live ASAP.

New Workers are deployed globally in less than five seconds, which means new changes are instantaneous. Better yet, since Workers runs on lightweight isolates, newly deployed Workers don’t experience dreaded cold starts, which means you can release code as frequently as you’re able to ship it, without having to invest additional time in auxiliary gadgets to pre-warm your Worker — more time for you to start working on your next feature!

Observe & Resolve

Making magic: Reimagining Developer Experience for the World of Serverless

The big red button has been pushed. Dopamine has been replaced with adrenaline: the instant question on your mind is: “Did I break anything? And if so, what, and how do I fix it?” These questions are at the core of what the industry calls “observability”.

There are different ways things can break and incidents can manifest themselves: increases in errors, drops in traffic, even a drop in performance could be considered a regression.

To identify these kinds of issues, you need to be able to spot a trend. Raw data, however, is not a very useful medium for spotting trends — humans simply cannot parse raw lines of logs to identify a subtle increase in errors.

This is why we took a two-punch approach to helping developers identify and fix issues: exposing trend data through analytics, while also providing the ability to tail production logs for forensics and investigation.

Earlier this year, we introduced Workers Metrics: an easy way for developers to identify trends in their production traffic.

With requests metrics, you can easily spot any increases in errors, or drastic changes in traffic patterns after a given release:

Making magic: Reimagining Developer Experience for the World of Serverless

Additionally, sometimes new code can introduce unforeseen regressions in the overall performance of the application. With CPU time metrics, our developers are now able to spot changes in the performance of their Worker, as well as use that information to guide and optimize their code.

Making magic: Reimagining Developer Experience for the World of Serverless

Once you’ve identified a regression, we wanted to provide the tools needed to find your bug and fix it, which is why we also recently launched `wrangler tail`: production logs in a single command.

wrangler tail can help diagnose where code is failing or why certain customers are getting unexpected outcomes because it exposes console.log() output and exceptions. By having access to this output, developers can immediately diagnose, fix, and resolve any issues occurring in production.

Making magic: Reimagining Developer Experience for the World of Serverless

We know how precious every moment can be when a bad code deploy impacts customer traffic. Luckily, once you’ve found and fixed your bug, it’s only a matter of seconds for users to start benefiting from the fix — unlike other platforms which make you wait as long as 5 minutes, Workers get deployed globally within five seconds.

Repeat

Making magic: Reimagining Developer Experience for the World of Serverless

As you’re thinking about your next feature, you checkout a new branch, and the cycle begins all over. We’re excited for you to check out all the improvements we’ve made to the development experience with Workers, all to reduce your time to first dopamine (TTFD).

We are always working on improving it further, looking where we can remove every additional bit of friction, and love to hear your feedback as we do so.

Workers Security

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/workers-security/

Workers Security

Workers Security
Hello, I’m an engineer on the Workers team, and today I want to talk to you about security.

Cloudflare is a security company, and the heart of Workers is, in my view, a security project. Running code written by third parties is always a scary proposition, and the primary concern of the Workers team is to make that safe.

For a project like this, it is not enough to pass a security review and say "ok, we’re secure" and move on. It’s not even enough to consider security at every stage of design and implementation. For Workers, security in and of itself is an ongoing project, and that work is never done. There are always things we can do to reduce the risk and impact of future vulnerabilities.

Today, I want to give you an overview of our security architecture, and then address two specific issues that we are frequently asked about: V8 bugs, and Spectre.

Architectural Overview

Let’s start with a quick overview of the Workers Runtime architecture.

Workers Security

There are two fundamental parts of designing a code sandbox: secure isolation and API design.

Isolation

First, we need to create an execution environment where code can’t access anything it’s not supposed to.

For this, our primary tool is V8, the JavaScript engine developed by Google for use in Chrome. V8 executes code inside "isolates", which prevent that code from accessing memory outside the isolate — even within the same process. Importantly, this means we can run many isolates within a single process. This is essential for an edge compute platform like Workers where we must host many thousands of guest apps on every machine, and rapidly switch between these guests thousands of times per second with minimal overhead. If we had to run a separate process for every guest, the number of tenants we could support would be drastically reduced, and we’d have to limit edge compute to a small number of big enterprise customers who could pay a lot of money. With isolate technology, we can make edge compute available to everyone.

Sometimes, though, we do decide to schedule a worker in its own private process. We do this if it uses certain features that we feel need an extra layer of isolation. For example, when a developer uses the devtools debugger to inspect their worker, we run that worker in a separate process. This is because historically, in the browser, the inspector protocol has only been usable by the browser’s trusted operator, and therefore has not received as much security scrutiny as the rest of V8. In order to hedge against the increased risk of bugs in the inspector protocol, we move inspected workers into a separate process with a process-level sandbox. We also use process isolation as an extra defense against Spectre, which I’ll describe later in this post.

Additionally, even for isolates that run in a shared process with other isolates, we run multiple instances of the whole runtime on each machine, which we call "cordons". Workers are distributed among cordons by assigning each worker a level of trust and separating low-trusted workers from those we trust more highly. As one example of this in operation: a customer who signs up for our free plan will not be scheduled in the same process as an enterprise customer. This provides some defense-in-depth in the case a zero-day security vulnerability is found in V8. But I’ll talk more about V8 bugs, and how we address them, later in this post.

At the whole-process level, we apply another layer of sandboxing for defense in depth. The "layer 2" sandbox uses Linux namespaces and seccomp to prohibit all access to the filesystem and network. Namespaces and seccomp are commonly used to implement containers. However, our use of these technologies is much stricter than what is usually possible in container engines, because we configure namespaces and seccomp after the process has started (but before any isolates have been loaded). This means, for example, we can (and do) use a totally empty filesystem (mount namespace) and use seccomp to block absolutely all filesystem-related system calls. Container engines can’t normally prohibit all filesystem access because doing so would make it impossible to use exec() to start the guest program from disk; in our case, our guest programs are not native binaries, and the Workers runtime itself has already finished loading before we block filesystem access.

The layer 2 sandbox also totally prohibits network access. Instead, the process is limited to communicating only over local Unix domain sockets, to talk to other processes on the same system. Any communication to the outside world must be mediated by some other local process outside the sandbox.

One such process in particular, which we call the "supervisor", is responsible for fetching worker code and configuration from disk or from other internal services. The supervisor ensures that the sandbox process cannot read any configuration except that which is relevant to the workers that it should be running.

For example, when the sandbox process receives a request for a worker it hasn’t seen before, that request includes the encryption key for that worker’s code (including attached secrets). The sandbox can then pass that key to the supervisor in order to request the code. The sandbox cannot request any worker for which it has not received the appropriate key. It cannot enumerate known workers. It also cannot request configuration it doesn’t need; for example, it cannot request the TLS key used for HTTPS traffic to the worker.

Aside from reading configuration, the other reason for the sandbox to talk to other processes on the system is to implement APIs exposed to Workers. Which brings us to API design.

API Design

There is a saying: "If a tree falls in the forest, but no one is there to hear it, does it make a sound?" I have a related saying: "If a Worker executes in a fully-isolated environment in which it is totally prevented from communicating with the outside world, does it actually run?"

Complete code isolation is, in fact, useless. In order for Workers to do anything useful, they have to be allowed to communicate with users. At the very least, a Worker needs to be able to receive requests and respond to them. It would also be nice if it could send requests to the world, safely. For that, we need APIs.

In the context of sandboxing, API design takes on a new level of responsibility. Our APIs define exactly what a Worker can and cannot do. We must be very careful to design each API so that it can only express operations which we want to allow, and no more. For example, we want to allow Workers to make and receive HTTP requests, while we do not want them to be able to access the local filesystem or internal network services.

Let’s dig into the easier example first. Currently, Workers does not allow any access to the local filesystem. Therefore, we do not expose a filesystem API at all. No API means no access.

But, imagine if we did want to support local filesystem access in the future. How would we do that? We obviously wouldn’t want Workers to see the whole filesystem. Imagine, though, that we wanted each Worker to have its own private directory on the filesystem where it can store whatever it wants.

To do this, we would use a design based on capability-based security. Capabilities are a big topic, but in this case, what it would mean is that we would give the worker an object of type Directory, representing a directory on the filesystem. This object would have an API that allows creating and opening files and subdirectories, but does not permit traversing "up" the tree to the parent directory. Effectively, each worker would see its private Directory as if it were the root of their own filesystem.

How would such an API be implemented? As described above, the sandbox process cannot access the real filesystem, and we’d prefer to keep it that way. Instead, file access would be mediated by the supervisor process. The sandbox talks to the supervisor using Cap’n Proto RPC, a capability-based RPC protocol. (Cap’n Proto is an open source project currently maintained by the Cloudflare Workers team.) This protocol makes it very easy to implement capability-based APIs, so that we can strictly limit the sandbox to accessing only the files that belong to the Workers it is running.

Now what about network access? Today, Workers are allowed to talk to the rest of the world only via HTTP — both incoming and outgoing. There is no API for other forms of network access, therefore it is prohibited (though we plan to support other protocols in the future).

As mentioned before, the sandbox process cannot connect directly to the network. Instead, all outbound HTTP requests are sent over a Unix domain socket to a local proxy service. That service implements restrictions on the request. For example, it verifies that the request is either addressed to a public Internet service, or to the Worker’s zone’s own origin server, not to internal services that might be visible on the local machine or network. It also adds a header to every request identifying the worker from which it originates, so that abusive requests can be traced and blocked. Once everything is in order, the request is sent on to our HTTP caching layer, and then out to the Internet.

Similarly, inbound HTTP requests do not go directly to the Workers Runtime. They are first received by an inbound proxy service. That service is responsible for TLS termination (the Workers Runtime never sees TLS keys), as well as identifying the correct Worker script to run for a particular request URL. Once everything is in order, the request is passed over a Unix domain socket to the sandbox process.

V8 bugs and the "patch gap"

Every non-trivial piece of software has bugs, and sandboxing technologies are no exception. Virtual machines have bugs, containers have bugs, and yes, isolates (which we use) also have bugs. We can’t live life pretending that no further bugs will ever be discovered; instead, we must assume they will and plan accordingly.

We rely heavily on isolation provided by V8, the JavaScript engine built by Google for use in Chrome. This has good sides and bad sides. On one hand, V8 is an extraordinarily complicated piece of technology, creating a wider "attack surface" than virtual machines. More complexity means more opportunities for something to go wrong. On the bright side, though, an extraordinary amount of effort goes into finding and fixing V8 bugs, owing to its position as arguably the most popular sandboxing technology in the world. Google regularly pays out 5-figure bounties to anyone finding a V8 sandbox escape. Google also operates fuzzing infrastructure that automatically finds bugs faster than most humans can. Google’s investment does a lot to minimize the danger of V8 "zero-days" — bugs that are found by the bad guys and not known to Google.

But, what happens after a bug is found and reported by the good guys? V8 is open source, so fixes for security bugs are developed in the open and released to everyone at the same time — good guys and bad guys. It’s important that any patch be rolled out to production as fast as possible, before the bad guys can develop an exploit.

The time between publishing the fix and deploying it is known as the "patch gap". Earlier this year, Google announced that Chrome’s patch gap had been reduced from 33 days to 15 days.

Fortunately, we have an advantage here, in that we directly control the machines on which our system runs. We have automated almost our entire build and release process, so the moment a V8 patch is published, our systems automatically build a new release of the Workers Runtime and, after one-click sign-off from the necessary (human) reviewers, automatically push that release out to production.

As a result, our patch gap is now under 24 hours. A patch published by V8’s team in Munich during their work day will usually be in production before the end of our work day in the US.

Spectre: Introduction

Workers Security
We get a lot of questions about Spectre. The V8 team at Google has stated in no uncertain terms that V8 itself cannot defend against Spectre. Since Workers relies on V8 for sandboxing, many have asked if that leaves Workers vulnerable. However, we do not need to depend on V8 for this; the Workers environment presents many alternative approaches to mitigating Spectre.

Spectre is complicated and nuanced, and there’s no way I can cover everything there is to know about it or how Workers addresses it in a single blog post. But, hopefully I can clear up some of the confusion and concern.

What is it?

Spectre is a class of attacks in which a malicious program can trick the CPU into "speculatively" performing computation using data that the program is not supposed to have access to. The CPU eventually realizes the problem and does not allow the program to see the results of the speculative computation. However, the program may be able to derive bits of the secret data by looking at subtle side effects of the computation, such as the effects on cache.

For more background about Spectre, check out our Learning Center page on the topic.

Why does it matter for Workers?

Spectre encompasses a wide variety of vulnerabilities present in modern CPUs. The specific vulnerabilities vary by architecture and model, and it is likely that many vulnerabilities exist which haven’t yet been discovered.

These vulnerabilities are a problem for every cloud compute platform. Any time you have more than one tenant running code on the same machine, Spectre attacks come into play. However, the "closer together" the tenants are, the more difficult it can be to mitigate specific vulnerabilities. Many of the known issues can be mitigated at the kernel level (protecting processes from each other) or at the hypervisor level (protecting VMs), often with the help of CPU microcode updates and various tricks (many of which, unfortunately, come with serious performance impact).

In Cloudflare Workers, we isolate tenants from each other using V8 isolates — not processes nor VMs. This means that we cannot necessarily rely on OS or hypervisor patches to "solve" Spectre for us. We need our own strategy.

Why not use process isolation?

Cloudflare Workers is designed to run your code in every single Cloudflare location, of which there are currently 200 worldwide and growing.

We wanted Workers to be a platform that is accessible to everyone — not just big enterprise customers who can pay megabucks for it. We need to handle a huge number of tenants, where many tenants get very little traffic.

Combine these two points, and things get tricky.

A typical, non-edge serverless provider could handle a low-traffic tenant by sending all of that tenant’s traffic to a single machine, so that only one copy of the application needs to be loaded. If the machine can handle, say, a dozen tenants, that’s plenty. That machine can be hosted in a mega-datacenter with literally millions of machines, achieving economies of scale. However, this centralization incurs latency and worldwide bandwidth costs when the users don’t happen to be nearby.

With Workers, on the other hand, every tenant, regardless of traffic level, currently runs in every Cloudflare location. And in our quest to get as close to the end user as possible, we sometimes choose locations that only have space for a limited number of machines. The net result is that we need to be able to host thousands of active tenants per machine, with the ability to rapidly spin up inactive ones on-demand. That means that each guest cannot take more than a couple megabytes of memory — hardly enough space for a call stack, much less everything else that a process needs.

Moreover, we need context switching to be extremely cheap. Many Workers resident in memory will only handle an event every now and then, and many Workers spend less than a fraction of a millisecond on any particular event. In this environment, a single core can easily find itself switching between thousands of different tenants every second. Moreover, to handle one event, a significant amount of communication needs to happen between the guest application and its host, meaning still more switching and communications overhead. If each tenant lives in its own process, all this overhead is orders of magnitude larger than if many tenants live in a single process. When using strict process isolation in Workers, we find the CPU cost can easily be 10x what it is with a shared process.

In order to keep Workers inexpensive, fast, and accessible to everyone, we must solve these issues, and that means we must find a way to host multiple tenants in a single process.

There is no "fix" for Spectre

A dirty secret that the industry doesn’t like to admit: no one has "fixed" Spectre. Not even when using heavyweight virtual machines. Everyone is still vulnerable.

The current approach being taken by most of the industry is essentially a game of whack-a-mole. Every couple months, researchers uncover a new Spectre vulnerability. CPU vendors release some new microcode, OS vendors release kernel patches, and everyone has to update.

But is it enough to merely deploy the latest patches?

It is abundantly clear that many more vulnerabilities exist, but haven’t yet been publicized. Who might know about those vulnerabilities? Most of the bugs being published are being found by (very smart) graduate students on a shoestring budget. Imagine, for a minute, how many more bugs a well-funded government agency, able to buy the very best talent in the world, could be uncovering.

To truly defend against Spectre, we need to take a different approach. It’s not enough to block individual known vulnerabilities. We must address the entire class of vulnerabilities at once.

We can’t stop it, but we can slow it down

Unfortunately, it’s unlikely that any catch-all "fix" for Spectre will be found. But for the sake of argument, let’s try.

Fundamentally, all Spectre vulnerabilities use side channels to detect hidden processor state. Side channels, by definition, involve observing some non-deterministic behavior of a system. Conveniently, most software execution environments try hard to eliminate non-determinism, because non-deterministic execution makes applications unreliable.

However, there are a few sorts of non-determinism that are still common. The most obvious among these is timing. The industry long ago gave up on the idea that a program should take the same amount of time every time it runs, because deterministic timing is fundamentally at odds with heuristic performance optimization. Sure enough, most Spectre attacks focus on timing as a way to detect the hidden microarchitectural state of the CPU.

Some have proposed that we can solve this by making timers inaccurate or adding random noise. However, it turns out that this does not stop attacks; it only makes them slower. If the timer tracks real time at all, then anything you can do to make it inaccurate can be overcome by running an attack multiple times and using statistics to filter out the noise.

Many security researchers see this as the end of the story. What good is slowing down an attack, if the attack is still possible? Once the attacker gets your private key, it’s game over, right? What difference does it make if it takes them a minute or a month?

Cascading Slow-downs

We find that, actually, measures that slow down an attack can be powerful.

Our key insight is this: as an attack becomes slower, new techniques become practical to make it even slower still. The goal, then, is to chain together enough techniques that an attack becomes so slow as to be uninteresting.

Much of cryptography, after all, is technically vulnerable to "brute force" attacks — technically, with enough time, you can break it. But when the time required is thousands (or even billions) of years, we decide that this is good enough.

So, what do we do to slow down Spectre attacks to the point of meaninglessness?

Freezing a Spectre Attack

Workers Security

Step 0: Don’t allow native code

We do not allow our customers to upload native-code binaries to run on our network. We only accept JavaScript and WebAssembly. Of course, many other languages, like Python, Rust, or even Cobol, can be compiled or transpiled to one of these two formats; the important point is that we do another pass on our end, using V8, to convert these formats into true native code.

This, in itself, doesn’t necessarily make Spectre attacks harder. However, I present this as step 0 because it is fundamental to enabling everything else below.

Accepting native code programs implies being beholden to an existing CPU architecture (typically, x86). In order to execute code with reasonable performance, it is usually necessary to run the code directly on real hardware, severely limiting the host’s control over how that execution plays out. For example, a kernel or hypervisor has no ability to prohibit applications from invoking the CLFLUSH instruction, an instruction which is very useful in side channel attacks and almost nothing else.

Moreover, supporting native code typically implies supporting whole existing operating systems and software stacks, which bring with them decades of expectations about how the architecture works under them. For example, x86 CPUs allow a kernel or hypervisor to disable the RDTSC instruction, which reads a high-precision timer. Realistically, though, disabling it will break many programs because they are implemented to use RDTSC any time they want to know the current time.

Supporting native code would bind our hands in terms of mitigation techniques. By using an abstract intermediate format, we have much greater freedom.

Step 1: Disallow timers and multi-threading

In Workers, you can get the current time using the JavaScript Date API, for example by calling Date.now(). However, the time value returned by this is not really the current time. Instead, it is the time at which the network message was received which caused the application to begin executing. While the application executes, time is locked in place. For example, say an attacker writes:

let start = Date.now();
for (let i = 0; i < 1e6; i++) {
  doSpectreAttack();
}
let end = Date.now();

The values of start and end will always be exactly the same. The attacker cannot use Date to measure the execution time of their code, which they would need to do to carry out an attack.

As an aside: This is a measure we actually implemented in mid-2017, long before Spectre was announced (and before we knew about it). We implemented this measure because we were worried about timing side channels in general. Side channels have been a concern of the Workers team from day one, and we have designed our system from the ground up with this concern in mind.

Related to our taming of Date, we also do not permit multi-threading or shared memory in Workers. Everything related to the processing of one event happens on the same thread — otherwise, it would be possible to "race" threads in order to "MacGyver" an implicit timer. We don’t even allow multiple Workers operating on the same request to run concurrently. For example, if you have installed a Cloudflare App on your zone which is implemented using Workers, and your zone itself also uses Workers, then a request to your zone may actually be processed by two Workers in sequence. These run in the same thread.

So, we have prevented code execution time from being measured locally. However, that doesn’t actually prevent it from being measured: it can still be measured remotely. For example, the HTTP client that is sending a request to trigger the execution of the Worker can measure how long it takes for the Worker to respond. Of course, such a measurement is likely to be very noisy, since it would have to traverse the Internet. Such noise can be overcome, in theory, by executing the attack many times and taking an average.

Another aside: Some people have suggested that if a serverless platform like Workers were to completely reset an application’s state between requests, so that every request "starts fresh", this would make attacks harder. That is, imagine that a Worker’s global variables were reset after every request, meaning you cannot store state in globals in one request and then read that state in the next. Then, doesn’t that mean the attack has to start over from scratch for every request? If each request is limited to, say, 50ms of CPU time, does that mean that a Spectre attack isn’t possible, because there’s not enough time to carry it out? Unfortunately, it’s not so simple. State doesn’t have to be stored in the Worker; it could instead be stored in a conspiring client. The server can return its state to the client in each response, and the client can send it back to the server in the next request.

But is an attack based on remote timers really feasible in practice? In adversarial testing, with help from leading Spectre experts, we have not been able to develop an attack that actually works in production.

However, we don’t feel the lack of a working attack means we should stop building defenses. Instead, we’re currently testing some more advanced measures, which we plan to roll out in the coming weeks.

Step 2: Dynamic Process Isolation

We know that if an attack is possible at all, it would take a very long time to run — hours at the very least, maybe as long as weeks. But once an attack has been running even for a second, we have a huge amount of new data that we can use to trigger further measures.

Spectre attacks, you see, do a lot of "weird stuff" that you wouldn’t usually expect to see in a normal program. These attacks intentionally try to create pathological performance scenarios in order to amplify microarchitectural effects. This is especially true when the attack has already been forced to run billions of times in a loop in order to overcome other mitigations, like those discussed above. This tends to show up in metrics like CPU performance counters.

Now, the usual problem with using performance metrics to detect Spectre attacks is that sometimes you get false positives. Sometimes, a legitimate program behaves really badly. You can’t go around shutting down every app that has bad performance.

Luckily, we don’t have to. Instead, we can choose to reschedule any Worker with suspicious performance metrics into its own process. As I described above, we can’t do this with every Worker, because the overhead would be too high. But, it’s totally fine to process-isolate just a few Workers, defensively. If the Worker is legitimate, it will keep operating just fine, albeit with a little more overhead. Fortunately for us, the nature of our platform is such that we can reschedule a Worker into its own process at basically any time.

In fact, fancy performance-counter based triggering may not even be necessary here. If a Worker merely uses a large amount of CPU time per event, then the overhead of isolating it in its own process is relatively less, because it switches context less often. So, we might as well use process isolation for any Worker that is CPU-hungry.

Once a Worker is isolated, then we can rely on the operating system’s Spectre defenses, just aslike, for example, most desktop web browsers now do.

Over the past year we’ve been working with the experts at Graz Technical University to develop this approach. (TU Graz’s team co-discovered Spectre itself, and has been responsible for a huge number of the follow-on discoveries since then.) We have developed the ability to dynamically isolate workers, and we have identified metrics which reliably detect attacks. The whole system is currently undergoing testing to work out any remaining bugs, and we expect to roll it out fully within the next several weeks.

But wait, didn’t I say earlier that even process isolation isn’t a complete defense, because it only addresses known vulnerabilities? Yes, this is still true. However, the trend over time is that new Spectre attacks tend to be slower and slower to carry out, and hence we can reasonably guess that by imposing process isolation we have further slowed down even attacks that we don’t know about yet.

Step 3: Periodic Whole-Memory Shuffling

After Step 2, we already think we’ve prevented all known attacks, and we’re only worried about hypothetical unknown attacks. How long does a hypothetical unknown attack take to carry out? Well, obviously, nobody knows. But with all the mitigations in place so far, and considering that new attacks have generally been slower than older ones, we think it’s reasonable to guess attacks will take days or longer.

On a time scale of a day, we have new things we can do. In particular, it’s totally reasonable to restart the entire Workers runtime on a daily basis, which resets the locations of everything in memory, forcing attacks to restart the process of discovering the locations of secrets.

We can also reschedule Workers across physical machines or cordons, so that the window to attack any particular neighbor is limited.

In general, because Workers are fundamentally preemptible (unlike containers or VMs), we have a lot of freedom to frustrate attacks.

Once we have dynamic process isolation fully deployed, we plan to develop these ideas next. We see this as an ongoing investment, not something that will ever be "done".

Conclusion

Phew. You just read twelve pages about Workers security. Hopefully I’ve convinced you that designing a secure sandbox is only the beginning of building a secure compute platform, and the real work is never done. Popular security culture often dwells on clever hacks and clean fixes. But for the difficult real-world problems, often there is no right answer or simple fix, only the hard work of building defenses thicker and thicker.

The Migration of Legacy Applications to Workers

Post Syndicated from Kirk Schwenkler original https://blog.cloudflare.com/the-migration-of-legacy-applications-to-workers/

The Migration of Legacy Applications to Workers

The Migration of Legacy Applications to Workers

As Cloudflare Workers, and other Serverless platforms, continue to drive down costs while making it easier for developers to stand up globally scaled applications, the migration of legacy applications is becoming increasingly common. In this post, I want to show how easy it is to migrate such an application onto Workers. To demonstrate, I’m going to use a common migration scenario: moving a legacy application — on an old compute platform behind a VPN or in a private cloud — to a serverless compute platform behind zero-trust security.

Wait but why?

Before we dive further into the technical work, however, let me just address up front: why would someone want to do this? What benefits would they get from such a migration? In my experience, there are two sets of reasons: (1) factors that are “pushing” off legacy platforms, or the constraints and problems of the legacy approach; and (2) factors that are “pulling” onto serverless platforms like Workers, which speaks to the many benefits of this new approach. In terms of the push factors, we often see three core ones:

  • Legacy compute resources are not flexible and must be constantly maintained, often leading to capacity constraints or excess cost;
  • Maintaining VPN credentials is cumbersome, and can introduce security risks if not done properly;
  • VPN client software can be challenging for non-technical users to operate.

Similarly, there are some very key benefits “pulling” folks onto Serverless applications and zero-trust security:

  • Instant scaling, up or down, depending on usage. No capacity constraints, and no excess cost;
  • No separate credentials to maintain, users can use Single Sign On (SSO) across many applications;
  • VPN hardware / private cloud; and existing compute, can be retired to simplify operations and reduce cost

While the benefits to this more modern end-state are clear, there’s one other thing that causes organizations to pause: the costs in time and migration effort seem daunting. Often what organizations find is that migration is not as difficult as they fear. In the rest of this post, I will show you how Cloudflare Workers, and the rest of the Cloudflare platform, can greatly simplify migrations and help you modernize all of your applications.

Getting Started

To take you through this, we will use a contrived application I’ve written in Node.js to illustrate the steps we would take with a real, and far more complex, example. The goal is to show the different tools and features you can use at each step; and how our platform design supports development and cutover of an application.  We’ll use four key Cloudflare technologies, as we see how to move this Application off of my Laptop and into the Cloud:

  1. Serverless Compute through Workers
  2. Robust Developer-focused Tooling for Workers via Wrangler
  3. Zero-Trust security through Access
  4. Instant, Secure Origin Tunnels through Argo Tunnels

Our example application for today is called Post Process, and it performs business logic on input provided in an HTTP POST body. It takes the input data from authenticated clients, performs a processing task, and responds with the result in the body of an HTTP response. The server runs in Node.js on my laptop.

Since the example application is written in Node.js; we will be able to directly copy some of the JavaScript assets for our new application. You could follow this “direct port” method not only for JavaScript applications, but even applications in our other WASM-supported languages. For other languages, you first need to rewrite or transpile into one with WASM support.

Into our ApplicationOur basic example will perform only simple text processing, so that we can focus on the broad features of the migration. I’ve set up an unauthenticated copy (using Workers, to give us a scalable and reliable place to host it) at https://postprocess-workers.kirk.workers.dev/postprocess where you can see how it operates. Here is an example cURL:

curl -X POST https://postprocess-workers.kirk.workers.dev/postprocess --data '{"operation":"2","data":"Data-Gram!"}'

The relevant takeaways from the code itself are pretty simple:

  • There are two code modules, which conveniently split the application logic completely from the Preprocessing / HTTP interface.
  • The application logic module exposes one function postProcess(object) where object is the parsed JSON of the POST body. It returns a JavaScript object, ready to be encoded into a string in the JSON HTTP response. This module can be run on Workers JavaScript, with no changes. It only needs a new preprocessing / HTTP interface.
  • The Preprocessing / HTTP interface runs on raw Node.js; and exposes a local HTTPS server. The server does not directly take inbound traffic from the Internet, but sits behind a gateway which controls access to the service.

Code snippet from Node.js HTTP module

const server = http.createServer((req, res) => {
    if (req.url == '/postprocess') {
        if(req.method == 'POST') {
                gatherPost(req, data => {
                        try{
                                jsonData = JSON.parse(data)
                        } catch (e) {
                                res.end('Invalid JSON payload! \n')
                                return
                        }
                        result = postProcess(jsonData)
                        res.write(JSON.stringify(result) + '\n');
                        res.end();
                })
        } else {
                res.end('Invalid Method, only POST is supported! \nPlease send a POST with data in format {"Operation":1","data","Data-Gram!"        }
    } else {
        res.end('Invalid request. Did you mean to POST to /postprocess? \n');
    }
});

Code snippet from Node.js logic module

function postProcess (postJson) {
        const ServerVersion = "2.5.17"
        if(postJson != null && 'operation' in postJson && 'data' in postJson){
                var output
                var operation = postJson['operation']
                var data = postJson['data']
                switch(operation){
                        case "1":
                              output = String(data).toLowerCase()
                              break
                        case "2":
                              d = data + "\n"
                              output = d + d + d
                              break
                        case "3":
                              output = ServerVersion
                              break
                        default:
                              output = "Invalid Operation"
                }
                return {'Output': output}
        }
        else{
                return {'Error':'Invalid request, invalid JSON format'}
        }

Current State Application Architecture

The Migration of Legacy Applications to Workers

Design Decisions

With all this information in hand, we can arrive at at the details of our new Cloudflare-based design:

  1. Keep the business logic completely intact, and specifically use the same .js asset
  2. Build a new preprocessing layer in Workers to replace the Node.js module
  3. Use Cloudflare Access to authenticate users to our application

Target State Application Architecture

The Migration of Legacy Applications to Workers

Finding the first win

One good way to make a migration successful is to find a quick win early on; a useful task which can be executed while other work is still ongoing. It is even better if the quick win also benefits the eventual cutover. We can find a quick win here, if we solve the zero-trust security problem ahead of the compute problem by putting Cloudflare’s security in front of the existing application.

We will do this by using cloudflare’s Argo Tunnel feature to securely connect to the existing application, and Access for zero-trust authentication. Below, you can see how easy this process is for any command-line user, with our cloudflared tool.

I open up a terminal and use cloudflared tunnel login, which presents me with an authentication flow. I then use the cloudflared tunnel --hostname postprocess.kschwenkler.com --url localhost:8080 command to connect an Argo Tunnel between the “url” (my local server) and the “hostname” (the new, public address we will use on my Cloudflare zone).

The Migration of Legacy Applications to Workers

Next I flip over to my Cloudflare dashboard, and attach an Access Policy to the “hostname” I specified before. We will be using the Service Token mode of Access; which generates a client-specific security token which that client can attach to each HTTP POST. Other modes are better suited to interactive browser use cases.

The Migration of Legacy Applications to Workers

Now, without using the VPN, I can send a POST to the service, still running on Node.js on my laptop, from any Internet-connected device which has the correct token! It has taken only a few minutes to add zero-trust security to this application; and safely expose it to the Internet while still running on a legacy compute platform (my laptop!).

The Migration of Legacy Applications to Workers

“Quick Win” Architecture

The Migration of Legacy Applications to Workers

Beyond the direct benefit of a huge security upgrade; we’ve also made our eventual application migration much easier, by putting the traffic through the target-state API gateway already. We will see later how we can surgically move traffic to the new application for testing, in this state.

Lift to the Cloud

With our zero-trust security benefits in hand, and our traffic running through Cloudflare; we can now proceed with the migration of the application itself to Workers. We’ll be using the Wrangler tooling to make this process very easy.

As noted when we first looked at the code, this contrived application exposes a very clean interface between the Node.js-specific HTTP module, which we need to replace, and the business logic postprocess module which we can use as is with Workers. We’ll first need to re-write the HTTP module, and then bundle it with the existing business logic into a new Workers application.

Here is a handwritten example of the basic pattern we’ll use for the HTTP module. We can see how the Service Workers API makes it very easy to grab the POST body with await, and how the JSON interface lets us easily pass the data to the postprocess module we took directly from the initial Node.js app.

addEventListener('fetch', event => {
 event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
 try{
   requestData = await request.json()
 } catch (e) {
   return new Response("Invalid JSON", {status:500})
 }
 const response = new Response(JSON.stringify(postProcess (requestData)))
 return response
}

For our work on the mock application, we will go a slightly different route; more in line with a real application which would be more complex. Instead of writing this by hand, we will use Wrangler and our Router template, to build the new front end from a robust framework.

We’ll run wrangler generate post-process-workers https://github.com/cloudflare/worker-template-router to initialize a new Wrangler project with the Router template. Most of the configurations for this template will work as is; and we just have to update account_id in our wrangler.toml and make a few small edits to the code in index.js.

Below is our index.js after my edits, Note the line const postProcess = require('./postProcess.js') at the start of the new http module – this will tell Wrangler to include the original business logic, from the legacy app’s postProcess.js module which I will copy to our working directory.

const Router = require('./router')
const postProcess = require('./postProcess.js')

addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request))
})

async function handler(request) {
    const init = {
        headers: { 'content-type': 'application/json' },
    }
    const body = JSON.stringify(postProcess(await request.json()))
    return new Response(body, init)
}

async function handleRequest(request) {
    const r = new Router()
    r.post('.*/postprocess*', request => handler(request))
    r.get('/', () => new Response('Hello worker!')) // return a default message for the root route

    const resp = await r.route(request)
    return resp
}

Now we can simply run wrangler publish, to put our application on workers.dev for testing! The Router template’s defaults; and the small edits made above, are all we need. Since Wrangler automatically exposes the test application to the Internet (note that we can *also* put the test application behind Access, with a slightly modified method), we can easily send test traffic from any device.

The Migration of Legacy Applications to Workers

Shift, Safely!

With our application up for testing on workers.dev, we finally come to the last and most daunting migration step: cutting over traffic from the legacy application to the new one without any service interruption.

Luckily, we had our quick win earlier and are already routing our production traffic through the Cloudflare network (to the legacy application via Argo Tunnels). This provides huge benefits now that we are at the cutover step. Without changing our IP address, SSL configuration, or any other client-facing properties, we can route traffic to the new application with just one wrangler command.

Seamless cutover from Transition to Target state

The Migration of Legacy Applications to Workers

We simply modify wrangler.toml to indicate the production domain / route we’d like the application to operate on; and wrangler publish. As soon as Cloudflare receives this update; it will send production traffic to our new application instead of the Argo Tunnel. We have configured the application to send a ‘version’ header which lets us verify this easily using curl.

The Migration of Legacy Applications to Workers
The Migration of Legacy Applications to Workers

Rollback, if it is needed, is also very easy. We can either set the wrangler.toml back to the workers.dev only mode, and wrangler publish again; or delete our route manually. Either will send traffic back to the Argo Tunnel.

The Migration of Legacy Applications to Workers
The Migration of Legacy Applications to Workers
The Migration of Legacy Applications to Workers

In Conclusion

Clearly, a real application will be more complex than our example above. It may have multiple components, with complex interactions, which must each be handled in turn. Argo Tunnel might remain in use, to connect to a data store or other application outside of our network. We might use WASM to support modules written in other languages. In any of these scenarios, Cloudflare’s Wrangler tooling and Serverless capabilities will help us work through the complexities and achieve success.

I hope that this simple example has helped you to see how Wrangler, cloudflared, Workers, and our entire global network can work together to make migrations as quick and hassle-free as possible. Whether for this case of an old application behind a VPN, or another application that has outgrown its current home – our Workers platform, Wrangler tooling, and underlying platform will scale to meet your business needs.

Trailblazing a Development Environment for Workers

Post Syndicated from Avery Harnish original https://blog.cloudflare.com/trailblazing-a-development-environment-for-workers/

Trailblazing a Development Environment for Workers

Trailblazing a Development Environment for Workers

When I arrived at Cloudflare for an internship in the summer of 2018, I was taken on a tour, introduced to my mentor who took me out for coffee (shoutout to Preston), and given a quick whiteboard overview of how Cloudflare works. Each of the interns would work on a small project of their own and they’d try to finish them by the end of the summer. The description of the project I was given on my very first day read something along the lines of “implementing signed exchanges in a Cloudflare Worker to fix the AMP URL attribution problem,” which was a lot to take in at once. I asked so many questions those first couple of weeks. What are signed exchanges? Can I put these stickers on my laptop? What’s a Cloudflare Worker? Is there a limit to how much Topo Chico I can take from the fridge? What’s the AMP URL attribution problem? Where’s the bathroom?

I got the answers to all of those questions (and more!) and eventually landed a full-time job at Cloudflare. Here’s the story of my internship and working on the Workers Developer Experience team at Cloudflare.

Getting Started with Workers in 2018

After doing a lot of reading, and asking a lot more questions, it was time to start coding. I set up a Cloudflare account with a Workers subscription, and was greeted with a page that looked something like this:

Trailblazing a Development Environment for Workers

I was able to change the code in the text area on the left, click “Update”, and the changes would be reflected on the right — fairly self-explanatory. There was also a testing tab which allowed me to handcraft HTTP requests with different methods and custom headers. So far so good.

As my project evolved, it became clear that I needed to leave the Workers editor behind. Anything more than a one-off script tends to require JavaScript modules and multiple files. I spent some time setting up a local development environment for myself with npm and webpack (see, purgatory: a place or state of temporary suffering. merriam-webster.com).

After I finally got everything working, my iteration cycle looked a bit like this:

  1. Make a change to my code
  2. Run npm run build (which ran webpack and bundled my code in a single script)
  3. Open ./dist/worker.min.js (the output from my build step)
  4. Copy the entire contents of the built Worker to my clipboard
  5. Switch to the Cloudflare Workers Dashboard
  6. Paste my script into the Workers editor
  7. Click update
  8. Investigate the behavior of my recently modified script
  9. Rinse and repeat

There were two main things here that were decidedly not a fantastic developer experience:

  1. Inspecting the value of a variable by adding a console.log statement would take me ~2-3 minutes and involved lots of manual steps to perform a full rebuild.
  2. I was unable to use familiar HTTP clients such as cURL and Postman without deploying to production. This was because the Workers Preview UI was an iframe nested in the dashboard.

Luckily for me, Cloudflare Workers deploy globally incredibly quickly, so I could push the latest iteration of my Worker, wait just a few seconds for it to go live, and cURL away.

A Better Workers Developer Experience in 2019

Shortly after we shipped AMP Real URL, Cloudflare released Wrangler, the official CLI tool for developing Workers, and I was hired full time to work on it. Wrangler came with a feature that automated steps 2-7 of my workflow by running the command wrangler preview, which was a significant improvement. Running the command would build my Worker and open the browser automatically for me so I could see log messages and test out HTTP requests. That summer, our intern Matt Alonso created wrangler preview --watch. This command automatically updates the Workers preview window when changes are made to your code. You can read more about that here. This was, yet again, another improvement over my old friend Build and Open and Copy and Switch Windows and Paste Forever and Ever, Amen. But there was still no way that I could test my Worker with any HTTP client I wanted without deploying to production — I was still locked in to using the nested iframe.

A few months ago we decided it was time to do something about it. To the whiteboard!

Enter wrangler dev

Most web developers are familiar with developing their applications on localhost, and since Wrangler is written in Rust, it means we could start up a server on localhost that would handle requests to a Worker. The idea was to somehow start a server on localhost and then transform incoming requests and send them off to a preview session running on a Cloudflare server.

Proof of Concept

What we came up with ended up looking a little something like this — when a developer runs wrangler dev, do the following:

Trailblazing a Development Environment for Workers

  1. Build the Worker
  2. Upload the Worker via the Cloudflare API as a previewable Worker
  3. The Cloudflare API takes the uploaded script and creates a preview session, and returns an access token
  4. Start listening for incoming HTTP requests at localhost:8787

Top secret fact: 8787 spells out Rust on a phone numpad Happy Easter!

  1. All incoming requests to localhost:8787 are modified:

  • All headers are prepended with cf-ew-raw- (for instance, X-Auth-Header would become cf-ew-raw-X-Auth-Header)
  • The URL is changed to https://rawhttp.cloudflareworkers.com/${path}
  • The Host header is changed to rawhttp.cloudflareworkers.com
  • The cf-ew-preview header is added with the access token returned from the API in step 3

  1. After sending this request, the response is modified

  • All headers not prefixed with cf-ew-raw- are discarded and headers with the prefix have it removed (for instance, cf-ew-raw-X-Auth-Success would become X-Auth-Success)

The hard part here was already done — the Workers Core team had already implemented the API to support the Preview UI. We just needed to gently nudge Wrangler and the API to be the best of friends. After some investigation into Rust’s HTTP ecosystem, we settled on using the HTTP library hyper, which I highly recommend if you’re in need of a low level HTTP library — it’s fast, correct, and the ergonomics are constantly improving. After a bit of work, we got a prototype working and carved Wrangler ❤️ Cloudflare API into the old oak tree down by Lady Bird Lake.

Usage

Let’s say I have a Workers script that looks like this:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  let message = "Hello, World!"
  return new Response(message)
}

If I created a Wrangler project with this code and ran wrangler dev, this is what it looked like:

$ wrangler dev
👂  Listening on http://127.0.0.1:8787

In another terminal session, I could run the following:

$ curl localhost:8787
Hello, World!

It worked! Hooray!

Just the Right Amount of Scope Creep

At this point, our initial goal was complete: any HTTP client could test out a Worker before it was deployed. However, wrangler dev was still missing crucial functionality. When running wrangler preview, it’s possible to view console.log output in the browser editor. This is incredibly useful for debugging Workers applications, and something with a name like wrangler dev should include a way to view those logs as well. “This will be easy,” I said, not yet knowing what I was signing up for. Buckle up!

console.log, V8, and the Chrome Devtools Protocol, Oh My!

My first goal was to get a Hello, World! message streamed to my terminal session so that developers can debug their applications using wrangler dev. Let’s take the script from earlier and add a console.log statement to it:

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  let message = "Hello, World!"
  console.log(message) // this line is new
  return new Response(message)
}

If you’d like to follow along, you can paste that script into the editor at cloudflareworkers.com using Google Chrome.

This is what the Preview editor looks like when that script is run:

Trailblazing a Development Environment for Workers

You can see that Hello, World! has been printed to the console. This may not be the most useful example, but in more complex applications logging different variables is helpful for debugging. If you’re following along, try changing console.log(message) to something more interesting, like console.log(request.url).

The console may look familiar to you if you’re a web developer because it’s the same interface you see when you open the Developer Tools in Google Chrome. Since Cloudflare Workers is built on top of V8 (more info about that here and here), the Workers runtime is able to create a WebSocket that speaks the Chrome Devtools Protocol. This protocol allows the client (your browser, Wrangler, or anything else that supports WebSockets) to send and receive messages that contain information about the script that is running.

In order to see the messages that are being sent back and forth between our browser and the Workers runtime:

  1. Open Chrome Devtools
  2. Click the Network tab at the top of the inspector
  3. Click the filter icon underneath the Network tab (it looks like a funnel and is nested between the cancel icon and the search icon)
  4. Click WS to filter out all requests but WebSocket connections

Your inspector should look like this:

Trailblazing a Development Environment for Workers

Then, reload the page, and select the /inspect item to view its messages. It should look like this:

Trailblazing a Development Environment for Workers

Hey look at that! We can see messages that our browser sent to the Workers runtime to enable different portions of the developer tools for this Worker, and we can see that the runtime sent back our Hello, World! Pretty cool!

On the Wrangler side of things, all we had to do to get started was initialize a WebSocket connection for the current Worker, and send a message with the method Runtime.enable so the Workers runtime would enable the Runtime domain and start sending console.log messages from our script.

After those initial steps, it quickly became clear that a lot more work was needed to get to a useful developer tool. There’s a lot that goes into the Chrome Devtools Inspector and most of the libraries for interacting with it are written in languages other than Rust (which we use for Wrangler). We spent a lot of time switching WebSocket libraries due to incompatibilities across operating systems (turns out TLS is hard) and implementing the part of the Chrome Devtools Protocol in Rust that we needed to. There’s a lot of work that still needs to be done in order to make wrangler dev a top notch developer tool, but we wanted to get it into the hands of developers as quickly as possible.

Try it Out!

wrangler dev is currently in alpha, and we’d love it if you could try it out! You should first check out the Quick Start and then move on to wrangler dev. If you run into issues or have any feedback, please let us know!

Signing Off

I’ve come a long way from where I started in 2018 and so has the Workers ecosystem. It’s been awesome helping to improve the developer experience of Workers for future interns, internal Cloudflare teams, and of course our customers. I can’t wait to see what we do next. I have some ideas for what’s next with Wrangler, so stay posted!

P.S. Wrangler is also open source, and we are more than happy to field bug reports, feedback, and community PRs. Check out our Contribution Guide if you want to help out!

Migrating to React land: Gatsby

Post Syndicated from Victoria Bernard original https://blog.cloudflare.com/migrating-to-react-land-gatsby/

Migrating to React land: Gatsby

Migrating to React land: Gatsby

I am an engineer that loves docs. Well, OK, I don’t love all docs but I believe docs are a crucial, yet often neglected element to a great developer experience. I work on the developer experience team for Cloudflare Workers focusing on several components of Workers, particularly on the docs that we recently migrated to Gatsby.


Through porting our documentation site to Gatsby I learned a lot. In this post, I share some of the learnings that could’ve saved my former self from several headaches. This will hopefully help others considering a move to Gatsby or another static site generator.

Why Gatsby?

Prior to our migration to Gatsby, we used Hugo for our developer documentation. There are a lot of positives about working with Hugo – fast build times, fast load times – that made building a simple static site a great use case for Hugo. Things started to turn sour when we started making our docs more interactive and expanding the content being generated.

Going from writing JSX with TypeScript back to string-based templating languages is difficult. Trying to perform complicated tasks, like generating a sidebar, cost me – a developer who knows nothing about liquid code or Go templating (though with Golang experience) – several tears not even to implement but to just understand what was happening.

Here is the code to template an item in the sidebar in Hugo:

<!-- templates -->
{{ define "section-tree-nav" }}
{{ $currentNode := .currentnode }}
{{ with .sect }}
 {{ if not .Params.Hidden }}
  {{ if .IsSection }}
    {{safeHTML .Params.head}}
    <li data-nav-id="{{.URL}}" class="dd-item
        {{ if .IsAncestor $currentNode }}parent{{ end }}
        {{ if eq .UniqueID $currentNode.UniqueID}}active{{ end }}
        {{ if .Params.alwaysopen}}parent{{ end }}
        {{ if .Params.alwaysopen}}always-open{{ end }}
        ">
      <a href="{{ .RelPermalink}}">
        <span>{{safeHTML .Params.Pre}}{{.Title}}{{safeHTML .Params.Post}}</span>
 
        {{ if .Params.new }}
          <span class="new-badge">NEW</span>
        {{ end }}
 
        {{ $numberOfPages := (add (len .Pages) (len .Sections)) }}
        {{ if ne $numberOfPages 0 }}
 
          {{ if or (.IsAncestor $currentNode) (.Params.alwaysopen)  }}
            <i class="triangle-up"></i>
          {{ else }}
            <i class="triangle-down"></i>
          {{ end }}
 
        {{ end }}
      </a>
      {{ if ne $numberOfPages 0 }}
        <ul>
          {{ .Scratch.Set "pages" .Pages }}
          {{ if .Sections}}
          {{ .Scratch.Set "pages" (.Pages | union .Sections) }}
          {{ end }}
          {{ $pages := (.Scratch.Get "pages") }}
 
        {{ if eq .Site.Params.ordersectionsby "title" }}
          {{ range $pages.ByTitle }}
            {{ if and .Params.hidden (not $.showhidden) }}
            {{ else }}
            {{ template "section-tree-nav" dict "sect" . "currentnode" $currentNode }}
            {{ end }}
          {{ end }}
        {{ else }}
          {{ range $pages.ByWeight }}
            {{ if and .Params.hidden (not $.showhidden) }}
            {{ else }}
            {{ template "section-tree-nav" dict "sect" . "currentnode" $currentNode }}
            {{ end }}
          {{ end }}
        {{ end }}
        </ul>
      {{ end }}
    </li>
  {{ else }}
    {{ if not .Params.Hidden }}
      <li data-nav-id="{{.URL}}" class="dd-item
     {{ if eq .UniqueID $currentNode.UniqueID}}active{{ end }}
      ">
        <a href="{{.RelPermalink}}">
        <span>{{safeHTML .Params.Pre}}{{.Title}}{{safeHTML .Params.Post}}</span>
        {{ if .Params.new }}
          <span class="new-badge">NEW</span>
        {{ end }}
 
        </a></li>
     {{ end }}
  {{ end }}
 {{ end }}
{{ end }}
{{ end }}

Whoa. I may be exceptionally oblivious, but I had to squint at the snippet above for an hour before I realized this was the code for a sidebar item (the li element was the eventual giveaway, but took some parsing to discover where the logic actually started).

(Disclaimer: I am in no way a pro at Hugo and in any situation there are always several ways to code a solution; thus I am in no way claiming this was the only way to write the template nor am I chastising the author of the code. I am just displaying the differences in pieces of code I came across)

Now, here is what the TSX (I will get into the JS later in the article) for the Gatsby project using the exact same styling would look like:

 <li data-nav-id={pathToServe} className={'dd-item ' + ddClass}>
   <Link className="" to={pathToServe} title="Docs Home" activeClassName="active">
     {title || 'No title'}
     {numberOfPages ? <Triangle isAncestor={isAncestor} alwaysopen={showChildren} /> : ''}
     {showNew ? <span className="new-badge">NEW</span> : ''}
   </Link>
   {showChildren ? (
     <ul>
       {' '}
       {myChildren.map((child: mdx) => {
         return (
           <SidebarLi
             frontmatter={child.frontmatter}
             fields={child.fields}
             depth={++depth}
             key={child.frontmatter.title}
           />
         )
       })}
     </ul>
   ) : (
     ''
   )}
 </li>

This code is clean and compact because Gatsby is a static content generation tool based on React. It’s loved for a myriad of reasons, but my honest main reason to migrate to it was to make the Hugo code above much less ugly.

For our purposes, less ugly was important because we had dreams of redesigning our docs to be interactive with support for multiple coding languages and other features.

For example, the template gallery would be a place to go to for how-to recipes and examples. The templates themselves would live in a template registry service and turn into static pages via an API.

We wanted the docs to not be constrained by Go templating. The Hugo docs admit their templates aren’t the best for complicated logic:

Go Templates provide an extremely simple template language that adheres to the belief that only the most basic of logic belongs in the template or view layer.

Gatsby and React enable the more complex logic we were looking for. After our team built workers.cloudflare.com and Built with Workers on Gatsby, I figured this was my shot to really give Gatsby a try on our Workers developer docs.

Decision to Migrate over Starting from Scratch

I’m normally not a fan of fixing things that aren’t broken. Though I didn’t like working with Hugo, did love working in React, and had all the reasons to. I was timid about being the one in charge of switching from Hugo. I was scared. I hated looking at the liquid code of Go templates. I didn’t want to have to port all the existing templates to React without truly understanding what I might be missing.

There comes a point with tech debt though where you have to tackle the tech debt you are most scared of.

The easiest solution would be of course to throw the Hugo code away. Start from scratch. A clean slate. But this means taking something that was not broken and breaking it. The styling, SEO, tagging, and analytics of the site took small iterations over the course of a few years to get right and I didn’t want to be the one to break them. Instead of throwing all the styling and logic tied in for search, SEO, etc…, our plan was to maintain as much of the current design and logic as possible while converting it to React piece-by-piece, component-by-component.

Also there were existing developer docs still using Hugo on Cloudflare by other teams (e.g. Access, Argo Tunnel, etc…). I wanted a team at Cloudflare to be able to import their existing markdown files with frontmatter into the Gatsby repo and preserve the existing design.

I wanted to migrate instead of teleport to Gatsby.

How-to: Hugo to Gatsby

In this blog post, I go through some but not all of the steps of how I ported to Gatsby from Hugo for our complex doc site. The few examples here help to convey the issues that caused the most pain.

Let’s start with getting the markdown files to turn into HTML pages.

Markdown

One goal was to keep all the existing markdown and frontmatter we had set up in Hugo as similar as possible. The reasoning for this was to not break existing content and also maintain the version history of each doc.

Gatsby is built on top of GraphQL. All the data and most all content for Gatsby is put into GraphQL during startup likely via a plugin, then Gatsby will query for this data upon actual page creation. This is quite different from Hugo’s much more abstract model of putting all your content in a folder named content and then Hugo figures out which template to apply based on the logic in the template.

MDX is a sophisticated tool that parses markdown into Gatsby so it can later be represented as HTML (it actually can do much more than that but, I won’t get into it here). I started with Gatsby’s MDX plugin to create nodes from my markdown files. Here is the code to set up the plugin to get all the markdown files (files ending in .md and .mdx) I had in the src/content folder into GraphQL:

gatsby-config.js

const path = require('path')
 
module.exports = {
 plugins: [
   {
     resolve: `gatsby-source-filesystem`,
     options: {
       name: `mdx-pages`,
       path: `${__dirname}/src/content`,
       ignore: [`**/CONTRIBUTING*`, '/styles/**'],
     },
   },
   {
     resolve: `gatsby-plugin-mdx`,
     options: {
       extensions: [`.mdx`, `.md`],
     },
   }, 
]}

Now that Gatsby knows about these files as nodes, we can create pages for them. In gatsby-node.js, I tell Gatsby to grab these MDX pages and use a template markdownTemplate.tsx to create pages for them:

const path = require(`path`)
const { createFilePath } = require(`gatsby-source-filesystem`)
exports.createPages = async ({ actions, GraphQL, reporter }) => {
 const { createPage } = actions
 
 const markdownTemplate = path.resolve(`src/templates/markdownTemplate.tsx`)
 
 result = await GraphQL(`
   {
     allMdx(limit: 1000) {
       edges {
         node {
           fields {
             pathToServe
           }
           frontmatter {
             alwaysopen
             weight
           }
           fileAbsolutePath
         }
       }
     }
   }
 `)
 // Handle errors
 if (result.errors) {
   reporter.panicOnBuild(`Error while running GraphQL query.`)
   return
 }
 result.data.allMdx.edges.forEach(({ node }) => {
   return createPage({
     path: node.fields.pathToServe,
     component: markdownTemplate,
     context: {
       parent: node.fields.parent,
       weight: node.frontmatter.weight,
     }, // additional data can be passed via context, can use as variable on query
   })
 })
}
exports.onCreateNode = ({ node, getNode, actions }) => {
 const { createNodeField } = actions
 // Ensures we are processing only markdown files
 if (node.internal.type === 'Mdx') {
   // Use `createFilePath` to turn markdown files in our `content` directory into `/workers/`pathToServe
   const originalPath = node.fileAbsolutePath.replace(
     node.fileAbsolutePath.match(/.*content/)[0],
     ''
   )
   let pathToServe = createFilePath({
     node,
     getNode,
     basePath: 'content/',
   })
   let parentDir = path.dirname(pathToServe)
   if (pathToServe.includes('index')) {
     pathToServe = parentDir
     parentDir = path.dirname(parentDir) // "/" dirname will = "/"
   }
   pathToServe = pathToServe.replace(/\/+$/, '/') // always end the path with a slash
   // Creates new query'able field with name of 'pathToServe', 'parent'..
   // for allMdx edge nodes
   createNodeField({
     node,
     name: 'pathToServe',
     value: `/workers${pathToServe}`,
   })
   createNodeField({
     node,
     name: 'parent',
     value: parentDir,
   })
   createNodeField({
     node,
     name: 'filePath',
     value: originalPath,
   })
 }
}

Now every time Gatsby runs, it starts running through each node on onCreateNode. If the node is MDX, it passes the node’s content (the markdown, fileAbsolutePath, etc.) and all the node fields (filePath, parent and pathToServe) to the markdownTemplate.tsx component so that the component can render the appropriate information for that markdown file.

The barebone component for a page that renders a React component from the MDX node looks like this:

markdownTemplate.tsx

import React from "react"
import { graphql } from "gatsby"
import { MDXRenderer } from "gatsby-plugin-mdx"
 
export default function PageTemplate({ data: { mdx } }) {
 return (
   <div>
     <h1>{mdx.frontmatter.title}</h1>
     <MDXRenderer>{mdx.body}</MDXRenderer>
   </div>
 )
}
 
export const pageQuery = graphql`
 query BlogPostQuery($id: String) {
   mdx(id: { eq: $id }) {
     id
     body
     frontmatter {
       title
     }
   }
 }
`

A Complex Component: Sidebar

Now let’s get into where I wasted the most time, but learned hard lessons upfront: turning the Hugo template into a React component. At the beginning of this article, I showed that scary sidebar.

To set up the li element we had the Hugo logic looks like:

{{ define "section-tree-nav" }}
{{ $currentNode := .currentnode }}
{{ with .sect }}
 {{ if not .Params.Hidden }}
  {{ if .IsSection }}
    {{safeHTML .Params.head}}
    <li data-nav-id="{{.URL}}" class="dd-item
        {{ if .IsAncestor $currentNode }}parent{{ end }}
        {{ if eq .UniqueID $currentNode.UniqueID}}active{{ end }}
        {{ if .Params.alwaysopen}}parent{{ end }}
        {{ if .Params.alwaysopen}}always-open{{ end }}
        ">

I see that the code is defining some section-tree-nav component-like thing and taking in some currentNode. To be honest, I still don’t know exactly what the variables .sect, IsSection, Params.head, Params.Hidden mean. Although I can take a wild guess, they’re not that important for understanding what the logic is doing. The logic is setting the classes on the li element which is all I really care about: parent, always-open and active.

When focusing on those three classes, we can port them to React in a much more readable way by defining a variable string ddClass:

 let ddClass = ''
 let isAncestor = numberOfPages > 0
 if (isAncestor) {
   ddClass += ' parent'
 }
 if (frontmatter.alwaysopen) {
   ddClass += ' parent alwaysOpen'
 }
 return (
   <Location>
     {({ location }) => {
       const currentPathActive = location.pathname === pathToServe
       if (currentPathActive) {
         ddClass += ' active'
       }
       return (
         <li data-nav-id={pathToServe} className={'dd-item ' + ddClass}>

There are actually a few nice things about the Hugo code, I admit. Using the Location component in React was probably less intuitive than Hugo’s ability to access currentNode to get the active page. Also isAncestor is predefined in Hugo as Whether the current page is an ancestor of the given page. For me though, having to track down the definitions of the predefined variables was frustrating and I appreciate the local explicitness of the definition, but I admit I’m a bit jaded.

Children

The most complex part of the sidebar is getting the children. Now this is a story that really gets me starting to appreciate GraphQL.

Here’s getting the children for the sidebar in Hugo:

    {{ $numberOfPages := (add (len .Pages) (len .Sections)) }}
        {{ if ne $numberOfPages 0 }}
 
          {{ if or (.IsAncestor $currentNode) (.Params.alwaysopen)  }}
            <i class="triangle-up"></i>
          {{ else }}
            <i class="triangle-down"></i>
          {{ end }}
 
        {{ end }}
      </a>
      {{ if ne $numberOfPages 0 }}
        <ul>
          {{ .Scratch.Set "pages" .Pages }}
          {{ if .Sections}}
          {{ .Scratch.Set "pages" (.Pages | union .Sections) }}
          {{ end }}
          {{ $pages := (.Scratch.Get "pages") }}
 
        {{ if eq .Site.Params.ordersectionsby "title" }}
          {{ range $pages.ByTitle }}
            {{ if and .Params.hidden (not $.showhidden) }}
            {{ else }}
            {{ template "section-tree-nav" dict "sect" . "currentnode" $currentNode }}
            {{ end }}
          {{ end }}
        {{ else }}
          {{ range $pages.ByWeight }}
            {{ if and .Params.hidden (not $.showhidden) }}
            {{ else }}
            {{ template "section-tree-nav" dict "sect" . "currentnode" $currentNode }}
            {{ end }}
          {{ end }}
        {{ end }}
        </ul>
      {{ end }}
    </li>
  {{ else }}
    {{ if not .Params.Hidden }}
      <li data-nav-id="{{.URL}}" class="dd-item
     {{ if eq .UniqueID $currentNode.UniqueID}}active{{ end }}
      ">
        <a href="{{.RelPermalink}}">
        <span>{{safeHTML .Params.Pre}}{{.Title}}{{safeHTML .Params.Post}}</span>
        {{ if .Params.new }}
          <span class="new-badge">NEW</span>
        {{ end }}
 
        </a></li>
     {{ end }}
  {{ end }}
 {{ end }}
{{ end }}
{{ end }}

This is just the first layer of children. No grandbabies, sorry. And I won’t even get into all that is going on there exactly. When I started porting this over, I realized a lot of that logic was not even being used.

In React, we grab all the markdown pages and see which have parents that match the current page:

 const topLevelMarkdown: markdownRemarkEdge[] = useStaticQuery(
   GraphQL`
     {
       allMdx(limit: 1000) {
         edges {
           node {
             frontmatter {
               title
               alwaysopen
               hidden
               showNew
               weight
             }
             fileAbsolutePath
             fields {
               pathToServe
               parent
               filePath
             }
           }
         }
       }
     }
   `
 ).allMdx.edges
 const myChildren: mdx[] = topLevelMarkdown
   .filter(
     edge =>
       fields.pathToServe === '/workers' + edge.node.fields.parent &&
       fields.pathToServe !== edge.node.fields.pathToServe
   )
   .map(child => child.node)
   .filter(child => !child.frontmatter.hidden)
   .sort(sortByWeight)
 const numberOfPages = myChildren.length

And then we render the children, so the full JSX becomes:

<li data-nav-id={pathToServe} className={'dd-item ' + ddClass}>
   <Link
     to={pathToServe}
     title="Docs Home"
     activeClassName="active"
   >
     {title || 'No title'}
     {numberOfPages ? (
       <Triangle isAncestor={isAncestor} alwaysopen={showChildren} />
     ) : (
       ''
     )}
     {showNew ? <span className="new-badge">NEW</span> : ''}
   </Link>
   {showChildren ? (
     <ul>
       {' '}
       {myChildren.map((child: mdx) => {
         return (
           <SidebarLi
             frontmatter={child.frontmatter}
             fields={child.fields}
             depth={++depth}
             key={child.frontmatter.title}
           />
         )
       })}
     </ul>
   ) : (
     ''
   )}
 </li>

Ok now that we have a component, and we have Gatsby creating the pages off the markdown, I can go back to my PageTemplate component and render the sidebar:

import Sidebar from './Sidebar'
export default function PageTemplate({ data: { mdx } }) {
 return (
   <div>
     <Sidebar />
     <h1>{mdx.frontmatter.title}</h1>
     <MDXRenderer>{mdx.body}</MDXRenderer>
   </div>
 )
}

I don’t have to pass any props to Sidebar because the GraphQL static query in Sidebar.tsx gets all the data about all the pages that I need. I don’t even maintain state because Location is used to determine which path is active. Gatsby generates pages using the above component for each page that’s a markdown MDX node.

Wrapping up

This was just the beginning of the full migration to Gatsby. I repeated the process above for turning templates, partials, and other HTML component-like parts in Hugo into React, which was actually pretty fun, though turning vanilla JS that once manipulated the DOM into React would probably be a nightmare if I wasn’t somewhat comfortable working in React.

Main lessons learned:

  • Being careful about breaking things and being scared to break things are two very different things. Being careful is good; being scared is bad. If I were to complete this migration again, I would’ve used the Hugo templates as a reference but not as a source of truth. Staging environments are what testing is for. Don’t sacrifice writing things the right way to comply with the old way.
  • When doing a migration like this on a static site, get just a few pages working before moving the content over to avoid intermediate PRs from breaking. It seems obvious but, with the large amounts of content we had, a lot of things broke when porting over content. Get everything polished with each type of page before moving all your content over.
  • When doing a migration like this, it’s OK to compromise some features of the old design until you determine whether to add them back in, just make sure to test this with real users first. For example, I made the mistake of assuming others wouldn’t mind being without anchor tags. (Note Hugo templates create anchor tags for headers automatically as in Gatsby you have to use MDX to customize markdown components). Test this on a single, popular page with real users first to see if it matters before giving it up.
  • Even for those with React background, the ramp up with GraphQL and setting up Gatsby isn’t as simple as it seems at first. But once you’re set up it’s pretty dang nice.

Overall the process of moving to Gatsby was well worth the effort. As we implement a redesign in React it’s much easier to apply the designs in this cleaner code base. Also though Hugo was already very performant with a nice SEO score, in Gatsby we are able to increase the performance and SEO thanks to the framework’s flexibility.

Lastly, working with the Gatsby team was awesome and they even give free T-shirts for your first PR!

An Update on CDNJS

Post Syndicated from Zack Bloom original https://blog.cloudflare.com/an-update-on-cdnjs/

An Update on CDNJS

When you loaded this blog, a file was delivered to your browser called jquery-3.2.1.min.js. jQuery is a library which makes it easier to build websites, and was at one point included on as many as 74.1% of all websites. A full eighteen million sites include jQuery and other libraries using one of the most popular tools on Earth: CDNJS. Beginning about a month ago Cloudflare began to take a more active role in the operation of CDNJS. This post is here to tell you more about CDNJS’ history and explain why we are helping to manage CDNJS.

What CDNJS Does

Virtually every site is composed of not just the code written by its developers, but also dozens or hundreds of libraries. These libraries make it possible for websites to extend what a web browser can do on its own. For example, libraries can allow a site to include powerful data visualizations, respond to user input, or even get more performant.

These libraries created wondrous and magical new capabilities for web browsers, but they can also cause the size of a site to explode. Particularly a decade ago, connections were not always fast enough to permit the use of many libraries while maintaining performance. But if so many websites are all including the same libraries, why was it necessary for each of them to load their own copy?

If we all load jQuery from the same place the browser can do a much better job of not actually needing to download it for every site. When the user visits the first jQuery-powered site it will have to be downloaded, but it will already be cached on the user’s computer for any subsequent jQuery-powered site they might visit.

An Update on CDNJS

The first visit might take time to load:

An Update on CDNJS

But any future visit to any website pointing to this common URL would already be cached:

An Update on CDNJS

<!-- Loaded only on my site, will need to be downloaded by every user -->
<script src="./jquery.js"></script>

<!-- Loaded from a common location across many sites -->
<script src="https://cdnjs.cloudflare.com/jquery.js"></script>

Beyond the performance advantage, including files this way also made it very easy for users to experiment and create. When using a web browser as a creation tool users often didn’t have elaborate build systems (this was also before npm), so being able to include a simple script tag was a boon. It’s worth noting that it’s not clear a massive performance advantage was ever actually provided by this scheme. It is becoming even less of a performance advantage now that browser vendors are beginning to use separate cache’s for each website you visit, but with millions of sites using CDNJS there’s no doubt it is a critical part of the web.

A CDN for all of us

My first Pull Request into the CDNJS project was in 2013. Back then if you created a JavaScript project it wasn’t possible to have it included in the jQuery CDN, or the ones provided by large companies like Google and Microsoft. They were only for big, important, projects. Of course, however, even the biggest project starts small. The community needed a CDN which would agree to host nearly all JavaScript projects, even the ones which weren’t world-changing (yet). In 2011, that project was launched by Ryan Kirkman and Thomas Davis as CDNJS.

The project was quickly wildly successful, far beyond their expectations. Their CDN bill quickly began to skyrocket (it would now be over a million dollars a year on AWS). Under the threat of having to shut down the service, Cloudflare was approached by the CDNJS team to see if we could help. We agreed to support their efforts and created cdnjs.cloudflare.com which serves the CDNJS project free of charge.

CDNJS has been astonishingly successful. The project is currently installed on over eighteen million websites (10% of the Internet!), offers files totaling over 1.5 billion lines of code, and serves over 173 billion requests a month. CDNJS only gets more popular as sites get larger, with 34% of the top 10k websites using the service. Each month we serve almost three petabytes of JavaScript, CSS, and other resources which power the web via cdnjs.cloudflare.com.

An Update on CDNJS
Spikes can happen when a very large or popular site installs CDNJS, or when a misbehaving web crawler discovers a CDNJS link.

The future value of CDNJS is now in doubt, as web browsers are beginning to use a separate cache for every website you visit. It is currently used on such a wide swath of the web, however, it is unlikely it will be disappearing any time soon.

How CDNJS Works

CDNJS starts with a Github repo. That project contains every file served by CDNJS, at every version which it has ever offered. That’s 182 GB without the commit history, over five million files, and over 1.5 billion lines of code.

Given that it stores and delivers versioned code files, in many ways it was the Internet’s first JavaScript package manager. Unlike other package managers and even other CDNs everything CDNJS serves is publicly versioned. All 67,724 commits! This means you as a user can verify that you are being served files which haven’t been tampered with.

To make changes to CDNJS a commit has to be made. For new projects being added to CDNJS, or when projects change significantly, these commits are made by humans, and get reviewed by other humans. When projects just release new versions there is a bot made by Peter and maintained by Sven which sucks up changes from npm and automatically creates commits.

Within Cloudflare’s infrastructure there is a set of machines which are responsible for pulling the latest version of the repo periodically. Those machines then become the origin for cdnjs.cloudflare.com, with Cloudflare’s Global Load Balancer automatically handling failures. Cloudflare’s cache automatically stores copies of many of the projects making it possible for us to deliver them quickly from all 195 of our data centers.

An Update on CDNJS

The Internet on a Shoestring Budget

The CDNJS project has always been administered independently of Cloudflare. In addition to the founders, the project has additionally been maintained by exceptionally hard-working caretakers like Peter and Matt Cowley. Maintaining a single repo of nearly every frontend project on Earth is no small task, and it has required a substantial amount of both manual work and bot development.

Unfortunately approximately thirty days ago one of those bots stopped working, preventing updated projects from appearing in CDNJS. The bot’s open-source maintainer was not able to invest the time necessary to keep the bot running. After several weeks we were asked by the community and the CDNJS founders to take over maintenance of the CDNJS repo itself. This means the Cloudflare engineering team is taking responsibility for keeping the contents of github.com/cdnjs/cdnjs up to date, in addition to ensuring it is correctly served on cdnjs.cloudflare.com.

We agreed to do this because we were, frankly, scared. Like so many open-source projects CDNJS was a critical part of our world, but wasn’t getting the attention it needed to survive. The Internet relies on CDNJS as much as on any other single project, losing it or allowing it to be commandeered would be catastrophic to millions of websites and their visitors. If it began to fail, some sites would adapt and update, others would be broken forever.

CDNJS has always been, and remains, a project for and by the community. We are invested in making all decisions in a transparent and inclusive manner. If you are interested in contributing to CDNJS or in the topics we’re currently discussing please visit the CDNJS Github Issues page.

An Update on CDNJS

A Plan for the Future

One example of an area where we could use your help is in charting a path towards a CDNJS which requires less manual moderation. Nothing can replace the intelligence and creativity of a human (yet), but for a task like managing what resources go into a CDN, it is error prone and time consuming. At present a human has to review every new project to be included, and often has to take additional steps to include new versions of a project.

As a part of our analysis of the project we examined a snapshot of the still-open PRs made against CDNJS for several months:

An Update on CDNJS

The vast majority of these PRs were changes which ultimately passed the automated review but nevertheless couldn’t be merged without manual review.

There is consensus that we should move to a model which does not require human involvement in most cases. We would love your input and collaboration on the best way for that to be solved. If this is something you are passionate about, please contribute here.

Our plan is to support the CDNJS project in whichever ways it requires for as long as the Internet relies upon it. We invite you to use CDNJS in your next project with the full assurance that it is backed by the same network and team who protect and accelerate over twenty million of your favorite websites across the Internet. We are also planning more posts diving further into the CDNJS data, subscribe to this blog if you would like to be notified upon their release.

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Post Syndicated from Filipp Nisenzoun original https://blog.cloudflare.com/introducing-the-graphql-analytics-api-exactly-the-data-you-need-all-in-one-place/

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Today we’re excited to announce a powerful and flexible new way to explore your Cloudflare metrics and logs, with an API conforming to the industry-standard GraphQL specification. With our new GraphQL Analytics API, all of your performance, security, and reliability data is available from one endpoint, and you can select exactly what you need, whether it’s one metric for one domain or multiple metrics aggregated for all of your domains. You can ask questions like “How many cached bytes have been returned for these three domains?” Or, “How many requests have all the domains under my account received?” Or even, “What effect did changing my firewall rule an hour ago have on the responses my users were seeing?”

The GraphQL standard also has strong community resources, from extensive documentation to front-end clients, making it easy to start creating simple queries and progress to building your own sophisticated analytics dashboards.

From many APIs…

Providing insights has always been a core part of Cloudflare’s offering. After all, by using Cloudflare, you’re relying on us for key parts of your infrastructure, and so we need to make sure you have the data to manage, monitor, and troubleshoot your website, app, or service. Over time, we developed a few key data APIs, including ones providing information regarding your domain’s traffic, DNS queries, and firewall events. This multi-API approach was acceptable while we had only a few products, but we started to run into some challenges as we added more products and analytics. We couldn’t expect users to adopt a new analytics API every time they started using a new product. In fact, some of the customers and partners that were relying on many of our products were already becoming confused by the various APIs.

Following the multi-API approach was also affecting how quickly we could develop new analytics within the Cloudflare dashboard, which is used by more people for data exploration than our APIs. Each time we built a new product, our product engineering teams had to implement a corresponding analytics API, which our user interface engineering team then had to learn to use. This process could take up to several months for each new set of analytics dashboards.

…to one

Our new GraphQL Analytics API solves these problems by providing access to all Cloudflare analytics. It offers a standard, flexible syntax for describing exactly the data you need and provides predictable, matching responses. This approach makes it an ideal tool for:

  1. Data exploration. You can think of it as a way to query your own virtual data warehouse, full of metrics and logs regarding the performance, security, and reliability of your Internet property.
  2. Building amazing dashboards, which allow for flexible filtering, sorting, and drilling down or rolling up. Creating these kinds of dashboards would normally require paying thousands of dollars for a specialized analytics tool. You get them as part of our product and can customize them for yourself using the API.

In a companion post that was also published today, my colleague Nick discusses using the GraphQL Analytics API to build dashboards. So, in this post, I’ll focus on examples of how you can use the API to explore your data. To make the queries, I’ll be using GraphiQL, a popular open-source querying tool that takes advantage of GraphQL’s capabilities.

Introspection: what data is available?

The first thing you may be wondering: if the GraphQL Analytics API offers access to so much data, how do I figure out what exactly is available, and how I can ask for it? GraphQL makes this easy by offering “introspection,” meaning you can query the API itself to see the available data sets, the fields and their types, and the operations you can perform. GraphiQL uses this functionality to provide a “Documentation Explorer,” query auto-completion, and syntax validation. For example, here is how I can see all the data sets available for a zone (domain):

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

If I’m writing a query, and I’m interested in data on firewall events, auto-complete will help me quickly find relevant data sets and fields:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Querying: examples of questions you can ask

Let’s say you’ve made a major product announcement and expect a surge in requests to your blog, your application, and several other zones (domains) under your account. You can check if this surge materializes by asking for the requests aggregated under your account, in the 30 minutes after your announcement post, broken down by the minute:

{
 viewer { 
   accounts (filter: {accountTag: $accountTag}) {
     httpRequests1mGroups(limit: 30, filter: {datetime_geq: "2019-09-16T20:00:00Z", datetime_lt: "2019-09-16T20:30:00Z"}, orderBy: [datetimeMinute_ASC]) {
	  dimensions {
		datetimeMinute
	  }
	  sum {
		requests
	  }
	}
   }
 }
}

Here is the first part of the response, showing requests for your account, by the minute:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Now, let’s say you want to compare the traffic coming to your blog versus your marketing site over the last hour. You can do this in one query, asking for the number of requests to each zone:

{
 viewer {
   zones(filter: {zoneTag_in: [$zoneTag1, $zoneTag2]}) {
     httpRequests1hGroups(limit: 2, filter: {datetime_geq: "2019-09-16T20:00:00Z",
datetime_lt: "2019-09-16T21:00:00Z"}) {
       sum {
         requests
       }
     }
   }
 }
}

Here is the response:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Finally, let’s say you’re seeing an increase in error responses. Could this be correlated to an attack? You can look at error codes and firewall events over the last 15 minutes, for example:

{
 viewer {
   zones(filter: {zoneTag: $zoneTag}) {
     httpRequests1mGroups (limit: 100,
filter: {datetime_geq: "2019-09-16T21:00:00Z",
datetime_lt: "2019-09-16T21:15:00Z"}) {
       sum {
         responseStatusMap {
           edgeResponseStatus
           requests
         }
       }
     }
    firewallEventsAdaptiveGroups (limit: 100,
filter: {datetime_geq: "2019-09-16T21:00:00Z",
datetime_lt: "2019-09-16T21:15:00Z"}) {
       dimensions {
         action
       }
       count
     }
    }
  }
}

Notice that, in this query, we’re looking at multiple datasets at once, using a common zone identifier to “join” them. Here are the results:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

By examining both data sets in parallel, we can see a correlation: 31 requests were “dropped” or blocked by the Firewall, which is exactly the same as the number of “403” responses. So, the 403 responses were a result of Firewall actions.

Try it today

To learn more about the GraphQL Analytics API and start exploring your Cloudflare data, follow the “Getting started” guide in our developer documentation, which also has details regarding the current data sets and time periods available. We’ll be adding more data sets over time, so take advantage of the introspection feature to see the latest available.

Finally, to make way for the new API, the Zone Analytics API is now deprecated and will be sunset on May 31, 2020. The data that Zone Analytics provides is available from the GraphQL Analytics API. If you’re currently using the API directly, please follow our migration guide to change your API calls. If you get your analytics using the Cloudflare dashboard or our Datadog integration, you don’t need to take any action.

One more thing….

In the API examples above, if you find it helpful to get analytics aggregated for all the domains under your account, we have something else you may like: a brand new Analytics dashboard (in beta) that provides this same information. If your account has many zones, the dashboard is helpful for knowing summary information on metrics such as requests, bandwidth, cache rate, and error rate. Give it a try and let us know what you think using the feedback link above the new dashboard.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner

Post Syndicated from Nadin El-Yabroudi original https://blog.cloudflare.com/introducing-flan-scan/

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner

Today, we’re excited to open source Flan Scan, Cloudflare’s in-house lightweight network vulnerability scanner. Flan Scan is a thin wrapper around Nmap that converts this popular open source tool into a vulnerability scanner with the added benefit of easy deployment.

We created Flan Scan after two unsuccessful attempts at using “industry standard” scanners for our compliance scans. A little over a year ago, we were paying a big vendor for their scanner until we realized it was one of our highest security costs and many of its features were not relevant to our setup. It became clear we were not getting our money’s worth. Soon after, we switched to an open source scanner and took on the task of managing its complicated setup. That made it difficult to deploy to our entire fleet of more than 190 data centers.

We had a deadline at the end of Q3 to complete an internal scan for our compliance requirements but no tool that met our needs. Given our history with existing scanners, we decided to set off on our own and build a scanner that worked for our setup. To design Flan Scan, we worked closely with our auditors to understand the requirements of such a tool. We needed a scanner that could accurately detect the services on our network and then lookup those services in a database of CVEs to find vulnerabilities relevant to our services. Additionally, unlike other scanners we had tried, our tool had to be easy to deploy across our entire network.

We chose Nmap as our base scanner because, unlike other network scanners which sacrifice accuracy for speed, it prioritizes detecting services thereby reducing false positives. We also liked Nmap because of the Nmap Scripting Engine (NSE), which allows scripts to be run against the scan results. We found that the “vulners” script, available on NSE, mapped the detected services to relevant CVEs from a database, which is exactly what we needed.

The next step was to make the scanner easy to deploy while ensuring it outputted actionable and valuable results. We added three features to Flan Scan which helped package up Nmap into a user-friendly scanner that can be deployed across a large network.

  • Easy Deployment and ConfigurationTo create a lightweight scanner with easy configuration, we chose to run Flan Scan inside a Docker container. As a result, Flan Scan can be built and pushed to a Docker registry and maintains the flexibility to be configured at runtime. Flan Scan also includes sample Kubernetes configuration and deployment files with a few placeholders so you can get up and scanning quickly.
  • Pushing results to the Cloud Flan Scan adds support for pushing results to a Google Cloud Storage Bucket or an S3 bucket. All you need to do is set a few environment variables and Flan Scan will do the rest. This makes it possible to run many scans across a large network and collect the results in one central location for processing.
  • Actionable Reports – Flan Scan generates actionable reports from Nmap’s output so you can quickly identify vulnerable services on your network, the applicable CVEs, and the IP addresses and ports where these services were found. The reports are useful for engineers following up on the results of the scan as well as auditors looking for evidence of compliance scans.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner
Sample run of Flan Scan from start to finish. 

How has Scan Flan improved Cloudflare’s network security?

By the end of Q3, not only had we completed our compliance scans, we also used Flan Scan to tangibly improve the security of our network. At Cloudflare, we pin the software version of some services in production because it allows us to prioritize upgrades by weighing the operational cost of upgrading against the improvements of the latest version. Flan Scan’s results revealed that our FreeIPA nodes, used to manage Linux users and hosts, were running an outdated version of Apache with several medium severity vulnerabilities. As a result, we prioritized their update. Flan Scan also found a vulnerable instance of PostgreSQL leftover from a performance dashboard that no longer exists.

Flan Scan is part of a larger effort to expand our vulnerability management program. We recently deployed osquery to our entire network to perform host-based vulnerability tracking. By complementing osquery’s findings with Flan Scan’s network scans we are working towards comprehensive visibility of the services running at our edge and their vulnerabilities. With two vulnerability trackers in place, we decided to build a tool to manage the increasing number of vulnerability  sources. Our tool sends alerts on new vulnerabilities, filters out false positives, and tracks remediated vulnerabilities. Flan Scan’s valuable security insights were a major impetus for creating this vulnerability tracking tool.

How does Flan Scan work?

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner

The first step of Flan Scan is running an Nmap scan with service detection. Flan Scan’s default Nmap scan runs the following scans:

  1. ICMP ping scan – Nmap determines which of the IP addresses given are online.
  2. SYN scan – Nmap scans the 1000 most common ports of the IP addresses which responded to the ICMP ping. Nmap marks ports as open, closed, or filtered.
  3. Service detection scan – To detect which services are running on open ports Nmap performs TCP handshake and banner grabbing scans.

Other types of scanning such as UDP scanning and IPv6 addresses are also possible with Nmap. Flan Scan allows users to run these and any other extended features of Nmap by passing in Nmap flags at runtime.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner
Sample Nmap output

Flan Scan adds the “vulners” script tag in its default Nmap command to include in the output a list of vulnerabilities applicable to the services detected. The vulners script works by making API calls to a service run by vulners.com which returns any known vulnerabilities for the given service.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner
Sample Nmap output with Vulners script

The next step of Flan Scan uses a Python script to convert the structured XML of Nmap’s output to an actionable report. The reports of the previous scanner we used listed each of the IP addresses scanned and present the vulnerabilities applicable to that location. Since we had multiple IP addresses running the same service, the report would repeat the same list of vulnerabilities under each of these IP addresses. This meant scrolling back and forth on documents hundreds of pages long to obtain a list of all IP addresses with the same vulnerabilities.  The results were impossible to digest.

Flan Scans results are structured around services. The report enumerates all vulnerable services with a list beneath each one of relevant vulnerabilities and all IP addresses running this service. This structure makes the report shorter and actionable since the services that need to be remediated can be clearly identified. Flan Scan reports are made using LaTeX because who doesn’t like nicely formatted reports that can be generated with a script? The raw LaTeX file that Flan Scan outputs can be converted to a beautiful PDF by using tools like pdf2latex or TeXShop.

Introducing Flan Scan: Cloudflare’s Lightweight Network Vulnerability Scanner
Sample Flan Scan report

What’s next?

Cloudflare’s mission is to help build a better Internet for everyone, not just Internet giants who can afford to buy expensive tools. We’re open sourcing Flan Scan because we believe it shouldn’t cost tons of money to have strong network security.

You can get started running a vulnerability scan on your network in a few minutes by following the instructions on the README. We welcome contributions and suggestions from the community.

Improve Your App Testing With Amplify Console’s Pull Requests Previews and Cypress Testing

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/improve-your-app-testing-with-amplify-consoles-pull-requests-previews-and-cypress-testing/

Amplify Console allows developers to easly configure a Git-based workflow for continuous deployment and hosting of fullstack serverless web apps. Fullstack serverless apps comprise of backend resources such as GraphQL APIs, Data and File Storage, Authentication, or Analytics, integrated with a frontend framework such as React, Gatsby, or Angular. You can read more about the Amplify Console in a previous article I wrote.

Today, we are announcing the ability to create preview URLs and to run end-to-end tests on pull requests before releasing code to production.

Pull Request previews
You can now configure Amplify Console to deploy your application to a unique URL every time a developer submits a pull request to your Git repository. The preview URL is completely different from the one used by the production site. You can see how changes look before merging the pull request into the main branch of your code repository, triggering a new release in the Amplify Console. For fullstack apps with backend environments provisioned via the Amplify CLI, every pull request spins up an ephemeral backend that is deleted when the pull request is closed. You can test changes in complete isolation from the production environment. Amplify Console creates backend infrastructures for pull requests on private git repositories only. This allows to avoid incurring extra costs in case of unsolicited pull requests.

To learn how it works, let’s start a web application with a cloud-based authentication backend, and deploy it on Amplify Console. I first create a React application (check here to learn how to install React).

npx create-react-app amplify-console-demo                                                
cd amplify-console-demo

I initialize the Amplify environment (learn how to install the Amplify CLI first). I add a cloud based authentication backend powered by Amazon Cognito. I accept all the defaults answers proposed by Amplify CLI.

npm install aws-amplify aws-amplify-react
amplify init
amplify add auth
amplify push

I then modify src/App.js to add the front end authentication user interface. The code is available in the AWS Amplify documentation. Once ready, I start the local development server to test the application locally.

npm run start

I point my browser to http://localhost:8080 to verify the scafolding (the below screenshot is taken from my AWS Cloud 9 development environment). I click Create account to create a user, verify the SignUp flow, and authenticate to the app.

After signing up, I see the application page.

There are two important details to note. First, I am using a private GitHub repository. Amplify Console only creates backend infrastructure on pull requests for private repositories, to avoid creating unnecessary infrastructure for unsollicited pull requests. Second, the Amplify Console build process looks for dependencies in package-lock.json only. This is why I added the amplify packages with npm and not with yarn.

When I am happy with my app, I push the code to a GitHub repo (let’s assume I already did git remote add origin ...).

git add amplify
git commit -am "initial commit"
git push origin master

The next step consists of configuring Amplify Console to build and deploy my app on every git commit. I login to the Amplify Console, click Connect App, choose GitHub as repository and click Continue (the first time I do this, I need to authenticate on GitHub, using my GitHub username and password)

I select my repository and the branch I want to use as source:

Amplify Console detects the type of project and proposes a build file. I select the name of my environment (dev). The first time I use Amplify Console, I follow the instructions to create a new service role. This role authorises Amplify Console to access AWS backend services on my behalf.

I click Next. I review the settings and click Save and Deploy. After a few seconds or minutes, my application is ready. I can point my browser to the deployment URL and verify the app is working correctly.

Now, let’s enable previews for pull requests. Click Preview on the left menu and Enable Previews. To enable the previews, Amplify Console requires an app to be installed in my GitHub account. I follow the instructions provided by the console to configure my GitHub account. Once set up, I select a branch, click Manage to enable / disable the pull request previews. (At anytime, I can uninstall the Amplify app from my GitHub account by visiting the Applications section of my GitHub account’s settings.)

Now that the mechanism is in place, let’s create a pull request.

I edit App.js directly on GitHub. I customize the withAuthenticator component to change the color of the Sign In button from orange to green. I save the changes and I create a pull request.

On the Pull Request detail page, I click Show all checks to get the status of the Amplify Console test. I see AWS Amplify Console Web Preview in progress. Amplify Console creates a full backend environment to test the pull request, to build and to deploy the frontend.

Eventually, I see All checks have passed and a green mark. I click Details to get the preview url. In case of an error, you can see the detailled log file of the build phase in the Amplify Console.

I can also check the status of the preview in the Amplify Console.

I point my browser to the preview URL to test my change. I can see the green Sign In button instead of the orange one.

When I try to authenticate using the username and password I created previously, I receive an User does not exist error message because this preview URL points to a different backend than the main application. I can see two Cognito user pools in the Cognito console, one for each environment.

I can control who can access the preview URL using similar access control settings that I use for the main URL.

When I am happy with the proposed changes, I merge the pull request on GitHub to trigger a new build and to deploy the change to the production environment. Amplify Console deletes the preview environment upon merging. The ephemeral backend environment created for the pull request also gets deleted.

Cypress testing
In addition to previewing changes before merging them to the main branch, we also added the capability to run end to end tests during your build process. You can use your favorite test framework to add unit or end-to-end tests to your application and automatically run the tests during the build phase. When you use Cypress test framework, Amplify Console detects the tests in your source tree and automatically adds the testing phase in your application build process.

Only projects that are passing all tests are pushed down your pipeline to the deployment phase. You can learn more about this and follow step by step instructions we posted a few weeks ago.

These two additions to Amplify Console allow you to gain higher confidence in the robustness of your pipeline and the quality of the code delivered to your production environment.

Availability
Web previews are available in all Regions where AWS Amplify Console is available today, at no additional cost on top of the regular Amplify Console pricing. With the AWS Free Usage Tier, you can get started for free. Upon sign up, new AWS customers receive 1,000 build minutes per month for the build and deploy feature, and 15 GB served per month and 5 GB data storage per month for the hosting.

— seb

Experiment with HTTP/3 using NGINX and quiche

Post Syndicated from Alessandro Ghedini original https://blog.cloudflare.com/experiment-with-http-3-using-nginx-and-quiche/

Experiment with HTTP/3 using NGINX and quiche

Experiment with HTTP/3 using NGINX and quiche

Just a few weeks ago we announced the availability on our edge network of HTTP/3, the new revision of HTTP intended to improve security and performance on the Internet. Everyone can now enable HTTP/3 on their Cloudflare zone and experiment with it using Chrome Canary as well as curl, among other clients.

We have previously made available an example HTTP/3 server as part of the quiche project to allow people to experiment with the protocol, but it’s quite limited in the functionality that it offers, and was never intended to replace other general-purpose web servers.

We are now happy to announce that our implementation of HTTP/3 and QUIC can be integrated into your own installation of NGINX as well. This is made available as a patch to NGINX, that can be applied and built directly with the upstream NGINX codebase.

Experiment with HTTP/3 using NGINX and quiche

It’s important to note that this is not officially supported or endorsed by the NGINX project, it is just something that we, Cloudflare, want to make available to the wider community to help push adoption of QUIC and HTTP/3.

Building

The first step is to download and unpack the NGINX source code. Note that the HTTP/3 and QUIC patch only works with the 1.16.x release branch (the latest stable release being 1.16.1).

 % curl -O https://nginx.org/download/nginx-1.16.1.tar.gz
 % tar xvzf nginx-1.16.1.tar.gz

As well as quiche, the underlying implementation of HTTP/3 and QUIC:

 % git clone --recursive https://github.com/cloudflare/quiche

Next you’ll need to apply the patch to NGINX:

 % cd nginx-1.16.1
 % patch -p01 < ../quiche/extras/nginx/nginx-1.16.patch

And finally build NGINX with HTTP/3 support enabled:

 % ./configure                          	\
   	--prefix=$PWD                       	\
   	--with-http_ssl_module              	\
   	--with-http_v2_module               	\
   	--with-http_v3_module               	\
   	--with-openssl=../quiche/deps/boringssl \
   	--with-quiche=../quiche
 % make

The above command instructs the NGINX build system to enable the HTTP/3 support ( --with-http_v3_module) by using the quiche library found in the path it was previously downloaded into ( --with-quiche=../quiche), as well as TLS and HTTP/2. Additional build options can be added as needed.

You can check out the full instructions here.

Running

Once built, NGINX can be configured to accept incoming HTTP/3 connections by adding the quic and reuseport options to the listen configuration directive.

Here is a minimal configuration example that you can start from:

events {
    worker_connections  1024;
}

http {
    server {
        # Enable QUIC and HTTP/3.
        listen 443 quic reuseport;

        # Enable HTTP/2 (optional).
        listen 443 ssl http2;

        ssl_certificate      cert.crt;
        ssl_certificate_key  cert.key;

        # Enable all TLS versions (TLSv1.3 is required for QUIC).
        ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;
    }
}

This will enable both HTTP/2 and HTTP/3 on the TCP/443 and UDP/443 ports respectively.

You can then use one of the available HTTP/3 clients (such as Chrome Canary, curl or even the example HTTP/3 client provided as part of quiche) to connect to your NGINX instance using HTTP/3.

We are excited to make this available for everyone to to experiment and play with HTTP/3, but it’s important to note that the implementation is still experimental and it’s likely to have bugs as well as limitations in functionality. Feel free to submit a ticket to the quiche project if you run into problems or find any bug.

Terraforming Cloudflare: in quest of the optimal setup

Post Syndicated from Guest Author original https://blog.cloudflare.com/terraforming-cloudflare/

Terraforming Cloudflare: in quest of the optimal setup

This is a guest post by Dimitris Koutsourelis and Alexis Dimitriadis, working for the Security Team at Workable, a company that makes software to help companies find and hire great people.

Terraforming Cloudflare: in quest of the optimal setup

This post is about our introductive journey to the infrastructure-as-code practice; managing Cloudflare configuration in a declarative and version-controlled way. We’d like to share the experience we’ve gained during this process; our pain points, limitations we faced, different approaches we took and provide parts of our solution and experimentations.

Terraform world

Terraform is a great tool that fulfills our requirements, and fortunately, Cloudflare maintains its own provider that allows us to manage its service configuration hasslefree.

On top of that, Terragrunt, is a thin wrapper that provides extra commands and functionality for keeping Terraform configurations DRY, and managing remote state.

The combination of both leads to a more modular and re-usable structure for Cloudflare resources (configuration), by utilizing terraform and terragrunt modules.
We’ve chosen to use the latest version of both tools (Terraform-v0.12 & Terragrunt-v0.19 respectively) and constantly upgrade to take advantage of the valuable new features and functionality, which at this point in time, remove important limitations.

Workable context

Our set up includes multiple domains that are grouped in two distinct Cloudflare organisations: production & staging. Our environments have their own purposes and technical requirements (i.e.: QA, development, sandbox and production) which translates to slightly different sets of Cloudflare zone configuration.

Our approach

Our main goal was to have a modular set up with the ability to manage any configuration for any zone, while keeping code repetition to a minimum. This is more complex than it sounds; we have repeatedly changed our Terraform folder structure – and other technical aspects – during the development period. The following sections illustrate a set of alternatives through our path, along with pros & cons.

Structure

Terraform configuration is based on the project’s directory structure, so this is the place to start.
Instead of retaining the Cloudflare organisation structure (production & staging as root level directories containing the zones that belong in each organization), our decision was to group zones that share common configuration under the same directory. This helps keep the code dry and the set up consistent and readable.
On the down side, this structure adds an extra layer of complexity, as two different sets of credentials need to be handled conditionally and two state files (at the environments/ root level) must be managed and isolated using workspaces.
On top of that, we used Terraform modules, to keep sets of common configuration across zone groups into a single place.
Terraform modules repository

modules/
│    ├── firewall/
│        ├── main.tf
│        ├── variables.tf
│    ├── zone_settings/
│        ├── main.tf
│        ├── variables.tf
│    └── [...]
└──

Terragrunt modules repository

environments/
│    ├── [...]
│    ├── dev/
│    ├── qa/
│    ├── demo/
│        ├── zone-8/ (production)
│            └── terragrunt.hcl
│        ├── zone-9/ (staging)
│            └── terragrunt.hcl
│        ├── config.tfvars
│        ├── main.tf
│        └── variables.tf
│    ├── config.tfvars
│    ├── secrets.tfvars
│    ├── main.tf
│    ├── variables.tf
│    └── terragrunt.hcl
└──

The Terragrunt modules tree gives flexibility, since we are able to apply configuration on a zone, group zone, or organisation level (which is inline with Cloudflare configuration capabilities – i.e.: custom error pages can also be configured on the organisation level).

Resource types

We decided to implement Terraform resources in different ways, to cover our requirements more efficiently.

1. Static resource

The first thought that came to mind was having one, or multiple .tf files implementing all the resources with hardcoded values assigned to each attribute. It’s simple and straightforward, but can have a high maintenance cost if it leads to code copy/paste between environments.
So, common settings seem to be a good use case; we chose to implement access_rules Terraform resources accordingly:
modules/access_rules/main.tf

resource "cloudflare_access_rule" "no_17" {
notes		= "this is a description"
mode 	= "blacklist"
configuration = {
target	= "ip"
value 	= "x.x.x.x"
}
}
[...]

2. Parametrized resources

Our next step was to add variables to gain flexibility. This is useful when few attributes of a shared resource configuration differ between multiple zones. Most of the configuration remains the same (as described above) and the variable instantiation is added in the Terraform module, while their values are fed through the Terragrunt module, as input variables, or entries inside_.tfvars_ files. The zone_settings_override resource was implemented accordingly:
modules/zone_settings/main.tf

resource "cloudflare_zone_settings_override" "zone_settings" {
zone_id = var.zone_id
settings {
always_online		= "on"
always_use_https		= "on"
[...]
browser_check		= var.browser_check
mobile_redirect {
mobile_subdomain	= var.mobile_redirect_subdomain
status			= var.mobile_redirect_status
strip_uri			= var.mobile_redirect_uri
}
[...]
waf			= "on"
webp		= "off"
websockets		= "on"
}
}

environments/qa/main.tf

module "zone_settings" {
source		= "[email protected]:foo/modules/zone_settings"
zone_name		= var.zone_name
browser_check	= var.zone_settings_browser_check
[...]
}

environments/qa/config.tfvars

#zone settings
zone_settings_browser_check = "off"
[...]
}

3. Dynamic resource

At that point, we thought that a more interesting approach would be to create generic resource templates to manage all instances of a given resource in one place. A template is implemented as a Terraform module and creates each resource dynamically, based on its input: data fed through the Terragrunt modules (/environments in our case), or entries in the tfvars files.
We chose to implement the account_member resource this way.
modules/account_members/variables.tf

variable "users" {
description	= "map of users - roles"
type        	= map(list(string))
}
variable "member_roles" {
description 	= "account role ids"
type        	= map(string)
}

modules/account_members/main.tf


resource "cloudflare_account_member" "account_member" {
for_each     		= var.users
email_address	= each.key
role_ids     		= [for role in each.value : lookup(var.member_roles, role)]
lifecycle {
prevent_destroy = true
}
}

We feed the template with a list of users (list of maps). Each member is assigned a number of roles. To make code more readable, we mapped users to role names instead of role ids:
environments/config.tfvars


member_roles = {
admin		= "000013091sds0193jdskd01d1dsdjhsd1"
admin_ro		= "0000ds81hd131bdsjd813hh173hds8adh"
analytics		= "0000hdsa8137djahd81y37318hshdsjhd"
[...]
super_admin		= "00001534sd1a2123781j5gj18gj511321"
}
users = {
"[email protected]"  	= ["super_admin"]
"[email protected]"	= ["analytics", "audit_logs", "cache_purge", "cf_workers"]
"[email protected]"	= ["cf_stream"]
[...]
"[email protected]"	= ["cf_stream"]
}

Another interesting case we dealt with was the rate_limit resource; the variable declaration (list of objects) & implementation goes as follows:
modules/rate_limit/variables.tf

variable "rate_limits" {
description	= "list of rate limits"
default	= []
type		= list(object(
{
disabled	= bool,
threshold	= number,
description	= string,
period	= number,
match	= object({
request	= object({
url_pattern	= map(string),
schemes		= list(string),
methods 		= list(string)
}),
response 		= object({
statuses		= list(number),
origin_traffic	= bool
})
}),
action	= object({
mode	= string,
timeout	= number
})
}))
}

modules/rate_limit/main.tf

locals {
[…]
}
data "cloudflare_zones" "zone" {
filter {
name   	= var.zone_name
status 	= "active"
paused 	= false
}
}
resource "cloudflare_rate_limit" "rate_limit" {
count 	= length(var.rate_limits)
zone_id    	=  lookup(data.cloudflare_zones.zone.zones[0], "id")
disabled    	= var.rate_limits[count.index].disabled
threshold   	= var.rate_limits[count.index].threshold
description 	= var.rate_limits[count.index].description
period        	= var.rate_limits[count.index].period
match {
request {
url_pattern 	= local.url_patterns[count.index]
schemes 		= var.rate_limits[count.index].match.request.schemes
methods 		= var.rate_limits[count.index].match.request.methods
}
response {
statuses       	= var.rate_limits[count.index].match.response.statuses
origin_traffic	= var.rate_limits[count.index].match.response.origin_traffic
}
}
action {
mode   	 = var.rate_limits[count.index].action.mode
timeout 	= var.rate_limits[count.index].action.timeout
}
}

environments/qa/rate_limit.tfvars

{
#1
disabled    	= false
threshold   	= 50
description 	= "sample description"
period     	 = 60
match 	= {
request 	= {
url_pattern 	= {
"subdomain" 	= "foo"
"path" 	= "/api/v1/bar"
}
schemes = [ "_ALL_", ]
methods = [ "GET", "POST", ]
}
response 	= {
statuses       	= []
origin_traffic 	= true
}
}
action 	= {
mode    	= "simulate"
timeout 	= 3600
}
},
[...]
}
]

The biggest advantage of this approach is that all common rate_limit rules are in one place and each environment can include its own rules in their .tfvars. The combination of those using Terraform built-in concat() function, achieves a 2-layer join of the two lists (common|unique rules). So we wanted to give it a try:

locals {
rate_limits  = concat(var.common_rate_limits, var.unique_rate_limits)
}

There is however a drawback: .tfvars files can only contain static values. So, since all url attributes – that include the zone name itself – have to be set explicitly in the data of each environment, it means that every time a change is needed to a url, this value has to be copied across all environments and change the zone name to match the environment.
The solution we came up with, in order to make the zone name dynamic, was to split the url attribute into 3 parts: subdomain, domain and path. This is effective for the .tfvars, but the added complexity to handle the new variables is non negligible. The corresponding code illustrates the issue:
modules/rate_limit/main.tf

locals {
rate_limits  	= concat(var.common_rate_limits, var.unique_rate_limits)
url_patterns 	= [for rate_limit in local.rate_limits:  "${lookup(rate_limit.match.request.url_pattern, "subdomain", null) != null ? "${lookup(rate_limit.match.request.url_pattern, "subdomain")}." : ""}"${lookup(rate_limit.match.request.url_pattern, "domain", null) != null ? "${lookup(rate_limit.match.request.url_pattern, "domain")}" : ${var.zone_name}}${lookup(rate_limit.match.request.url_pattern, "path", null) != null ? lookup(rate_limit.match.request.url_pattern, "path") : ""}"]
}

Readability vs functionality: although flexibility is increased and code duplication is reduced, the url transformations have an impact on code’s readability and ease of debugging (it took us several minutes to spot a typo). You can imagine this is even worse if you attempt to implement a more complex resource (such as page_rule which is a list of maps with four url attributes).
The underlying issue here is that at the point we were implementing our resources, we had to choose maps over objects due to their capability to omit attributes, using the lookup() function (by setting default values). This is a requirement for certain resources such as page_rules: only certain attributes need to be defined (and others ignored).
In the end, the context will determine if more complex resources can be implemented with dynamic resources.

4. Sequential resources

Cloudflare page rule resource has a specific peculiarity that differentiates it from other types of resources: the priority attribute.
When a page rule is applied, it gets a unique id and priority number which corresponds to the order it has been submitted. Although Cloudflare API and terraform provider give the ability to explicitly specify the priority, there is a catch.
Terraform doesn’t respect the order of resources inside a .tf file (even in a _for each loop!); each resource is randomly picked up and then applied to the provider. So, if page_rule priority is important – as in our case – the submission order counts. The solution is to lock the sequence in which the resources are created through the depends_on meta-attribute:

resource "cloudflare_page_rule" "no_3" {
depends_on 	= [cloudflare_page_rule.no_2]
zone_id    	= lookup(data.cloudflare_zones.zone.zones[0], "id")
target     	= "www.${var.zone_name}/foo"
status     	= "active"
priority   	= 3
actions {
forwarding_url {
status_code 	= 301
url        		 = "https://www.${var.zone_name}"
}
}
}
resource "cloudflare_page_rule" "no_2" {
depends_on = [cloudflare_page_rule.no_1]
zone_id   	= lookup(data.cloudflare_zones.zone.zones[0], "id")
target    	= "www.${var.zone_name}/lala*"
status     	= "active"
priority   	= 24
actions {
ssl                 		= "flexible"
cache_level         		= "simplified"
resolve_override    		= "bar.${var.zone_name}"
host_header_override 	= "new.domain.com"
}
}
resource "cloudflare_page_rule" "page_rule_1" {
zone_id    	= lookup(data.cloudflare_zones.zone.zones[0], "id")
target   	= "*.${var.zone_name}/foo/*"
status   	= "active"
priority 	= 1
actions {
forwarding_url {
status_code 	= 301
url         		= "https://foo.${var.zone_name}/$1/$2"
}
}
}

So we had to go with to a more static resource configuration because the depends_on attribute only takes static values (not dynamically calculated ones during the runtime).

Conclusion

After changing our minds several times along the way on Terraform structure and other technical details, we believe that there isn’t a single best solution. It all comes down to the requirements and keeping a balance between complexity and simplicity. In our case, a mixed approach is good middle ground.
Terraform is evolving quickly, but at this point it lacks some common coding capabilities. So over engineering can be a catch (which we fell-in too many times). Keep it simple and as DRY as possible. 🙂

Learn about AWS Services & Solutions – September AWS Online Tech Talks

Post Syndicated from Jenny Hang original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-september-aws-online-tech-talks/

Learn about AWS Services & Solutions – September AWS Online Tech Talks

AWS Tech Talks

Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

 

Compute:

September 23, 2019 | 11:00 AM – 12:00 PM PTBuild Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case.

September 26, 2019 | 1:00 PM – 2:00 PM PTSelf-Hosted WordPress: It’s Easier Than You Think – Learn how you can easily build a fault-tolerant WordPress site using Amazon Lightsail.

October 3, 2019 | 11:00 AM – 12:00 PM PTLower Costs by Right Sizing Your Instance with Amazon EC2 T3 General Purpose Burstable Instances – Get an overview of T3 instances, understand what workloads are ideal for them, and understand how the T3 credit system works so that you can lower your EC2 instance costs today.

 

Containers:

September 26, 2019 | 11:00 AM – 12:00 PM PTDevelop a Web App Using Amazon ECS and AWS Cloud Development Kit (CDK) – Learn how to build your first app using CDK and AWS container services.

 

Data Lakes & Analytics:

September 26, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Provisioning Amazon MSK Clusters and Using Popular Apache Kafka-Compatible Tooling – Learn best practices on running Apache Kafka production workloads at a lower cost on Amazon MSK.

 

Databases:

September 25, 2019 | 1:00 PM – 2:00 PM PTWhat’s New in Amazon DocumentDB (with MongoDB compatibility) – Learn what’s new in Amazon DocumentDB, a fully managed MongoDB compatible database service designed from the ground up to be fast, scalable, and highly available.

October 3, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Enterprise-Class Security, High-Availability, and Scalability with Amazon ElastiCache – Learn about new enterprise-friendly Amazon ElastiCache enhancements like customer managed key and online scaling up or down to make your critical workloads more secure, scalable and available.

 

DevOps:

October 1, 2019 | 9:00 AM – 10:00 AM PT – CI/CD for Containers: A Way Forward for Your DevOps Pipeline – Learn how to build CI/CD pipelines using AWS services to get the most out of the agility afforded by containers.

 

Enterprise & Hybrid:

September 24, 2019 | 1:00 PM – 2:30 PM PT Virtual Workshop: How to Monitor and Manage Your AWS Costs – Learn how to visualize and manage your AWS cost and usage in this virtual hands-on workshop.

October 2, 2019 | 1:00 PM – 2:00 PM PT – Accelerate Cloud Adoption and Reduce Operational Risk with AWS Managed Services – Learn how AMS accelerates your migration to AWS, reduces your operating costs, improves security and compliance, and enables you to focus on your differentiating business priorities.

 

IoT:

September 25, 2019 | 9:00 AM – 10:00 AM PTComplex Monitoring for Industrial with AWS IoT Data Services – Learn how to solve your complex event monitoring challenges with AWS IoT Data Services.

 

Machine Learning:

September 23, 2019 | 9:00 AM – 10:00 AM PTTraining Machine Learning Models Faster – Learn how to train machine learning models quickly and with a single click using Amazon SageMaker.

September 30, 2019 | 11:00 AM – 12:00 PM PTUsing Containers for Deep Learning Workflows – Learn how containers can help address challenges in deploying deep learning environments.

October 3, 2019 | 1:00 PM – 2:30 PM PTVirtual Workshop: Getting Hands-On with Machine Learning and Ready to Race in the AWS DeepRacer League – Join DeClercq Wentzel, Senior Product Manager for AWS DeepRacer, for a presentation on the basics of machine learning and how to build a reinforcement learning model that you can use to join the AWS DeepRacer League.

 

AWS Marketplace:

September 30, 2019 | 9:00 AM – 10:00 AM PTAdvancing Software Procurement in a Containerized World – Learn how to deploy applications faster with third-party container products.

 

Migration:

September 24, 2019 | 11:00 AM – 12:00 PM PTApplication Migrations Using AWS Server Migration Service (SMS) – Learn how to use AWS Server Migration Service (SMS) for automating application migration and scheduling continuous replication, from your on-premises data centers or Microsoft Azure to AWS.

 

Networking & Content Delivery:

September 25, 2019 | 11:00 AM – 12:00 PM PTBuilding Highly Available and Performant Applications using AWS Global Accelerator – Learn how to build highly available and performant architectures for your applications with AWS Global Accelerator, now with source IP preservation.

September 30, 2019 | 1:00 PM – 2:00 PM PTAWS Office Hours: Amazon CloudFront – Just getting started with Amazon CloudFront and [email protected]? Get answers directly from our experts during AWS Office Hours.

 

Robotics:

October 1, 2019 | 11:00 AM – 12:00 PM PTRobots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.

 

Security, Identity & Compliance:

October 1, 2019 | 1:00 PM – 2:00 PM PTDeep Dive on Running Active Directory on AWS – Learn how to deploy Active Directory on AWS and start migrating your windows workloads.

 

Serverless:

October 2, 2019 | 9:00 AM – 10:00 AM PTDeep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.

 

Storage:

September 24, 2019 | 9:00 AM – 10:00 AM PTOptimize Your Amazon S3 Data Lake with S3 Storage Classes and Management Tools – Learn how to use the Amazon S3 Storage Classes and management tools to better manage your data lake at scale and to optimize storage costs and resources.

October 2, 2019 | 11:00 AM – 12:00 PM PTThe Great Migration to Cloud Storage: Choosing the Right Storage Solution for Your Workload – Learn more about AWS storage services and identify which service is the right fit for your business.

 

 

Amplify Console – Hosting for Fullstack Serverless Web Apps

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amplify-console-hosting-for-fullstack-serverless-web-apps/

AWS Amplify Console is a fullstack web app hosting service, with continuous deployment from your preferred source code repository. Amplify Console has been introduced in November 2018 at AWS re:Invent. Since then, the team has been listening to customer feedback and iterated quickly to release several new features, here is a short re:Cap.

Instant Cache Invalidation
Amplify Console allows to host single page web apps or static sites with serverless backends via a content delivery network, or CDN. A CDN is a network of distributed servers that cache files at edge locations across the world enabling low latency distribution of your web file assets.

Previously, updating content on the CDN required manually invalidating the cache and waiting 15-20 minutes for changes to propagate globally. To make frequent updates, developers found workarounds such as setting lower time-to-live (TTLs) on asset headers which enables faster updates, but adversely impacts performance. Now, you no longer have to make a tradeoff between faster deployments and faster performance. On every commit code to your repository, the Amplify Console builds and deploys changes to the CDN that are viewable immediately in the browser.

“Deploy To Amplify Console” Button

Deploy To Amplify Console

When publishing your project source code on GitHub, you can make it easy for other developers to build and deploy your application by providing a “Deploy To Amplify Console” button in the Readme document. Clicking on that button will open Amplify Console and propose a three step process to deploy your code.

You test this yourself with these example projects and have a look at the documentation. Adding a button to your own code repository is as easy as adding this line in your Readme document (be sure to replace the username and repository name in the GitHub URL):

[![amplifybutton](https://oneclick.amplifyapp.com/button.svg)](https://console.aws.amazon.com/amplify/home#/deploy?repo=https://github.com/username/repository)

Manual Deploy
I think it is a good idea to version control everything, including simple web site where you are the only developer. But just in case you do not want to use a source code repository as source for your deployment, Amplify Console allows to deploy a zip file, a local folder on your laptop, an Amazon S3 bucket or any HTTPS URL, such as a shared repository on Dropbox.

When creating a new Amplify Console project, select Deploy without Git Provider option. 
Then choose your source file (your laptop, Amazon S3 or an HTTPS URI)

AWS CloudFormation Integration
Developers love automation. Deploying code or infrastructure is no different : you must ensure your infrastructure deployments are automated and repeatable. AWS CloudFormation allows you to automate the creation of infrastruture in the cloud based on a YAML or JSON description. Amplify Console added three new resource types to AWS CloudFormation:

  • AWS::Amplify::App
  • AWS::Amplify::Branch
  • AWS::Amplify::Domain

These allows you respectively to create a new Amplify Console app, to define the Git branch, and the DNS domain name to use.

AWS CloudFormation connects to your source code repository to add a webhook to it. You need to include your Github Personal Access Token to allow this to happen, this blog post has all the details. Remember to not hardcode credentials (or OAuth tokens) into your Cloudformation templates, use parameters instead.

Deploy Multiple Git Branches
We believe your CI/CD tools must adapt to your team workflow, not the other way around. Amplify Console supports branch pattern deployments, allowing you to automatically deploy branches that match a specific pattern without any extra configuration. Pattern matching is based on regular expresssions.

When you want to test a new feature, you typically create a new branch in Git. Amplify Console and the Amplify CLI are now detecting this and will provision a separate backend and hosting infrastructure for your serverless app.

To enable branch detection, use the left menu, click on General > Edit and turn on Branch Autodetection:

Custom HTTP Headers
You can customize Amplify Console to send customized HTTP response headers. Response headers can be used for debugging, security, or informational purposes. To add your custom headers, you select App Settings > Build Settings and then edit the buildspec. For example, to enforce TLS transport and prevent XSS attacks, you can add the following headers:

customHeaders:
        - pattern: '**/*'
          headers:
                - key: 'Strict-Transport-Security'
                        value: 'max-age=31536000; includeSubDomains'
                - key: 'X-Frame-Options'
                        value: 'X-Frame-Options: SAMEORIGIN'
                - key: 'X-XSS-Protection'
                        value: 'X-XSS-Protection: 1; mode=block'
                - key: 'X-Content-Type-Options'
                        value: 'X-Content-Type-Options: nosniff'
                - key: 'Content-Security-Policy'
                        value: "default-src 'self'"

The documentation has more details.

Custom Containers for Build
Last but not least, we made several changes to the build environment. Amplify Console uses AWS CodeBuild behind the scenes. The default build container image is now based on Amazon Linux 2 and has Serverless Application Model (SAM) CLI pre-installed. If, for whatever reasons you want to use your own container for the build, you can configure Amplify Console to do so. Select App Settings > Build Settings :

And then edit the build image setting

There are a few requirements on the container image: it has to have cURL, git, OpenSSH and, if you are building NodeJS projects, node and npm. As usual, the details are in the documentation.

Each of these new features has been driven by your feedback, so please continue to tell us what is important for you by submittin, and expect to see more changes coming in the second part of the year and beyond.

— seb

Building a GraphQL server on the edge with Cloudflare Workers

Post Syndicated from Kristian Freeman original https://blog.cloudflare.com/building-a-graphql-server-on-the-edge-with-cloudflare-workers/

Building a GraphQL server on the edge with Cloudflare Workers

Building a GraphQL server on the edge with Cloudflare Workers

Today, we’re open-sourcing an exciting project that showcases the strengths of our Cloudflare Workers platform: workers-graphql-server is a batteries-included Apollo GraphQL server, designed to get you up and running quickly with GraphQL.

Building a GraphQL server on the edge with Cloudflare Workers
Testing GraphQL queries in the GraphQL Playground

As a full-stack developer, I’m really excited about GraphQL. I love building user interfaces with React, but as a project gets more complex, it can become really difficult to manage how your data is managed inside of an application. GraphQL makes that really easy – instead of having to recall the REST URL structure of your backend API, or remember when your backend server doesn’t quite follow REST conventions – you just tell GraphQL what data you want, and it takes care of the rest.

Cloudflare Workers is uniquely suited as a platform to being an incredible place to host a GraphQL server. Because your code is running on Cloudflare’s servers around the world, the average latency for your requests is extremely low, and by using Wrangler, our open-source command line tool for building and managing Workers projects, you can deploy new versions of your GraphQL server around the world within seconds.

If you’d like to try the GraphQL server, check out a demo GraphQL playground, deployed on Workers.dev. This optional add-on to the GraphQL server allows you to experiment with GraphQL queries and mutations, giving you a super powerful way to understand how to interface with your data, without having to hop into a codebase.

If you’re ready to get started building your own GraphQL server with our new open-source project, we’ve added a new tutorial to our Workers documentation to help you get up and running – check it out here!

Finally, if you’re interested in how the project works, or want to help contribute – it’s open-source! We’d love to hear your feedback and see your contributions. Check out the project on GitHub.

Amazon Transcribe Streaming Now Supports WebSockets

Post Syndicated from Brandon West original https://aws.amazon.com/blogs/aws/amazon-transcribe-streaming-now-supports-websockets/

I love services like Amazon Transcribe. They are the kind of just-futuristic-enough technology that excites my imagination the same way that magic does. It’s incredible that we have accurate, automatic speech recognition for a variety of languages and accents, in real-time. There are so many use-cases, and nearly all of them are intriguing. Until now, the Amazon Transcribe Streaming API available has been available using HTTP/2 streaming. Today, we’re adding WebSockets as another integration option for bringing real-time voice capabilities to the things you build.

In this post, we are going to transcribe speech in real-time using only client-side JavaScript in a browser. But before we can build, we need a foundation. We’ll review just enough information about Amazon Transcribe, WebSockets, and the Amazon Transcribe Streaming API to broadly explain the demo. For more detailed information, check out the Amazon Transcribe docs.

If you are itching to see things in action, you can head directly to the demo, but I recommend taking a quick read through this post first.

What is Amazon Transcribe?

Amazon Transcribe applies machine learning models to convert speech in audio to text transcriptions. One of the most powerful features of Amazon Transcribe is the ability to perform real-time transcription of audio. Until now, this functionality has been available via HTTP/2 streams. Today, we’re announcing the ability to connect to Amazon Transcribe using WebSockets as well.

For real-time transcription, Amazon Transcribe currently supports British English (en-GB), US English (en-US), French (fr-FR), Canadian French (fr-CA), and US Spanish (es-US).

What are WebSockets?

WebSockets are a protocol built on top of TCP, like HTTP. While HTTP is great for short-lived requests, it hasn’t historically been good at handling situations that require persistent real-time communications. While an HTTP connection is normally closed at the end of the message, a WebSocket connection remains open. This means that messages can be sent bi-directionally with no bandwidth or latency added by handshaking and negotiating a connection. WebSocket connections are full-duplex, meaning that the server can client can both transmit data at the same time. They were also designed for cross-domain usage, so there’s no messing around with cross-origin resource sharing (CORS) as there is with HTTP.

HTTP/2 streams solve a lot of the issues that HTTP had with real-time communications, and the first Amazon Transcribe Streaming API available uses HTTP/2. WebSocket support opens Amazon Transcribe Streaming up to a wider audience, and makes integrations easier for customers that might have existing WebSocket-based integrations or knowledge.

How the Amazon Transcribe Streaming API Works

Authorization

The first thing we need to do is authorize an IAM user to use Amazon Transcribe Streaming WebSockets. In the AWS Management Console, attach the following policy to your user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "transcribestreaming",
            "Effect": "Allow",
            "Action": "transcribe:StartStreamTranscriptionWebSocket",
            "Resource": "*"
        }
    ]
}

Authentication

Transcribe uses AWS Signature Version 4 to authenticate requests. For WebSocket connections, use a pre-signed URL, that contains all of the necessary information is passed as query parameters in the URL. This gives us an authenticated endpoint that we can use to establish our WebSocket.

Required Parameters

All of the required parameters are included in our pre-signed URL as part of the query string. These are:

  • language-code: The language code. One of en-US, en-GB, fr-FR, fr-CA, es-US.
  • sample-rate: The sample rate of the audio, in Hz. Max of 16000 for en-US and es-US, and 8000 for the other languages.
  • media-encoding: Currently only pcm is valid.
  • vocabulary-name: Amazon Transcribe allows you to define custom vocabularies for uncommon or unique words that you expect to see in your data. To use a custom vocabulary, reference it here.

Audio Data Requirements

There are a few things that we need to know before we start sending data. First, Transcribe expects audio to be encoded as PCM data. The sample rate of a digital audio file relates to the quality of the captured audio. It is the number of times per second (Hz) that the analog signal is checked in order to generate the digital signal. For high-quality data, a sample rate of 16,000 Hz or higher is recommended. For lower-quality audio, such as a phone conversation, use a sample rate of 8,000 Hz. Currently, US English (en-US) and US Spanish (es-US) support sample rates up to 48,000 Hz. Other languages support rates up to 16,000 Hz.

In our demo, the file lib/audioUtils.js contains a downsampleBuffer() function for reducing the sample rate of the incoming audio bytes from the browser, and a pcmEncode() function that takes the raw audio bytes and converts them to PCM.

Request Format

Once we’ve got our audio encoding as PCM data with the right sample rate, we need to wrap it in an envelope before we send it across the WebSocket connection. Each messages consists of three headers, followed by the PCM-encoded audio bytes in the message body. The entire message is then encoded as a binary event stream message and sent. If you’ve used the HTTP/2 API before, there’s one difference that I think makes using WebSockets a bit more straightforward, which is that you don’t need to cryptographically sign each chunk of audio data you send.

Response Format

The messages we receive follow the same general format: they are binary-encoded event stream messages, with three headers and a body. But instead of audio bytes, the message body contains a Transcript object. Partial responses are returned until a natural stopping point in the audio is determined. For more details on how this response is formatted, check out the docs and have a look at the handleEventStreamMessage() function in main.js.

Let’s See the Demo!

Now that we’ve got some context, let’s try out a demo. I’ve deployed it using AWS Amplify Console – take a look, or push the button to deploy your own copy. Enter the Access ID and Secret Key for the IAM User you authorized earlier, hit the Start Transcription button, and start speaking into your microphone.

Deploy to Amplify Console

The complete project is available on GitHub. The most important file is lib/main.js. This file defines all our required dependencies, wires up the buttons and form fields in index.html, accesses the microphone stream, and pushes the data to Transcribe over the WebSocket. The code has been thoroughly commented and will hopefully be easy to understand, but if you have questions, feel free to open issues on the GitHub repo and I’ll be happy to help. I’d like to extend a special thanks to Karan Grover, Software Development Engineer on the Transcribe team, for providing the code that formed that basis of this demo.

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Post Syndicated from Giuliana DeAngelis original https://blog.cloudflare.com/join-cloudflare-moz-at-our-next-meetup-serverless-in-seattle/

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!
Photo by oakie / Unsplash

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Cloudflare is organizing a meetup in Seattle on Tuesday, June 25th and we hope you can join. We’ll be bringing together members of the developers community and Cloudflare users for an evening of discussion about serverless compute and the infinite number of use cases for deploying code at the edge.

To kick things off, our guest speaker Devin Ellis will share how Moz uses Cloudflare Workers to reduce time to first byte 30-70% by caching dynamic content at the edge. Kirk Schwenkler, Solutions Engineering Lead at Cloudflare, will facilitate this discussion and share his perspective on how to grow and secure businesses at scale.

Next up, Developer Advocate Kristian Freeman will take you through a live demo of Workers and highlight new features of the platform. This will be an interactive session where you can try out Workers for free and develop your own applications using our new command-line tool.

Food and drinks will be served til close so grab your laptop and a friend and come on by!

View Event Details & Register Here

Agenda:

  • 5:00 pm Doors open, food and drinks
  • 5:30 pm Customer use case by Devin and Kirk
  • 6:00 pm Workers deep dive with Kristian
  • 6:30 – 8:30 pm Networking, food and drinks