Tag Archives: Product News

How to test HTTP/3 and QUIC with Firefox Nightly

Post Syndicated from Lucas Pardue original https://blog.cloudflare.com/how-to-test-http-3-and-quic-with-firefox-nightly/

How to test HTTP/3 and QUIC with Firefox Nightly

How to test HTTP/3 and QUIC with Firefox Nightly

HTTP/3 is the third major version of the Hypertext Transfer Protocol, which takes the bold step of moving away from TCP to the new transport protocol QUIC in order to provide performance and security improvements.

During Cloudflare’s Birthday Week 2019, we were delighted to announce that we had enabled QUIC and HTTP/3 support on the Cloudflare edge network. This was joined by support from Google Chrome and Mozilla Firefox, two of the leading browser vendors and partners in our effort to make the web faster and more reliable for all. A big part of developing new standards is interoperability, which typically means different people analysing, implementing and testing a written specification in order to prove that it is precise, unambiguous, and actually implementable.

At the time of our announcement, Chrome Canary had experimental HTTP/3 support and we were eagerly awaiting a release of Firefox Nightly. Now that Firefox supports HTTP/3 we thought we’d share some instructions to help you enable and test it yourselves.

How do I enable HTTP/3 for my domain?

Simply go to the Cloudflare dashboard and flip the switch from the “Network” tab manually:

How to test HTTP/3 and QUIC with Firefox Nightly

Using Firefox Nightly as an HTTP/3 client

Firefox Nightly has experimental support for HTTP/3. In our experience things are pretty good but be aware that you might experience some teething issues, so bear that in mind if you decide to enable and experiment with HTTP/3. If you’re happy with that responsibility, you’ll first need to download and install the latest Firefox Nightly build. Then open Firefox and enable HTTP/3 by visiting “about:config” and setting “network.http.http3.enabled” to true. There are some other parameters that can be tweaked but the defaults should suffice.

How to test HTTP/3 and QUIC with Firefox Nightly
about:config can be filtered by using a search term like “http3”.

Once HTTP/3 is enabled, you can visit your site to test it out. A straightforward way to check if HTTP/3 was negotiated is to check the Developer Tools “Protocol” column in the “Network” tab (on Windows and Linux the Developer Tools keyboard shortcut is Ctrl+Shift+I, on macOS it’s Command+Option+I). This “Protocol” column might not be visible at first, so to enable it right-click one of the column headers and check “Protocol” as shown below.

How to test HTTP/3 and QUIC with Firefox Nightly

Then reload the page and you should see that “HTTP/3” is reported.

How to test HTTP/3 and QUIC with Firefox Nightly

The aforementioned teething issues might cause HTTP/3 not to show up initially. When you enable HTTP/3 on a zone, we add a header field such as alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400 to all responses for that zone. Clients see this as an advertisement to try HTTP/3 out and will take up the offer on the next request. So to make this happen you can reload the page but make sure that you bypass the local browser cache (via the “Disable Cache” checkbox, or use the Shift-F5 key combo) or else you’ll just see the protocol used to fetch the resource the first time around. Finally, Firefox provides the “about:networking” page which provides a list of visited zones and the HTTP version that was used to load them; for example, this very blog.

How to test HTTP/3 and QUIC with Firefox Nightly
about:networking contains a table of all visited zones and the connection properties.

Sometimes browsers can get sticky to an existing HTTP connection and will refuse to start an HTTP/3 connection, this is hard to detect by humans, so sometimes the best option is to close the app completely and reopen it. Finally, we’ve also seen some interactions with Service Workers that make it appear that a resource was fetched from the network using HTTP/1.1, when in fact it was fetched from the local Service Worker cache. In such cases if you’re keen to see HTTP/3 in action then you’ll need to deregister the Service Worker. If you’re in doubt about what is happening on the network it is often useful to verify things independently, for example capturing a packet trace and dissecting it with Wireshark.

What’s next?

The QUIC Working Group recently announced a “Working Group Last Call”, which marks an important milestone in the continued maturity of the standards. From the announcement:

After more than three and a half years and substantial discussion, all 845 of the design issues raised against the QUIC protocol drafts have gained consensus or have a proposed resolution. In that time the protocol has been considerably transformed; it has become more secure, much more widely implemented, and has been shown to be interoperable. Both the Chairs and the Editors feel that it is ready to proceed in standardisation.

The coming months will see the specifications settle and we anticipate that implementations will continue to improve their QUIC and HTTP/3 support, eventually enabling it in their stable channels. We’re pleased to continue working with industry partners such as Mozilla to help build a better Internet together.

In the meantime, you might want to check out our guides to testing with other implementations such as Chrome Canary or curl. As compatibility becomes proven, implementations will shift towards optimizing their performance; you can read about Cloudflare’s efforts on comparing HTTP/3 to HTTP/2 and the work we’ve done to improve performance by adding support for CUBIC and HyStart++ to our congestion control module.

Health Check Analytics and how you can use it

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/health-check-analytics-and-how-you-can-use-it/

Health Check Analytics and how you can use it

At the end of last year, we introduced Standalone Health Checks – a service that lets you monitor the health of your origin servers and avoid the need to purchase additional third party services. The more that can be controlled from Cloudflare decreases maintenance cost, vendor management, and infrastructure complexity. This is important as it ensures you are able to scale your infrastructure seamlessly as your company grows. Today, we are introducing Standalone Health Check Analytics to help decrease your time to resolution for any potential issues. You can find Health Check Analytics in the sub-menu under the Traffic tab in your Cloudflare Dashboard.

Health Check Analytics and how you can use it

As a refresher, Standalone Health Checks is a service that monitors an IP address or hostname for your origin servers or application and notifies you in near real-time if there happens to be a problem. These Health Checks support fine-tuned configurations based on expected codes, interval, protocols, timeout and more. These configurations enable you to properly target your checks based on the unique setup of your infrastructure. An example of a Health Check can be seen below which is monitoring an origin server in a staging environment with a notification set via email.

Health Check Analytics and how you can use it

Once you set up a notification, you will be alerted when there is a change in the health of your origin server. In the example above, if your staging environment starts responding with anything other than a 200 OK response code, we will send you an email within seconds so you can take the necessary action before customers are impacted.

Introducing Standalone Health Check Analytics

Once you get the notification email, we provide tools that help to quickly debug the possible cause of the issue with detailed logs as well as data visualizations enabling you to better understand the context around the issue. Let’s walk through a real-world scenario and see how Health Check Analytics helps decrease our time to resolution.

A notification email has been sent to you letting you know that Staging is unhealthy. You log into your dashboard and go into Health Check Analytics for this particular Health Check. In the screenshot below, you can see that Staging is up 76% of the time vs 100% of the time for Production. Now that we see the graph validating the email notification that there is indeed a problem, we need to dig in further.  Below the graph you can see a breakdown of the type of errors that have taken place in both the Staging and Production addresses over the specified time period. We see there is only one error taking place, but very frequently, in the staging environment – a TCP Connection Failed error, leading to the lower availability.

This starts to narrow the funnel for what the issue could be. We know that there is something wrong with either the Staging server’s ability to receive connections, maybe an issue during the SYN-ACK handshake, or possibly an issue with the router being used and not an issue with the origin server at all but instead receiving a down-stream consequence. With this information, you can quickly make the necessary checks to validate your hypothesis and minimize your time to resolution. Instead of having to dig through endless logs, or try to make educated guesses at where the issue could stem from, Health Check Analytics allows you to quickly hone in on detailed areas that could be the root cause. This in turn maximizes your application reliability but more importantly, keeps trust and brand expectation with your customers.

Health Check Analytics and how you can use it

Being able to quickly understand an overview of your infrastructure is important, but sometimes being able to dig deeper into each healthcheck can be more valuable to understand what is happening at a granular level. In addition to general information like address, response-code, round trip time (RTT) and failure reason,  we are adding more features to help you understand the Health Check result(s). We have also added extra information into the event table so you can quickly understand a given problem. In the case of a Response Code Mismatch Error, we now provide the expected response code for a given Health Check along with the received code. This removes the need to go back and check the configuration that may have been setup long ago and keep focus on the problem at hand.

Health Check Analytics and how you can use it

The availability of different portions of your infrastructure is very important, but this does not provide the complete view. Performance is continuing to skyrocket in importance and value to customers. If an application is not performant, they will quickly go to a competitor without a second thought. Sometimes RTT is not enough to understand why requests have higher latency and where the root of the issue may reside. To better understand where time is spent for a given request, we are introducing the waterfall view of a request within the Event Log. With this view, you can understand the time taken for the TCP connection, time taken for the TLS handshake, and the time to first byte (TTFB) for the request. The waterfall will give you a chronological idea about time spent in different stages of the request.

  1. Time taken for establishing the initial TCP connection.(in dark blue, approx 41ms)
  2. Once the TCP connection is established, time is spent doing the TLS handshake. This is another component that takes up time for HTTPS websites. (in light blue, approx 80ms)
  3. Once the SYN-ACK handshake and connection is complete, then the time taken for the first byte to be received is also exposed. (in dark orange, approx 222ms)
  4. The total round trip time (RTT) is the time taken to load the complete page once the TLS handshake is complete. The difference between the RTT and the TTFB gives you the time spent downloading content from a page. If your page has a large amount of content, the difference between TTFB and RTT will be high. (in light orange, approx 302ms). The page load time is approximately 80 ms for the address.

Using the information above lends to a number of steps that can be taken for the website owner. The delay in initial TCP connection time could be decreased by making the website available in different geo locations around the world. This could also reduce the time for TLS handshake as each round trip will be faster. Another thing that is visible is the page load time of 80ms. This might be because of the contents of the page, maybe compression can be applied on the server side to make the load time better or unnecessary content can be removed. The information in the waterfall view can also tell if an additional external library increases the time to load the website after a release.

Cloudflare has over 200 edge locations around the world making it one of the largest Anycast networks on the planet. When a health check is configured, it can be run across the different regions on the Cloudflare infrastructure, enabling you to see the variation in latency around the world for specific Health Checks.

Health Check Analytics and how you can use it
Waterfall from India
Health Check Analytics and how you can use it
Waterfall from Western North America‌‌

Based on the new information provided from Health Check Analytics, users can definitively validate that the address performs better from Western North America compared to India due to the page load time and overall RTT.

How do health checks run?

To understand and decipher the logs that are found in the analytics dashboard, it is important to understand how Cloudflare runs the Health Checks. Cloudflare has data centers in more than 200 cities across 90+ countries throughout the world [more]. We don’t run health checks from every single of these data centers (that would be a lot of requests to your servers!). Instead, we let you pick between one and thirteen regions from which to run health checks [Regions].

The Internet is not the same everywhere around the world. So your users may not have the same experience on your application according to where they are. Running Health Checks from different regions lets you know the health of your application from the point of view of the Cloudflare network in each of these regions.

Imagine you configure a Health Check from two regions, Western North America and South East Asia, at an interval of 10 seconds. You may have been expecting to get two requests to your origin server every 10 seconds, but if you look at your server’s logs you will see that you are actually getting six. That is because we send Health Checks not just from one location in each region but three.

Health Check Analytics and how you can use it

For a health check configured from All Regions (thirteen regions) there will be 39 requests to your server per configured interval.

You may wonder: ‘Why do you probe from multiple locations within a region?’ We do this to make sure the health we report represents the overall performance of your service as seen from that region. Before we report a change, we check that at least two locations agree on the status. We added a third one to make sure that the system keeps running even if there is an issue at one of our locations.

Conclusion

Health Check Analytics is now live and available to all Pro, Business and Enterprise customers!  We are very excited to help decrease your time to resolution and ensure your application reliability is maximised.

The Bandwidth Alliance Charges Forward with New Partners – Alibaba, Zenlayer, and Cherry Servers

Post Syndicated from Arjunan Rajeswaran original https://blog.cloudflare.com/bandwidth-alliance-charges-forward/

The Bandwidth Alliance Charges Forward with New Partners - Alibaba, Zenlayer, and Cherry Servers

We started the Bandwidth Alliance in 2018 with a group of like-minded cloud and networking partners. Our common goal was to help our mutual customers reduce or eliminate data transfer charges, sometimes known as “bandwidth” or “egress” fees, between the cloud and the consumer. By reducing or eliminating these costs, our customers can more easily choose a best of breed set of solutions because they don’t have to worry about data charges from moving workloads between vendors, and thereby becoming locked-in to a single provider for all their needs. Today we’re announcing an important milestone: the addition of Alibaba, Zenlayer, and Cherry Servers to the Bandwidth Alliance, expanding it to a total of 20 partners. These partners offer our customers a wide choice of cloud services and products each suited to different needs.

In addition, we are working with our existing partners including Microsoft Azure, Digital Ocean and several others to onboard customers and provide them the benefits of the Bandwidth Alliance. Contact us at [email protected] if you are interested.

Customer savings  

Over the past year we have seen several customers take advantage of the Bandwidth Alliance and wanted to highlight two examples.

Nodecraft, which allows users an easy way to set up their own game servers, is a perfect example of how the Bandwidth Alliance helped to cut down egress costs. Nodecraft supports games like Minecraft, ARK: Survival Evolved, and Counter-Strike. As Nodecraft’s popularity increased, so did their AWS bill. They were not only being charged for storage they were using, but also for ‘egress’ or data transfer fees out of AWS. They made the decision to move their storage to Backblaze. Now they use Backblaze’s B2 Storage and Cloudflare’s extensive network to deliver content to customers without any egress charges. Read more about their journey here and here.

Pippa.io provides simple and smart solutions for podcasting, including hosting, analytics, and ads. The most important of Pippa’s business is rapid and reliable asset delivery. As Pippa grew, they were pushing millions of large audio files to listeners worldwide. This resulted in significantly increased costs, including excessive data egress fees for retrieving data from a cloud storage service such as AWS S3. DigitalOcean waives egress fees to transfer data to Cloudflare, effectively creating a zero-cost data bridge from DigitalOcean to Cloudflare’s global network. Pippa moved to DigitalOcean storage and Cloudflare’s global cloud security and delivery network. With the combination of lower cost storage and zero egress fees, Pippa saw a 50% savings on their cloud bill.

What could your savings be?

Nodecraft and Pippa are just two examples of small businesses who are seeing significant cost savings from the Bandwidth Alliance. They both chose the best storage and cloud solution for their use case and a global cloud network by Cloudflare without any taxation of transferring data between these two products. With our newly added partners we expect many more customers to benefit.

You may be asking – ‘How much can I save?’ To help you get a sense of the scale of your potential savings, by moving to the Bandwidth Alliance, we have put together a calculator. Fill in your details on egress to figure out how much you could be saving. We hope this is a helpful resource as you evaluate your cloud platform choices and spend.

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/adding-the-fallback-pool-to-the-load-balancing-ui/

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

The Cloudflare Load Balancer was introduced over three years ago to provide our customers with a powerful, easy to use tool to intelligently route traffic to their origins across the world. During the initial design process, one of the questions we had to answer was ‘where do we send traffic if all pools are down?’ We did not think it made sense just to drop the traffic, so we used the concept of a ‘fallback pool’ to send traffic to a ‘pool of last resort’ in the case that no pools were detected as available. While this may still result in an error, it gave an eyeball request a chance at being served successfully in case the pool was still up.

As a brief reminder, a load balancer helps route traffic across your origin servers to ensure your overall infrastructure stays healthy and available. Load Balancers are made up of pools, which can be thought of as collections of servers in a particular location.

Over the past three years, we’ve made many updates to the dashboard. The new designs now support the fallback pool addition to the dashboard UI. The use of a fallback pool is incredibly helpful in a tight spot, but not having it viewable in the dashboard led to confusion around which pool was set as the fallback. Was there a fallback pool set at all? We want to be sure you have the tools to support your day-to-day work, while also ensuring our dashboard is usable and intuitive.

You can now check which pool is set as the fallback in any given Load Balancer, along with being able to easily designate any pool in the Load Balancer as the fallback. If no fallback pool is set, then the last pool in the list will automatically be chosen. We made the decision to auto-set a pool to be sure that customers are always covered in case the worst scenario happens. You can access the fallback pool within the Traffic App of the Cloudflare dashboard when creating or editing a Load Balancer.

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Load Balancing UI Improvements

Not only did we add the fallback pool to the UI, but we saw this as an opportunity to update other areas of the Load Balancing app that have caused some confusion in the past.

Facelift and De-modaling

As a start, we gave the main Load Balancing page a face lift as well as de-modaling (moving content out of a smaller modal screen into a larger area) the majority of the Load Balancing UI. We felt moving this content out of a small web element would allow users to more easily understand the content on the page and allow us to better use the larger available space rather than being limited to the small area of a modal. This change has been applied when you create or edit a Load Balancer and manage monitors and/or pools.

Before:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

After:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

The updated UI has combined the health status and icon to declutter the available space and make it clear at a glance what the status is for a particular Load Balancer or Pool. We have also updated to a smaller toggle button across the Load Balancing UI, which allows us to update the action buttons with the added margin space gained. Now that we are utilizing the page surface area more efficiently, we moved forward to add more information in our tables so users are more aware of the shared aspects of their Load Balancer.

Shared Objects and Editing

Shared objects have caused some level of concern for companies who have teams across the world – all leveraging the Cloudflare dashboard.

Some of the shared objects, Monitors and Pools, have a new column added outlining which Pools or Load Balancers are currently in use by a particular Monitor or Pool. This brings more clarity around what will be affected by any changes made by someone from your organization. This supports users to be more autonomous and confident when they make an update in the dashboard. If someone from team A wants to update a monitor for a production server, they can do so without the worry of monitoring for another pool possibly breaking or have to speak to team B first. The time saved and empowerment to make updates as things change in your business is incredibly valuable. It supports velocity you may want to achieve while maintaining a safe environment to operate in. The days of having to worry about unforeseen consequences that could crop up later down the road are swiftly coming to a close.

This helps teams understand the impact of a given change and what else would be affected. But, we did not feel this was enough. We want to be sure that everyone is confident in the changes they are making. On top of the additional columns, we added in a number of confirmation modals to drive confidence about a particular change. Also, a list in the modal of the other Load Balancers or Pools that would be impacted. We really wanted to drive the message home around which objects are shared: we made a final change to allow edits of monitors to take place only within the Manage Monitors page. We felt that having users navigate to the manage page in itself gives more understanding that these items are shared. For example, allowing edits to a Monitor in the same view of editing a Load Balancer can make it seem like those changes are only for that Load Balancer, which is not always the case.

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Manage Monitors before:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Manage monitors after:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Updated CTAs/Buttons

Lastly, when users would expand the Manage Load Balancer table to view more details about their Pools or Origins within that specific Load Balancer, they would click the large X icon in the top right of that expanded card to close it – seems reasonable in the expanded context.

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

But, the X icon did not close the expanded card, but rather deleted the Load Balancer altogether. This is dangerous and we want to prevent users from making mistakes. With the added space we gained from de-modaling large areas of the UI, we have updated these buttons to be clickable text buttons that read ‘Edit’ or ‘Delete’ instead of the icon buttons. The difference is providing clearly defined text around the action that will take place, rather than leaving it up to a users interpretation of what the icon on the button means and the action it would result in. We felt this was much clearer to users and not be met with unwanted changes.

We are very excited about the updates to the Load Balancing dashboard and look forward to improving day in and day out.

After:

Adding the Fallback Pool to the Load Balancing UI and other significant UI enhancements

Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV

Post Syndicated from Zaid Farooqui original https://blog.cloudflare.com/remote-work-isnt-just-video-conferencing-how-we-built-cloudflaretv/

Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV

Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV

At Cloudflare, we produce all types of video content, ranging from recordings of our Weekly All-Hands to product demos. Being able to stream video on demand has two major advantages when compared to live video:

  1. It encourages asynchronous communication within the organization
  2. It extends the life time value of the shared knowledge

Historically, we haven’t had a central, secure repository of all video content that could be easily accessed from the browser. Various teams choose their own platform to share the content. If I wanted to find a recording of a product demo, for example, I’d need to search Google Drive, Gmail and Google Chat with creative keywords. Very often, I would need to reach out to individual teams to finally locate the content.

So we decided we wanted to build CloudflareTV, an internal Netflix-like application that can only be accessed by Cloudflare employees and has all of our videos neatly organized and immediately watchable from the browser.

We wanted to achieve the following when building CloudflareTV:

  • Security: make sure the videos are access controlled and not publicly accessible
  • Authentication: ensure the application can only be accessed by Cloudflare employees
  • Tagging: allow the videos to be categorized so they can be found easily
  • Originless: build the entire backend using Workers and Stream so we don’t need separate infrastructure for encoding, storage and delivery

Securing the videos using signed URLs

Every video uploaded to Cloudflare Stream can be locked down by requiring signed URLs. A Stream video can be marked as requiring signed URLs using the UI or by making an API call:

Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV

Once locked down in this way videos can’t be accessed directly. Instead, they can only be accessed using a temporary token.

In order to create signed tokens, we must first make an API call to create a key:

curl -X POST -H "X-Auth-Email: {$EMAIL}" -H "X-Auth-Key: {$AUTH_KEY}"  "https://api.cloudflare.com/client/v4/accounts/{$ACCOUNT_ID}/media/keys"

The API call will return a JSON object similar to this:

{
  "result": {
    "id": "...",
    "pem": "...",
    "jwk": "...",
    "created": "2020-03-10T18:17:00.075188052Z"
  },
  "success": true,
  "errors": [],
  "messages": []
}

We can use the id and pem values in a Workers script that takes a video ID and returns a signed token that expires after 1 hour:

async function generateToken(video_id) {
var exp_time = Math.round((new Date()).getTime() / 1000)+3600;

    const key_data = {
        'id': '{$KEY_ID}',
        'pem': '{$PEM}',
        'exp': exp_time
    }

    let response = await fetch('https://util.cloudflarestream.com/sign/'+video_id, {
        method: 'POST',
        body: JSON.stringify(key_data)
    });
    let token_value = await response.text();
    return token_value;
}

The returned signed token should look something like this:

eyJhbGciOiJSUzI1NiIsImtpZCI6IjExZDM5ZjEwY2M0NGY1NGE4ZDJlMjM5OGY3YWVlOGYzIn0.eyJzdWIiOiJiODdjOWYzOTkwYjE4ODI0ZTYzMTZlMThkOWYwY2I1ZiIsImtpZCI6IjExZDM5ZjEwY2M0NGY1NGE4ZDJlMjM5OGY3YWVlOGYzIiwiZXhwIjoiMTUzNzQ2MDM2NSIsIm5iZiI6IjE1Mzc0NTMxNjUifQ.C1BEveKi4XVeZk781K8eCGsMJrhbvj4RUB-FjybSm2xiQntFi7AqJHmj_ws591JguzOqM1q-Bz5e2dIEpllFf6JKK4DMK8S8B11Vf-bRmaIqXQ-QcpizJfewNxaBx9JdWRt8bR00DG_AaYPrMPWi9eH3w8Oim6AhfBiIAudU6qeyUXRKiolyXDle0jaP9bjsKQpqJ10K5oPWbCJ4Nf2QHBzl7Aasu6GK72hBsvPjdwTxdD5neazdxViMwqGKw6M8x_L2j2bj93X0xjiFTyHeVwyTJyj6jyPwdcOT5Bpuj6raS5Zq35qgvffXWAy_bfrWqXNHiQdSMOCNa8MsV8hljQsh

Stream provides an embed code for each video. The “src” attribute of the embed code typically contains the video ID. But if the video is private, instead of setting the “src” attribute to the video ID, you set it to the signed token value:

<stream src="eyJhbGciOiJSUzI1NiIsImtpZCI6IjExZDM5ZjEwY2M0NGY1NGE4ZDJlMjM5OGY3YWVlOGYzIn0.eyJzdWIiOiJiODdjOWYzOTkwYjE4ODI0ZTYzMTZlMThkOWYwY2I1ZiIsImtpZCI6IjExZDM5ZjEwY2M0NGY1NGE4ZDJlMjM5OGY3YWVlOGYzIiwiZXhwIjoiMTUzNzQ2MDM2NSIsIm5iZiI6IjE1Mzc0NTMxNjUifQ.C1BEveKi4XVeZk781K8eCGsMJrhbvj4RUB-FjybSm2xiQntFi7AqJHmj_ws591JguzOqM1q-Bz5e2dIEpllFf6JKK4DMK8S8B11Vf-bRmaIqXQ-QcpizJfewNxaBx9JdWRt8bR00DG_AaYPrMPWi9eH3w8Oim6AhfBiIAudU6qeyUXRKiolyXDle0jaP9bjsKQpqJ10K5oPWbCJ4Nf2QHBzl7Aasu6GK72hBsvPjdwTxdD5neazdxViMwqGKw6M8x_L2j2bj93X0xjiFTyHeVwyTJyj6jyPwdcOT5Bpuj6raS5Zq35qgvffXWAy_bfrWqXNHiQdSMOCNa8MsV8hljQsh" controls></stream>
<script data-cfasync="false" defer type="text/javascript" src="https://embed.videodelivery.net/embed/r4xu.fla9.latest.js"></script>

Tagging videos

We would like to categorize videos uploaded to Stream by tagging them. This can be done by updating the video object’s meta field and passing it arbitrary JSON data. To categorize a video, we simply update the meta field with a comma-delimited list of tags:

curl -X POST  -d '{"uid": "VIDEO_ID", "meta": {"tags": "All Hands,Stream"}}' "https://api.cloudflare.com/client/v4/accounts/{$ACCOUNT_ID}/stream/{$VIDEO_ID}"  -H "X-Auth-Email: {$EMAIL}"  -H "X-Auth-Key: {$ACCOUNT_KEY}"  -H "Content-Type: application/json"

Later, we will create a getVideos Worker function to fetch a list of videos and all associated data so we can render the UI. The tagging data we just set for this video will be included in the video data returned by the Worker.

Fetching Video Data using Workers

The heart of the UI is a list of videos. How do we get this list of videos programmatically? Stream provides an endpoint that returns all the videos and any metadata associated with them.

First, we set up environment variables for our Worker:

Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV

Next, we wrote a simple Workers function to call the Stream API and return a list of videos, eliminating the need for an origin:

async function getVideos() {
    const headers = {
        'X-Auth-Key': CF_KEY,
        'X-Auth-Email': CF_EMAIL
    }

    let response = await fetch(“https://api.cloudflare.com/client/v4/accounts/” + CF_ACCOUNT_ID + '/stream', {
        headers: headers
    });
    let video_list = await response.text();
    return video_list;
}

Lastly, we set up a zone and within the zone, we set up a Worker routes pointing to our Workers script. This can be done from the Workers tab:

Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV

Authenticating using Cloudflare Access

Finally, we want to restrict access to CloudflareTV to people within the organization. We can do this using Cloudflare Access, available under the Access tab.

To restrict access to CloudflareTV, we must do two things:

  1. Add a new login method
  2. Add an access policy

To add a new login method, click the “+” icon and choose your identity provider. In our case, we chose Google:

Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV

You will see a pop up asking for information including Client ID and Client Secret, both key pieces of information required to set up Google as the identity provider.

Once we add an identity provider, we want to tell Access “who specifically should be allowed to access our application?” This is done by creating an Access Policy.

Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV
Remote Work Isn’t Just Video Conferencing: How We Built CloudflareTV

We set up an Access Policy to only allow emails ending in our domain name. This effectively makes CloudflareTV only accessible by our team!

What’s next?

If you have interesting ideas around video, Cloudflare Stream lets you focus on your idea while it handles storage, encoding and the viewing experience for your users. Coupled that with Access and Workers, you can build powerful applications. Here are the docs to help you get started:

Protect your team with Cloudflare Gateway

Post Syndicated from Irtefa original https://blog.cloudflare.com/protect-your-team-with-cloudflare-gateway/

Protect your team with Cloudflare Gateway

On January 7th, we announced Cloudflare for Teams, a new way to protect organizations and their employees globally, without sacrificing performance. Cloudflare for Teams centers around two core products – Cloudflare Access and Cloudflare Gateway. Cloudflare Access is already available and used by thousands of teams around the world to secure internal applications. Cloudflare Gateway solves the other end of the problem by protecting those teams from security threats without sacrificing performance.

Today, we’re excited to announce new secure DNS filtering capabilities in Cloudflare Gateway. Cloudflare Gateway protects teams from threats like malware, phishing, ransomware, crypto-mining and other security threats. You can start using Cloudflare Gateway at dash.teams.cloudflare.com. Getting started takes less than five minutes.

Why Cloudflare Gateway?

We built Cloudflare Gateway to address key challenges our customers experience with managing and securing global networks. The root cause of these challenges is architecture and inability to scale. Legacy network security models solved problems in the 1990s, but teams have continued to attempt to force the Internet of the 2020s through them.

Historically, branch offices sent all of their Internet-bound traffic to one centralized data center at or  near corporate headquarters. Administrators configured that to make sure all requests passed through a secure hardware firewall. The hardware firewall observed each request, performed inline SSL inspection, applied DNS filtering and made sure that the corporate network was safe from security threats. This solution worked when employees accessed business critical applications from the office, and when applications were not on the cloud.

Protect your team with Cloudflare Gateway
Average SaaS spending per company since 2008 (source)

SaaS broke this model when cloud-delivered applications became the new normal for workforce applications. As business critical applications moved to the cloud, the number of Internet bound requests from all the offices went up. Costs went up, too. In the last 10 years, SaaS spending across all company size segments  grew by more than 1615%. The legacy model of backhauling all Internet traffic through centralized locations could not keep up with the digital transformation that all businesses are still going through.

Protect your team with Cloudflare Gateway

The challenge of backhauling traffic for a global workforce

Expensive and slow

SaaS adoption is only one element that is breaking traditional network models. Geographically distributed offices and remote workers are playing a role, too.

Cloudflare Gateway has been in beta use for some of our customers over the last few months. One of those customers had more than 50 branch offices, and sent all of their DNS traffic through one location. The customer’s headquarters is in New York, but they have offices all over the world, including in India. When someone from the office in India visits google.com, DNS requests travel all the way to New York.

As a result, employees in India have a terrible experience using the Internet. The legacy approach to solve this problem is to add MPLS links from branch offices to the headquarters. But MPLS links are expensive, and can take a long time to configure and deploy. Businesses end up spending millions of dollars on legacy solutions, or they remain slow, driving down employee productivity.

Protect your team with Cloudflare Gateway

Slow to react to security threats

When businesses backhaul traffic to a single location to inspect and filter malicious traffic using a hardware firewall. But, the legacy hardware appliances were not built for the modern Internet. The threat landscape for the Internet is constantly changing.

For example: about 84% of phishing sites exist for less than 24 hours (source) and legacy hardware firewalls are not fast enough to update their static rules to thwart phishing attacks. When security threats on the Internet act like moving targets, legacy hardware appliances that rely on static models to filter malicious traffic cannot keep up. As a result, employees remain vulnerable to new threats even when businesses backhaul Internet bound traffic to a single location.

Cloudflare Gateway

Starting today, businesses of all sizes can secure all their Internet-bound traffic and make it faster with  Cloudflare Gateway. Cloudflare has data centers in more than 200 cities around the world and all of our services run in every single data center. Therefore, when a business uses Cloudflare Gateway, instead of backhauling traffic to a single location (slow), all Internet-bound requests travel to the nearest data center (fast) from the end user where Cloudflare Gateway applies security policies to protect businesses from security threats. All of this is done without the need for expensive MPLS links.

Protect your team with Cloudflare Gateway
(Source)

Gateway’s secure DNS filtering capabilities are built on top of 1.1.1.1, the fastest public DNS resolver in the world. We took the pieces that made the 1.1.1.1 public DNS resolver the fastest and built Cloudflare Gateway’s secure DNS filtering capabilities for customers who want to secure their connection to the Internet. Combined with Cloudflare’s global presence of data centers in more than 200 cities and the fastest public DNS resolver in the world, Cloudflare Gateway secures every connection from every device to every destination on the Internet without sacrificing performance.

Protect your team with Cloudflare Gateway

Why Secure DNS Filtering?

More than 90% of malware use DNS to perform command & control attacks and exfiltrate sensitive data. Here’s an example of how a malware can infect a device or a data center and perform a command & control (also known as C2C or C&C) attack:

Protect your team with Cloudflare Gateway

  1. Imagine Bob receives an email from someone impersonating his manager with a link to ‘Box’ that looks harmless. The email looks legitimate but in reality it is a phishing email intended to steal valuable information from Bob’s computer or infected with malware.
  2. When Bob clicks on the link, the website phishing ‘Box’ delivers an exploit and installs malware onto Bob’s computer.
  3. The downloaded malware sends a request to the Command & Control server signaling that the malware is ready to receive instructions from the server.
  4. Once the connection between the malware and Command & Control server is established, the server sends instructions to the malware to steal proprietary data, control the state of the machine to reboot it, shut it down or perform DDoS attacks against other websites.

If Bob’s computer was using DNS filtering, it could have prevented the attack in two places.

First, when Bob clicked on the phishing link (2). The browser sends a DNS request to resolve the domain of the phishing link. If that domain was identified by DNS filtering as a phishing domain, it would have blocked it right away.

Second, when malware initiated the connection with the Command & Control server, the malware also needed to make a DNS request to learn about the Command & Control server’s IP address. This is another place where a secure DNS filtering service can detect the domain as malware and block access to it.

Secure DNS filtering acts as the first layer of defence against most security threats and prevents corporate networks and devices from getting infected by malicious software in the first place. According to a security report by Global Cyber Alliance, companies could have prevented losses of more than $200B using DNS filtering.

How does Gateway’s secure DNS filtering work?

The primary difference between the 1.1.1.1 public DNS resolver and Gateway’s secure DNS filtering is that the 1.1.1.1 public DNS resolver does not block any DNS queries. When a browser requests example.com, the 1.1.1.1 public DNS resolver simply looks up the answer for the DNS query either in cache or by performing a full recursive query.

Cloudflare Gateway adds one new step to introduce security into this flow. Instead of allowing all DNS queries, Gateway first checks the name being queried against the intelligence Cloudflare has about threats on the Internet. If that query matches a known threat, or is requesting a blocked category, Gateway stops it before the site could load for the user – and potentially execute code or phish that team member.

Protect your team with Cloudflare Gateway

For example, if a customer is using Cloudflare Gateway, and sends a DNS query to example.com, first, Gateway checks if the DNS query is coming from a customer. Second, if it is coming from a customer Gateway checks if the DNS query matches with any of the policies setup by the customer. The policy could be a domain that the customer is manually blocking or it could be part of a broader security category that the customer enabled. If the domain matches one of those cases, Cloudflare Gateway will block access to the domain. This will prevent the end user from going to example.com.

Encrypted DNS from day one

Gateway supports DNS over HTTPS today and will also support DNS over TLS in the future. You can use Firefox to start sending DNS queries to Gateway in an encrypted fashion. It will also support other DNS over HTTPS clients as long as you can change the hostname in your preferred DNS over HTTPS client.

Here’s how DNS over HTTPS for Cloudflare Gateway works:

Protect your team with Cloudflare Gateway

The DNS over HTTPS client encrypts the DNS request and sends it to the closest Cloudflare’s data center. Upon receiving the encrypted DNS request, it will decrypt it and send it to Cloudflare Gateway. Cloudflare Gateway will apply the required security policies and return the response to our edge. Our edge will encrypt the response and send it back to the DNS over HTTPS client.

By encrypting your DNS queries you will make sure that ISPs cannot snoop on your DNS queries and at the same time filter DNS requests that are malicious.

Cloudflare Gateway is for everyone

One of our customers, Algolia, is a fast growing startup. Algolia grew by 1005% in 2019 (source). As the company experienced rapid growth, Cloudflare Gateway helped maintain their corporate security without slowing them down:

Algolia is growing pretty fast. At Algolia, we needed a way to have visibility across our corporate network without slowing things down for our employees. Cloudflare Gateway gave us a simple way to do that
Adam Surak (Director of Infrastructure & Security Algolia)

But Gateway isn’t just for fast growing startups. Anyone with a Cloudflare account can start using Cloudflare Gateway today. Gateway has a free tier where we wanted to make sure even small businesses, teams and households who cannot afford expensive security solutions can use Cloudflare Gateway to protect themselves from security threats on the Internet. We offer a free plan to our customers because we have a paid tier for this product with additional functionality that are more suited towards super users. Features like longer data retention for analytics, more granular security and content categories, individual DNS query logs, logpush to a cloud storage bucket etc. are features that are only available to our paid customers. You can learn more about Gateway in our product page.

How can you get started?

If you already have a Cloudflare account get started by visiting the Teams dashboard.

The onboarding will walk you through how to configure your router, or device to send DNS queries to Gateway. The onboarding will help you setup a location. A location is usually a physical entity like your office, retail location, data center or home.

Protect your team with Cloudflare Gateway

Once you finish onboarding, start by configuring a policy. A policy will allow you to block access to malicious websites when anyone is using the Internet from the location that you just created.

Protect your team with Cloudflare Gateway

You can choose from the categories of policy that we have created. You can also manually add a domain to block it using Gateway.

Protect your team with Cloudflare Gateway

Once you start sending DNS queries to Gateway, you will see analytics on the team’s dashboard. The analytics dashboard will help you understand if there are any anomalies in your network.

What’s next

Cloudflare’s mission is to help create a better Internet. We have achieved this by protecting millions of websites around the world and securing millions of devices using WARP. With Cloudflare Access, we helped secure and protect internal applications. Today, with Cloudflare Gateway’s secure DNS filtering capabilities we have extended our mission to also protect the people who use the Internet every day. The product you are seeing today is a glimpse of what we are building for the future. Our team is incredibly proud of what we have built and we are just getting started.

Seamless remote work with Cloudflare Access

Post Syndicated from Sam Rhea original https://blog.cloudflare.com/seamless-remote-work-with-cloudflare-access/

Seamless remote work with Cloudflare Access

The novel coronavirus is actively changing how organizations work in real-time. According to Fortune, the virus has led to the “world’s largest work-from-home experiment.” As the epidemic crosses borders, employees are staying home and putting new stress on how companies manage remote work.

This is only accelerating an existing trend, however. Remote work has gained real traction in the last decade and Gartner projects that it will only continue. However, teams which are moving to a distributed model tend to do so slowly. When those timelines are accelerated, IT and security administrators need to be able to help their workforce respond without disrupting their team members.

Cloudflare Access can help teams migrate to a model that makes it seamless for users to work from any location, or any device, without the need for lengthy migrations or onboarding sessions. Cloudflare Access can be deployed in less than one hour and bring SaaS-like convenience and speed to the self-hosted applications that previously lived behind a VPN.

Leaving the castle-and-moat

When users share a physical space, working on a private network is easy. Users do not need clunky VPN clients to connect to the resources they need. Team members physically sit close to the origin servers and code repositories that power their corporate apps.

Seamless remote work with Cloudflare Access

In this castle-and-moat model, every team member is assumed to be trusted simply by their presence inside of the walls of the office. They can silently attempt to connect to any resource without any default checks. Administrators must build complex network segmentation to avoid breaches and logging is mostly absent.

Seamless remote work with Cloudflare Access

This model has begun to fall apart for two reasons: the shift to cloud-hosted applications and the distribution of employees around the world.

The first trend, cloud-hosted applications, shifts resources outside of the castle-and-moat. Corporate apps no longer live in on-premise data centers but operate from centralized cloud providers. Those environments can sit hundreds or thousands of miles away from users, slowing down the connections to the applications hosted in those providers.

The second shift, users working outside of the office or from branch offices, introduces both a performance challenge in addition to a security concern. Organizations need to poke holes in their perimeter to allow users to connect back into their private network, before sending those users on to their target destination.

The spread of the coronavirus has accelerated the trend of users working away from home. Remote workers are putting new strain on the VPN appliances that sit in corporate headquarters, and that adds to the burden of IT teams attempting to manage a workplace shift that is happening much faster than planned.

Cloudflare Access

Cloudflare Access is one-half of Cloudflare for Teams, a security platform that runs on Cloudflare’s network and focuses on keeping users, devices, and data safe without compromising for performance. We built Cloudflare Access to solve our own headaches with private networks as we grew from a team concentrated in a single office to a globally distributed organization.

Cloudflare Access replaces corporate VPNs with Cloudflare’s network. Instead of placing internal tools on a private network, teams deploy them in any environment, including hybrid or multi-cloud models, and secure them consistently with Cloudflare’s network.

Administrators build rules to decide who should be able to reach the tools protected by Access. In turn, when users need to connect to those tools, they are prompted to authenticate with their team’s identity provider. Cloudflare Access checks their login against the list of allowed users and, if permitted, allows the request to proceed.

Seamless remote work with Cloudflare Access

Work from any device

The coronavirus is not only changing where employees work, but also the devices they use to do their work. Digitimes reports that the demand for tablets continues to grow as workers find alternatives to the desktops sitting in corporate offices, a trend they attribute to the rise in cases of coronavirus and increasing percentages of employees working outside of the office.

Tablets and other mobile devices introduce new challenges for teams. Users need to install and configure a VPN profile to connect, if they can connect at all.

Cloudflare Access offers an alternative that requires no user action or IT administration. End users can login and reach their corporate apps from any device, no client or agent required.

Rapid remote development

Working remotely can be difficult for users doing their job on browser-based applications. It becomes much more difficult for engineers and developers who need to do their work over RDP or SSH.

In the event that teams need to connect to the desktops back inside of the office, Access also supports RDP connections. Team members can reach desktops over Cloudflare’s global network, reducing the latency of traditional VPN-based RDP clients. Organizations do not need to deploy new credentials or run the risk of leaving remote desktops open to the Internet. Cloudflare Access integrates with a team’s identity provider to bring SSO login to remote desktops.

Cloudflare Access also includes support for native SSH workflows. With Access, developers and engineers can connect over SSH to the code repositories or build systems they need to stay productive. Users can connect remotely, from low-end devices, to powerful servers and machines hosted in cloud environments.

Seamless remote work with Cloudflare Access

Additionally, with the SSH feature in Cloudflare Access, organizations can replace the static SSH keys that live on user devices with short-lived certificates generated when a user logs in to Okta, AzureAD, or any other supported identity provider. If team members working from home are using personal devices, organizations can prevent those devices from ever storing long-lived keys that can reach production systems or code repositories.

One-click logging and compliance

When users leave the office, security teams can lose a real layer of a defense-in-depth strategy. Employees do not badge into a front desk when they work remotely.

Cloudflare Access addresses remote work blindspots by adding additional visibility into how applications are used. Access logs every authentication event and, if enabled, every user request made to a resource protected by the platform. Administrators can capture every request and attribute it to a user and IP address without any code changes. Cloudflare Access can help teams meet compliance and regulatory requirements for distributed users without any additional development time.

Onboard users without onboarding sessions

When IT departments change how users do their work, even to faster and safer models, those shifts can still require teams to invest time in training employees. Discoverability becomes a real problem. If users cannot find the applications they need, teams lose the benefit of faster connections and maintenance overhead.

Cloudflare Access includes an application launchpad , available to every user with additional configuration. With the Access App Launch, administrators can also skip sending custom emails or lists of links to new contractors and replace them with a single URL. When external users login with LinkedIn, GitHub, or any other provider, the Access App Launch will display only the applications they can reach. In a single view, users can find and launch the tools that they need.

Whether those users are employees or contractors and partners, every team member can quickly find the tools they need to avoid losing a step as they shift from working on a private network to a model built on Cloudflare’s global network.

Seamless remote work with Cloudflare Access

How to get started

It’s really very simple. To find out more about Cloudflare for Teams, visit teams.cloudflare.com.

If you’re looking to get started with Cloudflare Access today, it’s available on any Cloudflare plan. The first five seats are free. Follow the link here to get started.

Finally, need help in getting it up? A quick start guide is available here.

Using your devices as the key to your apps

Post Syndicated from Sam Rhea original https://blog.cloudflare.com/using-your-devices-as-the-key-to-your-apps/

Using your devices as the key to your apps

I keep a very detailed budget. I have for the last 7 years. I manually input every expense into a spreadsheet app and use a combination of sumifs functions to track spending.

Opening the spreadsheet app, and then the specific spreadsheet, every time that I want to submit an expense is a little clunky. I’m working on a new project to make that easier. I’m building a simple web app, with a very basic form, into which I will enter one-off expenses. This form will then append those expenses as rows into the budget workbook.

I want to lock down this project; I prefer that I am the only person with the power to wreck my budget. To do that, I’m going to use Cloudflare Access. With Access, I can require a login to reach the page – no server-side changes required.

Except, I don’t want to allow logins from any device. For this project, I want to turn my iPhone into the only device that can reach this app.

To do that, I’ll use Cloudflare Access in combination with an open source toolkit from Cloudflare, cfssl. Together, I can convert my device into a secure key for this application in about 45 minutes.

While this is just one phone and a simple project, a larger organization could scale this up to hundreds of thousands or millions – without spending 45 minutes per device. Authentication occurs in the Cloudflare network and lets teams focus on securely deploying devices, from IoT sensors to corporate laptops, that solve new problems.


🎯 I have a few goals for this project:

  • Protect my prototype budget-entry app with authentication
  • Avoid building a custom login flow into the app itself
  • Use mutual TLS (mTLS) authentication so that only requests from my iPhone are allowed

🗺️ This walkthrough covers how to:

  • Build an Access policy to enforce mutual TLS authentication
  • Use Cloudflare’s PKI toolkit to create a Root CA and then generate a client certificate
  • Use OpenSSL to convert that client certificate into a format for iPhone usage
  • Place that client certificate on my iPhone

⏲️Time to complete: ~45 minutes


Cloudflare Access

Cloudflare Access is a bouncer that checks ID at the door. Any and every door.

Old models of security built on private networks operate like a guard at the front door of a large apartment building, except this apartment building does not have locks on any of the individual units. If you can walk through the front door, you could walk into any home. By default, private networks assume that a user on that network is trusted until proven malicious – you’re free to roam the building until someone reports you. None of us want to live in that complex.

Access replaces that model with a bouncer in front of each apartment unit. Cloudflare checks every attempt to reach a protected app, machine, or remote desktop against rules that define who is allowed in.

To perform that check, Access needs to confirm a user’s identity. To do that, teams can integrate Access with identity providers like G Suite, AzureAD, Okta or even Facebook and GitHub.

Using your devices as the key to your apps

For this project, I want to limit not just who can reach the app, but also what can reach it. I want to only allow my particular iPhone to connect. Since my iPhone does not have its own GitHub account, I need to use a workflow that allows devices to authenticate: certificates, specifically mutual TLS (mTLS) certificate authentication.

📃 Please reach out. Today, the mTLS feature in Access is only available to Enterprise plans. Are you on a self-serve plan and working on a project where you want to use mTLS? IoT, service-to-service, corporate security included. If so, please reach out to me at [email protected] and let’s chat.

mTLS and cfssl

Public key infrastructure (PKI) makes it possible for your browser to trust that this blog really is blog.cloudflare.com. When you visit this blog, the site presents a certificate to tell your browser that it is the real blog.cloudflare.com.

Your browser is skeptical. It keeps a short list of root certificates that it will trust. Your browser will only trust certificates signed by authorities in that list. Cloudflare offers free certificates for hostnames using its reverse proxy. You can also get origin certificates from other services like Let’s Encrypt. Either way, when you visit a web page with a certificate, you can ensure you are on the authentic site and that the traffic between you and the blog is encrypted.

For this project, I want to go the other direction. I want my device to present a certificate to Cloudflare Access demonstrating that it is my authentic iPhone. To do that, I need to create a chain that can issue a certificate to my device.

Cloudflare publishes an open source PKI toolkit, cfssl, which can solve that problem for me. cfssl lets me quickly create a Root CA and then use that root to generate a client certificate, which will ultimately live on my phone.

To begin, I’ll follow the instructions here to set up cfssl on my laptop. Once installed, I can start creating certificates.

Generating a Root CA and an allegory about Texas

First, I need to create the Root CA. This root will give Access a basis for trusting client certificates. Think of the root as the Department of Motor Vehicles (DMV) in Texas. Only the State of Texas, through the DMV, can issue Texas driver licenses. Bouncers do not need to know about every driver license issued, but they do know to trust the State of Texas and how to validate Texas-issued licenses.

In this case, Access does not need to know about every client cert issued by this Root CA. The product only needs to know to trust this Root CA and how to validate if client certificates were issued by this root.

I’m going to start by creating a new directory, cert-auth to keep things organized. Inside of that directory, I’ll create a folder, root, where I’ll store the Root CA materials

Next, I’ll define some details about the Root CA. I’ll create a file, ca-csr.json and give it some specifics that relate to my deployment.

{
    "CN": "Sam Money App",
    "key": {
      "algo": "rsa",
      "size": 4096
    },
    "names": [
      {
        "C": "PT",
        "L": "Lisboa",
        "O": "Money App Test",
        "OU": "Sam Projects",
        "ST": "Lisboa"
      }
    ]
  }

Now I need to configure how the CA will be used. I’ll create another new file, ca-config.json, and add the following details.

{
    "signing": {
      "default": {
        "expiry": "8760h"
      },
      "profiles": {
        "server": {
          "usages": ["signing", "key encipherment", "server auth"],
          "expiry": "8760h"
        },
        "client": {
          "usages": ["signing","key encipherment","client auth"],
          "expiry": "8760h"
        }
      }
    }
  }

The ca-csr.json file gives the Root CA a sense of identity and the ca-config.json will later define the configuration details when signing new client certificates.

With that in place, I can go ahead and create the Root CA. I’ll run the following command in my terminal from within the root folder.

$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca

The “Root CA” here is really a composition of three files, all of which are created by that command. cfssl generates a private key, a certificate signing request, and the certificate itself. The output should resemble this screenshot:

Using your devices as the key to your apps

I need to guard the private key like it’s the only thing that matters. In real production deployments, most organizations will create an intermediate certificate and sign client certificates with that intermediate. This allows administrators to keep the root locked down even further, they only need to handle it when creating new intermediates (and those intermediates can be quickly revoked). For this test, I’m just going to use a root to create the client certificates.

Now that I have the Root CA, I can upload the certificate in PEM format to Cloudflare Access. Cloudflare can then use that certificate to authenticate incoming requests for a valid client certificate.

In the Cloudflare Access dashboard, I’ll use the card titled “Mutual TLS Root Certificates”. I can click “Add A New Certificate” and then paste the content of the ca.pem file directly into it.

Using your devices as the key to your apps

I need to associate this certificate with a fully qualified domain name (FQDN). In this case, I’m going to use the certificate to authenticate requests for money.samrhea.com, so I’ll just input that subdomain, but I could associate this cert with multiple FQDNs if needed.

Once saved, the Access dashboard will list the new Root CA.

Using your devices as the key to your apps

Building an Access Policy

Before I deploy the budget app prototype to money.samrhea.com, I need to lock down that subdomain with an Access policy.

In the Cloudflare dashboard, I’ll select the zone samrhea.com and navigate to the Access tab. Once there, I can click Create Access Policy in the Access Policies card. That card will launch an editor where I can build out the rule(s) for reaching this subdomain.

Using your devices as the key to your apps

In the example above, the policy will be applied to just the subdomain money.samrhea.com. I could make it more granular with path-based rules, but I’ll keep it simple for now.

In the Policies section, I’m going to create a rule to allow client certificates signed by the Root CA I generated to reach the application. In this case, I’ll pick “Non Identity” from the Decision drop-down. I’ll then choose “Valid Certificate” under the Include details.

This will allow any valid certificate signed by the “Money App Test” CA I uploaded earlier. I could also build a rule using Common Names, but I’ll stick with valid cert for now. I’ll hit Save and finish the certificate deployment.

Issuing client certs and converting to PKCS #12

So far, I have a Root CA and an Access policy that enforces mTLS with client certs issued by that Root CA. I’ve stationed a bouncer outside of my app and told them to only trust ID cards issued by The State of Texas. Now I need to issue a license in the form of a client certificate.

To avoid confusion, I’m going to create a new folder in the same directory as the root folder, this one called client. Inside of this directory, I’ll create a new file: client-csr.json with the following .json blob:

{
    "CN": "Rhea Group",
    "hosts": [""],
    "key": {
      "algo": "rsa",
      "size": 4096
    },
    "names": [
      {
        "C": "PT",
        "L": "Lisboa",
        "O": "Money App Test",
        "OU": "Sam Projects",
        "ST": "Lisboa"
      }
    ]
  }

This sets configuration details for the client certificate that I’m about to request.

I can now use cfssl to generate a client certificate against my Root CA. The command below uses the -profile flag to create the client cert using the JSON configuration I just saved. This also gives the file the name iphone-client.

$ cfssl gencert -ca=../root/ca.pem -ca-key=../root/ca-key.pem -config=../root/ca-config.json -profile=client client-csr.json | cfssljson -bare iphone-client

The combined output should resemble the following:

Using your devices as the key to your apps

FileDescriptionclient-csr.jsonThe JSON configuration created earlier to specify client cert details.iphone-client-key.pemThe private key for the client certificate generated.iphone-client.csrThe certificate signing request used to request the client cert.iphone-client.pemThe client certificate created.

With my freshly minted client certificate and key, I can go ahead and test that it works with my Access policy with a quick cURL command.

$ curl -v --cert iphone-client.pem --key iphone-client-key.pem https://money.samrhea.com

That works, but I’m not done yet. I need to get this client certificate on my iPhone. To do so, I need to convert the certificate and key into a format that my iPhone understands, PKCS #12.

PKCS 12 is a file format used for storing cryptographic objects. To convert the two .pem files, the certificate and the key, into PKCS 12, I’m going to use the OpenSSL command-line tool.

OpenSSL is a popular toolkit for TLS and SSL protocols that can solve a wide variety of certificate use cases. In my example, I just need it for one command:

$ openssl pkcs12 -export -out sam-iphone.p12 -inkey iphone-client-key.pem -in iphone-client.pem -certfile ../root/ca.pem

The command above takes the key and certificate generated previously and converts them into a single .p12 file. I’ll also be prompted to create an “Export Password”. I’ll use something that I can remember, because I’m going to need it in the next section.

Using your devices as the key to your apps

Authenticating from my iPhone

I now need to get the .p12 file on my iPhone. In corporate environments, organizations distribute client certificates via mobile device management (MDM) programs or other tools. I’m just doing this for a personal test project, so I’m going to use AirDrop.

Using your devices as the key to your apps

Once my iPhone receives the file, I’ll be prompted to select a device where the certificate will be installed as a device profile.

Using your devices as the key to your apps

I’ll then be prompted to enter my device password and the password set in the “Export” step above. Once complete, I can view the certificate under Profiles in Settings.

Using your devices as the key to your apps

Now, when I visit money.samrhea.com for the first time from my phone, I’ll be prompted to use the profile created.

Using your devices as the key to your apps

Browsers can exhibit strange behavior when handling client certificate prompts. This should be the only time I need to confirm this profile should be used, but it might happen again.

What’s next?

My prototype personal finance app now is only accessible from my iPhone. This also makes it easy to login through Access from my device.

Access policies can be pretty flexible. If I want to reach it from a different device, I could build a rule to allow logins through Google as an alternative. I can also create a policy to require both a certificate and SSO login.

Beyond just authentication, I can also build something with this client cert flow now. Cloudflare Access makes the details from the client cert, the ones I created earlier in this tutorial, available to Cloudflare Workers. I can start to create routing rules or trigger actions based on the details about this client cert.

Introducing Cloudflare for Campaigns

Post Syndicated from Alissa Starzak original https://blog.cloudflare.com/introducing-cloudflare-for-campaigns/

Introducing Cloudflare for Campaigns

Introducing Cloudflare for Campaigns

During the past year, we saw nearly 2 billion global citizens go to the polls to vote in democratic elections. There were major elections in more than 50 countries, including India, Nigeria, and the United Kingdom, as well as elections for the European Parliament. In 2020, we will see a similar number of elections in countries from Peru to Myanmar. In November, U.S citizens will cast their votes for the 46th President, 435 seats in the U.S House of Representatives, 35 of the 100 seats in the U.S. Senate, and many state and local elections.

Recognizing the importance of maintaining public access to election information, Cloudflare launched the Athenian Project in 2017, providing U.S. state and local government entities with the tools needed to secure their election websites for free. As we’ve seen, however, political parties and candidates for office all over the world are also frequent targets for cyberattack. Cybersecurity needs for campaign websites and internal tools are at an all time high.

Although Cloudflare has helped improve the security and performance of political parties and candidates for office all over the world for years, we’ve long felt that we could do more. So today, we’re announcing Cloudflare for Campaigns, a suite of Cloudflare services tailored to campaign needs. Cloudflare for Campaigns is designed to make it easier for all political campaigns and parties, especially those with small teams and limited resources, to get access to cybersecurity services.

Risks faced by political campaigns

Since Russians attempted to use cyberattacks to interfere in the U.S. Presidential election in 2016, the news has been filled with reports of cyber threats against political campaigns, in both the United States and around the world. Hackers targeted the Presidential campaigns of Emmanuel Macron in France and Angela Merkel in Germany with phishing attacks, the main political parties in the UK with DDoS attacks, and congressional campaigns in California with a combination of malware, DDoS attacks and brute force login attempts.

Both because of our services to state and local government election websites through the Athenian Project and because a significant number of political parties and candidates for office use our services, Cloudflare has seen many attacks on election infrastructure and political campaigns firsthand.

During the 2020 U.S. election cycle, Cloudflare has provided services to 18 major presidential campaigns, as well as a range of congressional campaigns. On a typical day, Cloudflare blocks 400,000 attacks against political campaigns, and, on a busy day, Cloudflare blocks more than 40 million attacks against campaigns.

What is Cloudflare for Campaigns?

Cloudflare for Campaigns is a suite of Cloudflare products focused on the needs of political campaigns, particularly smaller campaigns that don’t have the resources to bring significant cybersecurity resources in house. To ensure the security of a campaign website, the Cloudflare for Campaigns package includes Business-level service, as well as security tools particularly helpful for political campaigns websites, such as the web application firewall, rate limiting, load balancing, Enterprise level “I am Under Attack Support”, bot management, and multi-user account enablement.

Introducing Cloudflare for Campaigns

To ensure the security of internal campaign teams, the Cloudflare for Campaigns service will also provide tools for campaigns to ensure the security of their internal teams with Cloudflare Access, allowing for campaigns to secure, authenticate, and monitor user access to any domain, application, or path on Cloudflare, without using a VPN. Along with Access, we will be providing Cloudflare Gateway with DNS-based filtering at multiple locations to protect campaign staff as they navigate the Internet by keeping malicious content off the campaign’s network using DNS filtering, helping prevent users from running into phishing scams or malware sites. Campaigns can use Gateway after the product’s public release.

Cloudflare for Campaigns also includes Cloudflare reliability and security guide, which lists a best practice guide for political campaigns to maintain their campaign site and secure their internal teams.

Regulatory Challenges

Although there is widespread agreement that campaigns and political parties face threats of cyberattack, there is less consensus on how best to get political campaigns the help they need.  Many political campaigns and political parties operate under resource constraints, without the technological capability and financial resources to dedicate to cybersecurity. At the same time, campaigns around the world are the subject of a variety of different regulations intended to prevent corruption of democratic processes. As a practical matter, that means that, although campaigns may not have the resources needed to access cybersecurity services, donation of cybersecurity services to campaigns may not always be allowed.

In the U.S., campaign finance regulations prohibit corporations from providing any contributions of either money or services to federal candidates or political party organizations. These rules prevent companies from offering free or discounted services if those services are not provided on the same terms and conditions to similarly situated members of the general public. The Federal Elections Commission (FEC), which enforces U.S. campaign finance laws, has struggled with the issue of how best to apply those rules to the provision of free or discounted cybersecurity services to campaigns. In consideration of a number of advisory opinions, they have publicly wrestled with the competing priorities of securing campaigns from cyberattack while not opening a backdoor to donation of goods services that are intended to curry favors with particular candidates.

The FEC has issued two advisory opinions to tech companies seeking to provide free or discounted cybersecurity services to campaigns. In 2018, the FEC approved a request by Microsoft to offer a package of enhanced online account security protections for “election-sensitive” users. The FEC reasoned that Microsoft was offering the services to its paid users “based on commercial rather than political considerations, in the ordinary course of its business and not merely for promotional consideration or to generate goodwill.” In July 2019, the FEC approved a request by a cybersecurity company to provide low-cost anti-phishing services to campaigns because those services would be provided in the ordinary course of business and on the same terms and conditions as offered to similarly situated non-political clients.

In September 2018, a month after Microsoft submitted its request, Defending Digital Campaigns (DDC), a nonprofit established with the mission to “secure our democratic campaign process by providing eligible campaigns and political parties, committees, and related organizations with knowledge, training, and resources to defend themselves from cyber threats,” submitted a request to the FEC to offer free or reduced-cost cybersecurity services, including from technology corporations, to federal candidates and parties. Over the following months, the FEC issued and requested comment on multiple draft opinions on whether the donation was permissible and, if so, on what basis. As described by the FEC, to support its position, DDC represented that “federal candidates and parties are singularly ill-equipped to counteract these threats.” The FEC’s advisory opinion to DDC noted:

“You [DDC] state that presidential campaign committees and national party committees require expert guidance on cybersecurity and you contend that the ‘vast majority of campaigns’ cannot afford full-time cybersecurity staff and that ‘even basic cybersecurity consulting software and services’ can overextend the budgets of most congressional campaigns. AOR004. For instance, you note that a congressional candidate in California reported a breach to the Federal Bureau of Investigation (FBI) in March of this year but did not have the resources to hire a professional cybersecurity firm to investigate the attack, or to replace infected computers. AOR003.”

In May 2019, the FEC approved DDC’s request to partner with technology companies to provide free and discounted cybersecurity services “[u]nder the unusual and exigent circumstances” presented by the request and “in light of the demonstrated, currently enhanced threat of foreign cyberattacks against party and candidate committees.”

All of these opinions demonstrate the FEC’s desire to allow campaigns to access affordable cybersecurity services because of the heightened threat of cyberattack, while still being cautious to ensure that those services are offered transparently and consistent with the goals of campaign finance laws.

Partnering with DDC to Provide Free Services to US Candidates

We share the view of both DDC and the FEC that political campaigns — which are central to our democracy — must have the tools to protect themselves against foreign cyberattack. Cloudflare is therefore excited to announce a new partnership with DDC to provide Cloudflare for Campaigns for free to candidates and parties that meet DDC’s criteria.

Introducing Cloudflare for Campaigns

To receive free services under DDC, political campaigns must meet the following criteria, as the DDC laid out to the FEC:

  • A House candidate’s committee that has at least $50,000 in receipts for the current election cycle, and a Senate candidate’s committee that has at least $100,000 in receipts for the current election cycle;
  • A House or Senate candidate’s committee for candidates who have qualified for the general election ballot in their respective elections; or
  • Any presidential candidate’s committee whose candidate is polling above five percent in national polls.

For more information on eligibility for these services under DDC and the next steps, please visit cloudflare.com/campaigns/usa.

Election package

Although political campaigns are regulated differently all around the world, Cloudflare believes that the integrity of all political campaigns should be protected against powerful adversaries. With this in mind, Cloudflare will therefore also be offering Cloudflare for Campaigns as a paid service, designed to help campaigns all around the world as we attempt to address regulatory hurdles. For more information on how to sign up for the Cloudflare election package, please visit cloudflare.com/campaigns.

Introducing Cloudflare for Teams

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/introducing-cloudflare-for-teams/

Introducing Cloudflare for Teams

Ten years ago, when Cloudflare was created, the Internet was a place that people visited. People still talked about ‘surfing the web’ and the iPhone was less than two years old, but on July 4, 2009 large scale DDoS attacks were launched against websites in the US and South Korea.

Those attacks highlighted how fragile the Internet was and how all of us were becoming dependent on access to the web as part of our daily lives.

Fast forward ten years and the speed, reliability and safety of the Internet is paramount as our private and work lives depend on it.

We started Cloudflare to solve one half of every IT organization’s challenge: how do you ensure the resources and infrastructure that you expose to the Internet are safe from attack, fast, and reliable. We saw that the world was moving away from hardware and software to solve these problems and instead wanted a scalable service that would work around the world.

To deliver that, we built one of the world’s largest networks. Today our network spans more than 200 cities worldwide and is within milliseconds of nearly everyone connected to the Internet. We have built the capacity to stand up to nation-state scale cyberattacks and a threat intelligence system powered by the immense amount of Internet traffic that we see.

Introducing Cloudflare for Teams

Today we’re expanding Cloudflare’s product offerings to solve the other half of every IT organization’s challenge: ensuring the people and teams within an organization can access the tools they need to do their job and are safe from malware and other online threats.

The speed, reliability, and protection we’ve brought to public infrastructure is extended today to everything your team does on the Internet.

In addition to protecting an organization’s infrastructure, IT organizations are charged with ensuring that employees of an organization can access the tools they need safely. Traditionally, these problems would be solved by hardware products like VPNs and Firewalls. VPNs let authorized users access the tools they needed and Firewalls kept malware out.

Castle and Moat

Introducing Cloudflare for Teams

The dominant model was the idea of a castle and a moat. You put all your valuable assets inside the castle. Your Firewall created the moat around the castle to keep anything malicious out. When you needed to let someone in, a VPN acted as the drawbridge over the moat.

This is still the model most businesses use today, but it’s showing its age. The first challenge is that if an attacker is able to find its way over the moat and into the castle then it can cause significant damage. Unfortunately, few weeks go by without reading a news story about how an organization had significant data compromised because an employee fell for a phishing email, or a contractor was compromised, or someone was able to sneak into an office and plug in a rogue device.

The second challenge of the model is the rise of cloud and SaaS. Increasingly an organization’s resources aren’t in the just one castle anymore, but instead in different public cloud and SaaS vendors.

Services like Box, for instance, provide better storage and collaboration tools than most organizations could ever hope to build and manage themselves. But there’s literally nowhere you can ship a hardware box to Box in order to build your own moat around their SaaS castle. Box provides some great security tools themselves, but they are different from the tools provided by every other SaaS and public cloud vendor. Where IT organizations used to try to have a single pane of glass with a complex mess of hardware to see who was getting stopped by their moats and who was crossing their drawbridges, SaaS and cloud make that visibility increasingly difficult.

The third challenge to the traditional castle and moat strategy of IT is the rise of mobile. Where once upon a time your employees would all show up to work in your castle, now people are working around the world. Requiring everyone to login to a limited number of central VPNs becomes obviously absurd when you picture it as villagers having to sprint back from wherever they are across a drawbridge whenever they want to get work done. It’s no wonder VPN support is one of the top IT organization tickets and likely always will be for organizations that maintain a castle and moat approach.

Introducing Cloudflare for Teams

But it’s worse than that. Mobile has also introduced a culture where employees bring their own devices to work. Or, even if on a company-managed device, work from the road or home — beyond the protected walls of the castle and without the security provided by a moat.

If you’d looked at how we managed our own IT systems at Cloudflare four years ago, you’d have seen us following this same model. We used firewalls to keep threats out and required every employee to login through our VPN to get their work done. Personally, as someone who travels extensively for my job, it was especially painful.

Regularly, someone would send me a link to an internal wiki article asking for my input. I’d almost certainly be working from my mobile phone in the back of a cab running between meetings. I’d try and access the link and be prompted to login to our VPN in San Francisco. That’s when the frustration would start.

Corporate mobile VPN clients, in my experience, all seem to be powered by some 100-sided die that only will allow you to connect if the number of miles you are from your home office is less than 25 times whatever number is rolled. Much frustration, and several IT tickets later, with a little luck I may be able to connect. And, even then, the experience was horribly slow and unreliable.

When we audited our own system, we found that the frustration with the process had caused multiple teams to create work arounds that were, effectively, unauthorized drawbridges over our carefully constructed moat. And, as we increasingly adopted SaaS tools like Salesforce and Workday, we lost much visibility into how these tools were being used.

Around the same time we were realizing the traditional approach to IT security was untenable for an organization like Cloudflare, Google published their paper titled “BeyondCorp: A New Approach to Enterprise Security.” The core idea was that a company’s intranet should be no more trusted than the Internet. And, rather than the perimeter being enforced by a singular moat, instead each application and data source should authenticate the individual and device each time it is accessed.

The BeyondCorp idea, which has come to be known as a ZeroTrust model for IT security, was influential for how we thought about our own systems. Powerfully, because Cloudflare had a flexible global network, we were able to use it both to enforce policies as our team accessed tools as well as to protect ourselves from malware as we did our jobs.

Cloudflare for Teams

Today, we’re excited to announce Cloudflare for Teams™: the suite of tools we built to protect ourselves, now available to help any IT organization, from the smallest to the largest.

Cloudflare for Teams is built around two complementary products: Access and Gateway. Cloudflare Access™ is the modern VPN — a way to ensure your team members get fast access to the resources they need to do their job while keeping threats out. Cloudflare Gateway™ is the modern Next Generation Firewall — a way to ensure that your team members are protected from malware and follow your organization’s policies wherever they go online.

Powerfully, both Cloudflare Access and Cloudflare Gateway are built atop the existing Cloudflare network. That means they are fast, reliable, scalable to the largest organizations, DDoS resistant, and located everywhere your team members are today and wherever they may travel. Have a senior executive going on a photo safari to see giraffes in Kenya, gorillas in Rwanda, and lemurs in Madagascar — don’t worry, we have Cloudflare data centers in all those countries (and many more) and they all support Cloudflare for Teams.

Introducing Cloudflare for Teams

All Cloudflare for Teams products are informed by the threat intelligence we see across all of Cloudflare’s products. We see such a large diversity of Internet traffic that we often see new threats and malware before anyone else. We’ve supplemented our own proprietary data with additional data sources from leading security vendors, ensuring Cloudflare for Teams provides a broad set of protections against malware and other online threats.

Moreover, because Cloudflare for Teams runs atop the same network we built for our infrastructure protection products, we can deliver them very efficiently. That means that we can offer these products to our customers at extremely competitive prices. Our goal is to make the return on investment (ROI) for all Cloudflare for Teams customers nothing short of a no brainer. If you’re considering another solution, contact us before you decide.

Both Cloudflare Access and Cloudflare Gateway also build off products we’ve launched and battle tested already. For example, Gateway builds, in part, off our 1.1.1.1 Public DNS resolver. Today, more than 40 million people trust 1.1.1.1 as the fastest public DNS resolver globally. By adding malware scanning, we were able to create our entry-level Cloudflare Gateway product.

Cloudflare Access and Cloudflare Gateway build off our WARP and WARP+ products. We intentionally built a consumer mobile VPN service because we knew it would be hard. The millions of WARP and WARP+ users who have put the product through its paces have ensured that it’s ready for the enterprise. That we have 4.5 stars across more than 200,000 ratings, just on iOS, is a testament of how reliable the underlying WARP and WARP+ engines have become. Compare that with the ratings of any corporate mobile VPN client, which are unsurprisingly abysmal.

We’ve partnered with some incredible organizations to create the ecosystem around Cloudflare for Teams. These include endpoint security solutions including VMWare Carbon Black, Malwarebytes, and Tanium. SEIM and analytics solutions including Datadog, Sumo Logic, and Splunk. Identity platforms including Okta, OneLogin, and Ping Identity. Feedback from these partners and more is at the end of this post.

If you’re curious about more of the technical details about Cloudflare for Teams, I encourage you to read Sam Rhea’s post.

Serving Everyone

Cloudflare has always believed in the power of serving everyone. That’s why we’ve offered a free version of Cloudflare for Infrastructure since we launched in 2010. That belief doesn’t change with our launch of Cloudflare for Teams. For both Cloudflare Access and Cloudflare Gateway, there will be free versions to protect individuals, home networks, and small businesses. We remember what it was like to be a startup and believe that everyone deserves to be safe online, regardless of their budget.

With both Cloudflare Access and Gateway, the products are segmented along a Good, Better, Best framework. That breaks out into Access Basic, Access Pro, and Access Enterprise. You can see the features available with each tier in the table below, including Access Enterprise features that will roll out over the coming months.

Introducing Cloudflare for Teams

We wanted a similar Good, Better, Best framework for Cloudflare Gateway. Gateway Basic can be provisioned in minutes through a simple change to your network’s recursive DNS settings. Once in place, network administrators can set rules on what domains should be allowed and filtered on the network. Cloudflare Gateway is informed both by the malware data gathered from our global sensor network as well as a rich corpus of domain categorization, allowing network operators to set whatever policy makes sense for them. Gateway Basic leverages the speed of 1.1.1.1 with granular network controls.

Gateway Pro, which we’re announcing today and you can sign up to beta test as its features roll out over the coming months, extends the DNS-provisioned protection to a full proxy. Gateway Pro can be provisioned via the WARP client — which we are extending beyond iOS and Android mobile devices to also support Windows, MacOS, and Linux — or network policies including MDM-provisioned proxy settings or GRE tunnels from office routers. This allows a network operator to filter on policies not merely by the domain but by the specific URL.

Introducing Cloudflare for Teams

Building the Best-in-Class Network Gateway

While Gateway Basic (provisioned via DNS) and Gateway Pro (provisioned as a proxy) made sense, we wanted to imagine what the best-in-class network gateway would be for Enterprises that valued the highest level of performance and security. As we talked to these organizations we heard an ever-present concern: just surfing the Internet created risk of unauthorized code compromising devices. With every page that every user visited, third party code (JavaScript, etc.) was being downloaded and executed on their devices.

The solution, they suggested, was to isolate the local browser from third party code and have websites render in the network. This technology is known as browser isolation. And, in theory, it’s a great idea. Unfortunately, in practice with current technology, it doesn’t perform well. The most common way the browser isolation technology works is to render the page on a server and then push a bitmap of the page down to the browser. This is known as pixel pushing. The challenge is that can be slow, bandwidth intensive, and it breaks many sophisticated web applications.

We were hopeful that we could solve some of these problems by moving the rendering of the pages to Cloudflare’s network, which would be closer to end users. So we talked with many of the leading browser isolation companies about potentially partnering. Unfortunately, as we experimented with their technologies, even with our vast network, we couldn’t overcome the sluggish feel that plagues existing browser isolation solutions.

Enter S2 Systems

Introducing Cloudflare for Teams

That’s when we were introduced to S2 Systems. I clearly remember first trying the S2 demo because my first reaction was: “This can’t be working correctly, it’s too fast.” The S2 team had taken a different approach to browser isolation. Rather than trying to push down a bitmap of what the screen looked like, instead they pushed down the vectors to draw what’s on the screen. The result was an experience that was typically at least as fast as browsing locally and without broken pages.

The best, albeit imperfect, analogy I’ve come up with to describe the difference between S2’s technology and other browser isolation companies is the difference between WindowsXP and MacOS X when they were both launched in 2001. WindowsXP’s original graphics were based on bitmapped images. MacOS X were based on vectors. Remember the magic of watching an application “genie” in and out the MacOS X doc? Check it out in a video from the launch…

At the time watching a window slide in and out of the dock seemed like magic compared with what you could do with bitmapped user interfaces. You can hear the awe in the reaction from the audience. That awe that we’ve all gotten used to in UIs today comes from the power of vector images. And, if you’ve been underwhelmed by the pixel-pushed bitmaps of existing browser isolation technologies, just wait until you see what is possible with S2’s technology.

Introducing Cloudflare for Teams

We were so impressed with the team and the technology that we acquired the company. We will be integrating the S2 technology into Cloudflare Gateway Enterprise. The browser isolation technology will run across Cloudflare’s entire global network, bringing it within milliseconds of virtually every Internet user. You can learn more about this approach in Darren Remington’s blog post.

Once the rollout is complete in the second half of 2020 we expect we will be able to offer the first full browser isolation technology that doesn’t force you to sacrifice performance. In the meantime, if you’d like a demo of the S2 technology in action, let us know.

The Promise of a Faster Internet for Everyone

Cloudflare’s mission is to help build a better Internet. With Cloudflare for Teams, we’ve extended that network to protect the people and organizations that use the Internet to do their jobs. We’re excited to help a more modern, mobile, and cloud-enabled Internet be safer and faster than it ever was with traditional hardware appliances.

But the same technology we’re deploying now to improve enterprise security holds further promise. The most interesting Internet applications keep getting more complicated and, in turn, requiring more bandwidth and processing power to use.

For those of us fortunate enough to be able to afford the latest iPhone, we continue to reap the benefits of an increasingly powerful set of Internet-enabled tools. But try and use the Internet on a mobile phone from a few generations back, and you can see how quickly the latest Internet applications leaves legacy devices behind. That’s a problem if we want to bring the next 4 billion Internet users online.

We need a paradigm shift if the sophistication of applications and complexity of interfaces continues to keep pace with the latest generation of devices. To make the best of the Internet available to everyone, we may need to shift the work of the Internet off the end devices we all carry around in our pockets and let the network — where power, bandwidth, and CPU are relatively plentiful — carry more of the load.

That’s the long term promise of what S2’s technology combined with Cloudflare’s network may someday power. If we can make it so a less expensive device can run the latest Internet applications — using less battery, bandwidth, and CPU than ever before possible — then we can make the Internet more affordable and accessible for everyone.

We started with Cloudflare for Infrastructure. Today we’re announcing Cloudflare for Teams. But our ambition is nothing short of Cloudflare for Everyone.

Early Feedback on Cloudflare for Teams from Customers and Partners

Introducing Cloudflare for Teams

“Cloudflare Access has enabled Ziff Media Group to seamlessly and securely deliver our suite of internal tools to employees around the world on any device, without the need for complicated network configurations,” said Josh Butts, SVP Product & Technology, Ziff Media Group.

Introducing Cloudflare for Teams

“VPNs are frustrating and lead to countless wasted cycles for employees and the IT staff supporting them,” said Amod Malviya, Cofounder and CTO, Udaan. “Furthermore, conventional VPNs can lull people into a false sense of security. With Cloudflare Access, we have a far more reliable, intuitive, secure solution that operates on a per user, per access basis. I think of it as Authentication 2.0 — even 3.0”

Introducing Cloudflare for Teams

“Roman makes healthcare accessible and convenient,” said Ricky Lindenhovius, Engineering Director, Roman Health. “Part of that mission includes connecting patients to physicians, and Cloudflare helps Roman securely and conveniently connect doctors to internally managed tools. With Cloudflare, Roman can evaluate every request made to internal applications for permission and identity, while also improving speed and user experience.”

Introducing Cloudflare for Teams

“We’re excited to partner with Cloudflare to provide our customers an innovative approach to enterprise security that combines the benefits of endpoint protection and network security,” said Tom Barsi, VP Business Development, VMware. “VMware Carbon Black is a leading endpoint protection platform (EPP) and offers visibility and control of laptops, servers, virtual machines, and cloud infrastructure at scale. In partnering with Cloudflare, customers will have the ability to use VMware Carbon Black’s device health as a signal in enforcing granular authentication to a team’s internally managed application via Access, Cloudflare’s Zero Trust solution. Our joint solution combines the benefits of endpoint protection and a zero trust authentication solution to keep teams working on the Internet more secure.”

Introducing Cloudflare for Teams

“Rackspace is a leading global technology services company accelerating the value of the cloud during every phase of our customers’ digital transformation,” said Lisa McLin, vice president of alliances and channel chief at Rackspace. “Our partnership with Cloudflare enables us to deliver cutting edge networking performance to our customers and helps them leverage a software defined networking architecture in their journey to the cloud.”

Introducing Cloudflare for Teams

“Employees are increasingly working outside of the traditional corporate headquarters. Distributed and remote users need to connect to the Internet, but today’s security solutions often require they backhaul those connections through headquarters to have the same level of security,” said Michael Kenney, head of strategy and business development for Ingram Micro Cloud. “We’re excited to work with Cloudflare whose global network helps teams of any size reach internally managed applications and securely use the Internet, protecting the data, devices, and team members that power a business.”

Introducing Cloudflare for Teams

“At Okta, we’re on a mission to enable any organization to securely use any technology. As a leading provider of identity for the enterprise, Okta helps organizations remove the friction of managing their corporate identity for every connection and request that their users make to applications. We’re excited about our partnership with Cloudflare and bringing seamless authentication and connection to teams of any size,” said Chuck Fontana, VP, Corporate & Business Development, Okta.

Introducing Cloudflare for Teams

“Organizations need one unified place to see, secure, and manage their endpoints,” said Matt Hastings, Senior Director of Product Management at Tanium. “We are excited to partner with Cloudflare to help teams secure their data, off-network devices, and applications. Tanium’s platform provides customers with a risk-based approach to operations and security with instant visibility and control into their endpoints. Cloudflare helps extend that protection by incorporating device data to enforce security for every connection made to protected resources.”

Introducing Cloudflare for Teams

“OneLogin is happy to partner with Cloudflare to advance security teams’ identity control in any environment, whether on-premise or in the cloud, without compromising user performance,” said Gary Gwin, Senior Director of Product at OneLogin. “OneLogin’s identity and access management platform securely connects people and technology for every user, every app, and every device. The OneLogin and Cloudflare for Teams integration provides a comprehensive identity and network control solution for teams of all sizes.”

Introducing Cloudflare for Teams

“Ping Identity helps enterprises improve security and user experience across their digital businesses,” said Loren Russon, Vice President of Product Management, Ping Identity. “Cloudflare for Teams integrates with Ping Identity to provide a comprehensive identity and network control solution to teams of any size, and ensures that only the right people get the right access to applications, seamlessly and securely.”

Introducing Cloudflare for Teams

“Our customers increasingly leverage deep observability data to address both operational and security use cases, which is why we launched Datadog Security Monitoring,” said Marc Tremsal, Director of Product Management at Datadog. “Our integration with Cloudflare already provides our customers with visibility into their web and DNS traffic; we’re excited to work together as Cloudflare for Teams expands this visibility to corporate environments.”

Introducing Cloudflare for Teams

“As more companies support employees who work on corporate applications from outside of the office, it is vital that they understand each request users are making. They need real-time insights and intelligence to react to incidents and audit secure connections,” said John Coyle, VP of Business Development, Sumo Logic. “With our partnership with Cloudflare, customers can now log every request made to internal applications and automatically push them directly to Sumo Logic for retention and analysis.”

Introducing Cloudflare for Teams

“Cloudgenix is excited to partner with Cloudflare to provide an end-to-end security solution from the branch to the cloud.  As enterprises move off of expensive legacy MPLS networks and adopt branch to internet breakout policies, the CloudGenix CloudBlade platform and Cloudflare for Teams together can make this transition seamless and secure. We’re looking forward to Cloudflare’s roadmap with this announcement and partnership opportunities in the near term.” said Aaron Edwards, Field CTO, Cloudgenix.

Introducing Cloudflare for Teams

“In the face of limited cybersecurity resources, organizations are looking for highly automated solutions that work together to reduce the likelihood and impact of today’s cyber risks,” said Akshay Bhargava, Chief Product Officer, Malwarebytes. “With Malwarebytes and Cloudflare together, organizations are deploying more than twenty layers of security defense-in-depth. Using just two solutions, teams can secure their entire enterprise from device, to the network, to their internal and external applications.”

Introducing Cloudflare for Teams

“Organizations’ sensitive data is vulnerable in-transit over the Internet and when it’s stored at its destination in public cloud, SaaS applications and endpoints,” said Pravin Kothari, CEO of CipherCloud. “CipherCloud is excited to partner with Cloudflare to secure data in all stages, wherever it goes. Cloudflare’s global network secures data in-transit without slowing down performance. CipherCloud CASB+ provides a powerful cloud security platform with end-to-end data protection and adaptive controls for cloud environments, SaaS applications and BYOD endpoints. Working together, teams can rely on integrated Cloudflare and CipherCloud solution to keep data always protected without compromising user experience.”

Security on the Internet with Cloudflare for Teams

Post Syndicated from Sam Rhea original https://blog.cloudflare.com/cloudflare-for-teams-products/

Security on the Internet with Cloudflare for Teams

Security on the Internet with Cloudflare for Teams

Your experience using the Internet has continued to improve over time. It’s gotten faster, safer, and more reliable. However, you probably have to use a different, worse, equivalent of it when you do your work. While the Internet kept getting better, businesses and their employees were stuck using their own private networks.

In those networks, teams hosted their own applications, stored their own data, and protected all of it by building a castle and moat around that private world. This model hid internally managed resources behind VPN appliances and on-premise firewall hardware. The experience was awful, for users and administrators alike. While the rest of the Internet became more performant and more reliable, business users were stuck in an alternate universe.

That legacy approach was less secure and slower than teams wanted, but the corporate perimeter mostly worked for a time. However, that began to fall apart with the rise of cloud-delivered applications. Businesses migrated to SaaS versions of software that previously lived in that castle and behind that moat. Users needed to connect to the public Internet to do their jobs, and attackers made the Internet unsafe in sophisticated, unpredictable ways – which opened up every business to  a new world of never-ending risks.

How did enterprise security respond? By trying to solve a new problem with a legacy solution, and forcing the Internet into equipment that was only designed for private, corporate networks. Instead of benefitting from the speed and availability of SaaS applications, users had to backhaul Internet-bound traffic through the same legacy boxes that made their private network miserable.

Teams then watched as their bandwidth bills increased. More traffic to the Internet from branch offices forced more traffic over expensive, dedicated links. Administrators now had to manage a private network and the connections to the entire Internet for their users, all with the same hardware. More traffic required more hardware and the cycle became unsustainable.

Cloudflare’s first wave of products secured and improved the speed of those sites by letting customers, from free users to some of the largest properties on the Internet, replace that hardware stack with Cloudflare’s network. We could deliver capacity at a scale that would be impossible for nearly any company to build themselves. We deployed data centers in over 200 cities around the world that help us reach users wherever they are.

We built a unique network to let sites scale how they secured infrastructure on the Internet with their own growth. But internally, businesses and their employees were stuck using their own private networks.

Just as we helped organizations secure their infrastructure by replacing boxes, we can do the same for their teams and their data. Today, we’re announcing a new platform that applies our network, and everything we’ve learned, to make the Internet faster and safer for teams.
Cloudflare for Teams protects enterprises, devices, and data by securing every connection without compromising user performance. The speed, reliability and protection we brought to securing infrastructure is extended to everything your team does on the Internet.

The legacy world of corporate security

Organizations all share three problems they need to solve at the network level:

  1. Secure team member access to internally managed applications
  2. Secure team members from threats on the Internet
  3. Secure the corporate data that lives in both environments

Each of these challenges pose a real risk to any team. If any component is compromised, the entire business becomes vulnerable.

Internally managed applications

Solving the first bucket, internally managed applications, started by building a perimeter around those internal resources. Administrators deployed applications on a private network and users outside of the office connected to them with client VPN agents through VPN appliances that lived back on-site.

Users hated it, and they still do, because it made it harder to get their jobs done. A sales team member traveling to a customer visit in the back of a taxi had to start a VPN client on their phone just to review details about the meeting. An engineer working remotely had to sit and wait as every connection they made to developer tools was backhauled  through a central VPN appliance.

Administrators and security teams also had issues with this model. Once a user connects to the private network, they’re typically able to reach multiple resources without having to prove they’re authorized to do so . Just because I’m able to enter the front door of an apartment building, doesn’t mean I should be able to walk into any individual apartment. However, on private networks, enforcing additional security within the bounds of the private network required complicated microsegmentation, if it was done at all.

Threats on the Internet

The second challenge, securing users connecting to SaaS tools on the public Internet and applications in the public cloud, required security teams to protect against known threats and potential zero-day attacks as their users left the castle and moat.

How did most companies respond? By forcing all traffic leaving branch offices or remote users back through headquarters and using the same hardware that secured their private network to try and build a perimeter around the Internet, at least the Internet their users accessed. All of the Internet-bound traffic leaving a branch office in Asia, for example, would be sent back through a central location in Europe, even if the destination was just down the street.

Organizations needed those connections to be stable, and to prioritize certain functions like voice and video, so they paid carriers to support dedicated multi-protocol label switching (MPLS) links. MPLS delivered improved performance by applying label switching to traffic which downstream routers can forward without needing to perform an IP lookup, but was eye-wateringly expensive.

Securing data

The third challenge, keeping data safe, became a moving target. Organizations had to keep data secure in a consistent way as it lived and moved between private tools on corporate networks and SaaS applications like Salesforce or Office 365.

The answer? More of the same. Teams backhauled traffic over MPLS links to a place where data could be inspected, adding more latency and introducing more hardware that had to be maintained.

What changed?

The balance of internal versus external traffic began to shift as SaaS applications became the new default for small businesses and Fortune 500s alike. Users now do most of their work on the Internet, with tools like Office 365 continuing to gain adoption. As those tools become more popular, more data leaves the moat and lives on the public Internet.

User behavior also changed. Users left the office and worked from multiple devices, both managed and unmanaged. Teams became more distributed and the perimeter was stretched to its limit.

This caused legacy approaches to fail

Legacy approaches to corporate security pushed the  castle and moat model further out. However, that model simply cannot scale with how users do work on the Internet today.

Internally managed applications

Private networks give users headaches, but they’re also a constant and complex chore to maintain. VPNs require expensive equipment that must be upgraded or expanded and, as more users leave the office, that equipment must try and scale up.

The result is a backlog of IT help desk tickets as users struggle with their VPN and, on the other side of the house, administrators and security teams try to put band-aids on the approach.

Threats on the Internet

Organizations initially saved money by moving to SaaS tools, but wound up spending more money over time as their traffic increased and bandwidth bills climbed.

Additionally, threats evolve. The traffic sent back to headquarters was secured with static models of scanning and filtering using hardware gateways. Users were still vulnerable to new types of threats that these on-premise boxes did not block yet.

Securing data

The cost of keeping data secure in both environments also grew. Security teams attempted to inspect Internet-bound traffic for threats and data loss by backhauling branch office traffic through on-premise hardware, degrading speed and increasing bandwidth fees.

Even more dangerous, data now lived permanently outside of that castle and moat model. Organizations were now vulnerable to attacks that bypassed their perimeter and targeted SaaS applications directly.

How will Cloudflare solve these problems?

Cloudflare for Teams consists of two products, Cloudflare Access and Cloudflare Gateway.

Security on the Internet with Cloudflare for Teams

We launched Access last year and are excited to bring it into Cloudflare for Teams. We built Cloudflare Access to solve the first challenge that corporate security teams face: protecting internally managed applications.

Cloudflare Access replaces corporate VPNs with Cloudflare’s network. Instead of placing internal tools on a private network, teams deploy them in any environment, including hybrid or multi-cloud models, and secure them consistently with Cloudflare’s network.

Deploying Access does not require exposing new holes in corporate firewalls. Teams connect their resources through a secure outbound connection, Argo Tunnel, which runs in your infrastructure to connect the applications and machines to Cloudflare. That tunnel makes outbound-only calls to the Cloudflare network and organizations can replace complex firewall rules with just one: disable all inbound connections.

Administrators then build rules to decide who should authenticate to and reach the tools protected by Access. Whether those resources are virtual machines powering business operations or internal web applications, like Jira or iManage, when a user needs to connect, they pass through Cloudflare first.

When users need to connect to the tools behind Access, they are prompted to authenticate with their team’s SSO and, if valid, are instantly connected to the application without being slowed down. Internally-managed apps suddenly feel like SaaS products, and the login experience is seamless and familiar

Behind the scenes, every request made to those internal tools hits Cloudflare first where we enforce identity-based policies. Access evaluates and logs every request to those apps for identity, to give administrators more visibility and to offer more security than a traditional VPN.

Security on the Internet with Cloudflare for Teams

Every Cloudflare data center, in 200 cities around the world, performs the entire authentication check. Users connect faster, wherever they are working, versus having to backhaul traffic to a home office.

Access also saves time for administrators. Instead of configuring complex and error-prone network policies, IT teams build policies that enforce authentication using their identity provider. Security leaders can control who can reach internal applications in a single pane of glass and audit comprehensive logs from one source.

In the last year, we’ve released features that expand how teams can use Access so they can fully eliminate their VPN. We’ve added support for RDP, SSH, and released support for short-lived certificates that replace static keys. However, teams also use applications that do not run in infrastructure they control, such as SaaS applications like Box and Office 365. To solve that challenge, we’re releasing a new product, Cloudflare Gateway.

Security on the Internet with Cloudflare for Teams

Cloudflare Gateway secures teams by making the first destination a Cloudflare data center located near them, for all outbound traffic. The product places Cloudflare’s global network between users and the Internet, rather than forcing the Internet through legacy hardware on-site.

Cloudflare Gateway’s first feature begins by preventing users from running into phishing scams or malware sites by combining the world’s fastest DNS resolver with Cloudflare’s threat intelligence. Gateway resolver can be deployed to office networks and user devices in a matter of minutes. Once configured, Gateway actively blocks potential malware and phishing sites while also applying content filtering based on policies administrators configure.

However, threats can be hidden in otherwise healthy hostnames. To protect users from more advanced threats, Gateway will audit URLs and, if enabled, inspect  packets to find potential attacks before they compromise a device or office network. That same deep packet inspection can then be applied to prevent the accidental or malicious export of data.

Organizations can add Gateway’s advanced threat prevention in two models:

  1. by connecting office networks to the Cloudflare security fabric through GRE tunnels and
  2. by distributing forward proxy clients to mobile devices.

Security on the Internet with Cloudflare for Teams

The first model, delivered through Cloudflare Magic Transit, will give enterprises a way to migrate to Gateway without disrupting their current workflow. Instead of backhauling office traffic to centralized on-premise hardware, teams will point traffic to Cloudflare over GRE tunnels. Once the outbound traffic arrives at Cloudflare, Gateway can apply file type controls, in-line inspection, and data loss protection without impacting connection performance. Simultaneously, Magic Transit protects a corporate IP network from inbound attacks.

When users leave the office, Gateway’s client application will deliver the same level of Internet security. Every connection from the device will pass through Cloudflare first, where Gateway can apply threat prevention policies. Cloudflare can also deliver that security without compromising user experience, building on new technologies like the WireGuard protocol and integrating features from Cloudflare Warp, our popular individual forward proxy.

In both environments, one of the most common vectors for attacks is still the browser. Zero-day threats can compromise devices by using the browser as a vehicle to execute code.

Existing browser isolation solutions attempt to solve this challenge in one of two approaches: 1) pixel pushing and 2) DOM reconstruction. Both approaches lead to tradeoffs in performance and security. Pixel pushing degrades speed while also driving up the cost to stream sessions to users. DOM reconstruction attempts to strip potentially harmful content before sending it to the user. That tactic relies on known vulnerabilities and is still exposed to the zero day threats that isolation tools were meant to solve.

Cloudflare Gateway will feature always-on browser isolation that not only protects users from zero day threats, but can also make browsing the Internet faster. The solution will apply a patented approach to send vector commands that a browser can render without the need for an agent on the device. A user’s browser session will instead run in a Cloudflare data center where Gateway destroys the instance at the end of each session, keeping malware away from user devices without compromising performance.

When deployed, remote browser sessions will run in one of Cloudflare’s 200 data centers, connecting users to a faster, safer model of navigating the Internet without the compromises of legacy approaches. If you would like to learn more about this approach to browser isolation, I’d encourage you to read Darren Remington’s blog post on the topic.

Why Cloudflare?

To make infrastructure safer, and web properties faster, Cloudflare built out one of the world’s largest and most sophisticated networks. Cloudflare for Teams builds on that same platform, and all of its unique advantages.

Fast

Security should always be bundled with performance. Cloudflare’s infrastructure products delivered better protection while also improving speed. That’s possible because of the network we’ve built, both its distribution and how the data we have about the network allows Cloudflare to optimize requests and connections.

Cloudflare for Teams brings that same speed to end users by using that same network and route optimization. Additionally, Cloudflare has built industry-leading components that will become features of this new platform. All of these components leverage Cloudflare’s network and scale to improve user performance.

Gateway’s DNS-filtering features build on Cloudflare’s 1.1.1.1 public DNS resolver, the world’s fastest resolver according to DNSPerf. To protect entire connections, Cloudflare for Teams will deploy the same technology that underpins Warp, a new type of VPN with consistently better reviews than competitors.

Massive scalability

Cloudflare’s 30 TBps of network capacity can scale to meet the needs of nearly any enterprise. Customers can stop worrying about buying enough hardware to meet their organization’s needs and, instead, replace it with Cloudflare.

Near users, wherever they are — literally

Cloudflare’s network operates in 200 cities and more than 90 countries around the world, putting Cloudflare’s security and performance close to users, wherever they work.

That network includes presence in global headquarters, like London and New York, but also in traditionally underserved regions around the world.

Cloudflare data centers operate within 100 milliseconds of 99% of Internet-connected population in the developed world, and within 100 milliseconds of 94% of the Internet-connected population globally. All of your end users should feel like they have the performance traditionally only available to those in headquarters.

Easier for administrators

When security products are confusing, teams make mistakes that become incidents. Cloudflare’s solution is straightforward and easy to deploy. Most security providers in this market built features first and never considered usability or implementation.

Cloudflare Access can be deployed in less than an hour; Gateway features will build on top of that dashboard and workflow. Cloudflare for Teams brings the same ease-of-use of our tools that protect infrastructure to the products that new secure users, devices, and data.

Better threat intelligence

Cloudflare’s network already secures more than 20 million Internet properties and blocks 72 billion cyber threats each day. We build products using the threat data we gather from protecting 11 million HTTP requests per second on average.

What’s next?

Cloudflare Access is available right now. You can start replacing your team’s VPN with Cloudflare’s network today. Certain features of Cloudflare Gateway are available in beta now, and others will be added in beta over time. You can sign up to be notified about Gateway now.

Cloudflare + Remote Browser Isolation

Post Syndicated from Darren Remington original https://blog.cloudflare.com/cloudflare-and-remote-browser-isolation/

Cloudflare + Remote Browser Isolation

Cloudflare announced today that it has purchased S2 Systems Corporation, a Seattle-area startup that has built an innovative remote browser isolation solution unlike any other currently in the market. The majority of endpoint compromises involve web browsers — by putting space between users’ devices and where web code executes, browser isolation makes endpoints substantially more secure. In this blog post, I’ll discuss what browser isolation is, why it is important, how the S2 Systems cloud browser works, and how it fits with Cloudflare’s mission to help build a better Internet.

What’s wrong with web browsing?

It’s been more than 30 years since Tim Berners-Lee wrote the project proposal defining the technology underlying what we now call the world wide web. What Berners-Lee envisioned as being useful for “several thousand people, many of them very creative, all working toward common goals[1] has grown to become a fundamental part of commerce, business, the global economy, and an integral part of society used by more than 58% of the world’s population[2].

The world wide web and web browsers have unequivocally become the platform for much of the productive work (and play) people do every day. However, as the pervasiveness of the web grew, so did opportunities for bad actors. Hardly a day passes without a major new cybersecurity breach in the news. Several contributing factors have helped propel cybercrime to unprecedented levels: the commercialization of hacking tools, the emergence of malware-as-a-service, the presence of well-financed nation states and organized crime, and the development of cryptocurrencies which enable malicious actors of all stripes to anonymously monetize their activities.

The vast majority of security breaches originate from the web. Gartner calls the public Internet a “cesspool of attacks” and identifies web browsers as the primary culprit responsible for 70% of endpoint compromises.[3] This should not be surprising. Although modern web browsers are remarkable, many fundamental architectural decisions were made in the 1990’s before concepts like security, privacy, corporate oversight, and compliance were issues or even considerations. Core web browsing functionality (including the entire underlying WWW architecture) was designed and built for a different era and circumstances.

In today’s world, several web browsing assumptions are outdated or even dangerous. Web browsers and the underlying server technologies encompass an extensive – and growing – list of complex interrelated technologies. These technologies are constantly in flux, driven by vibrant open source communities, content publishers, search engines, advertisers, and competition between browser companies. As a result of this underlying complexity, web browsers have become primary attack vectors. According to Gartner, “the very act of users browsing the internet and clicking on URL links opens the enterprise to significant risk. […] Attacking thru the browser is too easy, and the targets too rich.[4] Even “ostensibly ‘good’ websites are easily compromised and can be used to attack visitors” (Gartner[5]) with more than 40% of malicious URLs found on good domains (Webroot[6]). (A complete list of vulnerabilities is beyond the scope of this post.)

The very structure and underlying technologies that power the web are inherently difficult to secure. Some browser vulnerabilities result from illegitimate use of legitimate functionality: enabling browsers to download files and documents is good, but allowing downloading of files infected with malware is bad; dynamic loading of content across multiple sites within a single webpage is good, but cross-site scripting is bad; enabling an extensive advertising ecosystem is good, but the inability to detect hijacked links or malicious redirects to malware or phishing sites is bad; etc.

Enterprise Browsing Issues

Enterprises have additional challenges with traditional browsers.

Paradoxically, IT departments have the least amount of control over the most ubiquitous app in the enterprise – the web browser. The most common complaints about web browsers from enterprise security and IT professionals are:

  1. Security (obviously). The public internet is a constant source of security breaches and the problem is growing given an 11x escalation in attacks since 2016 (Meeker[7]). Costs of detection and remediation are escalating and the reputational damage and financial losses for breaches can be substantial.
  2. Control. IT departments have little visibility into user activity and limited ability to leverage content disarm and reconstruction (CDR) and data loss prevention (DLP) mechanisms including when, where, or who downloaded/upload files.
  3. Compliance. The inability to control data and activity across geographies or capture required audit telemetry to meet increasingly strict regulatory requirements. This results in significant exposure to penalties and fines.

Given vulnerabilities exposed through everyday user activities such as email and web browsing, some organizations attempt to restrict these activities. As both are legitimate and critical business functions, efforts to limit or curtail web browser use inevitably fail or have a substantive negative impact on business productivity and employee morale.

Current approaches to mitigating security issues inherent in browsing the web are largely based on signature technology for data files and executables, and lists of known good/bad URLs and DNS addresses. The challenge with these approaches is the difficulty of keeping current with known attacks (file signatures, URLs and DNS addresses) and their inherent vulnerability to zero-day attacks. Hackers have devised automated tools to defeat signature-based approaches (e.g. generating hordes of files with unknown signatures) and create millions of transient websites in order to defeat URL/DNS blacklists.

While these approaches certainly prevent some attacks, the growing number of incidents and severity of security breaches clearly indicate more effective alternatives are needed.

What is browser isolation?

The core concept behind browser isolation is security-through-physical-isolation to create a “gap” between a user’s web browser and the endpoint device thereby protecting the device (and the enterprise network) from exploits and attacks. Unlike secure web gateways, antivirus software, or firewalls which rely on known threat patterns or signatures, this is a zero-trust approach.

There are two primary browser isolation architectures: (1) client-based local isolation and (2) remote isolation.

Local browser isolation attempts to isolate a browser running on a local endpoint using app-level or OS-level sandboxing. In addition to leaving the endpoint at risk when there is an isolation failure, these systems require significant endpoint resources (memory + compute), tend to be brittle, and are difficult for IT to manage as they depend on support from specific hardware and software components.

Further, local browser isolation does nothing to address the control and compliance issues mentioned above.

Remote browser isolation (RBI) protects the endpoint by moving the browser to a remote service in the cloud or to a separate on-premises server within the enterprise network:

  • On-premises isolation simply relocates the risk from the endpoint to another location within the enterprise without actually eliminating the risk.
  • Cloud-based remote browsing isolates the end-user device and the enterprise’s network while fully enabling IT control and compliance solutions.

Given the inherent advantages, most browser isolation solutions – including S2 Systems – leverage cloud-based remote isolation. Properly implemented, remote browser isolation can protect the organization from browser exploits, plug-ins, zero-day vulnerabilities, malware and other attacks embedded in web content.

How does Remote Browser Isolation (RBI) work?

In a typical cloud-based RBI system (the red-dashed box ❶ below), individual remote browsers ❷ are run in the cloud as disposable containerized instances – typically, one instance per user. The remote browser sends the rendered contents of a web page to the user endpoint device ❹ using a specific protocol and data format ❸. Actions by the user, such as keystrokes, mouse and scroll commands, are sent back to the isolation service over a secure encrypted channel where they are processed by the remote browser and any resulting changes to the remote browser webpage are sent back to the endpoint device.

Cloudflare + Remote Browser Isolation

In effect, the endpoint device is “remote controlling” the cloud browser. Some RBI systems use proprietary clients installed on the local endpoint while others leverage existing HTML5-compatible browsers on the endpoint and are considered ‘clientless.’

Data breaches that occur in the remote browser are isolated from the local endpoint and enterprise network. Every remote browser instance is treated as if compromised and terminated after each session. New browser sessions start with a fresh instance. Obviously, the RBI service must prevent browser breaches from leaking outside the browser containers to the service itself. Most RBI systems provide remote file viewers negating the need to download files but also have the ability to inspect files for malware before allowing them to be downloaded.

A critical component in the above architecture is the specific remoting technology employed by the cloud RBI service. The remoting technology has a significant impact on the operating cost and scalability of the RBI service, website fidelity and compatibility, bandwidth requirements, endpoint hardware/software requirements and even the user experience. Remoting technology also determines the effective level of security provided by the RBI system.

All current cloud RBI systems employ one of two remoting technologies:

(1)    Pixel pushing is a video-based approach which captures pixel images of the remote browser ‘window’ and transmits a sequence of images to the client endpoint browser or proprietary client. This is similar to how remote desktop and VNC systems work. Although considered to be relatively secure, there are several inherent challenges with this approach:

  • Continuously encoding and transmitting video streams of remote webpages to user endpoint devices is very costly. Scaling this approach to millions of users is financially prohibitive and logistically complex.
  • Requires significant bandwidth. Even when highly optimized, pushing pixels is bandwidth intensive.
  • Unavoidable latency results in an unsatisfactory user experience. These systems tend to be slow and generate a lot of user complaints.
  • Mobile support is degraded by high bandwidth requirements compounded by inconsistent connectivity.
  • HiDPI displays may render at lower resolutions. Pixel density increases exponentially with resolution which means remote browser sessions (particularly fonts) on HiDPI devices can appear fuzzy or out of focus.

(2) DOM reconstruction emerged as a response to the shortcomings of pixel pushing. DOM reconstruction attempts to clean webpage HTML, CSS, etc. before forwarding the content to the local endpoint browser. The underlying HTML, CSS, etc., are reconstructed in an attempt to eliminate active code, known exploits, and other potentially malicious content. While addressing the latency, operational cost, and user experience issues of pixel pushing, it introduces two significant new issues:

  • Security. The underlying technologies – HTML, CSS, web fonts, etc. – are the attack vectors hackers leverage to breach endpoints. Attempting to remove malicious content or code is like washing mosquitos: you can attempt to clean them, but they remain inherent carriers of dangerous and malicious material. It is impossible to identify, in advance, all the means of exploiting these technologies even through an RBI system.
  • Website fidelity. Inevitably, attempting to remove malicious active code, reconstructing HTML, CSS and other aspects of modern websites results in broken pages that don’t render properly or don’t render at all. Websites that work today may not work tomorrow as site publishers make daily changes that may break DOM reconstruction functionality. The result is an infinite tail of issues requiring significant resources in an endless game of whack-a-mole. Some RBI solutions struggle to support common enterprise-wide services like Google G Suite or Microsoft Office 365 even as malware laden web email continues to be a significant source of breaches.

Cloudflare + Remote Browser Isolation

Customers are left to choose between a secure solution with a bad user experience and high operating costs, or a faster, much less secure solution that breaks websites. These tradeoffs have driven some RBI providers to implement both remoting technologies into their products. However, this leaves customers to pick their poison without addressing the fundamental issues.

Given the significant tradeoffs in RBI systems today, one common optimization for current customers is to deploy remote browsing capabilities to only the most vulnerable users in an organization such as high-risk executives, finance, business development, or HR employees. Like vaccinating half the pupils in a classroom, this results in a false sense of security that does little to protect the larger organization.

Unfortunately, the largest “gap” created by current remote browser isolation systems is the void between the potential of the underlying isolation concept and the implementation reality of currently available RBI systems.

S2 Systems Remote Browser Isolation

S2 Systems remote browser isolation is a fundamentally different approach based on S2-patented technology called Network Vector Rendering (NVR).

The S2 remote browser is based on the open-source Chromium engine on which Google Chrome is built. In addition to powering Google Chrome which has a ~70% market share[8], Chromium powers twenty-one other web browsers including the new Microsoft Edge browser.[9] As a result, significant ongoing investment in the Chromium engine ensures the highest levels of website support, compatibility and a continuous stream of improvements.

A key architectural feature of the Chromium browser is its use of the Skia graphics library. Skia is a widely-used cross-platform graphics engine for Android, Google Chrome, Chrome OS, Mozilla Firefox, Firefox OS, FitbitOS, Flutter, the Electron application framework and many other products. Like Chromium, the pervasiveness of Skia ensures ongoing broad hardware and platform support.

Cloudflare + Remote Browser Isolation
Skia code fragment

Everything visible in a Chromium browser window is rendered through the Skia rendering layer. This includes application window UI such as menus, but more importantly, the entire contents of the webpage window are rendered through Skia. Chromium compositing, layout and rendering are extremely complex with multiple parallel paths optimized for different content types, device contexts, etc. The following figure is an egregious simplification for illustration purposes of how S2 works (apologies to Chromium experts):

Cloudflare + Remote Browser Isolation

S2 Systems NVR technology intercepts the remote Chromium browser’s Skia draw commands ❶, tokenizes and compresses them, then encrypts and transmits them across the wire ❷ to any HTML5 compliant web browser ❸ (Chrome, Firefox, Safari, etc.) running locally on the user endpoint desktop or mobile device. The Skia API commands captured by NVR are pre-rasterization which means they are highly compact.

On first use, the S2 RBI service transparently pushes an NVR WebAssembly (Wasm) library ❹ to the local HTML5 web browser on the endpoint device where it is cached for subsequent use. The NVR Wasm code contains an embedded Skia library and the necessary code to unpack, decrypt and “replay” the Skia draw commands from the remote RBI server to the local browser window. A WebAssembly’s ability to “execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms[10] results in near-native drawing performance.

The S2 remote browser isolation service uses headless Chromium-based browsers in the cloud, transparently intercepts draw layer output, transmits the draw commands efficiency and securely over the web, and redraws them in the windows of local HTML5 browsers. This architecture has a number of technical advantages:

(1)    Security: the underlying data transport is not an existing attack vector and customers aren’t forced to make a tradeoff between security and performance.

(2)    Website compatibility: there are no website compatibility issues nor long tail chasing evolving web technologies or emerging vulnerabilities.

(3)    Performance: the system is very fast, typically faster than local browsing (subject of a future blog post).

(4)    Transparent user experience: S2 remote browsing feels like native browsing; users are generally unaware when they are browsing remotely.

(5)    Requires less bandwidth than local browsing for most websites. Enables advanced caching and other proprietary optimizations unique to web browsers and the nature of web content and technologies.

(6)    Clientless: leverages existing HTML5 compatible browsers already installed on user endpoint desktop and mobile devices.

(7)    Cost-effective scalability: although the details are beyond the scope of this post, the S2 backend and NVR technology have substantially lower operating costs than existing RBI technologies. Operating costs translate directly to customer costs. The S2 system was designed to make deployment to an entire enterprise and not just targeted users (aka: vaccinating half the class) both feasible and attractive for customers.

(8)    RBI-as-a-platform: enables implementation of related/adjacent services such as DLP, content disarm & reconstruction (CDR), phishing detection and prevention, etc.

S2 Systems Remote Browser Isolation Service and underlying NVR technology eliminates the disconnect between the conceptual potential and promise of browser isolation and the unsatisfying reality of current RBI technologies.

Cloudflare + S2 Systems Remote Browser Isolation

Cloudflare’s global cloud platform is uniquely suited to remote browsing isolation. Seamless integration with our cloud-native performance, reliability and advanced security products and services provides powerful capabilities for our customers.

Our Cloudflare Workers architecture enables edge computing in 200 cities in more than 90 countries and will put a remote browser within 100 milliseconds of 99% of the Internet-connected population in the developed world. With more than 20 million Internet properties directly connected to our network, Cloudflare remote browser isolation will benefit from locally cached data and builds on the impressive connectivity and performance of our network. Our Argo Smart Routing capability leverages our communications backbone to route traffic across faster and more reliable network paths resulting in an average 30% faster access to web assets.

Once it has been integrated with our Cloudflare for Teams suite of advanced security products, remote browser isolation will provide protection from browser exploits, zero-day vulnerabilities, malware and other attacks embedded in web content. Enterprises will be able to secure the browsers of all employees without having to make trade-offs between security and user experience. The service will enable IT control of browser-conveyed enterprise data and compliance oversight. Seamless integration across our products and services will enable users and enterprises to browse the web without fear or consequence.

Cloudflare’s mission is to help build a better Internet. This means protecting users and enterprises as they work and play on the Internet; it means making Internet access fast, reliable and transparent. Reimagining and modernizing how web browsing works is an important part of helping build a better Internet.


[1] https://www.w3.org/History/1989/proposal.html

[2] “Internet World Stats,”https://www.internetworldstats.com/, retrieved 12/21/2019.

[3] “Innovation Insight for Remote Browser Isolation,” (report ID: G00350577) Neil MacDonald, Gartner Inc, March 8, 2018”

[4] Gartner, Inc., Neil MacDonald, “Innovation Insight for Remote Browser Isolation”, 8 March 2018

[5] Gartner, Inc., Neil MacDonald, “Innovation Insight for Remote Browser Isolation”, 8 March 2018

[6] “2019 Webroot Threat Report: Forty Percent of Malicious URLs Found on Good Domains”, February 28, 2019

[7] “Kleiner Perkins 2018 Internet Trends”, Mary Meeker.

[8] https://www.statista.com/statistics/544400/market-share-of-internet-browsers-desktop/, retrieved December 21, 2019

[9] https://en.wikipedia.org/wiki/Chromium_(web_browser), retrieved December 29, 2019

[10] https://webassembly.org/, retrieved December 30, 2019

Adopting a new approach to HTTP prioritization

Post Syndicated from Lucas Pardue original https://blog.cloudflare.com/adopting-a-new-approach-to-http-prioritization/

Adopting a new approach to HTTP prioritization

Adopting a new approach to HTTP prioritization

Friday the 13th is a lucky day for Cloudflare for many reasons. On December 13, 2019 Tommy Pauly, co-chair of the IETF HTTP Working Group, announced the adoption of the “Extensible Prioritization Scheme for HTTP” -- a new approach to HTTP prioritization.

Web pages are made up of many resources that must be downloaded before they can be presented to the user. The role of HTTP prioritization is to load the right bytes at the right time in order to achieve the best performance. This is a collaborative process between client and server, a client sends priority signals that the server can use to schedule the delivery of response data. In HTTP/1.1 the signal is basic, clients order requests smartly across a pool of about 6 connections. In HTTP/2 a single connection is used and clients send a signal per request, as a frame, which describes the relative dependency and weighting of the response. HTTP/3 tried to use the same approach but dependencies don’t work well when signals can be delivered out of order.

HTTP/3 is being standardised as part of the QUIC effort. As a Working Group (WG) we’ve been trying to fix the problems that non-deterministic ordering poses for HTTP priorities. However, in parallel some of us have been working on an alternative solution, the Extensible Prioritization Scheme, which fixes problems by dropping dependencies and using an absolute weighting. This is signalled in an HTTP header field meaning it can be backported to work with HTTP/2 or carried over HTTP/1.1 hops. The alternative proposal is documented in the Individual-Draft draft-kazuho-httpbis-priority-04, co-authored by Kazuho Oku (Fastly) and myself. This has now been adopted by the IETF HTTP WG as the basis of further work; It’s adopted name will be draft-ietf-httpbis-priority-00.

To some extent document adoption is the end of one journey and the start of the next; sometimes the authors of the original work are not the best people to oversee the next phase. However, I’m pleased to say that Kazuho and I have been selected as co-editors of this new document. In this role we will reflect the consensus of the WG and help steward the next chapter of HTTP prioritization standardisation. Before the next journey begins in earnest, I wanted to take the opportunity to share my thoughts on the story of developing the alternative prioritization scheme through 2019.

I’d love to explain all the details of this new approach to HTTP prioritization but the truth is I expect the standardization process to refine the design and for things to go stale quickly. However, it doesn’t hurt to give a taste of what’s in store, just be aware that it is all subject to change.

A recap on priorities

The essence of HTTP prioritization comes down to trying to download many things over constrained connectivity. To borrow some text from Pat Meenan: Web pages are made up of dozens (sometimes hundreds) of separate resources that are loaded and assembled by a browser into the final displayed content. Since it is not possible to download everything immediately, we prefer to fetch more important things before less important ones. The challenge comes in signalling the importance from client to server.

In HTTP/2, every connection has a priority tree that expresses the relative importance between requests. Servers use this to determine how to schedule sending response data. The tree starts with a single root node and as requests are made they either depend on the root or each other. Servers may use the tree to decide how to schedule sending resources but clients cannot force a server to behave in any particular way.

To illustrate, imagine a client that makes three simple GET requests that all depend on root. As the server receives each request it grows its view of the priority tree:

Adopting a new approach to HTTP prioritization
The server starts with only the root node of the priority tree. As requests arrive, the tree grows. In this case all requests depend on the root, so the requests are priority siblings.

Once all requests are received, the server determines all requests have equal priority and that it should send response data using round-robin scheduling: send some fraction of response 1, then a fraction of response 2, then a fraction of response 3, and repeat until all responses are complete.

A single HTTP/2 request-response exchange is made up of frames that are sent on a stream. A simple GET request would be sent using a single HEADERS frame:

Adopting a new approach to HTTP prioritization
HTTP/2 HEADERS frame, Each region of a frame is a named field

Each region of a frame is a named field, a ‘?’ indicates the field is optional and the value in parenthesis is the length in bytes with ‘*’ meaning variable length. The Header Block Fragment field holds compressed HTTP header fields (using HPACK), Pad Length and Padding relate to optional padding, and E, Stream Dependency and Weight combined are the priority signal that controls the priority tree.

The Stream Dependency and Weight fields are optional but their absence is interpreted as a signal to use the default values; dependency on the root with a weight of 16 meaning that the default priority scheduling strategy is round-robin . However, this is often a bad choice because important resources like HTML, CSS and JavaScript are tied up with things like large images. The following animation demonstrates this in the Edge browser, causing the page to be blank for 19 seconds. Our deep dive blog post explains the problem further.

Adopting a new approach to HTTP prioritization

The HEADERS frame E field is the interesting bit (pun intended). A request with the field set to 1 (true) means that the dependency is exclusive and nothing else can depend on the indicated node. To illustrate, imagine a client that sends three requests which set the E field to 1. As the server receives each request, it interprets this as an exclusive dependency on the root node. Because all requests have the same dependency on root, the tree has to be shuffled around to satisfy the exclusivity rules.

Adopting a new approach to HTTP prioritization
Each request has an exclusive dependency on the root node. The tree is shuffled as each request is received by the server.

The final version of the tree looks very different from our previous example. The server would schedule all of response 3, then all of response 2, then all of response 1. This could help load all of an HTML file before an image and thus improve the visual load behaviour.

In reality, clients load a lot more than three resources and use a mix of priority signals. To understand the priority of any single request, we need to understand all requests. That presents some technological challenges, especially for servers that act like proxies such as the Cloudflare edge network. Some servers have problems applying prioritization effectively.

Because not all clients send the most optimal priority signals we were motivated to develop Cloudflare’s Enhanced HTTP/2 Prioritization, announced last May during Speed Week. This was a joint project between the Speed team (Andrew Galloni, Pat Meenan, Kornel Lesiński) and Protocols team (Nick Jones, Shih-Chiang Chien) and others. It replaces the complicated priority tree with a simpler scheme that is well suited to web resources. Because the feature is implemented on the server side, we avoid requiring any modification of clients or the HTTP/2 protocol itself. Be sure to check out my colleague Nick’s blog post that details some of the technical challenges and changes needed to let our servers deliver smarter priorities.

The Extensible Prioritization Scheme proposal

The scheme specified in draft-kazuho-httpbis-priority-04, defines a way for priorities to be expressed in absolute terms. It replaces HTTP/2’s dependency-based relative prioritization, the priority of a request is independent of others, which makes it easier to reason about and easier to schedule.

Rather than send the priority signal in a frame, the scheme defines an HTTP header -- tentatively named “Priority” -- that can carry an urgency on a scale of 0 (highest) to 7 (lowest). For example, a client could express the priority of an important resource by sending a request with:

Priority: u=0

And a less important background resource could be requested with:

Priority: u=7

While Kazuho and I are the main authors of this specification, we were inspired by several ideas in the Internet community, and we have incorporated feedback or direct input from many of our peers in the Internet community over several drafts. The text today reflects the efforts-so-far of cross-industry work involving many engineers and researchers including organizations such Adobe, Akamai, Apple, Cloudflare, Fastly, Facebook, Google, Microsoft, Mozilla and UHasselt. Adoption in the HTTP Working Group means that we can help improve the design and specification by spending some IETF time and resources for broader discussion, feedback and implementation experience.

The backstory

I work in Cloudflare’s Protocols team which is responsible for terminating HTTP at the edge. We deal with things like TCP, TLS, QUIC, HTTP/1.x, HTTP/2 and HTTP/3 and since joining the company I’ve worked with Alessandro Ghedini, Junho Choi and Lohith Bellad to make QUIC and HTTP/3 generally available last September.

Working on emerging standards is fun. It involves an eclectic mix of engineering, meetings, document review, specification writing, time zones, personalities, and organizational boundaries. So while working on the codebase of quiche, our open source implementation of QUIC and HTTP/3, I am also mulling over design details of the protocols and discussing them in cross-industry venues like the IETF.

Because of HTTP/3’s lineage, it carries over a lot of features from HTTP/2 including the priority signals and tree described earlier in the post.

One of the key benefits of HTTP/3 is that it is more resilient to the effect of lossy network conditions on performance; head-of-line blocking is limited because requests and responses can progress independently. This is, however, a double-edged sword because sometimes ordering is important. In HTTP/3 there is no guarantee that the requests are received in the same order that they were sent, so the priority tree can get out of sync between client and server. Imagine a client that makes two requests that include priority signals stating request 1 depends on root, request 2 depends on request 1. If request 2 arrives before request 1, the dependency cannot be resolved and becomes dangling. In such a case what is the best thing for a server to do? Ambiguity in behaviour leads to assumptions and disappointment. We should try to avoid that.

Adopting a new approach to HTTP prioritization
Request 1 depends on root and request 2 depends on request 1. If an HTTP/3 server receives request 2 first, the dependency cannot be resolved.

This is just one example where things get tricky quickly. Unfortunately the WG kept finding edge case upon edge case with the priority tree model. We tried to find solutions but each additional fix seemed to create further complexity to the HTTP/3 design. This is a problem because it makes it hard to implement a server that handles priority correctly.

In parallel to Cloudflare’s work on implementing a better prioritization for HTTP/2, in January 2019 Pat posted his proposal for an alternative prioritization scheme for HTTP/3 in a message to the IETF HTTP WG.

Arguably HTTP/2 prioritization never lived up to its hype. However, replacing it with something else in HTTP/3 is a challenge because the QUIC WG charter required us to try and maintain parity between the protocols. Mark Nottingham, co-chair of the HTTP and QUIC WGs responded with a good summary of the situation. To quote part of that response:

My sense is that people know that we need to do something about prioritisation, but we’re not yet confident about any particular solution. Experimentation with new schemes as HTTP/2 extensions would be very helpful, as it would give us some data to work with. If you’d like to propose such an extension, this is the right place to do it.

And so started a very interesting year of cross-industry discussion on the future of HTTP prioritization.

A year of prioritization

The following is an account of my personal experiences during 2019. It’s been a busy year and there may be unintentional errors or omissions, please let me know if you think that is the case. But I hope it gives you a taste of the standardization process and a look behind the scenes of how new Internet protocols that benefit everyone come to life.

January

Pat’s email came at the same time that I was attending the QUIC WG Tokyo interim meeting hosted at Akamai (thanks to Mike Bishop for arrangements). So I was able to speak to a few people face-to-face on the topic. There was a bit of mailing list chatter but it tailed off after a few days.

February to April

Things remained quiet in terms of prioritization discussion. I knew the next best opportunity to get the ball rolling would be the HTTP Workshop 2019 held in April. The workshop is a multi-day event not associated with a standards-defining-organization (even if many of the attendees also go to meetings such as the IETF or W3C). It is structured in a way that allows the agenda to be more fluid than a typical standards meeting and gives plenty of time for organic conversation. This sometimes helps overcome gnarly problems, such as the community finding a path forward for WebSockets over HTTP/2 due to a productive discussion during the 2017 workshop. HTTP prioritization is a gnarly problem, so I was inspired to pitch it as a talk idea. It was selected and you can find the full slide deck here.

During the presentation I recounted the history of HTTP prioritization. The great thing about working on open standards is that many email threads, presentation materials and meeting materials are publicly archived. It’s fun digging through this history. Did you know: HTTP/2 is based on SPDY and inherited its weight-based prioritization scheme, the tree-based scheme we are familiar with today was only introduced in draft-ietf-httpbis-http2-11? One of the reasons for the more-complicated tree was to help HTTP intermediaries (a.k.a. proxies) implement clever resource management. However, it became clear during the discussion that no intermediaries implement this, and none seem to plan to. I also explained a bit more about Pat’s alternative scheme and Nick described his implementation experiences. Despite some interesting discussion around the topic however, we didn’t come to any definitive solution. There were a lot of other interesting topics to discover that week.

May

In early May, Ian Swett (Google) restarted interest in Pat’s mailing list thread. Unfortunately he was not present at the HTTP Workshop so had some catching up to do. A little while later Ian submitted a Pull Request to the HTTP/3 specification called “Strict Priorities”. This incorporated Pat’s proposal and attempted to fix a number of those prioritization edge cases that I mentioned earlier.

In late May, another QUIC WG interim meeting was held in London at the new Cloudflare offices, here is the view from the meeting room window. Credit to Alessandro for handling the meeting arrangements.


Mike, the editor of the HTTP/3 specification presented some of the issues with prioritization and we attempted to solve them with the conventional tree-based scheme. Ian, with contribution from Robin Marx (UHasselt), also presented an explanation about his “Strict Priorities” proposal. I recommend taking a look at Robin’s priority tree visualisations which do a great job of explaining things. From that presentation I particularly liked “The prioritization spectrum”, it’s a concise snapshot of the state of things at that time:

Adopting a new approach to HTTP prioritization
An overview of HTTP/3 prioritization issues, fixes and possible alternatives. Presented by Ian Swett at the QUIC Interim Meeting May 2019.

June and July

Following the interim meeting, the prioritization “debate” continued electronically across GitHub and email. Some time in June Kazuho started work on a proposal that would use a scheme similar to Pat and Ian’s absolute priorities. The major difference was that rather than send the priority signal in an HTTP frame, it would use a header field. This isn’t a new concept, Roy Fielding proposed something similar at IETF 83.

In HTTP/2 and HTTP/3 requests are made up of frames that are sent on streams. Using a simple GET request as an example: a client sends a HEADERS frame that contains the scheme, method, path, and other request header fields. A server responds with a HEADERS frame that contains the status and response header fields, followed by DATA frame(s) that contain the payload.

To signal priority, a client could also send a PRIORITY frame. In the tree-based scheme the frame carries several fields that express dependencies and weights. Pat and Ian’s proposals changed the contents of the PRIORITY frame. Kazuho’s proposal encodes the priority as a header field that can be carried in the HEADERS frame as normal metadata, removing the need for the PRIORITY frame altogether.

I liked the simplification of Kazuho’s approach and the new opportunities it might create for application developers. HTTP/2 and HTTP/3 implementations (in particular browsers) abstract away a lot of connection-level details such as stream or frames. That makes it hard to understand what is happening or to tune it.

The lingua franca of the Web is HTTP requests and responses, which are formed of header fields and payload data. In browsers, APIs such as Fetch and Service Worker allow handling of these primitives. In servers, there may be ways to interact with the primitives via configuration or programming languages. As part of Enhanced HTTP/2 Prioritization, we have exposed prioritization to Cloudflare Workers to allow rich behavioural customization. If a Worker adds the “cf-priority” header to a response, Cloudflare’s edge servers use the specified priority to serve the response. This might be used to boost the priority of a resource that is important to the load time of a page. To help inform this decision making, the incoming browser priority signal is encapsulated in the request object passed to a Worker’s fetch event listener (request.cf.requestPriority).

Standardising approaches to problems is part of helping to build a better Internet. Because of the resonance between Cloudflare’s work and Kazuho’s proposal, I asked if he would consider letting me come aboard as a co-author. He kindly accepted and on July 8th we published the first version as an Internet-Draft.

Meanwhile, Ian was helping to drive the overall prioritization discussion and proposed that we use time during IETF 105 in Montreal to speak to a wider group of people. We kicked off the week with a short presentation to the HTTP WG from Ian, and Kazuho and I presented our draft in a side-meeting that saw a healthy discussion. There was a realization that the concepts of prioritization scheme, priority signalling and server resource scheduling (enacting prioritization) were conflated and made effective communication and progress difficult. HTTP/2’s model was seen as one aspect, and two different I-Ds were created to deprecate it in some way (draft-lassey-priority-setting, draft-peon-httpbis-h2-priority-one-less). Martin Thomson (Mozilla) also created a Pull Request that simply removed the PRIORITY frame from HTTP/3.

To round off the week, in the second HTTP session it was decided that there was sufficient interest in resolving the prioritization debate via the creation of a design team. I joined the team led by Ian Swett along with others from Adobe, Akamai, Apple, Cloudflare, Fastly, Facebook, Google, Microsoft, and UHasselt.

August to October

Martin’s PR generated a lot of conversation. It was merged under proviso that some solution be found before the HTTP/3 specification was finalized. Between May and August we went from something very complicated (e.g. Orphan placeholder, with PRIORITY only on control stream, plus exclusive priorities) to a blank canvas. The pressure was now on!

The design team held several teleconference meetings across the months. Logistics are a bit difficult when you have team members distributed across West Coast America, East Coast America, Western Europe, Central Europe, and Japan. However, thanks to some late nights and early mornings we managed to all get on the call at the same time.

In October most of us travelled to Cupertino, CA to attend another QUIC interim meeting hosted at Apple’s Infinite Loop (Eric Kinnear helping with arrangements).  The first two days of the meeting were used for interop testing and were loosely structured, so the design team took the opportunity to hold the first face-to-face meeting. We made some progress and helped Ian to form up some new slides to present later in the week. Again, there was some useful discussion and signs that we should put some time in the agenda in IETF 106.

November

The design team came to agreement that draft-kazuho-httpbis-priority was a good basis for a new prioritization scheme. We decided to consolidate the various I-Ds that had sprung up during IETF 105 into the document, making it a single source that was easier for people to track progress and open issues if required. This is why, even though Kazuho and I are the named authors, the document reflects a broad input from the community. We published draft 03 in November, just ahead of the deadline for IETF 106 in Singapore.

Many of us travelled to Singapore ahead of the actual start of IETF 106. This wasn’t to squeeze in some sightseeing (sadly) but rather to attend the IETF Hackathon. These are events where engineers and researchers can really put the concept of “running code” to the test. I really enjoy attending and I’m grateful to Charles Eckel and the team that organised it. If you’d like to read more about the event, Charles wrote up a nice blog post that, through some strange coincidence, features a picture of me, Kazuho and Robin talking at the QUIC table.


The design team held another face-to-face during a Hackathon lunch break and decided that we wanted to make some tweaks to the design written up in draft 03. Unfortunately the freeze was still in effect so we could not issue a new draft. Instead, we presented the most recent thinking to the HTTP session on Monday where Ian put forward draft-kazuho-httpbis-priority as the group’s proposed design solution. Ian and Robin also shared results of prioritization experiments. We received some great feedback in the meeting and during the week pulled out all the stops to issue a new draft 04 before the next HTTP session on Thursday. The question now was: Did the WG think this was suitable to adopt as the basis of an alternative prioritization scheme? I think we addressed a lot of the feedback in this draft and there was a general feeling of support in the room. However, in the IETF consensus is declared via mailing lists and so Tommy Pauly, co-chair of the HTTP WG, put out a Call for Adoption on November 21st.

December

In the Cloudflare London office, preparations begin for mince pie acquisition and assessment.

The HTTP priorities team played the waiting game and watched the mailing list discussion. On the whole people supported the concept but there was one topic that divided opinion. Some people loved the use of headers to express priorities, some people didn’t and wanted to stick to frames.

On December 13th Tommy announced that the group had decided to adopt our document and assign Kazuho and I as editors. The header/frame divide was noted as something that needed to be resolved.

The next step of the journey

Just because the document has been adopted does not mean we are done. In some ways we are just getting started. Perfection is often the enemy of getting things done and so sometimes adoption occurs at the first incarnation of a “good enough” proposal.

Today HTTP/3 has no prioritization signal. Without priority information there is a small danger that servers pick a scheduling strategy that is not optimal, that could cause the web performance of HTTP/3 to be worse than HTTP/2. To avoid that happening we’ll refine and complete the design of the Extensible Priority Scheme. To do so there are open issues that we have to resolve, we’ll need to square the circle on headers vs. frames, and we’ll no doubt hit unknown unknowns. We’ll need the input of the WG to make progress and their help to document the design that fits the need, and so I look forward to continued collaboration across the Internet community.

2019 was quite a ride and I’m excited to see what 2020 brings.

If working on protocols is your interest and you like what Cloudflare is doing, please visit our careers page. Our journey isn’t finished, in fact far from it.

How we used our new GraphQL Analytics API to build Firewall Analytics

Post Syndicated from Nick Downie original https://blog.cloudflare.com/how-we-used-our-new-graphql-api-to-build-firewall-analytics/

How we used our new GraphQL Analytics API to build Firewall Analytics

How we used our new GraphQL Analytics API to build Firewall Analytics

Firewall Analytics is the first product in the Cloudflare dashboard to utilize the new GraphQL Analytics API. All Cloudflare dashboard products are built using the same public APIs that we provide to our customers, allowing us to understand the challenges they face when interfacing with our APIs. This parity helps us build and shape our products, most recently the new GraphQL Analytics API that we’re thrilled to release today.

By defining the data we want, along with the response format, our GraphQL Analytics API has enabled us to prototype new functionality and iterate quickly from our beta user feedback. It is helping us deliver more insightful analytics tools within the Cloudflare dashboard to our customers.

Our user research and testing for Firewall Analytics surfaced common use cases in our customers’ workflow:

  • Identifying spikes in firewall activity over time
  • Understanding the common attributes of threats
  • Drilling down into granular details of an individual event to identify potential false positives

We can address all of these use cases using our new GraphQL Analytics API.

GraphQL Basics

Before we look into how to address each of these use cases, let’s take a look at the format of a GraphQL query and how our schema is structured.

A GraphQL query is comprised of a structured set of fields, for which the server provides corresponding values in its response. The schema defines which fields are available and their type. You can find more information about the GraphQL query syntax and format in the official GraphQL documentation.

To run some GraphQL queries, we recommend downloading a GraphQL client, such as GraphiQL, to explore our schema and run some queries. You can find documentation on getting started with this in our developer docs.

At the top level of the schema is the viewer field. This represents the top level node of the user running the query. Within this, we can query the zones field to find zones the current user has access to, providing a filter argument, with a zoneTag of the identifier of the zone we’d like narrow down to.

{
  viewer {
    zones(filter: { zoneTag: "YOUR_ZONE_ID" }) {
      # Here is where we'll query our firewall events
    }
  }
}

Now that we have a query that finds our zone, we can start querying the firewall events which have occurred in that zone, to help solve some of the use cases we’ve identified.

Visualising spikes in firewall activity

It’s important for customers to be able to visualise and understand anomalies and spikes in their firewall activity, as these could indicate an attack or be the result of a misconfiguration.

Plotting events in a timeseries chart, by their respective action, provides users with a visual overview of the trend of their firewall events.

Within the zones field in the query we’ve created earlier, we can query our firewall event aggregates using the firewallEventsAdaptiveGroups field, providing arguments to limit the count of groups, a filter for the date range we’re looking for (combined with any user-entered filters), and a list of fields to order by; in this case, just the datetimeHour field that we’re grouping by.

Within the zones field in the query we created earlier, we can further query our firewall event aggregates using the firewallEventsAdaptiveGroups field and providing arguments for:

  • A limit for the count of groups
  • A filter for the date range we’re looking for (combined with any user-entered filters)
  • A list of fields to orderBy (in this case, just the datetimeHour field that we’re grouping by).

By adding the dimensions field, we’re querying for groups of firewall events, aggregated by the fields nested within dimensions. In this case, our query includes the action and datetimeHour fields, meaning the response will be groups of firewall events which share the same action, and fall within the same hour. We also add a count field, to get a numeric count of how many events fall within each group.

query FirewallEventsByTime($zoneTag: string, $filter: FirewallEventsAdaptiveGroupsFilter_InputObject) {
  viewer {
    zones(filter: { zoneTag: $zoneTag }) {
      firewallEventsAdaptiveGroups(
        limit: 576
        filter: $filter
        orderBy: [datetimeHour_DESC]
      ) {
        count
        dimensions {
          action
          datetimeHour
        }
      }
    }
  }
}

Note – Each of our groups queries require a limit to be set. A firewall event can have one of 8 possible actions, and we are querying over a 72 hour period. At most, we’ll end up with 567 groups, so we can set that as the limit for our query.

This query would return a response in the following format:

{
  "viewer": {
    "zones": [
      {
        "firewallEventsAdaptiveGroups": [
          {
            "count": 5,
            "dimensions": {
              "action": "jschallenge",
              "datetimeHour": "2019-09-12T18:00:00Z"
            }
          }
          ...
        ]
      }
    ]
  }
}

We can then take these groups and plot each as a point on a time series chart. Mapping over the firewallEventsAdaptiveGroups array, we can use the group’s count property on the y-axis for our chart, then use the nested fields within the dimensions object, using action as unique series and the datetimeHour as the time stamp on the x-axis.

How we used our new GraphQL Analytics API to build Firewall Analytics

Top Ns

After identifying a spike in activity, our next step is to highlight events with commonality in their attributes. For example, if a certain IP address or individual user agent is causing many firewall events, this could be a sign of an individual attacker, or could be surfacing a false positive.

Similarly to before, we can query aggregate groups of firewall events using the firewallEventsAdaptiveGroups field. However, in this case, instead of supplying action and datetimeHour to the group’s dimensions, we can add individual fields that we want to find common groups of.

By ordering by descending count, we’ll retrieve groups with the highest commonality first, limiting to the top 5 of each. We can add a single field nested within dimensions to group by it. For example, adding clientIP will give five groups with the IP addresses causing the most events.

We can also add a firewallEventsAdaptiveGroups field with no nested dimensions. This will create a single group which allows us to find the total count of events matching our filter.

query FirewallEventsTopNs($zoneTag: string, $filter: FirewallEventsAdaptiveGroupsFilter_InputObject) {
  viewer {
    zones(filter: { zoneTag: $zoneTag }) {
      topIPs: firewallEventsAdaptiveGroups(
        limit: 5
        filter: $filter
        orderBy: [count_DESC]
      ) {
        count
        dimensions {
          clientIP
        }
      }
      topUserAgents: firewallEventsAdaptiveGroups(
        limit: 5
        filter: $filter
        orderBy: [count_DESC]
      ) {
        count
        dimensions {
          userAgent
        }
      }
      total: firewallEventsAdaptiveGroups(
        limit: 1
        filter: $filter
      ) {
        count
      }
    }
  }
}

Note – we can add the firewallEventsAdaptiveGroups field multiple times within a single query, each aliased differently. This allows us to fetch multiple different groupings by different fields, or with no groupings at all. In this case, getting a list of top IP addresses, top user agents, and the total events.

How we used our new GraphQL Analytics API to build Firewall Analytics

We can then reference each of these aliases in the UI, mapping over their respective groups to render each row with its count, and a bar which represents the proportion of total events, showing the proportion of all events each row equates to.

Are these firewall events false positives?

After users have identified spikes, anomalies and common attributes, we wanted to surface more information as to whether these have been caused by malicious traffic, or are false positives.

To do this, we wanted to provide additional context on the events themselves, rather than just counts. We can do this by querying the firewallEventsAdaptive field for these events.

Our GraphQL schema uses the same filter format for both the aggregate firewallEventsAdaptiveGroups field and the raw firewallEventsAdaptive field. This allows us to use the same filters to fetch the individual events which summate to the counts and aggregates in the visualisations above.

query FirewallEventsList($zoneTag: string, $filter: FirewallEventsAdaptiveFilter_InputObject) {
  viewer {
    zones(filter: { zoneTag: $zoneTag }) {
      firewallEventsAdaptive(
        filter: $filter
        limit: 10
        orderBy: [datetime_DESC]
      ) {
        action
        clientAsn
        clientCountryName
        clientIP
        clientRequestPath
        clientRequestQuery
        datetime
        rayName
        source
        userAgent
      }
    }
  }
}

How we used our new GraphQL Analytics API to build Firewall Analytics

Once we have our individual events, we can render all of the individual fields we’ve requested, providing users the additional context on event they need to determine whether this is a false positive or not.

That’s how we used our new GraphQL Analytics API to build Firewall Analytics, helping solve some of our customers most common security workflow use cases. We’re excited to see what you build with it, and the problems you can help tackle.

You can find out how to get started querying our GraphQL Analytics API using GraphiQL in our developer documentation, or learn more about writing GraphQL queries on the official GraphQL Foundation documentation.

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Post Syndicated from Filipp Nisenzoun original https://blog.cloudflare.com/introducing-the-graphql-analytics-api-exactly-the-data-you-need-all-in-one-place/

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Today we’re excited to announce a powerful and flexible new way to explore your Cloudflare metrics and logs, with an API conforming to the industry-standard GraphQL specification. With our new GraphQL Analytics API, all of your performance, security, and reliability data is available from one endpoint, and you can select exactly what you need, whether it’s one metric for one domain or multiple metrics aggregated for all of your domains. You can ask questions like “How many cached bytes have been returned for these three domains?” Or, “How many requests have all the domains under my account received?” Or even, “What effect did changing my firewall rule an hour ago have on the responses my users were seeing?”

The GraphQL standard also has strong community resources, from extensive documentation to front-end clients, making it easy to start creating simple queries and progress to building your own sophisticated analytics dashboards.

From many APIs…

Providing insights has always been a core part of Cloudflare’s offering. After all, by using Cloudflare, you’re relying on us for key parts of your infrastructure, and so we need to make sure you have the data to manage, monitor, and troubleshoot your website, app, or service. Over time, we developed a few key data APIs, including ones providing information regarding your domain’s traffic, DNS queries, and firewall events. This multi-API approach was acceptable while we had only a few products, but we started to run into some challenges as we added more products and analytics. We couldn’t expect users to adopt a new analytics API every time they started using a new product. In fact, some of the customers and partners that were relying on many of our products were already becoming confused by the various APIs.

Following the multi-API approach was also affecting how quickly we could develop new analytics within the Cloudflare dashboard, which is used by more people for data exploration than our APIs. Each time we built a new product, our product engineering teams had to implement a corresponding analytics API, which our user interface engineering team then had to learn to use. This process could take up to several months for each new set of analytics dashboards.

…to one

Our new GraphQL Analytics API solves these problems by providing access to all Cloudflare analytics. It offers a standard, flexible syntax for describing exactly the data you need and provides predictable, matching responses. This approach makes it an ideal tool for:

  1. Data exploration. You can think of it as a way to query your own virtual data warehouse, full of metrics and logs regarding the performance, security, and reliability of your Internet property.
  2. Building amazing dashboards, which allow for flexible filtering, sorting, and drilling down or rolling up. Creating these kinds of dashboards would normally require paying thousands of dollars for a specialized analytics tool. You get them as part of our product and can customize them for yourself using the API.

In a companion post that was also published today, my colleague Nick discusses using the GraphQL Analytics API to build dashboards. So, in this post, I’ll focus on examples of how you can use the API to explore your data. To make the queries, I’ll be using GraphiQL, a popular open-source querying tool that takes advantage of GraphQL’s capabilities.

Introspection: what data is available?

The first thing you may be wondering: if the GraphQL Analytics API offers access to so much data, how do I figure out what exactly is available, and how I can ask for it? GraphQL makes this easy by offering “introspection,” meaning you can query the API itself to see the available data sets, the fields and their types, and the operations you can perform. GraphiQL uses this functionality to provide a “Documentation Explorer,” query auto-completion, and syntax validation. For example, here is how I can see all the data sets available for a zone (domain):

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

If I’m writing a query, and I’m interested in data on firewall events, auto-complete will help me quickly find relevant data sets and fields:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Querying: examples of questions you can ask

Let’s say you’ve made a major product announcement and expect a surge in requests to your blog, your application, and several other zones (domains) under your account. You can check if this surge materializes by asking for the requests aggregated under your account, in the 30 minutes after your announcement post, broken down by the minute:

{
 viewer { 
   accounts (filter: {accountTag: $accountTag}) {
     httpRequests1mGroups(limit: 30, filter: {datetime_geq: "2019-09-16T20:00:00Z", datetime_lt: "2019-09-16T20:30:00Z"}, orderBy: [datetimeMinute_ASC]) {
	  dimensions {
		datetimeMinute
	  }
	  sum {
		requests
	  }
	}
   }
 }
}

Here is the first part of the response, showing requests for your account, by the minute:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Now, let’s say you want to compare the traffic coming to your blog versus your marketing site over the last hour. You can do this in one query, asking for the number of requests to each zone:

{
 viewer {
   zones(filter: {zoneTag_in: [$zoneTag1, $zoneTag2]}) {
     httpRequests1hGroups(limit: 2, filter: {datetime_geq: "2019-09-16T20:00:00Z",
datetime_lt: "2019-09-16T21:00:00Z"}) {
       sum {
         requests
       }
     }
   }
 }
}

Here is the response:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Finally, let’s say you’re seeing an increase in error responses. Could this be correlated to an attack? You can look at error codes and firewall events over the last 15 minutes, for example:

{
 viewer {
   zones(filter: {zoneTag: $zoneTag}) {
     httpRequests1mGroups (limit: 100,
filter: {datetime_geq: "2019-09-16T21:00:00Z",
datetime_lt: "2019-09-16T21:15:00Z"}) {
       sum {
         responseStatusMap {
           edgeResponseStatus
           requests
         }
       }
     }
    firewallEventsAdaptiveGroups (limit: 100,
filter: {datetime_geq: "2019-09-16T21:00:00Z",
datetime_lt: "2019-09-16T21:15:00Z"}) {
       dimensions {
         action
       }
       count
     }
    }
  }
}

Notice that, in this query, we’re looking at multiple datasets at once, using a common zone identifier to “join” them. Here are the results:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

By examining both data sets in parallel, we can see a correlation: 31 requests were “dropped” or blocked by the Firewall, which is exactly the same as the number of “403” responses. So, the 403 responses were a result of Firewall actions.

Try it today

To learn more about the GraphQL Analytics API and start exploring your Cloudflare data, follow the “Getting started” guide in our developer documentation, which also has details regarding the current data sets and time periods available. We’ll be adding more data sets over time, so take advantage of the introspection feature to see the latest available.

Finally, to make way for the new API, the Zone Analytics API is now deprecated and will be sunset on May 31, 2020. The data that Zone Analytics provides is available from the GraphQL Analytics API. If you’re currently using the API directly, please follow our migration guide to change your API calls. If you get your analytics using the Cloudflare dashboard or our Datadog integration, you don’t need to take any action.

One more thing….

In the API examples above, if you find it helpful to get analytics aggregated for all the domains under your account, we have something else you may like: a brand new Analytics dashboard (in beta) that provides this same information. If your account has many zones, the dashboard is helpful for knowing summary information on metrics such as requests, bandwidth, cache rate, and error rate. Give it a try and let us know what you think using the feedback link above the new dashboard.

New tools to monitor your server and avoid downtime

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/new-tools-to-monitor-your-server-and-avoid-downtime/

New tools to monitor your server and avoid downtime

New tools to monitor your server and avoid downtime

When your server goes down, it’s a big problem. Today, Cloudflare is introducing two new tools to help you understand and respond faster to origin downtime — plus, a new service to automatically avoid downtime.

The new features are:

  • Standalone Health Checks, which notify you as soon as we detect problems at your origin server, without needing a Cloudflare Load Balancer.
  • Passive Origin Monitoring, which lets you know when your origin cannot be reached, with no configuration required.
  • Zero-Downtime Failover, which can automatically avert failures by retrying requests to origin.

Standalone Health Checks

Our first new tool is Standalone Health Checks, which will notify you as soon as we detect problems at your origin server — without needing a Cloudflare Load Balancer.

A Health Check is a service that runs on our edge network to monitor whether your origin server is online. Health Checks are a key part of our load balancing service because they allow us to quickly and actively route traffic to origin servers that are live and ready to serve requests. Standalone Health Checks allow you to monitor the health of your origin even if you only have one origin or do not yet need to balance traffic across your infrastructure.

We’ve provided many dimensions for you to hone in on exactly what you’d like to check, including response code, protocol type, and interval. You can specify a particular path if your origin serves multiple applications, or you can check a larger subset of response codes for your staging environment. All of these options allow you to properly target your Health Check, giving you a precise picture of what is wrong with your origin.

New tools to monitor your server and avoid downtime

If one of your origin servers becomes unavailable, you will receive a notification letting you know of the health change, along with detailed information about the failure so you can take action to restore your origin’s health.  

Lastly, once you’ve set up your Health Checks across the different origin servers, you may want to see trends or the top unhealthy origins. With Health Check Analytics, you’ll be able to view all the change events for a given health check, isolate origins that may be top offenders or not performing at par, and move forward with a fix. On top of this, in the near future, we are working to provide you with access to all Health Check raw events to ensure you have the detailed lens to compare Cloudflare Health Check Event logs against internal server logs.

New tools to monitor your server and avoid downtime

Users on the Pro, Business, or Enterprise plan will have access to Standalone Health Checks and Health Check Analytics to promote top-tier application reliability and help maximize brand trust with their customers. You can access Standalone Health Checks and Health Check Analytics through the Traffic app in the dashboard.

Passive Origin Monitoring

Standalone Health Checks are a super flexible way to understand what’s happening at your origin server. However, they require some forethought to configure before an outage happens. That’s why we’re excited to introduce Passive Origin Monitoring, which will automatically notify you when a problem occurs — no configuration required.

Cloudflare knows when your origin is down, because we’re the ones trying to reach it to serve traffic! When we detect downtime lasting longer than a few minutes, we’ll send you an email.

Starting today, you can configure origin monitoring alerts to go to multiple email addresses. Origin Monitoring alerts are available in the new Notification Center (more on that below!) in the Cloudflare dashboard:

New tools to monitor your server and avoid downtime

Passive Origin Monitoring is available to customers on all Cloudflare plans.

Zero-Downtime Failover

What’s better than getting notified about downtime? Never having downtime in the first place! With Zero-Downtime Failover, we can automatically retry requests to origin, even before Load Balancing kicks in.

How does it work? If a request to your origin fails, and Cloudflare has another record for your origin server, we’ll just try another origin within the same HTTP request. The alternate record could be either an A/AAAA record configured via Cloudflare DNS, or another origin server in the same Load Balancing pool.

Consider an website, example.com, that has web servers at two different IP addresses: 203.0.113.1 and 203.0.113.2. Before Zero-Downtime Failover, if 203.0.113.1 becomes unavailable, Cloudflare would attempt to connect, fail, and ultimately serve an error page to the user. With Zero-Downtime Failover, if 203.0.113.1 cannot be reached, then Cloudflare’s proxy will seamlessly attempt to connect to 203.0.113.2. If the second server can respond, then Cloudflare can avert serving an error to example.com’s user.

Since we rolled Zero-Downtime Failover a few weeks ago, we’ve prevented tens of millions of requests per day from failing!

Zero-Downtime Failover works in conjunction with Load Balancing, Standalone Health Checks, and Passive Origin Monitoring to keep your website running without a hitch. Health Checks and Load Balancing can avert failure, but take time to kick in. Zero-Downtime failover works instantly, but adds latency on each connection attempt. In practice, Zero-Downtime Failover is helpful at the start of an event, when it can instantly recover from errors; once a Health Check has detected a problem, a Load Balancer can then kick in and properly re-route traffic. And if no origin is available, we’ll send an alert via Passive Origin Monitoring.

To see an example of this in practice, consider an incident from a recent customer. They saw a spike in errors at their origin that would ordinarily cause availability to plummet (red line), but thanks to Zero-Downtime failover, their actual availability stayed flat (blue line).

New tools to monitor your server and avoid downtime

During a 30 minute time period, Zero-Downtime Failover improved overall availability from 99.53% to 99.98%, and prevented 140,000 HTTP requests from resulting in an error.

It’s important to note that we only attempt to retry requests that have failed during the TCP or TLS connection phase, which ensures that HTTP headers and payload have not been transmitted yet. Thanks to this safety mechanism, we’re able to make Zero-Downtime Failover Cloudflare’s default behavior for Pro, Business, and Enterprise plans. In other words, Zero-Downtime Failover makes connections to your origins more reliable with no configuration or action required.

Coming soon: more notifications, more flexibility

Our customers are always asking us for more insights into the health of their critical edge infrastructure. Health Checks and Passive Origin monitoring are a significant step towards Cloudflare taking a proactive instead of reactive approach to insights.

To support this work, today we’re announcing the Notification Center as the central place to manage notifications. This is available in the dashboard today, accessible from your Account Home.

From here, you can create new notifications, as well as view any existing notifications you’ve already set up. Today’s release allows you to configure  Passive Origin Monitoring notifications, and set multiple email recipients.

New tools to monitor your server and avoid downtime

We’re excited about today’s launches to helping our customers avoid downtime. Based on your feedback, we have lots of improvements planned that can help you get the timely insights you need:

  • New notification delivery mechanisms
  • More events that can trigger notifications
  • Advanced configuration options for Health Checks, including added protocols, threshold based notifications, and threshold based status changes.
  • More ways to configure Passive Health Checks, like the ability to add thresholds, and filter to specific status codes

Introducing Load Balancing Analytics

Post Syndicated from Brian Batraski original https://blog.cloudflare.com/introducing-load-balancing-analytics/

Introducing Load Balancing Analytics

Introducing Load Balancing Analytics

Cloudflare aspires to make Internet properties everywhere faster, more secure, and more reliable. Load Balancing helps with speed and reliability and has been evolving over the past three years.

Let’s go through a scenario that highlights a bit more of what a Load Balancer is and the value it can provide.  A standard load balancer comprises a set of pools, each of which have origin servers that are hostnames and/or IP addresses. A routing policy is assigned to each load balancer, which determines the origin pool selection process.

Let’s say you build an API that is using cloud provider ACME Web Services. Unfortunately, ACME had a rough week, and their service had a regional outage in their Eastern US region. Consequently, your website was unable to serve traffic during this period, which resulted in reduced brand trust from users and missed revenue. To prevent this from happening again, you decide to take two steps: use a secondary cloud provider (in order to avoid having ACME as a single point of failure) and use Cloudflare’s Load Balancing to take advantage of the multi-cloud architecture. Cloudflare’s Load Balancing can help you maximize your API’s availability for your new architecture. For example, you can assign health checks to each of your origin pools. These health checks can monitor your origin servers’ health by checking HTTP status codes, response bodies, and more. If an origin pool’s response doesn’t match what is expected, then traffic will stop being steered there. This will reduce downtime for your API when ACME has a regional outage because traffic in that region will seamlessly be rerouted to your fallback origin pool(s). In this scenario, you can set the fallback pool to be origin servers in your secondary cloud provider. In addition to health checks, you can use the ‘random’ routing policy in order to distribute your customers’ API requests evenly across your backend. If you want to optimize your response time instead, you can use ‘dynamic steering’, which will send traffic to the origin determined to be closest to your customer.

Our customers love Cloudflare Load Balancing, and we’re always looking to improve and make our customers’ lives easier. Since Cloudflare’s Load Balancing was first released, the most popular customer request was for an analytics service that would provide insights on traffic steering decisions.

Today, we are rolling out Load Balancing Analytics in the Traffic tab of the Cloudflare  dashboard. The three major components in the analytics service are:

  • An overview of traffic flow that can be filtered by load balancer, pool, origin, and region.
  • A latency map that indicates origin health status and latency metrics from Cloudflare’s global network spanning 194 cities and growing!
  • Event logs denoting changes in origin health. This feature was released in 2018 and tracks pool and origin transitions between healthy and unhealthy states. We’ve moved these logs under the new Load Balancing Analytics subtab. See the documentation to learn more.

In this blog post, we’ll discuss the traffic flow distribution and the latency map.

Traffic Flow Overview

Our users want a detailed view into where their traffic is going, why it is going there, and insights into what changes may optimize their infrastructure. With Load Balancing Analytics, users can graphically view traffic demands on load balancers, pools, and origins over variable time ranges.

Understanding how traffic flow is distributed informs the process of creating new origin pools, adapting to peak traffic demands, and observing failover response during origin pool failures.

Introducing Load Balancing Analytics
Figure 1

In Figure 1, we can see an overview of traffic for a given domain. On Tuesday, the 24th, the red pool was created and added to the load balancer. In the following 36 hours, as the red pool handled more traffic, the blue and green pool both saw a reduced workload. In this scenario, the traffic distribution graph did provide the customer with new insights. First, it demonstrated that traffic was being steered to the new red pool. It also allowed the customer to understand the new level of traffic distribution across their network. Finally, it allowed the customer to confirm whether traffic decreased in the expected pools. Over time, these graphs can be used to better manage capacity and plan for upcoming infrastructure needs.

Latency Map

The traffic distribution overview is only one part of the puzzle. Another essential component is understanding request performance around the world. This is useful because customers can ensure user requests are handled as fast as possible, regardless of where in the world the request originates.

The standard Load Balancing configuration contains monitors that probe the health of customer origins. These monitors can be configured to run from a particular region(s) or, for Enterprise customers, from all Cloudflare locations. They collect useful information, such as round-trip time, that can be aggregated to create the latency map.

The map provides a summary of how responsive origins are from around the world, so customers can see regions where requests are underperforming and may need further investigation. A common metric used to identify performance is request latency. We found that the p90 latency for all Load Balancing origins being monitored is 300 milliseconds, which means that 90% of all monitors’ health checks had a round trip time faster than 300 milliseconds. We used this value to identify locations where latency was slower than the p90 latency seen by other Load Balancing customers.

Introducing Load Balancing Analytics
Figure 2

In Figure 2, we can see the responsiveness of the Northeast Asia pool. The Northeast Asia pool is slow specifically for monitors in South America, the Middle East, and Southern Africa, but fast for monitors that are probing closer to the origin pool. Unfortunately, this means users for the pool in countries like Paraguay are seeing high request latency. High page load times have many unfortunate consequences: higher visitor bounce rate, decreased visitor satisfaction rate, and a lower search engine ranking. In order to avoid these repercussions, a site administrator could consider adding a new origin pool in a region closer to underserved regions. In Figure 3, we can see the result of adding a new origin pool in Eastern North America. We see the number of locations where the domain was found to be unhealthy drops to zero and the number of slow locations cut by more than 50%.

Introducing Load Balancing Analytics
Figure 3

Tied with the traffic flow metrics from the Overview page, the latency map arms users with insights to optimize their internal systems, reduce their costs, and increase their application availability.

GraphQL Analytics API

Behind the scenes, Load Balancing Analytics is powered by the GraphQL Analytics API. As you’ll learn later this week, GraphQL provides many benefits to us at Cloudflare. Customers now only need to learn a single API format that will allow them to extract only the data they require. For internal development, GraphQL eliminates the need for customized analytics APIs for each service, reduces query cost by increasing cache hits, and reduces developer fatigue by using a straightforward query language with standardized input and output formats. Very soon, all Load Balancing customers on paid plans will be given the opportunity to extract insights from the GraphQL API.  Let’s walk through some examples of how you can utilize the GraphQL API to understand your Load Balancing logs.

Suppose you want to understand the number of requests the pools for a load balancer are seeing from the different locations in Cloudflare’s global network. The query in Figure 4 counts the number of unique (location, pool ID) combinations every fifteen minutes over the course of a week.

Introducing Load Balancing Analytics
Figure 4

For context, our example load balancer, lb.example.com, utilizes dynamic steering. Dynamic steering directs requests to the most responsive, available, origin pool, which is often the closest. It does so using a weighted round-trip time measurement. Let’s try to understand why all traffic from Singapore (SIN) is being steered to our pool in Northeast Asia (asia-ne). We can run the query in Figure 5. This query shows us that the asia-ne pool has an avgRttMs value of 67ms, whereas the other two pools have avgRttMs values that exceed 150ms. The lower avgRttMs value explains why traffic in Singapore is being routed to the asia-ne pool.

Introducing Load Balancing Analytics
Figure 5

Notice how the query in Figure 4 uses the loadBalancingRequestsGroups schema, whereas the query in Figure 5 uses the loadBalancingRequests schema. loadBalancingRequestsGroups queries aggregate data over the requested query interval, whereas loadBalancingRequests provides granular information on individual requests. For those ready to get started, Cloudflare has written a helpful guide. The GraphQL website is also a great resource. We recommend you use an IDE like GraphiQL to make your queries. GraphiQL embeds the schema documentation into the IDE, autocompletes, saves your queries, and manages your custom headers, all of which help make the developer experience smoother.

Conclusion

Now that the Load Balancing Analytics solution is live and available to all Pro, Business, Enterprise customers, we’re excited for you to start using it! We’ve attached a survey to the Traffic overview page, and we’d love to hear your feedback.

Firewall Analytics: Now available to all paid plans

Post Syndicated from Alex Cruz Farmer original https://blog.cloudflare.com/updates-to-firewall-analytics/

Firewall Analytics: Now available to all paid plans

Firewall Analytics: Now available to all paid plans

Our Firewall Analytics tool enables customers to quickly identify and investigate security threats using an intuitive interface. Until now, this tool had only been available to our Enterprise customers, who have been using it to get detailed insights into their traffic and better tailor their security configurations. Today, we are excited to make Firewall Analytics available to all paid plans and share details on several recent improvements we have made.

All paid plans are now able to take advantage of these capabilities, along with several important enhancements we’ve made to improve our customers’ workflow and productivity.

Firewall Analytics: Now available to all paid plans

Increased Data Retention and Adaptive Sampling

Previously, Enterprise customers could view 14 days of Firewall Analytics for their domains. Today we’re increasing that retention to 30 days, and again to 90 days in the coming months. Business and Professional plan zones will get 30 and 3 days of retention, respectively.

In addition to the extended retention, we are introducing adaptive sampling to guarantee that Firewall Analytics results are displayed in the Cloudflare Dashboard quickly and reliably, even when you are under a massive attack or otherwise receiving a large volume of requests.

Adaptive sampling works similar to Netflix: when your internet connection runs low on bandwidth, you receive a slightly downscaled version of the video stream you are watching. When your bandwidth recovers, Netflix then upscales back to the highest quality available.

Firewall Analytics does this sampling on each query, ensuring that customers see the best precision available in the UI given current load on the zone. When results are sampled, the sampling rate will be displayed as shown below:

Firewall Analytics: Now available to all paid plans

Event-Based Logging

As adoption of our expressive Firewall Rules engine has grown, one consistent ask we’ve heard from customers is for a more streamlined way to see all Firewall Events generated by a specific rule. Until today, if a malicious request matched multiple rules, only the last one to execute was shown in the Activity Log, requiring customers to click into the request to see if the rule they’re investigating was listed as an “Additional match”.

To streamline this process, we’ve changed how the Firewall Analytics UI interacts with the Activity Log. Customers can now filter by a specific rule (or any other criteria) and see a row for each event generated by that rule. This change also makes it easier to review all requests that would have been blocked by a rule by creating it in Log mode first before changing it to Block.

Firewall Analytics: Now available to all paid plans

Challenge Solve Rates to help reduce False Positives

When our customers write rules to block undesired, automated traffic they want to make sure they’re not blocking or challenging desired traffic, e.g., humans wanting to make a purchase should be allowed but not bots scraping pricing.

To help customers determine what percent of CAPTCHA challenges returned to users may have been unnecessary, i.e., false positives, we are now showing the Challenge Solve Rate (CSR) for each rule. If you’re seeing rates higher than expected, e.g., for your Bot Management rules, you may want to relax the rule criteria. If the rate you see is 0% indicating that no CAPTCHAs are being solved, you may want to change the rule to Block outright rather than challenge.

Firewall Analytics: Now available to all paid plans

Hovering over the CSR rate will reveal the number of CAPTCHAs issued vs. solved:

Firewall Analytics: Now available to all paid plans

Exporting Firewall Events

Business and Enterprise customers can now export a set of 500 events from the Activity Log. The data exported are those events that remain after any selected filters have been applied.

Firewall Analytics: Now available to all paid plans

Column Customization

Sometimes the columns shown in the Activity Log do not contain the details you want to see to analyze the threat. When this happens, you can now click “Edit Columns” to select the fields you want to see. For example, a customer diagnosing a Bot related issue may want to also view the User-Agent and the source country whereas a customer investigating a DDoS attack may want to see IP addresses, ASNs, Path, and other attributes. You can now customize what you’d like to see as shown below.

Firewall Analytics: Now available to all paid plans

We would love to hear your feedback and suggestions, so feel free to reach out to us via our Community forums or through your Customer Success team.

If you’d like to receive more updates like this one directly to your inbox, please subscribe to our Blog!

Announcing deeper insights and new monitoring capabilities from Cloudflare Analytics

Post Syndicated from Filipp Nisenzoun original https://blog.cloudflare.com/announcing-deeper-insights-and-new-monitoring-capabilities/

Announcing deeper insights and new monitoring capabilities from Cloudflare Analytics

Announcing deeper insights and new monitoring capabilities from Cloudflare Analytics

This week we’re excited to announce a number of new products and features that provide deeper security and reliability insights, “proactive” analytics when there’s a problem, and more powerful ways to explore your data.

If you’ve been a user or follower of Cloudflare for a little while, you might have noticed that we take pride in turning technical challenges into easy solutions. Flip a switch or run a few API commands, and the attack you’re facing is now under control or your site is now 20% faster. However, this ease of use is even more helpful if it’s complemented by analytics. Before you make a change, you want to be sure that you understand your current situation. After the change, you want to confirm that it worked as intended, ideally as fast as possible.

Because of the front-line position of Cloudflare’s network, we can provide comprehensive metrics regarding both your traffic and the security and performance of your Internet property. And best of all, there’s nothing to set up or enable. Cloudflare Analytics is automatically available to all Cloudflare users and doesn’t rely on Javascript trackers, meaning that our metrics include traffic from APIs and bots and are not skewed by ad blockers.

Here’s a sneak peek of the product launches. Look out for individual blog posts this week for more details on each of these announcements.

  • Product Analytics:
    • Today, we’re making Firewall Analytics available to Business and Pro plans, so that more customers understand how well Cloudflare mitigates attacks and handles malicious traffic. And we’re highlighting some new metrics, such as the rate of solved captchas (useful for Bot Management), and features, such as customizable reports to facilitate sharing and archiving attack information.
    • We’re introducing Load Balancing Analytics, which shows traffic flows by load balancer, pool, origin, and region, and helps explain why a particular origin was selected to receive traffic.
  • Monitoring:
    • We’re announcing tools to help you monitor your origin either actively or passively and automatically reroute your traffic to a different server when needed. Because Cloudflare sits between your end users and your origin, we can spot problems with your servers without the use of external monitoring services.
  • Data tools:
    • The product analytics we’ll be featuring this week use a new API behind the scenes. We’re making this API generally available, allowing you to easily build custom dashboards and explore all of your Cloudflare data the same way we do, so you can easily gain insights and identify and debug issues.
  • Account Analytics:
    • We’re releasing (in beta) a new dashboard that shows aggregated information for all of the domains under your account, allowing you to know what’s happening at a glance.

We’re excited to tell you about all of these new products in this week’s posts and would love to hear your thoughts. If you’re not already subscribing to the blog, sign up now to receive daily updates in your inbox.

A History of HTML Parsing at Cloudflare: Part 2

Post Syndicated from Andrew Galloni original https://blog.cloudflare.com/html-parsing-2/

A History of HTML Parsing at Cloudflare: Part 2

A History of HTML Parsing at Cloudflare: Part 2

The second blog post in the series on HTML rewriters picks up the story in 2017 after the launch of the Cloudflare edge compute platform Cloudflare Workers. It became clear that the developers using workers wanted the same HTML rewriting capabilities that we used internally, but accessible via a JavaScript API.

This blog post describes the building of a streaming HTML rewriter/parser with a CSS-selector based API in Rust. It is used as the back-end for the Cloudflare Workers HTMLRewriter. We have open-sourced the library (LOL HTML) as it can also be used as a stand-alone HTML rewriting/parsing library.

The major change compared to LazyHTML, the previous rewriter, is the dual-parser architecture required to overcome the additional performance overhead of wrapping/unwrapping each token when propagating tokens to the workers runtime. The remainder of the post describes a CSS selector matching engine inspired by a Virtual Machine approach to regular expression matching.

v2 : Give it to everyone and make it faster

In 2017, Cloudflare introduced an edge compute platform – Cloudflare Workers. It was no surprise that customers quickly required the same HTML rewriting capabilities that we were using internally. Our team was impressed with the platform and decided to migrate some of our features to Workers. The goal was to improve our developer experience working with modern JavaScript rather than statically linked NGINX modules implemented in C with a Lua API.

It is possible to rewrite HTML in Workers, though for that you needed a third party JavaScript package (such as Cheerio). These packages are not designed for HTML rewriting on the edge due to the latency, speed and memory considerations described in the previous post.

JavaScript is really fast but it still can’t always produce performance comparable to native code for some tasks – parsing being one of those. Customers typically needed to buffer the whole content of the page to do the rewriting resulting in considerable output latency and memory consumption that often exceeded the memory limits enforced by the Workers runtime.

We started to think about how we could reuse the technology in Workers. LazyHTML was a perfect fit in terms of parsing performance, but it had two issues:

  1. API ergonomics: LazyHTML produces a stream of HTML tokens. This is sufficient for our internal needs. However, for an average user, it is not as convenient as the jQuery-like API of Cheerio.
  2. Performance: Even though LazyHTML is tremendously fast, integration with the Workers runtime adds even more limitations. LazyHTML operates as a simple parse-modify-serialize pipeline, which means that it produces tokens for the whole content of the page. All of these tokens then have to be propagated to the Workers runtime and wrapped inside a JavaScript object and then unwrapped and fed back to LazyHTML for serialization. This is an extremely expensive operation which would nullify the performance benefit of LazyHTML.

A History of HTML Parsing at Cloudflare: Part 2
LazyHTML with V8

LOL HTML

We needed something new, designed with Workers requirements in mind, using a language with the native speed and safety guarantees (it’s incredibly easy to shoot yourself in the foot doing parsing). Rust was the obvious choice as it provides the native speed and the best guarantee of memory safety which minimises the attack surface of untrusted input. Wherever possible the Low Output Latency HTML rewriter (LOL HTML) uses all the previous optimizations developed for LazyHTML such as tag name hashing.

Dual-parser architecture

Most developers are familiar and prefer to use CSS selector-based APIs (as in Cheerio, jQuery or DOM itself) for HTML mutation tasks. We decided to base our API on CSS selectors as well. Although this meant additional implementation complexity, the decision created even more opportunities for parsing optimizations.

As selectors define the scope of the content that should be rewritten, we realised we can skip the content that is not in this scope and not produce tokens for it. This not only significantly speeds up the parsing itself, but also avoids the performance burden of the back and forth interactions with the JavaScript VM. As ever the best optimization is not to do something.

A History of HTML Parsing at Cloudflare: Part 2

Considering the tasks required, LOL HTML’s parser consists of two internal parsers:

  • Lexer – a regular full parser, that produces output for all types of content that it encounters;
  • Tag scanner – looks for start and end tags and skips parsing the rest of the content. The tag scanner parses only the tag name and feeds it to the selector matcher. The matcher will switch parser to the lexer if there was a match or additional information about the tag (such as attributes) are required for matching.

The parser switches back to the tag scanner as soon as input leaves the scope of all selector matches. The tag scanner may also sometimes switch the parser to the Lexer – if it requires additional tag information for the parsing feedback simulation.

A History of HTML Parsing at Cloudflare: Part 2
LOL HTML architecture

Having two different parser implementations for the same grammar will increase development costs and is error-prone due to implementation inconsistencies. We minimize these risks by implementing a small Rust macro-based DSL which is similar in spirit to Ragel. The DSL program describes Nondeterministic finite automaton states and actions associated with each state transition and matched input byte.

An example of a DSL state definition:

tag_name_state {
   whitespace => ( finish_tag_name?; --> before_attribute_name_state )
   b'/'       => ( finish_tag_name?; --> self_closing_start_tag_state )
   b'>'       => ( finish_tag_name?; emit_tag?; --> data_state )
   eof        => ( emit_raw_without_token_and_eof?; )
   _          => ( update_tag_name_hash; )
}

The DSL program gets expanded by the Rust compiler into not quite as beautiful, but extremely efficient Rust code.

We no longer need to reimplement the code that drives the parsing process for each of our parsers. All we need to do is to define different action implementations for each. In the case of the tag scanner, the majority of these actions are a no-op, so the Rust compiler does the NFA optimization job for us: it optimizes away state branches with no-op actions and even whole states if all of the branches have no-op actions. Now that’s cool.

Byte slice processing optimisations

Moving to a memory-safe language provided new challenges. Rust has great memory safety mechanisms, however sometimes they have a runtime performance cost.

The task of the parser is to scan through the input and find the boundaries of lexical units of the language – tokens and their internal parts. For example, an HTML start tag token consists of multiple parts: a byte slice of input that represents the tag name and multiple pairs of input slices that represent attributes and values:

struct StartTagToken<'i> {
   name: &'i [u8],
   attributes: Vec<(&'i [u8], &'i [u8])>,
   self_closing: bool
}

As Rust uses bound checks on memory access, construction of a token might be a relatively expensive operation. We need to be capable of constructing thousands of them in a fraction of second, so every CPU instruction counts.

Following the principle of doing as little as possible to improve performance we use a “token outline” representation of tokens: instead of having memory slices for token parts we use numeric ranges which are lazily transformed into a byte slice when required.

struct StartTagTokenOutline {
   name: Range<usize>,
   attributes: Vec<(Range<usize>, Range<usize>)>,
   self_closing: bool
}

As you might have noticed, with this approach we are no longer bound to the lifetime of the input chunk which turns out to be very useful. If a start tag is spread across multiple input chunks we can easily update the token that is currently in construction, as new chunks of input arrive by just adjusting integer indices. This allows us to avoid constructing a new token with slices from the new input memory region (it could be the input chunk itself or the internal parser’s buffer).

This time we can’t get away with avoiding the conversion of input character encoding; we expose a user-facing API that operates on JavaScript strings and input HTML can be of any encoding. Luckily, as we can still parse without decoding and only encode and decode within token bounds by a request (though we still can’t do that for UTF-16 encoding).

So, when a user requests an element’s tag name in the API, internally it is still represented as a byte slice in the character encoding of the input, but when provided to the user it gets dynamically decoded. The opposite process happens when a user sets a new tag name.

For selector matching we can still operate on the original encoding representation – because we know the input encoding ahead of time we preemptively convert values in a selector to the page’s character encoding, so comparisons can be done without decoding fields of each token.

As you can see, the new parser architecture along with all these optimizations produced great performance results:

A History of HTML Parsing at Cloudflare: Part 2
Average parsing time depending on the input size – lower is better

LOL HTML’s tag scanner is typically twice as fast as LazyHTML and the lexer has comparable performance, outperforming LazyHTML on bigger inputs. Both are a few times faster than the tokenizer from html5ever – another parser implemented in Rust used in the Mozilla’s Servo browser engine.

CSS selector matching VM

With an impressively fast parser on our hands we had only one thing missing – the CSS selector matcher. Initially we thought we could just use Servo’s CSS selector matching engine for this purpose. After a couple of days of experimentation it turned out that it is not quite suitable for our task.

It did not work well with our dual parser architecture. We first need to to match just a tag name from the tag scanner, and then, if we fail, query the lexer for the attributes. The selectors library wasn’t designed with this architecture in mind so we needed ugly hacks to bail out from matching in case of insufficient information. It was inefficient as we needed to start matching again after the bailout doing twice the work. There were other problems, such as the integration of lazy character decoding and integration of tag name comparison using tag name hashes.

Matching direction

The main problem encountered was the need to backtrack all the open elements for matching. Browsers match selectors from right to left and traverse all ancestors of an element. This StackOverflow has a good explanation of why they do it this way. We would need to store information about all open elements and their attributes – something that we can’t do while operating with tight memory constraints. This matching approach would be inefficient for our case – unlike browsers, we expect to have just a few selectors and a lot of elements. In this case it is much more efficient to match selectors from left to right.

And this is when we had a revelation. Consider the following CSS selector:

body > div.foo  img[alt] > div.foo ul

It can be split into individual components attributed to a particular element with hierarchical combinators in between:

body > div.foo img[alt] > div.foo  ul
---    ------- --------   -------  --

Each component is easy to match having a start tag token – it’s just a matter of comparison of token fields with values in the component. Let’s dive into abstract thinking and imagine that each such component is a character in the infinite alphabet of all possible components:

Selector componentCharacter
bodya
div.foob
img[alt]c
uld

Let’s rewrite our selector with selector components replaced by our imaginary characters:

a > b c > b d

Does this remind you of something?

A   `>` combinator can be considered a child element, or “immediately followed by”.

The ` ` (space) is a descendant element can be thought of as there might be zero or more elements in between.

There is a very well known abstraction to express these relations – regular expressions. The selector replacing combinators can be replaced with a regular expression syntax:

ab.*cb.*d

We transformed our CSS selector into a regular expression that can be executed on the sequence of start tag tokens. Note that not all CSS selectors can be converted to such a regular grammar and the input on which we match has some specifics, which we’ll discuss later. However, it was a good starting point: it allowed us to express a significant subset of selectors.

Implementing a Virtual Machine

Next, we started looking at non-backtracking algorithms for regular expressions. The virtual machine approach seemed suitable for our task as it was possible to have a non-backtracking implementation that was flexible enough to work around differences between real regular expression matching on strings and our abstraction.

VM-based regular expression matching is implemented as one of the engines in many regular expression libraries such as regexp2 and Rust’s regex. The basic idea is that instead of building an NFA or DFA for a regular expression it is instead converted into DSL assembly language with instructions later executed by the virtual machine – regular expressions are treated as programs that accept strings for matching.

Since the VM program is just a representation of NFA with ε-transitions it can exist in multiple states simultaneously during the execution, or, in other words, spawns multiple threads. The regular expression matches if one or more states succeed.

For example, consider the following VM instructions:

  • expect c – waits for next input character, aborts the thread if doesn’t equal to the instruction’s operand;
  • jmp L – jump to label ‘L’;
  • thread L1, L2 – spawns threads for labels L1 and L2, effectively splitting the execution;
  • match – succeed the thread with a match;

For example, using this instructions set regular expression “ab*c” can be translated into:

    expect a
L1: thread L2, L3
L2: expect b
    jmp L1
L3: expect c
    match

Let’s try to translate the regular expression ab.*cb.*d from the selector we saw earlier:

    expect a
    expect b
L1: thread L2, L3
L2: expect [any]
    jmp L1
L3: expect c
    expect b
L4: thread L5, L6
L5: expect [any]
    jmp L4
L6: expect d
    match

That looks complex! Though this assembly language is designed for regular expressions in general, and regular expressions can be much more complex than our case. For us the only kind of repetition that matters is “.*”. So, instead of expressing it with multiple instructions we can use just one called hereditary_jmp:

    expect a
    expect b
    hereditary_jmp L1
L1: expect c
    expect b
    hereditary_jmp L2
L2: expect d
    match

The instruction tells VM to memoize instruction’s label operand and unconditionally spawn a thread with a jump to this label on each input character.

There is one significant distinction between the string input of regular expressions and the input provided to our VM. The input can shrink!

A regular string is just a contiguous sequence of characters, whereas we operate on a sequence of open elements. As new tokens arrive this sequence can grow as well as shrink. Assume we represent <div> as ‘a’ character in our imaginary language, so having <div><div><div> input we can represent it as aaa, if the next token in the input is </div> then our “string” shrinks to aa.

You might think at this point that our abstraction doesn’t work and we should try something else. What we have as an input for our machine is a stack of open elements and we needed a stack-like structure to store our hereditrary_jmp instruction labels that VM had seen so far. So, why not store it on the open element stack? If we store the next instruction pointer on each of stack items on which the expect instruction was successfully executed, we’ll have a full snapshot of the VM state, so we can easily roll back to it if our stack shrinks.

With this implementation we don’t need to store anything except a tag name on the stack, and, considering that we can use the tag name hashing algorithm, it is just a 64-bit integer per open element. As an additional small optimization, to avoid traversing of the whole stack in search of active hereditary jumps on each new input we store an index of the first ancestor with a hereditary jump on each stack item.

For example, having the following selector “body > div span” we’ll have the following VM program (let’s get rid of labels and just use instruction indices instead):

0| expect <body>
1| expect <div>
2| hereditary_jmp 3
3| expect <span>
4| match

Having an input “<body><div><div><a>” we’ll have the following stack:

A History of HTML Parsing at Cloudflare: Part 2

Now, if the next token is a start tag <span> the VM will first try to execute the selectors program from the beginning and will fail on the first instruction. However, it will also look for any active hereditary jumps on the stack. We have one which jumps to the instructions at index 3. After jumping to this instruction the VM successfully produces a match. If we get yet another <span> start tag later it will much as well following the same steps which is exactly what we expect for the descendant selector.

If we then receive a sequence of “</span></span></div></a></div>” end tags our stack will contain only one item:

A History of HTML Parsing at Cloudflare: Part 2

which instructs VM to jump to instruction at index 1, effectively rolling back to matching the div component of the selector.

We mentioned earlier that we can bail out from the matching process if we only have a tag name from the tag scanner and we need to obtain more information by running the lexer? With a VM approach it is as easy as stopping the execution of the current instruction and resuming it later when we get the required information.

Duplicate selectors

As we need a separate program for each selector we need to match, how can we stop the same simple components doing the same job? The AST for our selector matching program is a radix tree-like structure whose edge labels are simple selector components and nodes are hierarchical combinators.
For example for the following selectors:

body > div > link[rel]
body > span
body > span a

we’ll get the following AST:

A History of HTML Parsing at Cloudflare: Part 2

If selectors have common prefixes we can match them just once for all these selectors. In the compilation process, we flatten this structure into a vector of instructions.

[not] JIT-compilation

For performance reasons compiled instructions are macro-instructions – they incorporate multiple basic VM instruction calls. This way the VM can execute only one macro instruction per input token. Each of the macro instructions compiled using the so-called “[not] JIT-compilation” (the same approach to the compilation is used in our other Rust project – wirefilter).

Internally the macro instruction contains expect and following jmp, hereditary_jmp and match basic instructions. In that sense macro-instructions resemble microcode making it easy to suspend execution of a macro instruction if we need to request attributes information from the lexer.

What’s next

It is obviously not the end of the road, but hopefully, we’ve got a bit closer to it. There are still multiple bits of functionality that need to be implemented and certainly, there is a space for more optimizations.

If you are interested in the topic don’t hesitate to join us in development of LazyHTML and LOL HTML at GitHub and, of course, we are always happy to see people passionate about technology here at Cloudflare, so don’t hesitate to contact us if you are too :).