Tag Archives: security

Introducing for Families

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/introducing-1-1-1-1-for-families/

Introducing for Families

Two years ago today we announced, a secure, fast, privacy-first DNS resolver free for anyone to use. In those two years, has grown beyond our wildest imagination. Today, we process more than 200 billion DNS requests per day making us the second largest public DNS resolver in the world behind only Google.

Introducing for Families

Yesterday, we announced the results of the privacy examination. Cloudflare’s business has never involved selling user data or targeted advertising, so it was easy for us to commit to strong privacy protections for We’ve also led the way supporting encrypted DNS technologies including DNS over TLS and DNS over HTTPS. It is long past time to stop transmitting DNS in plaintext and we’re excited that we see more and more encrypted DNS traffic every day. for Families

Introducing for Families

Since launching, the number one request we have received is to provide a version of the product that automatically filters out bad sites. While can safeguard user privacy and optimize efficiency, it is designed for direct, fast DNS resolution, not for blocking or filtering content. The requests we’ve received largely come from home users who want to ensure that they have a measure of protection from security threats and can keep adult content from being accessed by their kids. Today, we’re happy to answer those requests.

Introducing for Families — the easiest way to add a layer of protection to your home network and protect it from malware and adult content. for Families leverages Cloudflare’s global network to ensure that it is fast and secure around the world. And it includes the same strong privacy guarantees that we committed to when we launched two years ago. And, just like, we’re providing it for free and it’s for any home anywhere in the world.

Two Flavors: (No Malware) & (No Malware or Adult Content)

Introducing for Families for Families is easy to set up and install, requiring just changing two numbers in the settings of your home devices or network router: your primary DNS and your secondary DNS. Setting up for Families usually takes less than a minute and we’ve provided instructions for common devices and routers through the installation guide. for Families has two default options: one that blocks malware and the other that blocks malware and adult content. You choose which setting you want depending on which IP address you configure.

Malware Blocking Only
Primary DNS:
Secondary DNS:

Malware and Adult Content
Primary DNS:
Secondary DNS:

Additional Configuration

Introducing for Families

In the coming months, we will provide the ability to define additional configuration settings for for Families. This will include options to create specific whitelists and blacklists of certain sites. You will be able to set the times of the day when categories, such as social media, are blocked and get reports on your household’s Internet usage. for Families is built on top of the same site categorization and filtering technology that powers Cloudflare’s Gateway product. With the success of Gateway, we wanted to provide an easy-to-use service that can help any home network be fast, reliable, secure, and protected from potentially harmful content.

Not A Joke

Most of Cloudflare’s business involves selling services to businesses. However, we’ve made it a tradition every April 1 to launch a new consumer product that leverages our network to bring more speed, reliability, and security to every Internet user. While we make money selling to businesses, the products we launch at this time of the year are close to our hearts because of the broad impact they have for every Internet user.

Introducing for Families

This year, while many of us are confined to our homes, protecting our communities from COVID-19, and relying on our home networks more than ever it seemed especially important to launch for Families. We hope during these troubled times it will help provide a bit of peace of mind for households everywhere.

Announcing the Beta for WARP for macOS and Windows

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/announcing-the-beta-for-warp-for-macos-and-windows/

Announcing the Beta for WARP for macOS and Windows

Announcing the Beta for WARP for macOS and Windows

Last April 1 we announced WARP — an option within the iOS and Android app to secure and speed up Internet connections. Today, millions of users have secured their mobile Internet connections with WARP.

While WARP started as an option within the app, it’s really a technology that can benefit any device connected to the Internet. In fact, one of the most common requests we’ve gotten over the last year is support for WARP for macOS and Windows. Today we’re announcing exactly that: the start of the WARP beta for macOS and Windows.

What’s The Same: Fast, Secure, and Free

We always wanted to build a WARP client for macOS and Windows. We started with mobile because it was the hardest challenge. And it turned out to be a lot harder than we anticipated. While we announced the beta of with WARP on April 1, 2019 it took us until late September before we were able to open it up to general availability. We don’t expect the wait for macOS and Windows WARP to be nearly as long.

The WARP client for macOS and Windows relies on the same fast, efficient Wireguard protocol to secure Internet connections and keep them safe from being spied on by your ISP. Also, just like WARP on the mobile app, the basic service will be free on macOS and Windows.

Announcing the Beta for WARP for macOS and Windows

WARP+ Gets You There Faster

We plan to add WARP+ support in the coming months to allow you to leverage Cloudflare’s Argo network for even faster Internet performance. We will provide a plan option for existing WARP+ subscribers to add additional devices at a discount. In the meantime, existing WARP+ users will be among the first to be invited to try WARP for macOS and Windows. If you are a WARP+ subscriber, check your app over the coming weeks for a link to an invitation to try the new WARP for macOS and Windows clients.

If you’re not a WARP+ subscriber, you can add yourself to the waitlist by signing up on the page linked below. We’ll email as soon as it’s ready for you to try.


Linux Support

We haven’t forgotten about Linux. About 10% of Cloudflare’s employees run Linux on their desktops. As soon as we get the macOS and Windows clients out we’ll turn our attention to building a WARP client for Linux.

Thank you to everyone who helped us make WARP fast, efficient, and reliable on mobile. It’s incredible how far it’s come over the last year. If you tried it early in the beta last year but aren’t using it now, I encourage you to give it another try. We’re looking forward to bringing WARP speed and security to even more devices.

Cloudflare now supports security keys with Web Authentication (WebAuthn)!

Post Syndicated from Anita Tenjarla original https://blog.cloudflare.com/cloudflare-now-supports-security-keys-with-web-authentication-webauthn/

Cloudflare now supports security keys with Web Authentication (WebAuthn)!

Cloudflare now supports security keys with Web Authentication (WebAuthn)!

We’re excited to announce that Cloudflare now supports security keys as a two factor authentication (2FA) method for all users. Cloudflare customers now have the ability to use security keys on WebAuthn-supported browsers to log into their user accounts. We strongly suggest users configure multiple security keys and 2FA methods on their account in order to access their apps from various devices and browsers. If you want to get started with security keys, visit your account’s 2FA settings.

Cloudflare now supports security keys with Web Authentication (WebAuthn)!

What is WebAuthn?

WebAuthn is a standardized protocol for authentication online using public key cryptography. It is part of the FIDO2 Project and is backwards compatible with FIDO U2F. Depending on your device and browser, you can use hardware security keys (like YubiKeys) or built-in biometric support (like Apple Touch ID) to authenticate to your Cloudflare user account as a second factor. WebAuthn support is rapidly increasing among browsers and devices, and we’re proud to join the growing list of services that offer this feature.

To use WebAuthn, a user registers their security key, or “authenticator”, to a supporting application, or “relying party” (in this case Cloudflare). The authenticator then generates and securely stores a public/private keypair on the device. The keypair is scoped to a specific domain and user account. The authenticator then sends the public key to the relying party, who stores it. A user may have multiple authenticators registered with the same relying party. In fact, it’s strongly encouraged for a user to do so in case an authenticator is lost or broken.

When a user logs into their account, the relying party will issue a randomly generated byte sequence called a “challenge”. The authenticator will prompt the user for “interaction” in the form of a tap, touch or PIN before signing the challenge with the stored private key and sending it back to the relying party. The relying party evaluates the signed challenge against the public key(s) it has stored associated with the user, and if the math adds up the user is authenticated! To learn more about how WebAuthn works, take a look at the official documentation.

Cloudflare now supports security keys with Web Authentication (WebAuthn)!

How is WebAuthn different from other 2FA methods?

There’s a lot of hype about WebAuthn, and rightfully so. But there are some common misconceptions about how WebAuthn actually works, so I wanted to take some time to explain why it’s so effective against various credential-based attacks.

First, WebAuthn relies on a “physical thing you have” rather than an app or a phone number, which makes it a lot harder for a remote attacker to impersonate a victim. This assumption prevents common exploits like SIM swapping, which is an attack used to bypass SMS-based verification. In contrast, an attacker physically (and cryptographically) cannot “impersonate” a hardware security key unless they have physical access to a victim’s unlocked device.

WebAuthn is also simpler and quicker to use compared to mobile app-based 2FA methods. Users often complain about the amount of time it takes to reach for their phone, open an app, and copy over an expiring passcode every time they want to log into an account. By contrast, security keys require a simple touch or tap on a piece of hardware that’s often attached to a device.

But where WebAuthn really shines is its particular resistance to phishing attacks. Phishing often requires an attacker to construct a believable fake replica of a target site. For example, an attacker could try to register cloudfare[.]com (notice the typo!) and construct a site that looks similar to the genuine cloudflare[.]com. The attacker might then try to trick a victim into logging into the fake site and disclosing their credentials. Even if the victim has mobile app TOTP authentication enabled, a sophisticated attacker can still proxy requests from the fake site to the genuine site and successfully authenticate as the victim. This is the assumption behind powerful man-in-the-middle tools like evilginx.

WebAuthn prevents users from falling victim to common phishing and man-in-the-middle attacks because it takes the domain name into consideration when creating user credentials. When an authenticator creates the public/private keypair, it is specifically scoped to a particular account and domain. So let’s say a user with WebAuthn configured navigates to the phishy cloudfare[.]com site. When the phishy site prompts the authenticator to sign its challenge, the authenticator will attempt to find credentials for that phishy site’s domain and, upon failing to find any, will error and prevent the user from logging in. This is why hardware security keys are among the most secure authentication methods in existence today according to research by Google.

Cloudflare now supports security keys with Web Authentication (WebAuthn)!

WebAuthn also has very strict privacy guarantees.  If a user authenticates with a biometric key (like Apple TouchID or Windows Hello), the relying party never receives any of that biometric data. The communication between authenticator and client browser is completely separate from the communication between client browser and relying party. WebAuthn also urges relying parties to not disclose user-identifiable information (like email addresses) during registration or authentication. This helps prevent replay or user enumeration attacks. And because credentials are strictly scoped to a particular relying party and domain, a malicious relying party won’t be able to gain information about other relying parties an authenticator has created credentials for in order to track a user’s various accounts.

Finally, WebAuthn is great for relying parties because they don’t have to store anything additionally sensitive about a user. The relying party simply stores a user’s public key. An attacker who gains access to the public key can’t do much with it because they won’t know the associated private key. This is markedly less risky than TOTP, where a relying party must use proper hygiene to store a TOTP secret seed from which all subsequent time-based user passcodes are generated.

Security isn’t always intuitive

Sometimes in the security industry we have the tendency to fixate on new and sophisticated attacks. But often it’s the same old “simple” problems that have the highest impact. Two factor authentication is a textbook case where the security industry largely believes a concept is trivial, but the average user still finds it confusing or annoying. WebAuthn addresses this problem because it’s quicker and more secure for the end user compared to other authentication methods. We think the trend towards security key adoption will continue to grow, and we’re looking forward to doing our part to help the effort.

Note: If you login to your Cloudflare user account with Single Sign-On (SSO), you will not have the option to use two factor authentication (2FA). This is because your SSO provider manages your 2FA methods. To learn more about Cloudflare’s 2FA offerings, please visit our support center.

Amazon Detective – Rapid Security Investigation and Analysis

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-detective-rapid-security-investigation-and-analysis/

Almost five years ago, I blogged about a solution that automatically analyzes AWS CloudTrail data to generate alerts upon sensitive API usage. It was a simple and basic solution for security analysis and automation. But demanding AWS customers have multiple AWS accounts, collect data from multiple sources, and simple searches based on regular expressions are not enough to conduct in-depth analysis of suspected security-related events. Today, when a security issue is detected, such as compromised credentials or unauthorized access to a resource, security analysts cross-analyze several data logs to understand the root cause of the issue and its impact on the environment. In-depth analysis often requires scripting and ETL to connect the dots between data generated by multiple siloed systems. It requires skilled data engineers to answer basic questions such as “is this normal?”. Analysts use Security Information and Event Management (SIEM) tools, third-party libraries, and data visualization tools to validate, compare, and correlate data to reach their conclusions. To further complicate the matters, new AWS accounts and new applications are constantly introduced, forcing analysts to constantly reestablish baselines of normal behavior, and to understand new patterns of activities every time they evaluate a new security issue.

Amazon Detective is a fully managed service that empowers users to automate the heavy lifting involved in processing large quantities of AWS log data to determine the cause and impact of a security issue. Once enabled, Detective automatically begins distilling and organizing data from AWS Guard Duty, AWS CloudTrail, and Amazon Virtual Private Cloud Flow Logs into a graph model that summarizes the resource behaviors and interactions observed across your entire AWS environment.

At re:invent 2019, we announced a preview of Amazon Detective. Today, it is our pleasure to announce its availability for all AWS Customers.

Amazon Detective uses machine learning models to produce graphical representations of your account behavior and helps you to answer questions such as “is this an unusual API call for this role?” or “is this spike in traffic from this instance expected?”. You do not need to write code, to configure or to tune your own queries.

To get started with Amazon Detective, I open the AWS Management Console, I type “detective” in the search bar and I select Amazon Detective from the provided results to launch the service. I enable the service and I let the console guide me to configure “member” accounts to monitor and the “master” account in which to aggregate the data. After this one-time setup, Amazon Detective immediately starts analyzing AWS telemetry data and, within a few minutes, I have access to a set of visual interfaces that summarize my AWS resources and their associated behaviors such as logins, API calls, and network traffic. I search for a finding or resource from the Amazon Detective Search bar and, after a short while, I am able to visualize the baseline and current value for a set of metrics.

I select the resource type and ID and start to browse the various graphs.

I can also investigate a AWS Guard Duty finding by using the native integrations within the Guard Duty and AWS Security Hub consoles. I click the “Investigate” link from any finding from AWS Guard Duty and jump directly into a Amazon Detective console that provides related details, context, and guidance to investigate and to respond to the issue. In the example below, Guard Duty reports an unauthorized access that I decide to investigate:

Amazon Detective console opens:

I scroll down the page to check the graph of failed API calls. I click a bar in the graph to get the details, such as the IP addresses where the calls originated:

Once I know the source IP addresses, I click New behavior: AWS role and observe where these calls originated from to compare with the automatically discovered baseline.

Amazon Detective works across your AWS accounts, it is a multi-account solution that aggregates data and findings from up to 1000 AWS accounts into a single security-owned “master” account making it easy to view behavioral patterns and connections across your entire AWS environment.

There are no agents, sensors, or additional software to deploy in order to use the service. Amazon Detective retrieves, aggregates and analyzes data from AWS Guard Duty, AWS CloudTrail and Amazon Virtual Private Cloud Flow Logs. Amazon Detective collects existing logs directly from AWS without touching your infrastructure, thereby not causing any impact to cost or performance.

Amazon Detective can be administered via the AWS Management Console or via the Amazon Detective management APIs. The management APIs enable you to build Amazon Detective into your standard account registration, enablement, and deployment processes.

Amazon Detective is a regional service. I activate the service in every AWS Regions in which I want to analyze findings. All data are processed in the AWS Region where they are generated. Amazon Detective maintains data analytics and log summaries in the behavior graph for a 1-year rolling period from the date of log ingestion. This allows for visual analysis and deep dives over a large data set for a long period of time. When I disable the service, all data is expunged to ensure no data remains.

There are no additional charges or upfront commitments required to use Amazon Detective. We charge per GB of data ingested from AWS AWS CloudTrail, Amazon Virtual Private Cloud Flow Logs, and AWS Guard Duty findings. Amazon Detective offers a 30-day free trial. As usual, check the pricing page for the details.

Amazon Detective is available in all commercial AWS Regions, except China. You can start to use it today.

— seb

Using Cloudflare to secure your cardholder data environment

Post Syndicated from Jacob Zollinger original https://blog.cloudflare.com/using-cloudflare-to-secure-your-cardholder-data-environment/

Using Cloudflare to secure your cardholder data environment

Using Cloudflare to secure your cardholder data environment

As part of our ongoing compliance efforts Cloudflare’s PCI scope is reviewed quarterly and after any significant changes to ensure all in-scope systems are operating in accordance with the PCI DSS. This review also allows us to periodically review each product we offer as a PCI validated service provider and identify where there might be opportunities to provide greater value to our customers.

With our customers in mind, we completed our latest assessment and have increased our PCI certified product offering!

Building trust in our products is one critical component that allows Cloudflare’s mission of “Building a Better Internet” to succeed. We reaffirm our dedication to building trust in our products by obtaining industry standard security compliance certifications and complying with regulations.

Cloudflare is a Level 1 Merchant, the highest level, and also provides services to organizations to help secure their cardholder data environment. Maintaining PCI DSS compliance is important for Cloudflare because (1) we must ensure that our transmission and processing of cardholder data is secure for our own customers, (2) that our customers know they can trust Cloudflare’s products to transmit cardholder data securely, and (3) that anyone who interacts with Cloudflare’s services know that their information is transmitted securely.

The PCI standard applies to any company or organization that accepts credit cards, debit cards, or even prepaid cards for payment. The purpose of this compliance standard is to help protect financial institutions and customers from having their payment card information compromised. Each major payment card brand has merchants sorted into different tiers based on the number of transactions made per year, and each tier requires varying requirements to satisfy their compliance obligations. Annually, Cloudflare undergoes an assessment by a Qualified Security Assessor. This assessor conducts a thorough review of Cloudflare’s technical environment and validates that Cloudflare’s controls related to securing the transmission, processing, and storage of cardholder data meet the requirements in the PCI Data Security Standard (PCI DSS).

Cloudflare has been PCI compliant since 2014 as both a merchant and as a service provider, but this year we have expanded our Service Provider scope to include more products that will help our customers become more secure and meet their own compliance obligations.

How can Cloudflare Help You?

In addition to our WAF, we are proud to announce that Cloudflare’s Content Delivery Network, Cloudflare Access, and the Cloudflare Time Service are also certified under our latest Attestation of Compliance!

Our Attestation of Compliance is applicable for all Pro, Business, and Enterprise accounts. This designation can be used to simplify your PCI audits and remove the pressure on you to manage these services or appliances locally.

If you use our WAF, enable the OWASP ruleset, and tune rules for your environment you will meet the need to protect web-facing applications and satisfy PCI requirement 6.6.

As detailed by several recent blog posts, Cloudflare Access is changing the game and your relationship with your corporate VPN. Many organizations rely on VPNs and other segmentation tools to reduce the scope of their cardholder data environment. Cloudflare Access provides another means of segmentation by using Cloudflare’s global network as a VPN service to access internal resources. Additionally, these sessions can be configured to time out after 15 minutes of inactivity to help customers meet requirement 8.1.8!

There are several large providers of time services that most organizations use. However, in 2019 Cloudflare announced our time.cloudflare.com NTP service. The benefits of using our time service rely on the use of our CDN and our global network to provide an advantage in latency and accuracy. Our 200 locations around the world all use anycast to route your packets to our closest server. All of our servers are synchronized with stratum 1 time service providers, and then offer NTP to the general public, similar to how other public NTP providers function. Accurate time services are critical to maintaining accurate audit logging and being able to respond to incidents. By changing your time source to time.cloudflare.com we can help you meet requirement 10.4.3.

Finally, Cloudflare has given our customers the opportunity to configure higher levels of TLS. Currently, you can configure connections between the client and your origin server to use up to TLS 1.3, which exceeds the requirement to use the latest versions of TLS 1.1 or higher  referenced in requirement 4.1!

We use our own products to secure our cardholder data environment and hope that our customers will find these product additions as beneficial and easy to implement as we have.

Learn more about Compliance at Cloudflare

Cloudflare is committed to helping our customers earn their user’s trust by ensuring our products are secure. The Security team is committed to adhering to security compliance certifications and regulations that maintain the security, confidentiality, and availability of company and client information.

In order to help our customers keep track of the latest certifications, Cloudflare continually updates our Compliance certification page – www.cloudflare.com/compliance. Today, you can view our status on all compliance certifications and download our SOC 3 report.

Six years of the GitHub Security Bug Bounty program

Post Syndicated from Brian Anglin original https://github.blog/2020-03-25-six-years-of-the-github-security-bug-bounty-program/

Last month GitHub reached some big milestones for our Security Bug Bounty program. As of February 2020, it’s been six years since we started accepting submissions. Over the years we’ve been able to invest in the bug bounty community through live events, private bug bounties, feature previews, and of course through cash bounties.

We’re excited to announce that we recently passed $1,000,000 in total payments to researchers since we moved our program to HackerOne in 2016. We paid out over half of our total awards in the last year alone, reaching almost $590,000 in total bounty rewards across our programs. We’ve also been able to provide quick response times to an increasingly large amount of submissions—maintaining an average response time of 17 hours. This is all while seeing a 40 percent increase in submissions since last year. We’re sharing some highlights from the past year along with our upcoming plans for the future.

2019 highlights

Cool bugs

One of my favorite parts of working on the bug bounty program is getting to see the amazing submissions we get from the community. Many of the best submissions show an understanding of GitHub and our technology that rivals that of our own engineering teams. We’ve offered very competitive bounties so we can attract those talented individuals and provide them an incentive to spend time digging deep into our codebase. The community in 2019 did not disappoint.

OAuth flow bypass using cross-site HEAD requests

@not-an-ardvark has a lot of great submissions to our program but this was particularly impactful. He wrote a great post about it in detail that I’ll quickly recap.

GitHub provides a few ways for integrators to interact with our ecosystem. One of the ways integrators can use GitHub is via OAuth applications which allow the application to take actions on behalf of a GitHub user. Before allowing access to a user’s data, an OAuth application must redirect the user to GitHub.com, allowing them to review the requested permissions and explicitly authorize the application. @not-an-ardvark found a way to bypass our controls to authorize OAuth applications without any user interaction. Let’s get into how this happened.

When we process state changing requests on GitHub.com, such as authorizing an OAuth application, we rely on Ruby on Rails’ Cross Site Request Forgery (CSRF) protection. We inject a special token into the DOM of every form element that we validate when receiving POST requests. The OAuth application authorization flow uses POST requests which require a valid CSRF token. However, the OAuth controller incorrectly allowed both POST and HEAD requests to trigger the authorization logic. We skip CSRF validation when processing HEAD requests since they’re not typically state changing. This allowed a malicious site to automatically authorize an OAuth application without any user interaction.

Due to the severity of the vulnerability, we needed to patch it as quickly as possible. We worked closely with the engineering team and shipped a fix to GitHub users within three hours of receiving the submission. We also conducted a full investigation with SIRT engineers and confirmed that this vulnerability wasn’t exploited in the wild. Additionally, we rolled out patches for GitHub Enterprise Server for all supported versions. We rewarded @not-an-aardvark with $25,000 for the severity of the vulnerability and their detailed writeup in their submission.

This bug demonstrates the important role that researchers play in our overall security. By identifying this issue via our bug bounty program, we were able to protect our users by patching the issue and validating that it wasn’t previously exploited.

GitHub.com remote code execution through command injection

@ajxchapman achieved remote code execution in GitHub.com by triggering command injection in our Mercurial import feature. The import logic didn’t correctly sanitize branch names which allowed a maliciously crafted branch name to execute code on our servers. Since the import feature is quite complicated, we’ve traditionally run the import code in a sandbox on dedicated servers isolated from our production network. This isolation limited the impact of the vulnerability, and we were able to quickly release a fix for GitHub.com and backported the fix for GitHub Enterprise Server customers. We also audited the import logic for similar issues and confirmed from our logging systems that this wasn’t exploited in the wild.

What makes this bug particularly interesting is the root cause: it was ultimately caused by an outdated dependency. The bug existed in a dependency that handles code imports and was previously fixed upstream. However, we failed to keep up with the latest version and were ultimately vulnerable to this issue. This issue highlights how critical dependency management is to the overall success of a security program. GitHub continues to invest in dependency management tooling to keep us and our customers secure. Find more of Alex’s work on his personal blog.

Expanded scope

GitHub released many new features in 2019 that were added to our Security Bug Bounty scope:

  • Pull reminders added functionality to help keep engineers informed of new pull requests that need attention. We included the solution into our core application and existing Slack integration.
  • Automated security updates (formerly Dependabot) added a better way to track vulnerabilities in dependencies since it automatically opens new pull requests updating the version of a dependency when it finds a new security fix.
  • GitHub for mobile is GitHub’s first presence in the App Store. This brought new requirements of our API and new security concerns in our application. We’re delivering the same security and functionality that’s available on GitHub.com.
  • GitHub Actions is one of GitHub’s biggest releases since pull requests and whole classes of new security corner cases. Through close collaboration with our engineering partners, we’ve provided users the ability to run their code right on GitHub.com.
  • Semmle’s LGTM tool was a significant addition to our suite of security tools, like Dependabot and the Maintainer Security Advisories. LGTM allows our users to scan for potential security issues in their code on every pull request.

We’ve had several valuable submissions that influenced the development of these products significantly. We paid out over $20,000 in bounties for vulnerabilities affecting the products in this expanded scope, and we’re excited to continue expanding our Bug Bounty scope as GitHub grows.


In August 2019, we returned to Las Vegas to participate in our second H1-702 event. This event invited the top hackers from HackerOne’s platform to join us along with two other companies for three nights of live hacking. We were excited to participate and wanted to give researchers every incentive to dig deep into our application. We even added a bunch of bonuses on top of our base payouts, including bonuses for Best Proof of Concept, Longest Exploit Chain, and RCE. We also set up a CTF on GitHub.com to direct researchers to some of our newest attack surfaces. Lastly, we hid flags in a Maintainer Security Advisory and GitHub Package Registry with bonuses for every flag. We received positive feedback from some of our researchers about our CTF and will continue to include a CTF component in future events.

Overall, we paid out over $155,000 to researchers in one night, with half those rewards for high or critical severity issues. We can’t express how important live-hacking events, like H1-702, are to our bug bounty program. We look forward to more live-hacking events in the future and other new and innovative ways to engage the community.

Private bug bounty

Beyond the wide scope of our public program, we conducted an invite-only program where we preview features to researchers before they’re launched to everyone. These private programs allow us to work closely with a small group, and give us the opportunity to find bugs before they can affect the majority of our users. We’ve paid out just over $37,000 via our private program this year, and many of these findings were fixed before new features reached a significant number of our customers.

Actions CI/CD

Following the success of our first private bug bounty targeting GitHub Actions, we wanted to re-run the private program to target the most recent iteration of our GitHub Actions product. We used what we learned in our first bug bounty to secure the product against similar issues. The community accepted the challenge and found novel bugs in our second iteration.

Automated security updates (formerly Dependabot)

Just like any combination of two complex systems, the acquisition of Dependabot presented a unique challenge for our security team in integrating these two separate architectures. We used the private bug bounty to supplement our own security review of these new services. The findings from the private bug bounty program greatly informed how we integrated Dependabot with GitHub.com. We were also able to surface a few issues before rolling it out.

Pull reminders

Like Dependabot, pull reminders required the same care and attention to ensure a secure transition from an integration to a first-party GitHub product. Pull reminders also added more complexity through its connection to Slack. Our own Slack integration provided a foundation for this feature, but there was significant re-architecture and development to tie these two features together. Again, we turned to our bug bounty community to test our pull reminder integration before releasing the feature widely.

2020 initiatives

We have a lot of plans for 2020 and want to highlight some of our upcoming changes.

Security Lab bounty program

We launched the GitHub Security Lab bounty program to incentivize researchers to help us secure all open source software. The new program rewards community members who write CodeQL queries that detect entire vulnerability classes so that the rest of the community can run those queries against their own projects. This results in removing vulnerabilities at scale.

Making a contribution to this program not only influences the global state of software security,  but also prevents similar vulnerabilities from being released in the future. This is an exciting twist on our traditional bug bounty program, and we’re excited to see researchers using our new CodeQL tooling. To date, we received 20 submissions and awarded almost $21,000, with hundreds of vulnerabilities fixed across the OSS ecosystem as a direct result.

CVEs and disclosure

This year, we’re assigning CVEs to bounty submissions which affect GitHub Enterprise Server. This is a big step forward in consistently communicating the state of our software to our customers, but also provides another accolade for our researchers who identify vulnerabilities in GitHub Enterprise Server.

Get involved

Are you excited by the new additions to our program? Get involved! Visit the GitHub Security Bug Bounty page for details of our scope, rules, and rewards. We can’t wait to make GitHub better for everyone with the help of your submissions.

Learn more about the GitHub Security Bug Bounty

The post Six years of the GitHub Security Bug Bounty program appeared first on The GitHub Blog.

Speeding up Linux disk encryption

Post Syndicated from Ignat Korchagin original https://blog.cloudflare.com/speeding-up-linux-disk-encryption/

Speeding up Linux disk encryption

Data encryption at rest is a must-have for any modern Internet company. Many companies, however, don’t encrypt their disks, because they fear the potential performance penalty caused by encryption overhead.

Encrypting data at rest is vital for Cloudflare with more than 200 data centres across the world. In this post, we will investigate the performance of disk encryption on Linux and explain how we made it at least two times faster for ourselves and our customers!

Encrypting data at rest

When it comes to encrypting data at rest there are several ways it can be implemented on a modern operating system (OS). Available techniques are tightly coupled with a typical OS storage stack. A simplified version of the storage stack and encryption solutions can be found on the diagram below:

Speeding up Linux disk encryption

On the top of the stack are applications, which read and write data in files (or streams). The file system in the OS kernel keeps track of which blocks of the underlying block device belong to which files and translates these file reads and writes into block reads and writes, however the hardware specifics of the underlying storage device is abstracted away from the filesystem. Finally, the block subsystem actually passes the block reads and writes to the underlying hardware using appropriate device drivers.

The concept of the storage stack is actually similar to the well-known network OSI model, where each layer has a more high-level view of the information and the implementation details of the lower layers are abstracted away from the upper layers. And, similar to the OSI model, one can apply encryption at different layers (think about TLS vs IPsec or a VPN).

For data at rest we can apply encryption either at the block layers (either in hardware or in software) or at the file level (either directly in applications or in the filesystem).

Block vs file encryption

Generally, the higher in the stack we apply encryption, the more flexibility we have. With application level encryption the application maintainers can apply any encryption code they please to any particular data they need. The downside of this approach is they actually have to implement it themselves and encryption in general is not very developer-friendly: one has to know the ins and outs of a specific cryptographic algorithm, properly generate keys, nonces, IVs etc. Additionally, application level encryption does not leverage OS-level caching and Linux page cache in particular: each time the application needs to use the data, it has to either decrypt it again, wasting CPU cycles, or implement its own decrypted “cache”, which introduces more complexity to the code.

File system level encryption makes data encryption transparent to applications, because the file system itself encrypts the data before passing it to the block subsystem, so files are encrypted regardless if the application has crypto support or not. Also, file systems can be configured to encrypt only a particular directory or have different keys for different files. This flexibility, however, comes at a cost of a more complex configuration. File system encryption is also considered less secure than block device encryption as only the contents of the files are encrypted. Files also have associated metadata, like file size, the number of files, the directory tree layout etc., which are still visible to a potential adversary.

Encryption down at the block layer (often referred to as disk encryption or full disk encryption) also makes data encryption transparent to applications and even whole file systems. Unlike file system level encryption it encrypts all data on the disk including file metadata and even free space. It is less flexible though – one can only encrypt the whole disk with a single key, so there is no per-directory, per-file or per-user configuration. From the crypto perspective, not all cryptographic algorithms can be used as the block layer doesn’t have a high-level overview of the data anymore, so it needs to process each block independently. Most common algorithms require some sort of block chaining to be secure, so are not applicable to disk encryption. Instead, special modes were developed just for this specific use-case.

So which layer to choose? As always, it depends… Application and file system level encryption are usually the preferred choice for client systems because of the flexibility. For example, each user on a multi-user desktop may want to encrypt their home directory with a key they own and leave some shared directories unencrypted. On the contrary, on server systems, managed by SaaS/PaaS/IaaS companies (including Cloudflare) the preferred choice is configuration simplicity and security – with full disk encryption enabled any data from any application is automatically encrypted with no exceptions or overrides. We believe that all data needs to be protected without sorting it into "important" vs "not important" buckets, so the selective flexibility the upper layers provide is not needed.

Hardware vs software disk encryption

When encrypting data at the block layer it is possible to do it directly in the storage hardware, if the hardware supports it. Doing so usually gives better read/write performance and consumes less resources from the host. However, since most hardware firmware is proprietary, it does not receive as much attention and review from the security community. In the past this led to flaws in some implementations of hardware disk encryption, which render the whole security model useless. Microsoft, for example, started to prefer software-based disk encryption since then.

We didn’t want to put our data and our customers’ data to the risk of using potentially insecure solutions and we strongly believe in open-source. That’s why we rely only on software disk encryption in the Linux kernel, which is open and has been audited by many security professionals across the world.

Linux disk encryption performance

We aim not only to save bandwidth costs for our customers, but to deliver content to Internet users as fast as possible.

At one point we noticed that our disks were not as fast as we would like them to be. Some profiling as well as a quick A/B test pointed to Linux disk encryption. Because not encrypting the data (even if it is supposed-to-be a public Internet cache) is not a sustainable option, we decided to take a closer look into Linux disk encryption performance.

Device mapper and dm-crypt

Linux implements transparent disk encryption via a dm-crypt module and dm-crypt itself is part of device mapper kernel framework. In a nutshell, the device mapper allows pre/post-process IO requests as they travel between the file system and the underlying block device.

dm-crypt in particular encrypts "write" IO requests before sending them further down the stack to the actual block device and decrypts "read" IO requests before sending them up to the file system driver. Simple and easy! Or is it?

Benchmarking setup

For the record, the numbers in this post were obtained by running specified commands on an idle Cloudflare G9 server out of production. However, the setup should be easily reproducible on any modern x86 laptop.

Generally, benchmarking anything around a storage stack is hard because of the noise introduced by the storage hardware itself. Not all disks are created equal, so for the purpose of this post we will use the fastest disks available out there – that is no disks.

Instead Linux has an option to emulate a disk directly in RAM. Since RAM is much faster than any persistent storage, it should introduce little bias in our results.

The following command creates a 4GB ramdisk:

$ sudo modprobe brd rd_nr=1 rd_size=4194304
$ ls /dev/ram0

Now we can set up a dm-crypt instance on top of it thus enabling encryption for the disk. First, we need to generate the disk encryption key, "format" the disk and specify a password to unlock the newly generated key.

$ fallocate -l 2M crypthdr.img
$ sudo cryptsetup luksFormat /dev/ram0 --header crypthdr.img

This will overwrite data on crypthdr.img irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase:
Verify passphrase:

Those who are familiar with LUKS/dm-crypt might have noticed we used a LUKS detached header here. Normally, LUKS stores the password-encrypted disk encryption key on the same disk as the data, but since we want to compare read/write performance between encrypted and unencrypted devices, we might accidentally overwrite the encrypted key during our benchmarking later. Keeping the encrypted key in a separate file avoids this problem for the purposes of this post.

Now, we can actually "unlock" the encrypted device for our testing:

$ sudo cryptsetup open --header crypthdr.img /dev/ram0 encrypted-ram0
Enter passphrase for /dev/ram0:
$ ls /dev/mapper/encrypted-ram0

At this point we can now compare the performance of encrypted vs unencrypted ramdisk: if we read/write data to /dev/ram0, it will be stored in plaintext. Likewise, if we read/write data to /dev/mapper/encrypted-ram0, it will be decrypted/encrypted on the way by dm-crypt and stored in ciphertext.

It’s worth noting that we’re not creating any file system on top of our block devices to avoid biasing results with a file system overhead.

Measuring throughput

When it comes to storage testing/benchmarking Flexible I/O tester is the usual go-to solution. Let’s simulate simple sequential read/write load with 4K block size on the ramdisk without encryption:

$ sudo fio --filename=/dev/ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=plain
plain: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
Starting 1 process
Run status group 0 (all jobs):
   READ: io=21013MB, aggrb=1126.5MB/s, minb=1126.5MB/s, maxb=1126.5MB/s, mint=18655msec, maxt=18655msec
  WRITE: io=21023MB, aggrb=1126.1MB/s, minb=1126.1MB/s, maxb=1126.1MB/s, mint=18655msec, maxt=18655msec

Disk stats (read/write):
  ram0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

The above command will run for a long time, so we just stop it after a while. As we can see from the stats, we’re able to read and write roughly with the same throughput around 1126 MB/s. Let’s repeat the test with the encrypted ramdisk:

$ sudo fio --filename=/dev/mapper/encrypted-ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=crypt
crypt: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
Starting 1 process
Run status group 0 (all jobs):
   READ: io=1693.7MB, aggrb=150874KB/s, minb=150874KB/s, maxb=150874KB/s, mint=11491msec, maxt=11491msec
  WRITE: io=1696.4MB, aggrb=151170KB/s, minb=151170KB/s, maxb=151170KB/s, mint=11491msec, maxt=11491msec

Whoa, that’s a drop! We only get ~147 MB/s now, which is more than 7 times slower! And this is on a totally idle machine!

Maybe, crypto is just slow

The first thing we considered is to ensure we use the fastest crypto. cryptsetup allows us to benchmark all the available crypto implementations on the system to select the best one:

$ sudo cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1      1340890 iterations per second for 256-bit key
PBKDF2-sha256    1539759 iterations per second for 256-bit key
PBKDF2-sha512    1205259 iterations per second for 256-bit key
PBKDF2-ripemd160  967321 iterations per second for 256-bit key
PBKDF2-whirlpool  720175 iterations per second for 256-bit key
#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b   969.7 MiB/s  3110.0 MiB/s
 serpent-cbc   128b           N/A           N/A
 twofish-cbc   128b           N/A           N/A
     aes-cbc   256b   756.1 MiB/s  2474.7 MiB/s
 serpent-cbc   256b           N/A           N/A
 twofish-cbc   256b           N/A           N/A
     aes-xts   256b  1823.1 MiB/s  1900.3 MiB/s
 serpent-xts   256b           N/A           N/A
 twofish-xts   256b           N/A           N/A
     aes-xts   512b  1724.4 MiB/s  1765.8 MiB/s
 serpent-xts   512b           N/A           N/A
 twofish-xts   512b           N/A           N/A

It seems aes-xts with a 256-bit data encryption key is the fastest here. But which one are we actually using for our encrypted ramdisk?

$ sudo dmsetup table /dev/mapper/encrypted-ram0
0 8388608 crypt aes-xts-plain64 0000000000000000000000000000000000000000000000000000000000000000 0 1:0 0

We do use aes-xts with a 256-bit data encryption key (count all the zeroes conveniently masked by dmsetup tool – if you want to see the actual bytes, add the --showkeys option to the above command). The numbers do not add up however: cryptsetup benchmark tells us above not to rely on the results, as "Tests are approximate using memory only (no storage IO)", but that is exactly how we’ve set up our experiment using the ramdisk. In a somewhat worse case (assuming we’re reading all the data and then encrypting/decrypting it sequentially with no parallelism) doing back-of-the-envelope calculation we should be getting around (1126 * 1823) / (1126 + 1823) =~696 MB/s, which is still quite far from the actual 147 * 2 = 294 MB/s (total for reads and writes).

dm-crypt performance flags

While reading the cryptsetup man page we noticed that it has two options prefixed with --perf-, which are probably related to performance tuning. The first one is --perf-same_cpu_crypt with a rather cryptic description:

Perform encryption using the same cpu that IO was submitted on.  The default is to use an unbound workqueue so that encryption work is automatically balanced between available CPUs.  This option is only relevant for open action.

So we enable the option

$ sudo cryptsetup close encrypted-ram0
$ sudo cryptsetup open --header crypthdr.img --perf-same_cpu_crypt /dev/ram0 encrypted-ram0

Note: according to the latest man page there is also a cryptsetup refresh command, which can be used to enable these options live without having to "close" and "re-open" the encrypted device. Our cryptsetup however didn’t support it yet.

Verifying if the option has been really enabled:

$ sudo dmsetup table encrypted-ram0
0 8388608 crypt aes-xts-plain64 0000000000000000000000000000000000000000000000000000000000000000 0 1:0 0 1 same_cpu_crypt

Yes, we can now see same_cpu_crypt in the output, which is what we wanted. Let’s rerun the benchmark:

$ sudo fio --filename=/dev/mapper/encrypted-ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=crypt
crypt: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
Starting 1 process
Run status group 0 (all jobs):
   READ: io=1596.6MB, aggrb=139811KB/s, minb=139811KB/s, maxb=139811KB/s, mint=11693msec, maxt=11693msec
  WRITE: io=1600.9MB, aggrb=140192KB/s, minb=140192KB/s, maxb=140192KB/s, mint=11693msec, maxt=11693msec

Hmm, now it is ~136 MB/s which is slightly worse than before, so no good. What about the second option --perf-submit_from_crypt_cpus:

Disable offloading writes to a separate thread after encryption.  There are some situations where offloading write bios from the encryption threads to a single thread degrades performance significantly.  The default is to offload write bios to the same thread.  This option is only relevant for open action.

Maybe, we are in the "some situation" here, so let’s try it out:

$ sudo cryptsetup close encrypted-ram0
$ sudo cryptsetup open --header crypthdr.img --perf-submit_from_crypt_cpus /dev/ram0 encrypted-ram0
Enter passphrase for /dev/ram0:
$ sudo dmsetup table encrypted-ram0
0 8388608 crypt aes-xts-plain64 0000000000000000000000000000000000000000000000000000000000000000 0 1:0 0 1 submit_from_crypt_cpus

And now the benchmark:

$ sudo fio --filename=/dev/mapper/encrypted-ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=crypt
crypt: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
Starting 1 process
Run status group 0 (all jobs):
   READ: io=2066.6MB, aggrb=169835KB/s, minb=169835KB/s, maxb=169835KB/s, mint=12457msec, maxt=12457msec
  WRITE: io=2067.7MB, aggrb=169965KB/s, minb=169965KB/s, maxb=169965KB/s, mint=12457msec, maxt=12457msec

~166 MB/s, which is a bit better, but still not good…

Asking the community

Being desperate we decided to seek support from the Internet and posted our findings to the dm-crypt mailing list, but the response we got was not very encouraging:

If the numbers disturb you, then this is from lack of understanding on your side. You are probably unaware that encryption is a heavy-weight operation…

We decided to make a scientific research on this topic by typing "is encryption expensive" into Google Search and one of the top results, which actually contains meaningful measurements, is… our own post about cost of encryption, but in the context of TLS! This is a fascinating read on its own, but the gist is: modern crypto on modern hardware is very cheap even at Cloudflare scale (doing millions of encrypted HTTP requests per second). In fact, it is so cheap that Cloudflare was the first provider to offer free SSL/TLS for everyone.

Digging into the source code

When trying to use the custom dm-crypt options described above we were curious why they exist in the first place and what is that "offloading" all about. Originally we expected dm-crypt to be a simple "proxy", which just encrypts/decrypts data as it flows through the stack. Turns out dm-crypt does more than just encrypting memory buffers and a (simplified) IO traverse path diagram is presented below:

Speeding up Linux disk encryption

When the file system issues a write request, dm-crypt does not process it immediately – instead it puts it into a workqueue named "kcryptd". In a nutshell, a kernel workqueue just schedules some work (encryption in this case) to be performed at some later time, when it is more convenient. When "the time" comes, dm-crypt sends the request to Linux Crypto API for actual encryption. However, modern Linux Crypto API is asynchronous as well, so depending on which particular implementation your system will use, most likely it will not be processed immediately, but queued again for "later time". When Linux Crypto API will finally do the encryption, dm-crypt may try to sort pending write requests by putting each request into a red-black tree. Then a separate kernel thread again at "some time later" actually takes all IO requests in the tree and sends them down the stack.

Now for read requests: this time we need to get the encrypted data first from the hardware, but dm-crypt does not just ask for the driver for the data, but queues the request into a different workqueue named "kcryptd_io". At some point later, when we actually have the encrypted data, we schedule it for decryption using the now familiar "kcryptd" workqueue. "kcryptd" will send the request to Linux Crypto API, which may decrypt the data asynchronously as well.

To be fair the request does not always traverse all these queues, but the important part here is that write requests may be queued up to 4 times in dm-crypt and read requests up to 3 times. At this point we were wondering if all this extra queueing can cause any performance issues. For example, there is a nice presentation from Google about the relationship between queueing and tail latency. One key takeaway from the presentation is:

A significant amount of tail latency is due to queueing effects

So, why are all these queues there and can we remove them?

Git archeology

No-one writes more complex code just for fun, especially for the OS kernel. So all these queues must have been put there for a reason. Luckily, the Linux kernel source is managed by git, so we can try to retrace the changes and the decisions around them.

The "kcryptd" workqueue was in the source since the beginning of the available history with the following comment:

Needed because it would be very unwise to do decryption in an interrupt context, so bios returning from read requests get queued here.

So it was for reads only, but even then – why do we care if it is interrupt context or not, if Linux Crypto API will likely use a dedicated thread/queue for encryption anyway? Well, back in 2005 Crypto API was not asynchronous, so this made perfect sense.

In 2006 dm-crypt started to use the "kcryptd" workqueue not only for encryption, but for submitting IO requests:

This patch is designed to help dm-crypt comply with the new constraints imposed by the following patch in -mm: md-dm-reduce-stack-usage-with-stacked-block-devices.patch

It seems the goal here was not to add more concurrency, but rather reduce kernel stack usage, which makes sense again as the kernel has a common stack across all the code, so it is a quite limited resource. It is worth noting, however, that the Linux kernel stack has been expanded in 2014 for x86 platforms, so this might not be a problem anymore.

A first version of "kcryptd_io" workqueue was added in 2007 with the intent to avoid:

starvation caused by many requests waiting for memory allocation…

The request processing was bottlenecking on a single workqueue here, so the solution was to add another one. Makes sense.

We are definitely not the first ones experiencing performance degradation because of extensive queueing: in 2011 a change was introduced to conditionally revert some of the queueing for read requests:

If there is enough memory, code can directly submit bio instead queuing this operation in a separate thread.

Unfortunately, at that time Linux kernel commit messages were not as verbose as today, so there is no performance data available.

In 2015 dm-crypt started to sort writes in a separate "dmcrypt_write" thread before sending them down the stack:

On a multiprocessor machine, encryption requests finish in a different order than they were submitted. Consequently, write requests would be submitted in a different order and it could cause severe performance degradation.

It does make sense as sequential disk access used to be much faster than the random one and dm-crypt was breaking the pattern. But this mostly applies to spinning disks, which were still dominant in 2015. It may not be as important with modern fast SSDs (including NVME SSDs).

Another part of the commit message is worth mentioning:

…in particular it enables IO schedulers like CFQ to sort more effectively…

It mentions the performance benefits for the CFQ IO scheduler, but Linux schedulers have improved since then to the point that CFQ scheduler has been removed from the kernel in 2018.

The same patchset replaces the sorting list with a red-black tree:

In theory the sorting should be performed by the underlying disk scheduler, however, in practice the disk scheduler only accepts and sorts a finite number of requests. To allow the sorting of all requests, dm-crypt needs to implement its own sorting.

The overhead associated with rbtree-based sorting is considered negligible so it is not used conditionally.

All that make sense, but it would be nice to have some backing data.

Interestingly, in the same patchset we see the introduction of our familiar "submit_from_crypt_cpus" option:

There are some situations where offloading write bios from the encryption threads to a single thread degrades performance significantly

Overall, we can see that every change was reasonable and needed, however things have changed since then:

  • hardware became faster and smarter
  • Linux resource allocation was revisited
  • coupled Linux subsystems were rearchitected

And many of the design choices above may not be applicable to modern Linux.

The "clean-up"

Based on the research above we decided to try to remove all the extra queueing and asynchronous behaviour and revert dm-crypt to its original purpose: simply encrypt/decrypt IO requests as they pass through. But for the sake of stability and further benchmarking we ended up not removing the actual code, but rather adding yet another dm-crypt option, which bypasses all the queues/threads, if enabled. The flag allows us to switch between the current and new behaviour at runtime under full production load, so we can easily revert our changes should we see any side-effects. The resulting patch can be found on the Cloudflare GitHub Linux repository.

Synchronous Linux Crypto API

From the diagram above we remember that not all queueing is implemented in dm-crypt. Modern Linux Crypto API may also be asynchronous and for the sake of this experiment we want to eliminate queues there as well. What does "may be" mean, though? The OS may contain different implementations of the same algorithm (for example, hardware-accelerated AES-NI on x86 platforms and generic C-code AES implementations). By default the system chooses the "best" one based on the configured algorithm priority. dm-crypt allows overriding this behaviour and request a particular cipher implementation using the capi: prefix. However, there is one problem. Let us actually check the available AES-XTS (this is our disk encryption cipher, remember?) implementations on our system:

$ grep -A 11 'xts(aes)' /proc/crypto
name         : xts(aes)
driver       : xts(ecb(aes-generic))
module       : kernel
priority     : 100
refcnt       : 7
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 32
max keysize  : 64
name         : __xts(aes)
driver       : cryptd(__xts-aes-aesni)
module       : cryptd
priority     : 451
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 32
max keysize  : 64
name         : xts(aes)
driver       : xts-aes-aesni
module       : aesni_intel
priority     : 401
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 32
max keysize  : 64
name         : __xts(aes)
driver       : __xts-aes-aesni
module       : aesni_intel
priority     : 401
refcnt       : 7
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 32
max keysize  : 64

We want to explicitly select a synchronous cipher from the above list to avoid queueing effects in threads, but the only two supported are xts(ecb(aes-generic)) (the generic C implementation) and __xts-aes-aesni (the x86 hardware-accelerated implementation). We definitely want the latter as it is much faster (we’re aiming for performance here), but it is suspiciously marked as internal (see internal: yes). If we check the source code:

Mark a cipher as a service implementation only usable by another cipher and never by a normal user of the kernel crypto API

So this cipher is meant to be used only by other wrapper code in the Crypto API and not outside it. In practice this means, that the caller of the Crypto API needs to explicitly specify this flag, when requesting a particular cipher implementation, but dm-crypt does not do it, because by design it is not part of the Linux Crypto API, rather an "external" user. We already patch the dm-crypt module, so we could as well just add the relevant flag. However, there is another problem with AES-NI in particular: x86 FPU. "Floating point" you say? Why do we need floating point math to do symmetric encryption which should only be about bit shifts and XOR operations? We don’t need the math, but AES-NI instructions use some of the CPU registers, which are dedicated to the FPU. Unfortunately the Linux kernel does not always preserve these registers in interrupt context for performance reasons (saving/restoring FPU is expensive). But dm-crypt may execute code in interrupt context, so we risk corrupting some other process data and we go back to "it would be very unwise to do decryption in an interrupt context" statement in the original code.

Our solution to address the above was to create another somewhat "smart" Crypto API module. This module is synchronous and does not roll its own crypto, but is just a "router" of encryption requests:

  • if we can use the FPU (and thus AES-NI) in the current execution context, we just forward the encryption request to the faster, "internal" __xts-aes-aesni implementation (and we can use it here, because now we are part of the Crypto API)
  • otherwise, we just forward the encryption request to the slower, generic C-based xts(ecb(aes-generic)) implementation

Using the whole lot

Let’s walk through the process of using it all together. The first step is to grab the patches and recompile the kernel (or just compile dm-crypt and our xtsproxy modules).

Next, let’s restart our IO workload in a separate terminal, so we can make sure we can reconfigure the kernel at runtime under load:

$ sudo fio --filename=/dev/mapper/encrypted-ram0 --readwrite=readwrite --bs=4k --direct=1 --loops=1000000 --name=crypt
crypt: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
Starting 1 process

In the main terminal make sure our new Crypto API module is loaded and available:

$ sudo modprobe xtsproxy
$ grep -A 11 'xtsproxy' /proc/crypto
driver       : xts-aes-xtsproxy
module       : xtsproxy
priority     : 0
refcnt       : 0
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 32
max keysize  : 64
ivsize       : 16
chunksize    : 16

Reconfigure the encrypted disk to use our newly loaded module and enable our patched dm-crypt flag (we have to use low-level dmsetup tool and cryptsetup obviously is not aware of our modifications):

$ sudo dmsetup table encrypted-ram0 --showkeys | sed 's/aes-xts-plain64/capi:xts-aes-xtsproxy-plain64/' | sed 's/$/ 1 force_inline/' | sudo dmsetup reload encrypted-ram0

We just "loaded" the new configuration, but for it to take effect, we need to suspend/resume the encrypted device:

$ sudo dmsetup suspend encrypted-ram0 && sudo dmsetup resume encrypted-ram0

And now observe the result. We may go back to the other terminal running the fio job and look at the output, but to make things nicer, here’s a snapshot of the observed read/write throughput in Grafana:

Speeding up Linux disk encryption
Speeding up Linux disk encryption

Wow, we have more than doubled the throughput! With the total throughput of ~640 MB/s we’re now much closer to the expected ~696 MB/s from above. What about the IO latency? (The await statistic from the iostat reporting tool):

Speeding up Linux disk encryption

The latency has been cut in half as well!

To production

So far we have been using a synthetic setup with some parts of the full production stack missing, like file systems, real hardware and most importantly, production workload. To ensure we’re not optimising imaginary things, here is a snapshot of the production impact these changes bring to the caching part of our stack:

Speeding up Linux disk encryption

This graph represents a three-way comparison of the worst-case response times (99th percentile) for a cache hit in one of our servers. The green line is from a server with unencrypted disks, which we will use as baseline. The red line is from a server with encrypted disks with the default Linux disk encryption implementation and the blue line is from a server with encrypted disks and our optimisations enabled. As we can see the default Linux disk encryption implementation has a significant impact on our cache latency in worst case scenarios, whereas the patched implementation is indistinguishable from not using encryption at all. In other words the improved encryption implementation does not have any impact at all on our cache response speed, so we basically get it for free! That’s a win!

We’re just getting started

This post shows how an architecture review can double the performance of a system. Also we reconfirmed that modern cryptography is not expensive and there is usually no excuse not to protect your data.

We are going to submit this work for inclusion in the main kernel source tree, but most likely not in its current form. Although the results look encouraging we have to remember that Linux is a highly portable operating system: it runs on powerful servers as well as small resource constrained IoT devices and on many other CPU architectures as well. The current version of the patches just optimises disk encryption for a particular workload on a particular architecture, but Linux needs a solution which runs smoothly everywhere.

That said, if you think your case is similar and you want to take advantage of the performance improvements now, you may grab the patches and hopefully provide feedback. The runtime flag makes it easy to toggle the functionality on the fly and a simple A/B test may be performed to see if it benefits any particular case or setup. These patches have been running across our wide network of more than 200 data centres on five generations of hardware, so can be reasonably considered stable. Enjoy both performance and security from Cloudflare for all!

Speeding up Linux disk encryption

Deploying security.txt: how Cloudflare’s security team builds on Workers

Post Syndicated from David Haynes original https://blog.cloudflare.com/security-dot-txt/

Deploying security.txt: how Cloudflare’s security team builds on Workers

Deploying security.txt: how Cloudflare’s security team builds on Workers

When the security team at Cloudflare takes on new projects, we approach them with the goal of achieving the “builder first mindset” whereby we design, develop, and deploy solutions just as any standard engineering team would. Additionally, we aim to dogfood our products wherever possible. Cloudflare as a security platform offers a lot of functionality that is vitally important to us, including, but not limited to, our WAF, Workers platform, and Cloudflare Access. We get a lot of value out of using Cloudflare to secure Cloudflare. Not only does this allow us to test the security of our products; it provides us an avenue of direct feedback to help improve the roadmaps for engineering projects.

One specific product that we get a lot of use out of is our serverless platform, Cloudflare Workers. With it, we can have incredible flexibility in the types of applications that we are able to build and deploy to our edge. An added bonus here is that our team does not have to manage a single server that our code runs on.

Today, we’re launching support for the security.txt initiative through Workers to help give security researchers a common location to learn about how to communicate with our team. In this post, I’m going to focus on some of the benefits of managing security projects through Workers, detail how we built out the mechanism that we’re using to manage security.txt on cloudflare.com, and highlight some other use cases that our team has built on Workers in the past few months.

Building on Workers

The Workers platform is designed to let anyone deploy code to our edge network which currently spans across 200 cities in 90 countries. As a result of this, applications deployed on the platform perform exceptionally well, while maintaining high reliability. You can implement nearly any logic imaginable on top of zones in your Cloudflare account with almost no effort.

One of the biggest immediate benefits of this is the idea that there are no servers for our team to maintain. Far too often I have seen situations where a long running application is deployed onto an instanced machine within a cloud environment and immediately forgotten about. In some cases, outdated software or poorly provisioned permissions can open up dangerous vectors for malicious actors if a compromise is occurring. Our team can deploy new projects with the confidence that they will remain secure. The Workers runtime environment has many wins such as the fact that it receives automatic security updates and that we don’t need to maintain the software stack, just the application logic.

Another benefit is that since Workers executes JavaScript, we can dream up complex applications and rapidly deploy them as full applications in production without a huge investment in engineering time. This encourages rapid experimentation and iteration while achieving high impact with consistent performance. A couple of examples:

  • Secure code review – On every pull request made to a code repository, a Worker runs a set of security curated rules and posts comments if there are matches.
  • CSP nonces and HTML rewriting – A Worker generates random nonces and uses the HTMLRewriter API to mutate responses with dynamic content security policies on our dashboard.
  • Authentication on legacy applications –  Using the Web Crypto API, a Worker sits in front of legacy origins and issues, validates, and signs JWTs to control access.

Stay tuned for future blog posts where we plan to dive deeper on these initiatives and more!


Our team regularly engages with external security researchers through our HackerOne program and since about 2014, the details of our program can be found on a dedicated page on our marketing site. This has worked quite well in starting up our program and allows anyone to contact our team if they have a vulnerability to report. However, every so often we’d see that there were issues with this approach. Specifically, there were cases of vulnerabilities being submitted to our support staff who would then tell the reporter about the disclosure page which then directs them to HackerOne resulting in an overall error prone experience for everyone involved. Other times, the team that manages our social media would direct researchers to us.

Security researchers, having done the hard work to find a vulnerability, face challenges to responsibly disclose it. They need to be able to quickly locate contact information, consume it and disclose without delay. The security.txt initiative addresses this problem by defining a common location and format for organizations to provide this information to researchers. A specification for this was submitted to the IETF with the description:

“When security vulnerabilities are discovered by researchers, proper reporting channels are often lacking.  As a result, vulnerabilities may be left unreported.  This document defines a format (“security.txt”) to help organizations describe their vulnerability disclosure practices to make it easier for researchers to report vulnerabilities.”

Employing the guidance in security.txt doesn’t solve the problem entirely but it is becoming common best practice amongst many other security conscious companies. Over time I expect that researchers will rely on security.txt for information gathering. In fact, some integrations are already being built by the community into security tools to automatically fetch information from a company’s security.txt!

Our security.txt can be found here: https://cloudflare.com/.well-known/security.txt

security.txt as a service — built on Cloudflare Workers

Deploying security.txt: how Cloudflare’s security team builds on Workers
Cloudflare’s security.txt

When scoping out the work for deploying security.txt we quickly realized that there were a few areas we wanted to address when deploying the service:

  • Automation – Deploys should involve as few humans as possible.
  • Ease of maintenance – Necessary changes (specification updates, key rotations, etc) should be a single commit and deploy away.
  • Version control security.txt is inherently a sensitive file, retaining attribution of every change made is valuable from an auditing perspective.

Certainly, we could manage this project through a manual process involving extensive documentation and coordination for manual deployments but we felt it was important to build this out as a full service that was easily maintainable. Manual maintenance and deployments do not necessarily scale well over time, especially for something that isn’t touched that often. We wanted to make that process as easy as possible since security.txt is meant to be a regularly maintained, living document.

We quickly turned to Workers for the task. Using the Wrangler CLI we can achieve the ease of deployment and automation requirements as well as track the project in git. In under half an hour I was able to build out a full prototype that addressed all of the requirements of the Internet-Draft and deploy it to a staging instance of our website. Since then, members from our team made some revisions to the text itself, updated the expiration on our PGP key, and bumped up to the latest draft version.

One cool decision we made is that the security.txt file is created from a template at build time which allows us to perform dynamic operations on our baseline security.txt. For example, Section 3.5.5 of the draft calls for a date and time after which the data contained in the “security.txt” file is considered stale and should not be used. To address this field, we wrote a short node.js script that automatically sets an expiration of 365 days after the point of deployment, encouraging regular updates:

const dayjs = require('dayjs')
const fs = require('fs')

const main = async () => {
    `\nExpires: ${dayjs()
      .add(365, 'day')
      // Thu, 31 Dec 2020 18:37:07 -0800
      .format('ddd, D MMM YYYY HH:mm:ss ZZ')}\n`,
    function(err) {
      if (err) throw err
      console.log('Wrote expiration field!')

We’ve also leveraged this benefit in our Make targets where at build time we ensure that the deployed security.txt is clearsigned with the [email protected] PGP key:

	gpg --local-user [email protected] -o src/txt/security.txt --clearsign src/txt/security.txt.temp
	rm src/txt/security.txt.temp

And finally, leveraging the multi-route support in wrangler 1.8 alongside the request object that gets passed into our Worker, we can serve our PGP public key and security.txt at the same time on two routes using one Worker, differentiated based on what the eyeball is asking for:

import pubKey from './txt/security-cloudflare-public-06A67236.txt'
import securityTxt from './txt/security.txt'

const handleRequest = async request => {
  const { url } = request
  if (url.includes('/.well-known/security.txt')) {
    return new Response(securityTxt, {
      headers: { 'content-type': 'text/plain; charset=utf-8' }, // security.txt
  } else if (url.includes('/gpg/security-cloudflare-public-06A67236.txt')) {
    return new Response(pubKey, {
      headers: { 'content-type': 'text/plain; charset=utf-8' }, // GPG Public key
  return fetch(request) // Pass to origin

In the interest of transparency and to allow anyone to easily achieve the same wins we did, we’ve open sourced the Worker itself for anyone who wants to deploy this service onto their Cloudflare zone. It just takes a few minutes of time.

You can find the project on GitHub: https://github.com/cloudflare/securitytxt-worker

What’s next?

The security team at Cloudflare has grown a significant amount in the past year. This is most evident in terms of our headcount (and still growing!) but also in how we take on new projects. We’re hoping to share more stories like this in the future and open source some of the other security services that we are building on Workers to help others achieve the same security wins that we are achieving with the platform.

Using Cloudflare Gateway to Stay Productive (and turn off distractions) While Working Remotely

Post Syndicated from Irtefa original https://blog.cloudflare.com/using-cloudflare-gateway-to-stay-productive-and-turn-off-distractions-while-working-remotely/

Using Cloudflare Gateway to Stay Productive (and turn off distractions) While Working Remotely

This week, like many of you reading this article, I am working from home. I don’t know about you, but I’ve found it hard to stay focused when the Internet is full of news related to the coronavirus.

CNN. Twitter. Fox News. It doesn’t matter where you look, everyone is vying for your attention. It’s totally riveting…

… and it’s really hard not to get distracted.

It got me annoyed enough that I decided to do something about it. Using Cloudflare’s new product, Cloudflare Gateway, I removed all the online distractions I normally get snared by — at least during working hours.

This blog post isn’t very long, but that’s a function of how easy it is to get Gateway up and running!

Getting Started

To get started, you’ll want to set up Gateway under your Cloudflare account. Head to the Cloudflare for Teams dashboard to set it up for free (if you don’t already have a Cloudflare account, hit the ‘Sign up’ button beneath the login form).

If you are using Gateway for the first time, the dashboard will take you through an onboarding experience:

Using Cloudflare Gateway to Stay Productive (and turn off distractions) While Working Remotely

The onboarding flow will help you set up your first location. A location is usually a physical entity like your home, office, store or a data center.

When you are setting up your location, the dashboard will automatically identify your IP address and create a location using that IP. Gateway will associate requests from your router or device by matching requests with your location by using the linked IP address of your location (for an IPv4 network). If you are curious, you can read more about how Gateway determines your location here.

Before you complete the setup you will have to change your router’s DNS settings by removing the existing DNS resolvers and adding Cloudflare Gateway’s recursive DNS resolvers:


How you configure your DNS settings may vary by router or a device, so we created a page to show you how to change DNS settings for different devices.

You can also watch this video to learn how to setup Gateway:

Deep Work

Next up, in the dashboard, I am going to go to my policies and create a policy that will block my access to distracting sites. You can call your policy anything you want, but I am going to call mine “Deep work.”

Using Cloudflare Gateway to Stay Productive (and turn off distractions) While Working Remotely

And I will add a few websites that I don’t want to get distracted by, like CNN, Fox News and Twitter.

Using Cloudflare Gateway to Stay Productive (and turn off distractions) While Working Remotely

After I add the domains, I hit Save.

If you find the prospect of blocking all of these websites cumbersome, you can use category-based DNS filtering to block all domains that are associated with a category (‘Content categories’ have limited capabilities on Gateway’s free tier).

Using Cloudflare Gateway to Stay Productive (and turn off distractions) While Working Remotely

So if I select Sports, all websites that are related to Sports will now be blocked by Gateway. This will take most people a few minutes to complete.

And once you set the rules by hitting ‘Save’, it will take just seconds for the selected policies to propagate across all of Cloudflare’s data centers, spread across more than 200 cities around the world.

How can I test if Gateway is blocking the websites?

If you now try to go to one of the blocked websites, you will see the following page on your browser:

Using Cloudflare Gateway to Stay Productive (and turn off distractions) While Working Remotely

Cloudflare Gateway is letting your browser know that the website you blocked is unreachable. You can also test if Gateway is working by using dig or nslookup on your machine:

Using Cloudflare Gateway to Stay Productive (and turn off distractions) While Working Remotely

If a domain is blocked, you will see the following in the DNS response status: REFUSED.

This means that the policy you created is working!

And once working hours are over, it’s back to being glued to the latest news.

If you’d rather watch this in video format, here’s one I recorded earlier:

And to everyone dealing with the challenges of COVID-19 and working from home — stay safe!

CERT partners with GitHub Security Lab for automated remediation

Post Syndicated from Nico Waisman original https://github.blog/2020-03-18-cert-partners-with-github-security-lab-for-automated-remediation/

As security researchers, the GitHub Security Lab team constantly embarks on an emotional journey with each new vulnerability challenge. The excitement of starting new research, the disappointment that comes with hitting a plateau, the energy it takes to stay focused and on target…and hopefully, the sheer joy of achieving a tangible result after weeks or months of working on a problem that seemed unsolvable.

Regardless of how proud you are of the results, do you ever get a nagging feeling that maybe you didn’t make enough of an impact? While single bug fixes are worthwhile in improving code, it’s not sufficient enough to improve the state of security of the open source software (OSS) ecosystem as a whole. This holds true especially when you consider that software is always growing and changing—and as vulnerabilities are fixed, new ones are introduced.

Beyond single bug fixes

At GitHub, we host millions of OSS projects which puts us in a unique position to take a different approach with OSS security. We have the power and responsibility to make an impact beyond single bug fixes. This is why a big part of the GitHub Security Lab mission is to find ways to scale our vulnerability hunting efforts and empower others to do the same.

Our goal is to turn single vulnerabilities into hundreds, if not thousands, of bug fixes at a time. Enabled by the GitHub engineering teams, we aim to establish workflows that are open to the community that tackle vulnerabilities at scale on the GitHub platform.

Ultimately, we want to establish feedback loops with the developer and security communities, and act as security facilitators, all while working with the OSS community to secure the software we all depend upon.

We’re taking a deep-dive in the remediation of a security vulnerability with CERT. Learn more about how we found ways to scale our vulnerability hunting efforts and empower others to do the same.

Continue reading on the Security Lab blog

The post CERT partners with GitHub Security Lab for automated remediation appeared first on The GitHub Blog.

Announcing Network Analytics

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/announcing-network-analytics/

Our Analytics Platform

Announcing Network Analytics

Back in March 2019, we released Firewall Analytics which provides insights into HTTP security events across all of Cloudflare’s protection suite; Firewall rule matches, HTTP DDoS Attacks, Site Security Level which harnesses Cloudflare’s threat intelligence, and more. It helps customers tailor their security configurations more effectively. The initial release was for Enterprise customers, however we believe that everyone should have access to powerful tools, not just large enterprises, and so in December 2019 we extended those same enterprise-level analytics to our Business and Pro customers.

Announcing Network Analytics
Source: https://imgflip.com/memegenerator

Since then, we’ve built on top of our analytics platform; improved the usability, added more functionality and extended it to additional Cloudflare services in the form of Account Analytics, DNS Analytics, Load Balancing Analytics, Monitoring Analytics and more.

Our entire analytics platform harnesses the powerful GraphQL framework which is also available to customers that want to build, export and share their own custom reports and dashboards.

Extending Visibility From L7 To L3

Until recently, all of our dashboards were mostly HTTP-oriented and provided visibility into HTTP attributes such as the user agent, hosts, cached resources, etc. This is valuable to customers that use Cloudflare to protect and accelerate HTTP applications, mobile apps, or similar. We’re able to provide them visibility into the application layer (Layer 7 in the OSI model) because we proxy their traffic at L7.

Announcing Network Analytics
DDoS Protection for Layer 3-7

However with Magic Transit, we don’t proxy traffic at L7 but rather route it at L3 (network layer). Using BGP Anycast, customer traffic is routed to the closest point of presence of Cloudflare’s network edge where it is filtered by customer-defined network firewall rules and automatic DDoS mitigation systems. Clean traffic is then routed via dynamic GRE Anycast tunnels to the customer’s data-centers. Routing at L3 means that we have limited visibility into the higher layers. So in order to provide Magic Transit customers visibility into traffic and attacks, we needed to extend our analytics platform to the packet-layer.

Announcing Network Analytics
Magic Transit Traffic Flow

On January 16, 2020, we released the Network Analytics dashboard for Magic Transit customers and Bring Your Own IP (BYOIP) customers. This packet and bit oriented dashboard provides near real-time visibility into network- and transport-layer traffic patterns and DDoS attacks that are blocked at the Cloudflare edge in over 200 cities around the world.

Announcing Network Analytics
Network Analytics – Packets & Bits Over Time

Analytics For A Year

The way we’ve architected the analytics data-stores enables us to provide our customers one year’s worth of insights. Traffic is sampled at the edge data-centers. From those samples we structure IP flow logs which are similar to SFlow and bucket one minute’s worth of traffic, grouped by destination IP, port and protocol. IP flows includes multiple packet attributes such as TCP flags, source IPs and ports, Cloudflare data-center, etc. The source IP is considered PII data and is therefore only stored for 30 days, after which the source IPs are discarded and logs are rolled up into one hour groups, and then one day groups. The one hour roll-ups are stored for 6 months and the one day roll-ups for 1 year.

Similarly, attack logs are also stored efficiently. Attacks are stored as summaries with start/end timestamps, min/max/average/total bits and packets per second, attack type, action taken and more. A DDoS attack could easily consist of billions of packets which could impact performance due to the number of read/write calls to the data-store. By storing attacks as summary logs, we’re able to overcome these challenges and therefore provide attack logs for up to 1 year back.

Network Analytics via GraphQL API

We built this dashboard on the same analytics platform, meaning that our packet-level analytics are also available by GraphQL. As an example, below is an attack report query that would show the top attacker IPs, the data-center cities and countries where the attack was observed, the IP version distribution, the ASNs that were used by the attackers and the ports. The query is done at the account level, meaning it would provide a report for all of your IP ranges. In order to narrow the report down to a specific destination IP or port range, you can simply add additional filters. The same filters also exist in the UI.

  viewer {
    accounts(filter: { accountTag: $accountTag }) {
      topNPorts: ipFlows1mGroups(
        limit: 5
        filter: $portFilter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
        dimensions {
          metric: sourcePort
      topNASN: ipFlows1mGroups(
        limit: 5
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
        dimensions {
          metric: sourceIPAsn
          description: sourceIPASNDescription
      topNIPs: ipFlows1mGroups(
        limit: 5
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
        dimensions {
          metric: sourceIP
      topNColos: ipFlows1mGroups(
        limit: 10
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
        dimensions {
          metric: coloCity
      topNCountries: ipFlows1mGroups(
        limit: 10
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
        dimensions {
          metric: coloCountry
      topNIPVersions: ipFlows1mGroups(
        limit: 2
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
        dimensions {
          metric: ipVersion
Attack Report Query Example

After running the query using Altair GraphQL Client, the response is returned in a JSON format:

Announcing Network Analytics

What Do Customers Want?

As part of our product definition and design research stages, we interviewed internal customer-facing teams including Customer Support, Solution Engineering and more. I consider these stakeholders as super-user-aggregators because they’re customer-facing teams and are constantly engaging and helping our users. After the internal research phase, we expanded externally to customers and prospects; particularly network and security engineers and leaders. We wanted to know how they expect the dashboard to fit in their work-flow, what are their use cases and how we can tailor the dashboard to their needs. Long story short, we identified two main use cases: Incident Response and Reporting. Let’s go into each of these use cases in more detail.

Incident Response

I started off by asking them a simple question – “what do you do when you’re paged?” We wanted to better understand their incident response process; specifically, how they’d expect to use this dashboard when responding to an incident and what are the metrics that matter to them to help them make quick calculated decisions.

You’ve Just Been Paged

Let’s say that you’re a security operations engineer. It’s Black Friday. You’re on call. You’ve just been paged. Traffic levels to one of your data-centers has exceeded a safe threshold. Boom. What do you do? Responding quickly and resolving the issue as soon as possible is key.

If your workflows are similar to our customers’, then your objective is to resolve the page as soon as possible. However, before you can resolve it, you need to determine if there is any action that you need to take. For instance, is this a legitimate rise in traffic from excited Black Friday shoppers, perhaps a new game release or maybe an attack that hasn’t been mitigated? Do you need to shift traffic to another data-center or are levels still stable? Our customers tell us that these are the metrics that matter the most:

  1. Top Destination IP and port – helps understand what services are being impacted
  2. Top source IPs, port, ASN, data-center and data-center Country – helps identify the source of the traffic
  3. Real-time packet and bit rates – helps understand the traffic levels
  4. Protocol distribution – helps understand what type of traffic is abnormal
  5. TCP flag distribution – an abnormal distribution could indicate an attack
  6. Attack Log – shows what types of traffic is being dropped/rate-limited
Announcing Network Analytics
Customizable DDoS Attack Log

As network and transport layer attacks can be highly distributed and the packet attributes can be easily spoofed, it’s usually not practical to block IPs. Instead, the dashboard enables you to quickly identify patterns such as an increased frequency of a specific TCP flag or increased traffic from a specific country. Identifying these patterns brings you one step closer to resolving the issue. After you’ve identified the patterns, packet-level filtering can be applied to drop or rate-limit the malicious traffic. If the attack was automatically mitigated by Cloudflare’s systems, you’ll be able to see it immediately along with the attack attributes in the activity log. By filtering by the Attack ID, the entire dashboard becomes your attack report.

Announcing Network Analytics
Packet/Bit Distribution by Source & Destination
Announcing Network Analytics
TCP Flag Distribution


During our interviews with security and network engineers, we also asked them what metrics and insights they need when creating reports for their managers, C-levels, colleagues and providing evidence to law-enforcement agencies. After all, processing data and creating reports can consume over a third (36%) of a security team’s time (~3 hours a day) and is also one of the most frequent DDoS asks by our customers.

Announcing Network Analytics
Add filters, select-time range, print and share

On top of all of these customizable insights, we wanted to also provide a one-line summary that would reflect your recent activity. The one-liner is dynamic and changes based on your activity. It tells you whether you’re currently under attack, and how many attacks were blocked. If your CISO is asking for a security update, you can simply copy-paste it and convey the efficiency of the service:

Announcing Network Analytics
Dynamic Summary

Our customers say that they want to reflect the value of Cloudflare to their managers and peers:

  1. How much potential downtime and bandwidth did Cloudflare spare me?
  2. What are my top attacked IPs and ports?
  3. Where are the attacks coming from? What types and what are the trends?

The Secret To Creating Good Reports

What does everyone love? Cool Maps! The key to a good report is adding a map; showing where the attack came from. But given that packet attributes can be easily spoofed, including the source IP, it won’t do us any good to plot a map based on the locations of the source IPs. It would result in a spoofed source country and is therefore useless data. Instead, we decided to show the geographic distribution of packets and bits based on the Cloudflare data-center in which they were ingested. As opposed to legacy scrubbing center solutions with limited network infrastructures, Cloudflare has data-centers in more than 200 cities around the world. This enables us to provide precise geographic distribution with high confidence, making your reports accurate.

Announcing Network Analytics
Packet/Bit Distribution by geography: Data-center City & Country

Tailored For You

One of the main challenges both our customers and we struggle with is how to process and generate actionable insights from all of the data points. This is especially critical when responding to an incident. Under this assumption, we built this dashboard with the purpose of speeding up your reporting and investigation processes. By tailoring it to your needs, we hope to make you more efficient and make the most out of Cloudflare’s services. Got any feedback or questions? Post them below in the comments section.

If you’re an existing Magic Transit or BYOIP customer, then the dashboard is already available to you. Not a customer yet? Click here to learn more.

Cloudflare’s COVID-19 FAQs

Post Syndicated from Janet Van Huysse original https://blog.cloudflare.com/cloudflare-covid-19-faqs/

Cloudflare's COVID-19 FAQs

As the status of COVID-19 continues to impact people and businesses around the world, Cloudflare is committed to providing awareness and transparency to our customers, employees, and partners about how we are responding. We do not anticipate any significant disruptions in Cloudflare services.

Our Business Continuity Team is monitoring the situation closely and all company personnel are kept up to date via multiple internal communication channels including a live chat room. Customers and the public are encouraged to visit this blog post for the latest information.

You can check the status of our network at www.cloudflarestatus.com. For COVID-19-related questions that aren’t answered below, please contact our Customer Support Team.

Does Cloudflare have a Business Continuity Team (BCT)?

Yes, Cloudflare’s Business Continuity Team is a cross-functional, geographically diverse group dedicated to navigating through a health crisis like COVID-19 as well as a variety of other scenarios that may impact employee safety and business continuity.

What is Cloudflare’s Business Continuity Plan in the light of COVID-19?

In addition to Cloudflare’s existing Disaster Recovery Plan we have implemented the following strategies:

  • Daily Business Continuity Team meetings to determine if updates, changes, or communication need to be provided to customers, partners, and employees.
  • Global monitoring of COVID-19 related events and impact.
  • Tailored business continuity plans per office and function, including work from home policies and regional resources.

Can the essential aspects of the product or service, requiring employee interaction, be performed by employees working from alternate locations or at their homes?

Fortunately the nature of Cloudflare business is digital and rarely requires in-person activity. At this time we do not anticipate significant impact to products or services. Some teams are adjusting to teleworking but at this time we have not identified a service-level impact.

Which components of the product or service are reliant on employees performing a specific action vs. which ones are automated activities?

Troubleshooting and maintenance of the platform is performed by Cloudflare employees in globally dispersed locations. On-prem support is not required for the vast majority of our products and services.

Products can be used without the need for manual interaction from Cloudflare employees.

What is the response for your customer support team? Do you have call centers?

Cloudflare does not have call centers. Our support personnel will continue to provide assistance 24 hours a day to customers no matter their location as usual.

Have you implemented any travel restrictions or social distancing protocols?

Yes, Cloudflare has implemented travel restrictions to countries as recommended by government agencies. Employees are encouraged to postpone all non-essential business travel at this time including inter-office travel. We are monitoring regional guidance from health authorities and updating our requirements as needed.

Do any of Cloudflare’s vendors have any new or emerging concerns about their ability to deliver goods or services during a pandemic?

Cloudflare has taken an extra step to work with critical business partners and suppliers to ensure that there will be minimal to no impact to the business or our customers.

Are Cloudflare offices closed?

No, our offices are open and operational. Employees are encouraged to work from home and must if they meet local health authority-advised populations for quarantine.

What are your plans to ensure minimal impact to services?

Cloudflare’s business continuity team has worked with organization leaders to prepare for the challenges of COVID-19 and many other scenarios. We are confident in our ability to limit impact to services because of our preparation.

Do you anticipate any service disruption or support by either yourself or your subcontractors due to COVID-19?

At this time we do not anticipate any service disruptions due to COVID-19. We are monitoring the situation closely and will update as information becomes available.

What happens if one of our data centers goes down? Who will remedy it? Does it require a person to be on-prem?

Due to the nature of the Anycast network, we have over 200 Points of Presence (PoPs) that manage failover traffic. Traffic would simply be rerouted to other locations. Learn about the Anycast network here: https://www.cloudflare.com/network/.

The Infrastructure and Engineering teams are working proactively to ensure that enough capacity is available at our most critical PoPs. We feel confident in our ability to service our most critical facilities with our approved partner.

Which vendor contact is responsible for communicating any disruption in their service to customers?

Our Customer Support Team is fully operational and will reach out as they would with any other outage or incident. Methods vary based on contract.

What is the communication method they will be using to inform customers of an interruption? (For example, if they would normally call your office phone, and you are working remotely, your desk phone may no longer be the best option)

You can check the status of our services at www.cloudflarestatus.com. Additionally, our Customer Support Team is fully operational and will reach out as they would with any other outage or incident. Methods vary based on contract.

Who can I reach out to for comments or concerns?

For questions not answered above, customers can reach out to our Customer Support Team via normal means.

Cloudflare During the Coronavirus Emergency

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/cloudflare-during-the-coronavirus-emergency/

Cloudflare During the Coronavirus Emergency

This email was sent to all Cloudflare customers a short while ago

From: Matthew Prince
Date: Thu, Mar 12, 2020 at 4:20 PM
Subject: Cloudflare During the Coronavirus Emergency

We know that organizations and individuals around the world depend on Cloudflare and our network. I wanted to send you a personal note to let you know how Cloudflare is dealing with the Coronavirus emergency.

First, the health and safety of our employees and customers is our top priority. We have implemented a number of sensible policies to this end, including encouraging many employees to work from home. This, however, hasn’t slowed our operations. Our network operations center (NOC), security operations center (SOC), and customer support teams will remain fully operational and can do their jobs entirely remote as needed.

Second, we are tracking Internet usage patterns globally. As more people work from home, peak traffic in impacted regions has increased, on average, approximately 10%. In Italy, which has imposed a nationwide quarantine, peak Internet traffic is up 30%. Traffic patterns have also shifted so peak traffic is occurring earlier in the day in impacted regions. None of these traffic changes raise any concern for us. Cloudflare’s network is well provisioned to handle significant spikes in traffic. We have not seen, and do not anticipate, any impact to our network’s performance, reliability, or security globally.

Third, we are monitoring for any changes in cyberthreats. While we have seen more phishing attacks using the Coronavirus as a lure, we have not seen any significant increase in attack traffic or new threats. Again, our SOC remains fully operational and is continuously monitoring for any new security threats that may emerge.

Finally, we recognize that this emergency has put strain on the infrastructure of companies around the world as more employees work from home. On Monday, I wrote about how we are making our Cloudflare for Teams product, which helps support secure and efficient remote work, free for small businesses for at least the next six months:


As the severity of the emergency has become clearer over the course of this week, we decided to extend this offer to help any business, regardless of size. The healthy functioning of our economy globally depends on work continuing to get done, even as people need to do that work remotely. If Cloudflare can do anything to help ensure that happens, I believe it is our duty to do so.

If you are already a Cloudflare for Teams customer, we have removed the caps on usage during this emergency so you can scale to whatever number of seats you need without additional cost. If you are not yet using Cloudflare for Teams, and if you or your employer are struggling with limits on the capacity of your existing VPN or Firewall, we stand ready to help and have removed the limits on the free trials of our Access and Gateway products for at least the next six months. Cloudflare employees around the world have volunteered to run no-cost onboarding sessions so you can get set up quickly and ensure your business’ continuity.

Details: https://developers.cloudflare.com/access/about/coronavirus-emergency/
Sign up for an onboarding session: https://calendly.com/cloudflare-for-teams/onboarding

Thank you for being a Cloudflare customer. These are challenging times but I want you to know that we stand ready to help however we can. We understand the critical role we play in the functioning of the Internet and we are continually humbled by the trust you place in us. Together, we can get through this.

Matthew Prince
Co-founder & CEO


Protect your team with Cloudflare Gateway

Post Syndicated from Irtefa original https://blog.cloudflare.com/protect-your-team-with-cloudflare-gateway/

Protect your team with Cloudflare Gateway

On January 7th, we announced Cloudflare for Teams, a new way to protect organizations and their employees globally, without sacrificing performance. Cloudflare for Teams centers around two core products – Cloudflare Access and Cloudflare Gateway. Cloudflare Access is already available and used by thousands of teams around the world to secure internal applications. Cloudflare Gateway solves the other end of the problem by protecting those teams from security threats without sacrificing performance.

Today, we’re excited to announce new secure DNS filtering capabilities in Cloudflare Gateway. Cloudflare Gateway protects teams from threats like malware, phishing, ransomware, crypto-mining and other security threats. You can start using Cloudflare Gateway at dash.teams.cloudflare.com. Getting started takes less than five minutes.

Why Cloudflare Gateway?

We built Cloudflare Gateway to address key challenges our customers experience with managing and securing global networks. The root cause of these challenges is architecture and inability to scale. Legacy network security models solved problems in the 1990s, but teams have continued to attempt to force the Internet of the 2020s through them.

Historically, branch offices sent all of their Internet-bound traffic to one centralized data center at or  near corporate headquarters. Administrators configured that to make sure all requests passed through a secure hardware firewall. The hardware firewall observed each request, performed inline SSL inspection, applied DNS filtering and made sure that the corporate network was safe from security threats. This solution worked when employees accessed business critical applications from the office, and when applications were not on the cloud.

Protect your team with Cloudflare Gateway
Average SaaS spending per company since 2008 (source)

SaaS broke this model when cloud-delivered applications became the new normal for workforce applications. As business critical applications moved to the cloud, the number of Internet bound requests from all the offices went up. Costs went up, too. In the last 10 years, SaaS spending across all company size segments  grew by more than 1615%. The legacy model of backhauling all Internet traffic through centralized locations could not keep up with the digital transformation that all businesses are still going through.

Protect your team with Cloudflare Gateway

The challenge of backhauling traffic for a global workforce

Expensive and slow

SaaS adoption is only one element that is breaking traditional network models. Geographically distributed offices and remote workers are playing a role, too.

Cloudflare Gateway has been in beta use for some of our customers over the last few months. One of those customers had more than 50 branch offices, and sent all of their DNS traffic through one location. The customer’s headquarters is in New York, but they have offices all over the world, including in India. When someone from the office in India visits google.com, DNS requests travel all the way to New York.

As a result, employees in India have a terrible experience using the Internet. The legacy approach to solve this problem is to add MPLS links from branch offices to the headquarters. But MPLS links are expensive, and can take a long time to configure and deploy. Businesses end up spending millions of dollars on legacy solutions, or they remain slow, driving down employee productivity.

Protect your team with Cloudflare Gateway

Slow to react to security threats

When businesses backhaul traffic to a single location to inspect and filter malicious traffic using a hardware firewall. But, the legacy hardware appliances were not built for the modern Internet. The threat landscape for the Internet is constantly changing.

For example: about 84% of phishing sites exist for less than 24 hours (source) and legacy hardware firewalls are not fast enough to update their static rules to thwart phishing attacks. When security threats on the Internet act like moving targets, legacy hardware appliances that rely on static models to filter malicious traffic cannot keep up. As a result, employees remain vulnerable to new threats even when businesses backhaul Internet bound traffic to a single location.

Cloudflare Gateway

Starting today, businesses of all sizes can secure all their Internet-bound traffic and make it faster with  Cloudflare Gateway. Cloudflare has data centers in more than 200 cities around the world and all of our services run in every single data center. Therefore, when a business uses Cloudflare Gateway, instead of backhauling traffic to a single location (slow), all Internet-bound requests travel to the nearest data center (fast) from the end user where Cloudflare Gateway applies security policies to protect businesses from security threats. All of this is done without the need for expensive MPLS links.

Protect your team with Cloudflare Gateway

Gateway’s secure DNS filtering capabilities are built on top of, the fastest public DNS resolver in the world. We took the pieces that made the public DNS resolver the fastest and built Cloudflare Gateway’s secure DNS filtering capabilities for customers who want to secure their connection to the Internet. Combined with Cloudflare’s global presence of data centers in more than 200 cities and the fastest public DNS resolver in the world, Cloudflare Gateway secures every connection from every device to every destination on the Internet without sacrificing performance.

Protect your team with Cloudflare Gateway

Why Secure DNS Filtering?

More than 90% of malware use DNS to perform command & control attacks and exfiltrate sensitive data. Here’s an example of how a malware can infect a device or a data center and perform a command & control (also known as C2C or C&C) attack:

Protect your team with Cloudflare Gateway

  1. Imagine Bob receives an email from someone impersonating his manager with a link to ‘Box’ that looks harmless. The email looks legitimate but in reality it is a phishing email intended to steal valuable information from Bob’s computer or infected with malware.
  2. When Bob clicks on the link, the website phishing ‘Box’ delivers an exploit and installs malware onto Bob’s computer.
  3. The downloaded malware sends a request to the Command & Control server signaling that the malware is ready to receive instructions from the server.
  4. Once the connection between the malware and Command & Control server is established, the server sends instructions to the malware to steal proprietary data, control the state of the machine to reboot it, shut it down or perform DDoS attacks against other websites.

If Bob’s computer was using DNS filtering, it could have prevented the attack in two places.

First, when Bob clicked on the phishing link (2). The browser sends a DNS request to resolve the domain of the phishing link. If that domain was identified by DNS filtering as a phishing domain, it would have blocked it right away.

Second, when malware initiated the connection with the Command & Control server, the malware also needed to make a DNS request to learn about the Command & Control server’s IP address. This is another place where a secure DNS filtering service can detect the domain as malware and block access to it.

Secure DNS filtering acts as the first layer of defence against most security threats and prevents corporate networks and devices from getting infected by malicious software in the first place. According to a security report by Global Cyber Alliance, companies could have prevented losses of more than $200B using DNS filtering.

How does Gateway’s secure DNS filtering work?

The primary difference between the public DNS resolver and Gateway’s secure DNS filtering is that the public DNS resolver does not block any DNS queries. When a browser requests example.com, the public DNS resolver simply looks up the answer for the DNS query either in cache or by performing a full recursive query.

Cloudflare Gateway adds one new step to introduce security into this flow. Instead of allowing all DNS queries, Gateway first checks the name being queried against the intelligence Cloudflare has about threats on the Internet. If that query matches a known threat, or is requesting a blocked category, Gateway stops it before the site could load for the user – and potentially execute code or phish that team member.

Protect your team with Cloudflare Gateway

For example, if a customer is using Cloudflare Gateway, and sends a DNS query to example.com, first, Gateway checks if the DNS query is coming from a customer. Second, if it is coming from a customer Gateway checks if the DNS query matches with any of the policies setup by the customer. The policy could be a domain that the customer is manually blocking or it could be part of a broader security category that the customer enabled. If the domain matches one of those cases, Cloudflare Gateway will block access to the domain. This will prevent the end user from going to example.com.

Encrypted DNS from day one

Gateway supports DNS over HTTPS today and will also support DNS over TLS in the future. You can use Firefox to start sending DNS queries to Gateway in an encrypted fashion. It will also support other DNS over HTTPS clients as long as you can change the hostname in your preferred DNS over HTTPS client.

Here’s how DNS over HTTPS for Cloudflare Gateway works:

Protect your team with Cloudflare Gateway

The DNS over HTTPS client encrypts the DNS request and sends it to the closest Cloudflare’s data center. Upon receiving the encrypted DNS request, it will decrypt it and send it to Cloudflare Gateway. Cloudflare Gateway will apply the required security policies and return the response to our edge. Our edge will encrypt the response and send it back to the DNS over HTTPS client.

By encrypting your DNS queries you will make sure that ISPs cannot snoop on your DNS queries and at the same time filter DNS requests that are malicious.

Cloudflare Gateway is for everyone

One of our customers, Algolia, is a fast growing startup. Algolia grew by 1005% in 2019 (source). As the company experienced rapid growth, Cloudflare Gateway helped maintain their corporate security without slowing them down:

Algolia is growing pretty fast. At Algolia, we needed a way to have visibility across our corporate network without slowing things down for our employees. Cloudflare Gateway gave us a simple way to do that
Adam Surak (Director of Infrastructure & Security Algolia)

But Gateway isn’t just for fast growing startups. Anyone with a Cloudflare account can start using Cloudflare Gateway today. Gateway has a free tier where we wanted to make sure even small businesses, teams and households who cannot afford expensive security solutions can use Cloudflare Gateway to protect themselves from security threats on the Internet. We offer a free plan to our customers because we have a paid tier for this product with additional functionality that are more suited towards super users. Features like longer data retention for analytics, more granular security and content categories, individual DNS query logs, logpush to a cloud storage bucket etc. are features that are only available to our paid customers. You can learn more about Gateway in our product page.

How can you get started?

If you already have a Cloudflare account get started by visiting the Teams dashboard.

The onboarding will walk you through how to configure your router, or device to send DNS queries to Gateway. The onboarding will help you setup a location. A location is usually a physical entity like your office, retail location, data center or home.

Protect your team with Cloudflare Gateway

Once you finish onboarding, start by configuring a policy. A policy will allow you to block access to malicious websites when anyone is using the Internet from the location that you just created.

Protect your team with Cloudflare Gateway

You can choose from the categories of policy that we have created. You can also manually add a domain to block it using Gateway.

Protect your team with Cloudflare Gateway

Once you start sending DNS queries to Gateway, you will see analytics on the team’s dashboard. The analytics dashboard will help you understand if there are any anomalies in your network.

What’s next

Cloudflare’s mission is to help create a better Internet. We have achieved this by protecting millions of websites around the world and securing millions of devices using WARP. With Cloudflare Access, we helped secure and protect internal applications. Today, with Cloudflare Gateway’s secure DNS filtering capabilities we have extended our mission to also protect the people who use the Internet every day. The product you are seeing today is a glimpse of what we are building for the future. Our team is incredibly proud of what we have built and we are just getting started.

Open sourcing our Sentry SSO plugin

Post Syndicated from Sam Rhea original https://blog.cloudflare.com/open-sourcing-our-sentry-sso-plugin/

Open sourcing our Sentry SSO plugin

Cloudflare Access, part of Cloudflare for Teams, replaces legacy corporate VPNs with Cloudflare’s global network. Using your existing identity provider, Access enables your end users to login from anywhere — without a clunky agent or traffic backhaul through a centralized appliance or VPN.

Today, we are open sourcing a plugin that continues to improve that experience by making it easier for teams to use Cloudflare Access with one of the software industry’s most popular engineering tools, Sentry.

What is Sentry?

Sentry is an application that helps software teams find and diagnose errors in their products. We use Sentry here at Cloudflare. When you encounter an error when using a Cloudflare product, like our dashboard, we log that event. We then use Sentry to determine what went wrong.

Sentry can categorize and roll up errors, making it easy to identify new problems before investigating them with the tool’s event logging. Engineering managers here can use the dashboards to monitor the health of a new release. Product managers often use those reports as part of prioritizing what to fix next. Engineers on our team can dig into the individual errors as they release a fix.

Sentry is available in two forms: a SaaS model and a self-hosted version. Both modes give engineering teams comprehensive insight into the behavior of their deployed applications and the issues their users encounter.

Connecting users to Sentry

Organizations can deploy the self-hosted version on-premise or in a cloud environment they control. However, they still need to create a secure way to allow their teams to connect to the app.

Historically, most opt for a VPN to solve that challenge. End users outside of the office need to configure a VPN client on their laptop and try to login with credentials that are often different from the ones used for a corporate SSO. Administrators had to make sure their VPN appliance could scale for a few users, but with most in the office, the VPN was a serious inconvenience for a smaller set of users.

Over the last few years, that group of users working outside of the office has grown. The outbreak of COVID-19 is accelerating that growth significantly. Users are working from BYOD laptops, mobile phones, and in unfamiliar networks that all struggle with a VPN. Even worse, a VPN has a load limit because it relies on an actual appliance (whether virtual or physical hardware). Organizations can attempt to stress test their VPN, but will always have a limit that administrators need to continuously monitor.

Cloudflare Access gives administrators the scale of Cloudflare’s global network and provides end users with a SaaS-like experience that just works from any device or network. When teams secure Sentry with Cloudflare Access, end users visit the hostname of the application, login with their identity provider, and are redirected from Cloudflare’s edge to the app if they have permission to reach it.

However, in the case of an app like Sentry, end users need to login one more time to the application itself. That small step adds real friction, which Access can now solve through this open source plugin.

JWT Security with Cloudflare for Teams

When a user logs in to their identity provider when connecting to an application protected by Access, Cloudflare signs a JSON Web Token (JWT).

Open sourcing our Sentry SSO plugin

Cloudflare Access uses that JWT, and its contents, to confirm a user identity before allowing or denying access to sensitive resources. Cloudflare securely creates these through the OAUTH or SAML integration between Cloudflare Access and the configured identity provider. Each JWT consists of three Base64-URL strings: the header, the payload, and the signature.

  • The header defines the cryptographic operation that encrypts the data in the JWT.
  • The payload consists of name-value pairs for at least one and typically multiple claims, encoded in JSON. For example, the payload can contain the identity of a user
  • The signature allows the receiving party to confirm that the payload is authentic.

The token is signed using a public private key pair and saved in the user’s browser. Inside of that token, we store the following details in addition to some general metadata:

  • User identity: typically the email address of the user retrieved from your identity provider.
  • Authentication domain: the domain that signs the token. For Access, we use “example.cloudflareaccess.com” where “example” is a subdomain you can configure.
  • Audience: The domain of the application you are attempting to reach.
  • Expiration: the time at which the token is no longer valid for use.

When a request is made to an application behind Access, Cloudflare looks for the presence of that token. If available, we decrypt it, validate its authenticity, and then read the payload. If the payload contains information about a user who should be able to reach the application, we send their request to an origin.

The Sentry plugin takes that JWT and reuses it, instead of prompting the visitor to login again with separate credentials. The plugin parses the user identity, checks it against the directory of users in Sentry, and maps that token to a Sentry profile and its assigned permissions.

All of this is seamless to the end user and takes just a few milliseconds. The user is instantly redirected to the application, fully authenticated, and only needs to remember their SSO login. Administrators now have one fewer set of credentials to worry about managing and the associated onboarding and offboarding.

Building your own SSO plugin

We believe that the JSON Web Token is a simple and efficient method for sending identity. Applications that use JWTs for authorization only need to support the JWT standard, instead of attempting to integrate with different versions of SAML or other formats like OIDC and OAUTH. A JWT is also information dense and built in a format, JSON, that can be easily parsed by the target application.

Some products, like Redash, already have native support for JWT integration. The Sentry plugin we built joins our Atlassian plugin as both options to extend support to those apps, but also examples that can be used for integration with other products. Other teams, like Auth0, have also published materials to add JWT integration to legacy apps.

What’s next?

Cloudflare Access is available on every Cloudflare account and 5 free seats are included by default. You can follow these instructions to get started.

If you are a small business, you can sign up for the Cloudflare for Teams program right now at the link below.


How Replicated Developers Develop Remotely

Post Syndicated from Guest Author original https://blog.cloudflare.com/how-replicated-secured-our-remote-dev-environment-with-cloudflare-access/

How Replicated Developers Develop Remotely

This is a guest post by Marc Campbell and Grant Miller, co-founders of Replicated.

How Replicated Developers Develop Remotely

Replicated is a 5-year old infrastructure software company working to make it easy for businesses to install and operate third party software. We don’t want you to have to send your data to a multi-tenant SaaS provider just to use their services. Our team is made up of twenty-two people distributed throughout the US. One thing that’s different about Replicated is our developers don’t actually store or execute code on their laptops; all of our development happens on remote instances in the cloud.

Our product, KOTS, runs in Kubernetes and manages the lifecycle of 3rd-party applications in the Kubernetes cluster. Building and validating the product requires a developer to have access to a cluster. But as we started to hire more and more engineers it became ridiculous to ask everyone to run their own local Kubernetes cluster. We needed to both simplify and secure our setup to allow every engineer to run their environment in the cloud, and we needed to do it in a way which was seamless and secure.

Previous Dev Environments with Docker for Mac

We started with each developer building their own local environments, using whatever tools they were comfortable with. Our first attempt to build a standard development environment that works for our engineering team was to use Docker for Mac and its built-in Kubernetes distribution. We would buy the best MacBook Pros available (16 GB, then 32 GB, now 64 GB), and everyone would have the entire stack running on their laptop.

This worked pretty well, except that there was a set of problems that our engineers would continue to hit–battery life was terrible because of the constant CPU usage, Docker For Mac was different from “real Kubernetes” in some meaningful ways, and Docker for Mac’s built-in K8s regularly would just sometimes stop working and the developer would need to uninstall and reinstall the entire stack. It was miserable.

We’d lose hours every week from engineers troubleshooting their local environments. When a front end engineer (who wasn’t expected to be a Kubernetes expert) would have issues, they’d need to pair and get help from a backend engineer; consuming not just one but two people’s valuable time.

We needed something better.

To The Cloud

Rather than running Docker locally, we now create an instance in Google Cloud for each developer. These instances have no public IP and are based on our machine image which has all of our prerequisites installed. This includes many tools, including a Kubernetes distribution that’s completely local to the server. We run a docker registry in each developer’s cluster as a cluster add-on. The cloud server has a magical tool called cloudflared running on it that replaces all of the network configuration and security work we would otherwise have had to do.‌‌

Cloudflared powers Argo Tunnel. When it starts, cloudflared creates four secure HTTP/2 tunnels to two Cloudflare data centers. When a request comes in for a development machine, Cloudflare routes that request over one of those tunnels directly to the machine running that developer’s environment. For example, my hostname is “marc.repl.dev”. Whenever I connect to that, from anywhere on earth, Cloudflare will see that I reach my development environment securely. If I need to spin up a new development environment, there is no configuration to do, wherever is running cloudflared with the appropriate credentials will receive the traffic. This all works on any cloud and in any cloud region.

‌‌This configuration has several advantages over a traditional deployment. For one, the server does not have a public IP and we don’t need to have any ports open in the Google Load Balancer, including for SSH. The only way to connect to these servers is through the Argo Tunnel, secured by Cloudflare Access. Access provides a BeyondCorp-style method of authentication, this ensures that the environment can be reached from anywhere in the world without the use of a VPN.

How Replicated Developers Develop Remotely

BeyondCorp is an elaborate way of saying that all our authentication is managed in a single place. We can write a policy which defines which machines a user should have access to and trust it will be applied everywhere. This means rather than managing SSH certificates which are hard to revoke and long-living, we can allow developers to login with the same Google credentials we use everywhere else! Should, knock on wood, a developer leave, we can revoke those credentials instantly; no more worrying what public keys they still might have lying around.

What happens on the developer’s machines?

Through Argo Tunnel and Access we now have the ability to connect to our new development instances, but that isn’t enough to allow our engineers to work. They need to be able to write and execute code on that remote machine in a seamless way. To solve that problem we turned to the Remote SSH extension for VS Code. In the words of the documentation for that project:

The Visual Studio Code Remote SSH extension allows you to open a remote folder on any remote machine, virtual machine, or container with a running SSH server and take full advantage of VS Code’s feature set. Once connected to a server, you can interact with files and folders anywhere on the remote filesystem.

With Remote SSH, VS Code seamlessly reads and writes files to the developer’s remote server. When a developer opens a project, it feels local and seamless, but everything is authenticated by Access and proxied through Argo over SSH. Our developers can travel anywhere in the world, and trust their development environment will be accessible and fast.

Locally, a developer has a .ssh/config file to define local ports to forward through the SSH connection to a port that’s only available on the remote server. For example, my .ssh/config file contains:‌‌

Host marc.repl.dev

HostName marc.repl.dev

User marc‌‌

LocalForward 8080

LocalForward 8005

To build and execute code our developers open the embedded terminal in VS Code. This automatically connects them to the remote server. We use skaffold, a Kubernetes CLI for local development. A simple skaffold dev starts the stack on their remote machine which feels local because it’s all happening inside VS Code. Once it’s started, the developer can access localhost in their browser to view the results of their work by visiting http://localhost:8080. The SSH config above will forward this traffic to port 30080 on the remote server. Port 30080 on the remote server is a NodePort configured in the local cluster, that has the web server running in it. All of our APIs and web servers have static NodePorts for local development environments.

Now, when a developer starts at Replicated, their first day (or even week) isn’t consumed by setting up the development environment–now it takes less than an hour. We have a Terraform script that makes it easy to replace any one of our developer’s machines in seconds.

The Aftermath

All developers at Replicated have now been using this environment for nine months. We haven’t eliminated the problems that occasionally come up where Kubernetes isn’t playing nicely, or Docker uses too much disk space. However, these problems do occur much less frequently than they did on Docker for Mac. We now have two new options that weren’t easily available when everyone ran their environment locally.

First, a backend engineer can just ssh through the Argo Tunnel into the other developers server to troubleshoot and help. Every development environment has become a collaborative place. This is great when two engineers aren’t in the same room.  Also, we’re less attached to our development environments–if my server isn’t working properly for unknown reasons, instead of troubleshooting it for hours, I can delete it and get a new clean one.

Some additional benefits include:

  • Developers can have multiple envs easily (to try out a new k8s version, for example)
  • Battery life is awesome again on laptops
  • We don’t need the biggest and most powerful laptops anymore (Hello Chromebooks and Tablets)
  • Developers can choose their local OS and environment (MacOS, Windows, Linux) because they are all supported, as long as SSH is supported.
  • Code does not live on a developer laptop; it doesn’t travel with them to coffee shops and other insecure places. This is great for security purposes–a lost laptop no longer means the codebase is out there with it.

How To

Beyond just telling you what we did, we’d like to show you how to replicate it for yourself! This assumes you have a domain which is already configured to use Cloudflare.

  1. Create an instance to represent your development environment in the cloud of your choice.


gcloud compute instances create my-dev-universe


2.   Configure your instance to run cloudflared when it starts up, and give it a helpful hostname like dev.mysite.com.‌‌


sudo apt-get install cloudflare/cloudflare/cloudflared

cat “hostname: dev.mysite.com\n” > ~/.cloudflared/config.yml

cloudflared login

sudo cloudflared service install


3.  Write an Access policy to allow only you to access your machine‌‌

How Replicated Developers Develop Remotely

‌4. Configure your local machine to SSH via Cloudflare:‌‌


sudo apt-get install cloudflare/cloudflare/cloudflared

cloudflared access ssh-config –hostname dev.mysite.com –short-lived-cert >> ~/.ssh/config


4. Install VS Code and the Remote Development extension pack

5. In VS Code select ‘Remote-SSH: Connect to Host…’ from the Command Palette and enter [email protected]. A browser window will open where you will be prompted to login with the identity provider you configured with Cloudflare.

6. You’re done! If you select File > Open you will be seeing files on your remote machine. The embedded terminal will also execute code on that remote machine.

7. Once you’re ready to get a production-ready setup for your team, take a look at the instructions we share with our team.


There is no doubt that the world is becoming more Internet-connected, and that deployment environments are becoming more complex. It stands to reason that it’s only a matter of time before all software development happens through and in concert with the Internet.

While it might not be the best solution for every team, it has resulted in a dramatically better experience for Replicated and we hope it does for you as well.

How to get started‌‌

‌‌Replicated develops remotely with Cloudflare Access, a remote access gateway that helps you secure access to internal applications and infrastructure without a VPN.

Effective until September 1, 2020, Cloudflare is making Access and other Cloudflare for Teams products free to small businesses. We’re doing this to help ensure that small businesses that implement work from home policies in order to combat the spread of the Coronavirus (COVID-19) can ensure business continuity.

‌You can learn more and apply at cloudflare.com/smallbusiness now.

How to Build a Highly Productive Remote Team (or Team of Contractors) with Cloudflare for Teams

Post Syndicated from Lane Billings original https://blog.cloudflare.com/how-to-build-a-highly-productive-remote-team-or-team-of-contractors-with-cloudflare-for-teams/

How to Build a Highly Productive Remote Team (or Team of Contractors) with Cloudflare for Teams

Much of IT has been built on two outdated assumptions about how work is done. First, that employees all sit in the same building or branch offices. Second, that those employees will work full-time at the same company for years.

Both of these assumptions are no longer true.

Employees now work from anywhere. In the course of writing this blog post, I opened review tickets in our internal JIRA from my dining table at home. I reviewed internal wiki pages on my phone during my commute on the train. And I spent time reviewing some marketing materials in staging in our CMS.

In a past job, I would have suffered trying to connect to these tools through a VPN. That would have slowed down my work on a laptop and made it nearly impossible to use a phone to catch up on my commute.

The second challenge is ramp-up. I joined Cloudflare a few months ago. As a member of the marketing team, I work closely with our product organization and there are several dozen tools that I need to do that.

I’m hardly alone. The rise of SaaS and custom internal applications means that employees need access to all kinds of tools to effectively do their job. The increasing prevalence of contractors and part-time employees is compounding the challenge of how to get employees productive. On-boarding (and off-boarding) is now not an occasional thing, but has become a regular rhythm of how companies operate.

All these factors are combining to cause a bigger question: how can I make teams that reflect the new modern workforce — often remote, and increasingly not the traditional full time, permanent employee — as productive as possible?

Step one: put the VPN on a performance improvement plan

We’ve blogged extensively about our own troubles with VPNs. As we became a complex, multinational organization made up of contractors and full-time employees, the private network we deployed to host internal applications began to slow our teams down. We built Cloudflare Access to address our own challenges with the VPN, and since then hundreds of customers have used it to accelerate access for their remote workforces.

How to Build a Highly Productive Remote Team (or Team of Contractors) with Cloudflare for Teams

India’s largest B2B e-commerce platform, Udaan, is one example. They used Access to avoid ever having to deploy a VPN in the first place. As Udaan grew to new locations around the world, their IT team needed fast ways to give access to the thousands of users — including contractors, employees, interns and vendors — that needed to connect to their internal systems.

Now that their internal applications are protected with Cloudflare, Udaan’s IT team doesn’t need to spend time manually onboarding contractors and issuing them corporate accounts. And logging into Udaan’s tools, whether they’re SaaS apps or private applications, looks and feels the same every time, for every user.

How to Build a Highly Productive Remote Team (or Team of Contractors) with Cloudflare for Teams

“VPNs are frustrating and lead to countless wasted cycles for employees and the IT staff supporting them,” said Amod Malviya, Cofounder and CTO, Udaan.  Furthermore, conventional VPNs can lull people into a false sense of security. With Cloudflare Access, we have a far more reliable, intuitive, secure solution that operates on a per user, per access basis. I think of it as Authentication 2.0 — even 3.0”

Cloudflare Access can help speed up remote teams by replacing VPNs with Cloudflare’s network. Instead of placing internal tools on a private network, teams can deploy them in any environment, including hybrid or multi-cloud models, and secure them consistently with Cloudflare’s network.  Remote teams get work done faster without having to deal with a VPN client, and IT teams spend less time troubleshooting their VPN issues.

Step two: make tools easier to find

High performing remote teams get new employees started fast. That starts with day-one access to the right tools. If you’re like me and you recently joined a new organization, you know how hard it can be to find the right applications you need to do your job. I am drowning in an ocean of new productivity tools.

App launchpads were designed to be a life-raft in the tools ocean. They make discovering apps easier by bringing every application a user can access into one easy, graphical dashboard. But they’re hard for IT teams to customize for different types of users with different permission levels (intern/contractor/full-time), and often not comprehensive of every kind of app (internal/SaaS).

Cloudflare Access’ App Launch is a dashboard for all the applications protected by Access. Once enabled, end users can login and connect to every app behind Access with a single click.  IT teams that are setting up contractors can send contractors a custom launchpad of everything they need to access on day one.

How to Build a Highly Productive Remote Team (or Team of Contractors) with Cloudflare for Teams

When administrators secure an application with Access, any request to the hostname of that application stops at Cloudflare’s network first. Once there, Cloudflare Access checks the request against the list of users who have permission to reach the application.

To check identity, Access relies on the identity provider that the team already uses. Access integrates with providers like OneLogin, Okta, AzureAD, G Suite and others to determine who a user is. If the user has not logged in yet, Access will prompt them to do so at the identity provider configured.

Step 3: fast-track your contractors

Modern remote teams are made up of whatever combination of people can get online and get the work done. That means many different kinds of users are working together in the same tools –full-time employees, contractors, freelancers, vendors and partners.

IT models predicated on full-time workforces imagined identifying users as a straight line process, where users could be identified and validated against one source of truth – the corporate directory. The old model breaks down when users join organizations temporarily to work on isolated projects, and you need to figure out how to authenticate them based on their organizational identity, not yours.

In response, many organizations deploy VPNs to temporary users, scotch-tape together federations between multiple SSO providers, or even have administrators spend hours issuing contracted users new corporate identities to complete one-off projects.

Meanwhile, contractors waste valuable cycles getting set up with the tools they need, and feel like second-class citizens in the IT hierarchy.

How to Build a Highly Productive Remote Team (or Team of Contractors) with Cloudflare for Teams

Cloudflare Access prevents contractor onboarding slowdown by simultaneously integrating with multiple identity providers, including popular services like Gmail or GitHub that do not require corporate subscriptions.

External users login with these accounts and still benefit from the same ease-of-use available to internal employees. Meanwhile, administrators avoid the burden in legacy deployments that require onboarding and offboarding new accounts for each project.

How to get started – at no cost

Cloudflare for Teams lets your team use all the same features to stay productive from anywhere in the world.

Effective until September 1, 2020, we’re making Cloudflare for Teams products free to small businesses. We’re doing this to help ensure that small businesses that implement work from home policies in order to combat the spread of the Coronavirus (COVID-19) can ensure business continuity.

You can learn more and apply at cloudflare.com/smallbusiness now.

How Cloudflare keeps employees productive from any location

Post Syndicated from Sam Rhea original https://blog.cloudflare.com/how-cloudflare-keeps-employees-productive-from-any-location/

How Cloudflare keeps employees productive from any location

Cloudflare employs more than 1,200 people in 13 different offices and maintains a network that operates in 200 cities. To do that, we used to suffer through a traditional corporate VPN that backhauled traffic through a physical VPN appliance. It was, frankly, horrible to work with as a user or IT person.

With today’s mix of on-prem, public cloud and SaaS and a workforce that needs to work from anywhere, be it a coffee shop or home, that model is no longer sustainable. As we grew in headcount, we were spending too much time resolving VPN helpdesk tickets. As offices around the world opened, we could not ask our workforce to sit as every connection had to go back through a central location.

We also had to be ready to scale. Some organizations are currently scrambling to load test their own VPN in the event that their entire workforce needs to work remotely during the COVID-19 outbreak. We could not let a single physical appliance constrain our ability to deliver 26M Internet properties to audiences around the world.

To run a network like Cloudflare, we needed to use Cloudflare’s network to stay fast and secure.

We built Cloudflare Access, part of Cloudflare for Teams, as an internal project several years ago to start replacing our VPN with a faster, safer, alternative that made internal applications, no matter where they live ,seamless for our users.

To address the scale challenge, we built Cloudflare Access to run on Workers, Cloudflare’s serverless platform. Each data center in the Cloudflare network becomes a comprehensive identity proxy node, giving us the scale to stay productive from any location – and to do it for our customers as well.

Over the last two years, we’ve continued to expand its feature set by prioritizing the use cases we had to address to remove our reliance on a VPN. We’re excited to help customers stay online and productive with the same tools and services we use to run Cloudflare.

How does Cloudflare Access work?

Cloudflare Access is one-half of Cloudflare for Teams, a security platform that runs on Cloudflare’s network and focuses on keeping users, devices, and data safe without compromising experience or  performance. We built Cloudflare Access to solve our own headaches with private networks as we grew from a team concentrated in a single office to a globally distributed organization.

How Cloudflare keeps employees productive from any location

Cloudflare Access replaces corporate VPNs with Cloudflare’s network. Instead of placing internal tools on a private network, teams deploy them in any environment, including hybrid or multi-cloud models, and secure them consistently with Cloudflare’s network.

Administrators build rules to decide who should be able to reach the tools protected by Access. In turn, when users need to connect to those tools, they are prompted to authenticate with their team’s identity provider. Cloudflare Access checks their login against the list of allowed users and, if permitted, allows the request to proceed.

Deploying Access does not require exposing new holes in corporate firewalls. Teams connect their resources through a secure outbound connection, Argo Tunnel, which runs in your infrastructure to connect the applications and machines to Cloudflare. That tunnel makes outbound-only calls to the Cloudflare network and organizations can replace complex firewall rules with just one: disable all inbound connections.

To defend against attackers addressing IPs directly, Argo Tunnel can help secure the interface and force outbound requests through Cloudflare Access. With Argo Tunnel, and firewall rules preventing inbound traffic, no request can reach those IPs without first hitting Cloudflare, where Access can evaluate the request for authentication.

Administrators then build rules to decide who should authenticate to and reach the tools protected by Access. Whether those resources are virtual machines powering business operations or internal web applications, like Jira or iManage, when a user needs to connect, they pass through Cloudflare first.

When users need to connect to the tools behind Access, they are prompted to authenticate with their team’s SSO and, if valid, instantly connected to the application without being slowed down. Internally managed apps suddenly feel like SaaS products, and the login experience is seamless and familiar.

Behind the scenes, every request made to those internal tools hits Cloudflare first where we enforce identity-based policies. Access evaluates and logs every request to those apps for identity, giving administrators more visibility and security than a traditional VPN.

Our team members SSO into the Atlassian suite with one-click

We rely on a set of productivity tools built by Atlassian, including Jira and Confluence. We secure them with Cloudflare Access.

In the past, when our team members wanted to reach those applications, they first logged into the VPN with a separate set of credentials unique to their VPN client. They navigated to one of the applications, and then broke out a second set of credentials, specific to the Atlassian suite, to reach Jira or Wiki.

All of this was clunky, reliant on the VPN, and not integrated with our SSO provider.

We decided to put the Atlassian suite behind Access and to build a plugin that could use the login from Access to SSO the end user into the application. Users login with their SSO provider and are instantly redirected into Jira or Wiki or Bitbucket, authorized without managing extra credentials.

We selected Atlassian because nearly every member of our global team uses the product each day. Saving the time to input a second set of credentials, daily, has real impact. Additionally, removing the extra step makes reaching these critical tools easier from mobile devices.

When we rolled this out at Cloudflare, team members had one fewer disruption in their day. We all became accustomed to it. We only received real feedback when we disabled it, briefly, to test a new release. And that response was loud. When we returned momentarily to the old world of multiple login flows, we started to appreciate just how convenient SSO is for a team. The lesson motivated us to make this available, quickly, to our customers.

You can read more about using our Atlassian plugin in your organization, check out the announcement here.

Our engineers can SSH to the resources they need

When we launched Cloudflare Access, we started with browser-based applications. We built a command-line tool to make CLI operations a bit easier, but SSH connections still held us back from killing the VPN altogether.

To solve that challenge, we released support for SSH connections through Cloudflare Access. The feature builds on top of our Argo Tunnel and Argo Smart Routing products.

Argo Smart Routing intelligently routes traffic around Cloudflare’s network, so that our engineers can connect to any data center in our fleet without suffering from Internet congestion. The Argo Tunnel product creates secure, outbound-only, connections from our data centers back to our network.

Team members can then use their SSH client to connect without any special wrappers or alternate commands. Our command-line tool, `cloudflared`, generates a single config file change and our engineers are ready to reach servers around the world.

We started by making our internal code repositories available in this flow. Users login with our SSO and can pull and submit new code without the friction of a private network. We then expanded the deployment to make it possible for our reliability engineering team to connect to the data centers that power Cloudflare’s network without a VPN.

You can read more about using our SSH workflow in your organization in the post here.

We can onboard users rapidly

Cloudflare continues to grow as we add new team members in locations around the world. Keeping a manual list of bookmarks for new users no longer scales.

With Cloudflare Access, we have the pieces that we need to remove that step in the onboarding flow. We released a feature, the Access App Launch, that gives our users a single location from which they can launch any application they should be able to reach with a single click.

For administrators, the App Launch does not require additional configuration for each app. The dashboard reads an organization’s Access policies and only presents apps to the end user that they already have permission to reach. Each team member has a personalized dashboard, out of the box, that they can use to navigate and find the tools they need. No onboarding sessions required.

How Cloudflare keeps employees productive from any location

You can read more about using our App Launch feature in your organization in the post here.

Our security team can add logging everywhere with one-click

When users leave the office, security teams can lose a real layer of a defense-in-depth strategy. Employees do not badge into a front desk when they work remotely.

Cloudflare Access addresses remote work blindspots by adding additional visibility into how applications are used. Access logs every authentication event and, if enabled, every user request made to a resource protected by the platform. Administrators can capture every request and attribute it to a user and IP address without any code changes. Cloudflare Access can help teams meet compliance and regulatory requirements for distributed users without any additional development time.

Our Security team uses this data to audit every request made to internal resources without interrupting any application owners.

You can read more about using our per-request logging in your organization in the post here.

How to get started

Your team can use all of the same features to stay online and secure from any location. To find out more about Cloudflare for Teams, visit teams.cloudflare.com.

If you’re looking to get started with Cloudflare Access today, it’s available on any Cloudflare plan. The first five seats are free. Follow the link here to get started.
Finally, need help in getting it up? A quick start guide is available here.

Pwned Passwords Padding (ft. Lava Lamps and Workers)

Post Syndicated from Junade Ali original https://blog.cloudflare.com/pwned-passwords-padding-ft-lava-lamps-and-workers/

Pwned Passwords Padding (ft. Lava Lamps and Workers)

Pwned Passwords Padding (ft. Lava Lamps and Workers)

The Pwned Passwords API (part of Troy Hunt’s Have I Been Pwned service) is used tens of millions of times each day, to alert users if their credentials are breached in a variety of online services, browser extensions and applications. Using Cloudflare, the API cached around 99% of requests, making it very efficient to run.

From today, we are offering a new security advancement in the Pwned Passwords API – API clients can receive responses padded with random data. This exists to effectively protect from any potential attack vectors which seek to use passive analysis of the size of API responses to identify which anonymised bucket a user is querying. I am hugely grateful to security researcher Matt Weir who I met at PasswordsCon in Stockholm and has explored proof-of-concept analysis of unpadded API responses in Pwned Passwords and has driven some of the work to consider the addition of padded responses.

Now, by passing a header of “Add-Padding” with a value of “true”, Pwned Passwords API users are able to request padded API responses (to a minimum of 800 entries with additional padding of a further 0-200 entries). The padding consists of randomly generated hash suffixes with the usage count field set to “0”.

Clients using this approach should seek to exclude 0-usage hash suffixes from breach validation. Given most implementations of PwnedPasswords simply do string matching on the suffix of a hash, there is no real performance implication of searching through the padding data. The false positive risk if a hash suffix matches a randomly generated response is very low, 619/(235*4) ≈ 4.44 x 10-40. This means you’d need to do about 1040 queries (roughly a query for every two atoms in the universe) to have a 44.4% probability of a collision.

In the future, non-padded responses will be deprecated outright (and all responses will be padded) once clients have had a chance to update.

You can see an example padded request by running the following curl request:

curl -H Add-Padding:true https://api.pwnedpasswords.com/range/FFFFF

API Structure

The high level structure of the Pwned Passwords API is discussed in my original blog post “Validating Leaked Passwords with k-Anonymity”. In essence, a client queries the API for the first 5 hexadecimal characters of a SHA-1 hashed password (amounting to 20 bits), a list of responses is returned with the remaining 35 hexadecimal characters of the hash (140 bits) of every breached password in the dataset. Each hash suffix is appended with a colon (“:”) and the number of times that given hash is found in the breached data.

An example query for FFFFF can be seen below, with the structure represented:

Pwned Passwords Padding (ft. Lava Lamps and Workers)

Without padding, the message length varies given the amount of hash suffixes in the bucket that is queried. It is known that it is possible to fingerprint TLS traffic based on the encrypted message length – fortunately this padding can be inserted in the API responses themselves (in the HTTP content). We can see the difference in download size between two unpadded buckets by running:

$ curl -so /dev/null https://api.pwnedpasswords.com/range/E0812 -w '%{size_download} bytes\n'
17022 bytes
$ curl -so /dev/null https://api.pwnedpasswords.com/range/834EF -w '%{size_download} bytes\n'
25118 bytes

The randomised padded entries can be found with with the “:0” suffix (indicating usage count); for example, below the top three entries are real entries whilst the last 3 represent padding data:


Compression and Randomisation

Cloudflare supports both GZip and Brotli for compression. Compression benefits the PwnedPasswords API as responses are hexadecimal represented in ASCII. That said, compression is somewhat limited given the Avalanche Effect in hashing algorithms (that a small change in an input results in a completely different hash output) – each range searched has dramatically different input passwords and the remaining 35 characters of the SHA-1 hash are similarly different and have no expected similarity between them.

Accordingly, if one were to simply pad messages with null messages (say “000…”), the compression could mean that values padded to the same could be differentiated after compression. Similarly, even without compression, padding messages with the same data could still yield credible attacks.

Accordingly, padding is instead generated with randomly generated entries. In order to not break clients, such padding is generated to effectively look like legitimate hash suffixes. It is possible, however, to identify such messages as randomised padding. As the PwnedPasswords API contains a count field (distinguished by a colon after the remainder of the hex followed by a numerical count), randomised entries can be distinguished with a 0 usage.

Lava Lamps and Workers

I’ve written before about how cache optimisation of Pwned Passwords (including using Cloudflare Workers). Cloudflare Workers has an additional benefit that Workers run before elements are pulled from cache.

This allows for randomised entries to be generated dynamically on a request-to-request basis instead of being cached. This means the resulting randomised padding can differ from request-to-request (thus the amount of entries in a given response and the size of the response).

Cloudflare Workers supports the Web Crypto API, providing for exposure of a cryptographically sound random number generator. This random number generator is used to decide the variable amount of padding added to each response. Whilst a cryptographically secure random number generator is used for determining the amount of padding, as the random hexadecimal padding does not need to be indistinguishable from the real hashes, for computational performance we use the non-cryptographically secure Math.random() to generate the actual content of the padding.

Famously, one of the sources of entropy used in Cloudflare servers is sourced from Lava Lamps. By filming a wall of lava lamps in our San Francisco office (with individual photoreceptors picking up on random noise beyond the movement of the lava), we are able to generate random seed data used in servers (complimented by other sources of entropy along the way). This lava lamp entropy is used alongside the randomness sources on individual servers. This entropy is used to seed cryptographically secure pseudorandom number generators (CSPRNG) algorithms when generating random numbers. Cloudflare Workers runtime uses the v8 engine for JavaScript, with randomness sourced from /dev/urandom on the server itself.

Each response is padded to a minimum of 800 hash suffixes and a randomly generated amount of additional padding (from 200 entries).

This can be seen in two ways, firstly we can see that repeating the same responses to the same endpoint (with the underlying response being cached), yields a randomised amount of lines between 800 and 1000:

$ for run in {1..10}; do curl -s -H Add-Padding:true https://api.pwnedpasswords.com/range/FFFFF | wc -l; done

Secondly, we can see a randomised download size in each response:

$ for run in {1..10}; do curl -so /dev/null -H Add-Padding:true https://api.pwnedpasswords.com/range/FFFFF -w '%{size_download} bytes\n'; done
35572 bytes
37358 bytes
38194 bytes
33596 bytes
32304 bytes
37168 bytes
32532 bytes
37928 bytes
35154 bytes
33178 bytes

Future Work and Conclusion

There has been a considerable amount of research that has complemented the anonymity approach in Pwned Passwords. For example; Google and Stanford have written a paper about their approach implemented in Google Password Checkup, “Protecting accounts from credential stuffing with password breach alerting” [Usenix].

We have done a significant amount of work exploring more advanced protocols for Pwned Passwords, some of this work can be found in a paper we worked on with academics at Cornell University, “Protocols for Checking Compromised Credentials” [ACM or arXiv preprint]. This research offers two new protocols (FSB, frequency smoothing bucketization, and IDB, identifier-based bucketization) to further reduce information leakage in the APIs.

Further work is needed before these protocols gain the production worthiness that we’d like before they are shipped – but, as always, we’ll keep you updated here on our blog.