On 2023-10-04 at 13:00 UTC, Atlassian released details of the zero-day vulnerability described as “Privilege Escalation Vulnerability in Confluence Data Center and Server” (CVE-2023-22515), a zero-day vulnerability impacting Confluence Server and Data Center products.
Cloudflare was warned about the vulnerability before the advisory was published and worked with Atlassian to proactively apply protective WAF rules for all customers. All Cloudflare customers, including Free, received the protection enabled by default. On 2023-10-03 14:00 UTC Cloudflare WAF team released the following managed rules to protect against the first variant of the vulnerability observed in real traffic.
When CVE-2023-22515 is exploited, an attacker could access public Confluence Data Center and Server instances to create unauthorized Confluence administrator accounts to access the instance. According to the advisory the vulnerability is assessed by Atlassian as critical. At the moment of writing a CVSS score is not yet known. More information can be found in the security advisory, including what versions of Confluence Server are affected.
We are constantly researching ways to improve our products. For the Web Application Firewall (WAF), the goal is simple: keep customer web applications safe by building the best solution available on the market.
In this blog post we talk about our approach and ongoing research into detecting novel web attack vectors in our WAF before they are seen by a security researcher. If you are interested in learning about our secret sauce, read on.
This post is the written form of a presentation first delivered at Black Hat USA 2023.
The value of a WAF
Many companies offer web application firewalls and application security products with a total addressable market forecasted to increase for the foreseeable future.
In this space, vendors, including ourselves, often like to boast the importance of their solution by presenting ever-growing statistics around threats to web applications. Bigger numbers and scarier stats are great ways to justify expensive investments in web security. Taking a few examples from our very own application security report research (see our latest report here):
The numbers above all translate to real value: yes, a large portion of Internet HTTP traffic is malicious, therefore you could mitigate a non-negligible amount of traffic reaching your applications if you deployed a WAF. It is also true that we are seeing a drastic increase in global API traffic, therefore, you should look into the security of your APIs as you are likely serving API traffic you are not aware of. You need a WAF with API protection capabilities. And so on.
There is, however, one statistic often presented that hides a concept more directly tied to the value of a web application firewall:
This brings us to zero-days. The definition of a zero-day may vary depending on who you ask, but is generally understood to be an exploit that is not yet, or has very recently become, widely known with no patch available. High impact zero-days will get assigned a CVE number. These happen relatively frequently and the value can be implied by how often we see exploit attempts in the wild. Yes, you need a WAF to make sure you are protected from zero-day exploits.
But herein hides the real value: how quickly can a WAF mitigate a new zero-day/CVE?
By definition a zero-day is not well known, and a single malicious payload could be the one that compromises your application. From a purist standpoint, if your WAF is not fast at detecting new attack vectors, it is not providing sufficient value.
The faster the mitigation, the better. We refer to this as “time to mitigate”. Any WAF evaluation should focus on this metric.
How fast is fast enough?
24 hours? 6 hours? 30 minutes? Luckily we run one of the world's largest networks, and we can look at some real examples to understand how quickly a WAF really needs to be to protect most environments. I specifically mention “most” here as not everyone is the target of a highly sophisticated attack, and therefore, most companies should seek to be protected at least by the time a zero-day is widely known. Anything better is a plus.
Our first example is Log4Shell (CVE-2021-44228). A high and wide impacting vulnerability that affected Log4J, a popular logging software maintained by the Apache Software Foundation. The vulnerability was disclosed back in December 2021. If you are a security practitioner, you have certainly heard of this exploit.
The proof of concept of this attack was published on GitHub on December 9, 2021, at 15:27 UTC. A tweet followed shortly after. We started observing a substantial amount of attack payloads matching the signatures from about December 10 at 10:00 UTC. That is about ~19 hours after the PoC was published.
Our second example is a little more recent: Atlassian Confluence CVE-2022-26134 from June 2, 2022. In this instance Atlassian published a security advisory pertaining to the vulnerability at 20:00 UTC. We were very fast at deploying mitigations and had rules globally deployed protecting customers at 23:38 UTC, before the four-hour mark.
Although potentially matching payloads were observed before the rules were deployed, these were not confirmed. Exact matches were only observed on 2022-06-03 at 10:30 UTC, over 10 hours after rule deployment. Even in this instance, we provided our observations on our blog.
The list of examples could go on, but the data tells the same story: for most, as long as you have mitigations in place within a few hours, you are likely to be fine.
That, however, is a dangerous statement to make. Cloudflare protects applications that have some of the most stringent security requirements due to the data they hold and the importance of the service they provide. They could be the one application that is first targeted with the zero-day well before it is widely known. Also, we are a WAF vendor and I would not be writing this post if I thought “a few hours” was fast enough.
Zero (time) is the only acceptable time to mitigate!
Signatures are not enough, but are here to stay
All WAFs on the market today will have a signature based component. Signatures are great as they can be built to minimize false positives (FPs), their behavior is predictable and can be improved overtime.
We build and maintain our own signatures provided in the WAF as the Cloudflare Managed Ruleset. This is a set of over 320 signatures (at time of writing) that have been fine-tuned and optimized over the 13 years of Cloudflare’s existence.
Signatures tend to be written in ModSecurity, regex-like syntax or other proprietary language. At Cloudflare, we use wirefilter, a language understood by our global proxy. To use the same example as above, here is what one of our Log4Shell signatures looks like:
Our network, which runs our WAF, also gives us an additional superpower: the ability to test new signatures (or updates to existing ones) on over 64M HTTP/S requests per second at peak. We can tell pretty quickly if a signature is well written or not.
But one of their qualities (low false positive rates), along with the fact that humans have to write them, are the source of our inability to solely rely on signatures to reach zero time to mitigate. Ultimately a signature is limited by the speed at which we can write it, and combined with our goal to keep FPs low, they only match things we know and are 100% sure about. Our WAF security analyst team is, after all, limited by human speed while balancing the effectiveness of the rules.
The good news: signatures are a vital component to reach zero time to mitigate, and will always be needed, so the investment remains vital.
Getting to zero time to mitigation
To reach zero time to mitigate we need to rely on some machine learning algorithms. It turns out that WAFs are a great application for this type of technology especially combined with existing signature based systems. In this post I won’t describe the algorithms themselves (subject for another post) but will provide the high level concepts of the system and the steps of how we built it.
Step 1: create the training set
It is a well known fact in data science that the quality of any classification system, including the latest generative AI systems, is highly dependent on the quality of the training set. The old saying “garbage in, garbage out” resonates well.
And this is where our signatures come into play. As these were always written with a low false positive rate in mind, combined with our horizontal WAF deployment on our network, we essentially have access to millions of true positive examples per second to create what is likely one of the best WAF training sets available today.
We also, due to customer configurations and other tools such as Bot Management, have a pretty clear idea of what true negatives look like. In summary, we have a constant flow of training data. Additionally due to our self-service plans and the globally distributed nature of Cloudflare’s service and customer base, our data tends to be very diverse, removing a number of biases that may otherwise be present.
Simply relying on real traffic data is good, but with a few artificial enhancements the training set can become a lot better, leading to much higher detection efficacy.
In a nutshell we went through the process of generating artificial (but realistic) data to increase the diversity of our data even further by studying statistical distribution of existing real-world data. For example, mutating benign content with random character noise, language specific keywords, generating new benign content and so on.
One restriction that often applies to machine learning based classifiers running to inline traffic, like the Cloudflare proxy, is latency performance. To be useful, we need to be able to compute classification “inline” without affecting the user experience for legitimate end users. We don’t want security to be associated with “slowness”.
This required us to fine tune not only the feature set used by the classification system, but also the underlying tooling, so it was both fast and lightweight. The classifier is built using TensorFlow Lite.
At time of writing, our classification model is able to provide a classification output under 1ms at 50th percentile. We believe we can reach 1ms at 90th percentile with ongoing efforts.
Step 4: deploy on the network
Once the classifier is ready, there is still a large amount of additional work needed to deploy on live production HTTP traffic, especially at our scale. Quite a few additional steps need to be implemented starting from a fully formed live HTTP request and ending with a classification output.
The diagram below is a good summary of each step. First and foremost, starting from the raw HTTP request, we normalize it, so it can easily be parsed and processed, without unintended consequences, by the following steps in the pipeline. Second we extract the relevant features found after experimentation and research, that would be more beneficial for our use case. To date we extract over 6k features. We then run inference on the resulting features (the actual classification) and generate outputs for the various attack types we have trained the model for. To date we classify cross site scripting payloads (XSS), SQL injection payloads (SQLi) and remote code execution payloads (RCE). The final step is to consolidate the output in a single WAF Attack Score.
Step 5: expose output as a simple interface
To make the system usable we decided the output should be in the same format as our Bot Management system output. A single score that ranges from 1 to 99. Lower scores indicate higher probability that the request is malicious, higher scores indicate the request is clean.
There are two main benefits of representing the output within a fixed range. First, using the output to BLOCK traffic becomes very easy. It is sufficient to deploy a WAF rule that blocks all HTTP requests with a score lower than $x, for example a rule that blocks all traffic with a score lower than 10 would look like this:
cf.waf.score < 10 then BLOCK
Secondly, deciding what the threshold should be can be done easily by representing the score distributions on your live traffic in colored “buckets”, and then allowing you to zoom in where relevant to validate the correct classification. For example, the graph below shows an attack that we observed against blog.cloudflare.com when we initially started testing the system. This graph is available to all business and enterprise users.
All that remains, is to actually use the score!
Success in the wild
The classifier has been deployed for just over a year on Cloudflare’s network. The main question stated at the start of this post remains: does it work? Have we been able to detect attacks before we’ve seen them? Have we achieved zero time to mitigate?
To answer this we track classification output for new CVEs that fail to be detected by existing Cloudflare Managed Rules. Of course our rule improvement work is always ongoing, but this gives us an idea on how well the system is performing.
And the answer: YES. For all CVEs or bypasses that rely on syntax similar to existing vulnerabilities, the classifier performs very well, and we have observed several instances of it blocking valid malicious payloads that were not detected by our signatures. All of this, while keeping false positives very low at a threshold of 15 or below. XSS variations, SQLi CVEs, are in most cases, a problem fully solved if the classifier is deployed.
One recent example is a set of Sitecore vulnerabilities that were disclosed in June 2023 listed below:
The CVEs listed above were not detected by Cloudflare Managed Rules, but were correctly detected and classified by our model. Customers that had the score deployed in a rule in June 2023, would have been protected in zero time.
This does not mean there isn’t space for further improvement.
The classification works very well for attack types that are aligned, or somewhat similar to existing attack types. If the payload implements a brand new never seen before syntax, then we still have some work to do. Log4Shell is actually a very good example of this. If another zero-day vulnerability was discovered that leveraged the JNDI Java syntax, we are confident that our customers who have deployed WAF rules using the WAF Attack Score would be safe against it.
We are already working on adding more detection capabilities including web shell detection and open redirects/path traversal.
The perfect feedback loop
I mentioned earlier that our security analyst driven improvements to our Cloudflare Managed Rulesets are not going to stop. Our public changelog is full of activity and there is no sign of slowing down.
There is a good reason for this: the signature based system will remain, and likely eventually be converted to our training set generation tool. But not only that, it also provides an opportunity to speed up improvements by focusing on reviewing malicious traffic that is classified by our machine learning system but not detected by our signatures. The delta between the two systems is now one of the main focuses of attention for our security analyst team. The diagram below visualizes this concept.
It is this delta that is helping our team to further fine tune and optimize the signatures themselves. Both to match malicious traffic that is bypassing the signatures, and to reduce false positives. You can now probably see where this is going as we are starting to build the perfect feedback loop.
Better signatures provide a better training set (data). In turn, we can create a better model. The model will provide us with a more interesting delta, which, once reviewed by humans, will allow us to create better signatures. And start over.
We are now working to automate this entire process with the goal of having humans simply review and click to deploy. This is the leading edge for WAF zero-day mitigation in the industry.
Summary
One of the main value propositions of any web application security product is the ability to detect novel attack vectors before they can cause an issue, allowing internal teams time to patch and remediate the underlying codebase. We call this time to mitigate. The ideal value is zero.
We’ve put a lot of effort and research into a machine learning system that augments our existing signature based system to yield very good classification results of new attack vectors the first time they are seen. The system outputs a score that we call the WAF Attack Score. We have validated that for many CVEs, we are indeed able to correctly classify malicious payloads on the first attempt and provide Sitecore CVEs as an example.
Moving forward, we are now automating a feedback loop that will allow us to both improve our signatures faster, to then subsequently iterate on the model and provide even better detection.
The system is live and available to all our customers in the business or enterprise plan. Log in to the Cloudflare dashboard today to receive instant zero-day mitigation.
Rate Limiting rules are essential in the toolbox of security professionals as they are very effective in managing targeted volumetric attacks, takeover attempts, scraping bots, or API abuse. Over the years we have received a lot of feature requests from users, but two stand out: suggesting rate limiting thresholds and implementing a throttle behavior. Today we released both to Enterprise customers!
When creating a rate limit rule, one of the common questions is “what rate should I put in to block malicious traffic without affecting legitimate users?”. If your traffic is authenticated, API Shield will suggest thresholds based on auth IDs (such a session-id, cookie, or API key). However, when you don’t have authentication headers, you will need to create IP-based rules (like for a ‘/login’ endpoint) and you are left guessing the threshold. From today, we provide analytics tools to determine what rate of requests can be used for your rule.
So far, a rate limit rule could be created with log, challenge, or block action. When ‘block’ is selected, all requests from the same source (for example, IP) were blocked for the timeout period. Sometimes this is not ideal, as you would rather selectively block/allow requests to enforce a maximum rate of requests without an outright temporary ban. When using throttle, a rule lets through enough requests to keep the request rate from individual clients below a customer-defined threshold.
Continue reading to learn more about each feature.
Introducing Rate Limit Analysis in Security Analytics
The Security Analytics view was designed with the intention of offering complete visibility on HTTP traffic while adding an extra layer of security on top. It's proven a great value when it comes to crafting custom rules. Nevertheless, when it comes to creating rate limiting rules, relying solely on Security Analytics can be somewhat challenging.
To create a rate limiting rule you can leverage Security Analytics to determine the filter — what requests are evaluated by the rule (for example, by filtering on mitigated traffic, or selecting other security signals like Bot scores). However, you’ll also need to determine what’s the maximum rate you want to enforce and that depends on the specific application, traffic pattern, time of day, endpoint, etc. What’s the typical rate of legitimate users trying to access the login page at peak time? What’s the rate of requests generated by a botnet with the same JA3 fingerprint scraping prices from an ecommerce site? Until today, you couldn’t answer these questions from the analytics view.
That’s why we made the decision to integrate a rate limit helper into Security Analytics as a new tab called "Rate Limit Analysis," which concentrates on providing a tool to answer rate-related questions.
High level top statistics vs. granular Rate Limit Analysis
In Security Analytics, users can analyze traffic data by creating filters combining what we call top statistics. These statistics reveal the total volume of requests associated with a specific attribute of the HTTP requests. For example, you can filter the traffic from the ASNs that generated more requests in the last 24 hours, or you slice the data to look only at traffic reaching the most popular paths of your application. This tool is handy when creating rules based on traffic analysis.
However, for rate limits, a more detailed approach is required.
The new Rate limit analysis tab now displays data on request rate for traffic matching the selected filter and time period. You can select a rate defined on different time intervals, like one or five minutes, and the attribute of the request used to identify the rate, such as IP address, JA3 fingerprint, or a combination of both as this often improves accuracy. Once the attributes are selected, the chart displays the distribution of request rates for the top 50 unique clients (identified as unique IPs or JA3s) observed during the chosen time interval in descending order.
You can use the slider to determine the impact of a rule with different thresholds. How many clients would have been caught by the rule and rate limited? Can I visually identify abusers with above-average rate vs. the long tail of average users? This information will guide you in assessing what’s the most appropriate rate for the selected filter.
Using Rate Limit Analysis to define rate thresholds
It takes a few minutes to build your rate limit rule now. Let’s apply this to one of the common use cases where we identify /login endpoint and create a rate limit rule based on the IP with a logging action.
Define a scope and rate.
In the HTTP requests tab (the default view), start by selecting a specific time period. If you’re looking for the normal rate distribution you can specify a period with non-peak traffic. Alternatively, you can analyze the rate of offending users by selecting a period when an attack was carried out.
Using the filters in the top statistics, select a specific endpoint (e.g., /login). We can also focus on non-automated/human traffic using the bot score quick filter on the right sidebar or the filter button on top of the chart. In the Rate limiting Analysis tab, you can choose the characteristic (JA3, IP, or both) and duration (1 min, 5 mins, or 1 hour) for your rate limit rule. At this point, moving the dotted line up and down can help you choose an appropriate rate for the rule. JA3 is only available to customers using Bot Management.
Looking at the distribution, we can exclude any IPs or ASNs that might be known to us, to have a better visual on end user traffic. One way to do this is to filter out the outliers right before the long tail begins. A rule with this setting will block the IPs/JA3 with a higher rate of requests.
Validate your rate. You can validate the rate by repeating this process but selecting a portion of traffic where you know there was an attack or traffic peak. The rate you've chosen should block the outliers during the attack and allow traffic during normal times. In addition to that, looking at the sampled logs can be helpful in verifying the fingerprints and filters chosen.
Create a rule. Selecting “Create rate limit rule” will take you to the rate limiting tab in the WAF with your filters pre-populated.
Choose your action and behavior in the rule. Depending on your needs you can choose to log, challenge, or block requests exceeding the selected threshold. It’s often a good idea to first deploy the rule with a log action to validate the threshold and then change the action to block or challenge when you are confident with the result. With every action, you can also choose between two behaviors: fixed action or throttle. Learn more about the difference in the next section.
Introducing the new throttle behavior
Until today, the only available behavior for Rate Limiting has been fixed action, where an action is triggered for a selected time period (also known as timeout). For example, did the IP 192.0.2.23 exceed the rate of 20 requests per minute? Then block (or log) all requests from this IP for, let’s say, 10 minutes.
In some situations, this type of penalty is too severe and risks affecting legitimate traffic. For example, if a device in a corporate network (think about NAT) exceeds the threshold, all devices sharing the same IP will be blocked outright.
With throttling, rate limiting selectively drops requests to maintain the rate within the specified threshold. It’s like a leaky bucket behavior (with the only difference that we do not implement a queuing system). For example, throttling a client to 20 requests per minute means that when a request comes from this client, we look at the last 60 seconds and see if (on average) we have received less than 20 requests. If this is true, the rule won’t perform any action. If the average is already at 20 requests then we will take action on that request. When another request comes in, we will check again. Since some time has passed the average rate might have dropped, making room for more requests.
Throttling can be used with all actions: block, log, or challenge. When creating a rule, you can select the behavior after choosing the action.
When using any challenge action, we recommend using the fixedaction behavior. As a result, when a client exceeds the threshold we will challenge all requests until a challenge is passed. The client will then be able to reach the origin again until the threshold is breached again.
Throttle behavior is available to Enterprise rate limiting plans.
Try it out!
Today we are introducing a new Rate Limiting analytics experience along with the throttle behavior for all Rate Limiting users on Enterprise plans. We will continue to work actively on providing a better experience to save our customers' time. Log in to the dashboard, try out the new experience, and let us know your feedback using the feedback button located on the top right side of the Analytics page or by reaching out to your account team directly.
Cloudflare has a unique vantage point on the Internet. From this position, we are able to see, explore, and identify trends that would otherwise go unnoticed. In this report we are doing just that and sharing our insights into Internet-wide application security trends.
Since the last report, our network is bigger and faster: we are now processing an average of 46 million HTTP requests/second and 63 million at peak. We consistently handle approximately 25 million DNS queries per second. That's around 2.1 trillion DNS queries per day, and 65 trillion queries a month. This is the sum of authoritative and resolver requests served by our infrastructure. Summing up both HTTP and DNS requests, we get to see a lot of malicious traffic. Focusing on HTTP requests only, in Q2 2023 Cloudflare blocked an average of 112 billion cyber threats each day, and this is the data that powers this report.
But as usual, before we dive in, we need to define our terms.
Definitions
Throughout this report, we will refer to the following terms:
Mitigated traffic: any eyeball HTTP* request that had a “terminating” action applied to it by the Cloudflare platform. These include the following actions: BLOCK, CHALLENGE, JS_CHALLENGE and MANAGED_CHALLENGE. This does not include requests that had the following actions applied: LOG, SKIP, ALLOW. In contrast to last year, we now exclude requests that had CONNECTION_CLOSE and FORCE_CONNECTION_CLOSE actions applied by our DDoS mitigation system, as these technically only slow down connection initiation. They also accounted for a relatively small percentage of requests. Additionally, we improved our calculation regarding the CHALLENGE type actions to ensure that only unsolved challenges are counted as mitigated. A detailed description of actions can be found in our developer documentation.
Bot traffic/automated traffic: any HTTP* request identified by Cloudflare’s Bot Management system as being generated by a bot. This includes requests with a bot score between 1 and 29 inclusive. This has not changed from last year’s report.
API traffic: any HTTP* request with a response content type of XML or JSON. Where the response content type is not available, such as for mitigated requests, the equivalent Accept content type (specified by the user agent) is used instead. In this latter case, API traffic won’t be fully accounted for, but it still provides a good representation for the purposes of gaining insights.
Unless otherwise stated, the time frame evaluated in this post is the 3 month period from April 2023 through June 2023 inclusive.
Finally, please note that the data is calculated based only on traffic observed across the Cloudflare network and does not necessarily represent overall HTTP traffic patterns across the Internet.
* When referring to HTTP traffic we mean both HTTP and HTTPS.
Global traffic insights
Mitigated daily traffic stable at 6%, spikes reach 8%
Although daily mitigated HTTP requests decreased by 2 percentage points to 6% on average from 2021 to 2022, days with larger than usual malicious activity can be clearly seen across the network. One clear example is shown in the graph below: towards the end of May 2023, a spike reaching nearly 8% can be seen. This is attributable to large DDoS events and other activity that does not follow standard daily or weekly cycles and is a constant reminder that large malicious events can still have a visible impact at a global level, even at Cloudflare scale.
75% of mitigated HTTP requests were outright BLOCKed. This is a 6 percentage point decrease compared to the previous report. The majority of other requests are mitigated with the various CHALLENGE type actions, with managed challenges leading with ~20% of this subset.
Shields up: customer configured rules now biggest contributor to mitigated traffic
In our previous report, our automated DDoS mitigation system accounted for, on average, more than 50% of mitigated traffic. Over the past two quarters, due to both increased WAF adoption, but most likely organizations better configuring and locking down their applications from unwanted traffic, we’ve seen a new trend emerge, with WAF mitigated traffic surpassing DDoS mitigation. Most of the increase has been driven by WAF Custom Rule BLOCKs rather than our WAF Managed Rules, indicating that these mitigations are generated by customer configured rules for business logic or related purposes. This can be clearly seen in the chart below.
Note that our WAF Managed Rules mitigations (yellow line) are negligible compared to overall WAF mitigated traffic also indicating that customers are adopting positive security models by allowing known good traffic as opposed to blocking only known bad traffic. Having said that, WAF Managed Rules mitigations reached as much as 1.5 billion/day during the quarter.
Our DDoS mitigation is, of course, volumetric and the amount of traffic matching our DDoS layer 7 rules should not be underestimated, especially given that we are observing a number of novel attacks and botnets being spun up across the web. You can read a deep dive on DDoS attack trends in our Q2 DDoS threat report.
Aggregating the source of mitigated traffic, the WAF now accounts for approximately 57% of all mitigations. Tabular format below with other sources for reference.
Source
Percentage %
WAF
57%
DDoS Mitigation
34%
IP Reputation
6%
Access Rules
2%
Other
1%
Application owners are increasingly relying on geo location blocks
Given the increase in mitigated traffic from customer defined WAF rules, we thought it would be interesting to dive one level deeper and better understand what customers are blocking and how they are doing it. We can do this by reviewing rule field usage across our WAF Custom Rules to identify common themes. Of course, the data needs to be interpreted correctly, as not all customers have access to all fields as that varies by contract and plan level, but we can still make some inferences based on field “categories”. By reviewing all ~7M WAF Custom Rules deployed across the network and focusing on main groupings only, we get the following field usage distribution:
Field
Used in percentage % of rules
Geolocation fields
40%
HTTP URI
31%
IP address
21%
Other HTTP fields (excluding URI)
34%
Bot Management fields
11%
IP reputation score
4%
Notably, 40% of all deployed WAF Custom Rules use geolocation-related fields to make decisions on how to treat traffic. This is a common technique used to implement business logic or to exclude geographies from which no traffic is expected and helps reduce attack surface areas. While these are coarse controls which are unlikely to stop a sophisticated attacker, they are still efficient at reducing the attack surface.
Another notable observation is the usage of Bot Management related fields in 11% of WAF Custom Rules. This number has been steadily increasing over time as more customers adopt machine learning-based classification strategies to protect their applications.
Old CVEs are still exploited en masse
Contributing ~32% of WAF Managed Rules mitigated traffic overall, HTTP Anomaly is still the most common attack category blocked by the WAF Managed Rules. SQLi moved up to second position, surpassing Directory Traversal with 12.7% and 9.9% respectively.
If we look at the start of April 2023, we notice the DoS category far exceeding the HTTP Anomaly category. Rules in the DoS category are WAF layer 7 HTTP signatures that are sufficiently specific to match (and block) single requests without looking at cross request behavior and that can be attributed to either specific botnets or payloads that cause denial of service (DoS). Normally, as is the case here, these requests are not part of “distributed” attacks, hence the lack of the first “D” for “distributed” in the category name.
Tabular format for reference (top 10 categories):
Source
Percentage %
HTTP Anomaly
32%
SQLi
13%
Directory Traversal
10%
File Inclusion
9%
DoS
9%
XSS
9%
Software Specific
7%
Broken Authentication
6%
Common Injection
3%
CVE
1%
Zooming in, and filtering on the DoS category only, we find that most of the mitigated traffic is attributable to one rule: 100031 / ce02fd… (old WAF and new WAF rule ID respectively). This rule, with a description of “Microsoft IIS – DoS, Anomaly:Header:Range – CVE:CVE-2015-1635” pertains to a CVE dating back to 2015 that affected a number of Microsoft Windows components resulting in remote code execution*. This is a good reminder that old CVEs, even those dating back more than 8 years, are still actively exploited to compromise machines that may be unpatched and still running vulnerable software.
* Due to rule categorisation, some CVE specific rules are still assigned to a broader category such as DoS in this example. Rules are assigned to a CVE category only when the attack payload does not clearly overlap with another more generic category.
Another interesting observation is the increase in Broken Authentication rule matches starting in June. This increase is also attributable to a single rule deployed across all our customers, including our FREE users: “WordPress – Broken Access Control, File Inclusion”. This rule is blocking attempts to access wp-config.php – the WordPress default configuration file which is normally found in the web server document root directory, but of course should never be accessed directly via HTTP.
On a similar note, CISA/CSA recently published a report highlighting the 2022 Top Routinely Exploited Vulnerabilities. We took this opportunity to explore how each CVE mentioned in CISA’s report was reflected in Cloudflare’s own data. The CISA/CSA discuss 12 vulnerabilities that malicious cyber actors routinely exploited in 2022. However, based on our analysis, two CVEs mentioned in the CISA report are responsible for the vast majority of attack traffic we have seen in the wild: Log4J and Atlassian Confluence Code Injection. Our data clearly suggests a major difference in exploit volume between the top two and the rest of the list. The following chart compares the attack volume (in logarithmic scale) of the top 6 vulnerabilities of the CISA list according to our logs.
Bot traffic insights
Cloudflare’s Bot Management continues to see significant investment as the addition of JavaScript Verified URLs for greater protection against browser-based bots, Detection IDs are now available in Custom Rules for additional configurability, and an improved UI for easier onboarding. For self-serve customers, we’ve added the ability to “Skip” Super Bot Fight Mode rules and support for WordPress Loopback requests, to better integrate with our customers’ applications and give them the protection they need.
Our confidence in the Bot Management classification output remains very high. If we plot the bot scores across the analyzed time frame, we find a very clear distribution, with most requests either being classified as definitely bot (score below 30) or definitely human (score greater than 80), with most requests actually scoring less than 2 or greater than 95. This equates, over the same time period, to 33% of traffic being classified as automated (generated by a bot). Over longer time periods we do see the overall bot traffic percentage stable at 29%, and this reflects the data shown on Cloudflare Radar.
On average, more than 10% of non-verified bot traffic is mitigated
Compared to the last report, non-verified bot HTTP traffic mitigation is currently on a downward trend (down 6 percentage points). However, the Bot Management field usage within WAF Custom Rules is non negligible, standing at 11%. This means that there are more than 700k WAF Custom Rules deployed on Cloudflare that are relying on bot signals to perform some action. The most common field used is cf.client.bot, an alias to cf.bot_management.verified_bot which is powered by our list of verified bots and allows customers to make a distinction between “good” bots and potentially “malicious” non-verified ones.
Enterprise customers have access to the more powerful cf.bot_management.score which provides direct access to the score computed on each request, the same score used to generate the bot score distribution graph in the prior section.
The above data is also validated by looking at what Cloudflare service is mitigating unverified bot traffic. Although our DDoS mitigation system is automatically blocking HTTP traffic across all customers, this only accounts for 13% of non-verified bot mitigations. On the other hand, WAF, and mostly customer defined rules, account for 77% of such mitigations, much higher than mitigations across all traffic (57%) discussed at the start of the report. Note that Bot Management is specifically called out but refers to our “default” one-click rules, which are counted separately from the bot fields used in WAF Custom Rules.
Tabular format for reference:
Source
Percentage %
WAF
77%
DDoS Mitigation
13%
IP reputation
5%
Access Rules
3%
Other
1%
API traffic insights
58% of dynamic (non cacheable) traffic is API related
The growth of overall API traffic observed by Cloudflare is not slowing down. Compared to last quarter, we are now seeing 58% of total dynamic traffic be classified as API related. This is a 3 percentage point increase as compared to Q1.
Our investment in API Gateway is also following a similar growth trend. Over the last quarter we have released several new API security features.
First, we’ve made API Discovery easier to use with a new inbox view. API Discovery inventories your APIs to prevent shadow IT and zombie APIs, and now customers can easily filter to show only new endpoints found by API Discovery. Saving endpoints from API Discovery places them into our Endpoint Management system.
Next, we’ve added a brand new API security feature offered only at Cloudflare: the ability to control API access by client behavior. We call it Sequence Mitigation. Customers can now create positive or negative security models based on the order of API paths accessed by clients. You can now ensure that your application’s users are the only ones accessing your API instead of brute-force attempts that ignore normal application functionality. For example, in a banking application you can now enforce that access to the funds transfer endpoint can only be accessed after a user has also accessed the account balance check endpoint.
We’re excited to continue releasing API security and API management features for the remainder of 2023 and beyond.
65% of global API traffic is generated by browsers
The percentage of API traffic generated by browsers has remained very stable over the past quarter. With this statistic, we are referring to HTTP requests that are not serving HTML based content that will be directly rendered by the browser without some preprocessing, such as those more commonly known as AJAX calls which would normally serve JSON based responses.
HTTP Anomalies are the most common attack vector on API endpoints
Just like last quarter, HTTP Anomalies remain the most common mitigated attack vector on API traffic. SQLi injection attacks, however, are non negligible, contributing approximately 11% towards the total mitigated traffic, closely followed by XSS attacks, at around 9%.
Tabular format for reference (top 5):
Source
Percentage %
HTTP Anomaly
64%
SQLi
11%
XSS
9%
Software Specific
5%
Command Injection
4%
Looking forward
As we move our application security report to a quarterly cadence, we plan to deepen some of the insights and to provide additional data from some of our newer products such as Page Shield, allowing us to look beyond HTTP traffic, and explore the state of third party dependencies online.
Stay tuned and keep an eye on Cloudflare Radar for more frequent application security reports and insights.
The Cybersecurity and Infrastructure Security Agency (CISA) just released a report highlighting the most commonly exploited vulnerabilities of 2022. With our role as a reverse proxy to a large portion of the Internet, Cloudflare is in a unique position to observe how the Common Vulnerabilities and Exposures (CVEs) mentioned by CISA are being exploited on the Internet.
We wanted to share a bit of what we’ve learned.
Based on our analysis, two CVEs mentioned in the CISA report are responsible for the vast majority of attack traffic seen in the wild: Log4J and Atlassian Confluence Code Injection. Although CISA/CSA discuss a larger number of vulnerabilities in the same report, our data clearly suggests a major difference in exploit volume between the top two and the rest of the list.
The top CVEs for 2022
Looking at the volume of requests detected by WAF Managed Rules that were created for the specific CVEs listed in the CISA report, we rank the vulnerabilities in order of prevalence:
Topping the list is Log4J (CVE-2021-44228). This isn’t surprising, as this is likely one of the most high impact exploits we have seen in decades — leading to full remote compromise. The second most exploited vulnerability is the Atlassian Confluence Code Injection (CVE-2022-26134).
When comparing the attack volume for these five groups, we immediately notice that one vulnerability stands out. Log4J is more than an order of magnitude more exploited than the runner up (Atlassian Confluence Code Injection); and all the remaining CVEs are even lower. Although the CISA/CSA report groups all these vulnerabilities together, we think that there are really two groups: one dominant CVE (Log4J), and a secondary group of comparable 0-days. Each of the two groups have similar attack volume.
The chart below, in logarithmic scale, clearly shows the difference in popularity.
CVE-2021-44228: Log4J
The first on our list is the notorious CVE-2021-44228 — better known as the Log4j vulnerability. This flaw caused significant disturbance in the cyber world in 2021, and continues to be exploited extensively.
Cloudflare released new managed rules within hours after the vulnerability was made public. We also released updated detections in the following days (blog). Overall, we released rules in three stages:
The rules we deployed detect the exploit in four categories:
Log4j Headers: Attack pattern in HTTP header
Log4j Body: Attack pattern in HTTP Body
Log4j URI: Attack Pattern in URI
Log4j Body Obfuscation: Obfuscated Attack pattern
We have found that Log4J attacks in HTTP Headers are more common than in HTTP bodies. The graph below shows the persistence of exploit attempts for this vulnerability over time, with clear peaks and growth into July 2023 (time of writing).
Due to the high impact of this vulnerability, to step up and lead the charge for a safer, better Internet, on March 15, 2022 Cloudflare announced that all plans (including Free) would get WAF Managed Rules for high-impact vulnerabilities. These free rules tackle high-impact vulnerabilities such as the Log4J exploit, the Shellshock vulnerability, and various widespread WordPress exploits. Every business or personal website, regardless of size or budget, can protect their digital assets using Cloudflare’s WAF.
A code injection vulnerability that afflicted Atlassian Confluence was the second most exploited CVE in 2022. This exploit posed a threat to entire systems, leaving many businesses at the mercy of attackers. This is an indication of how critical knowledge-based systems have become in managing information within organizations. Attackers are targeting these systems as they recognize how important they are.. In response, the Cloudflare WAF team rolled out two emergency releases to protect its customers:
The graph below displays the number of hits received each day, showing a clear peak followed by a gradual decline as systems were patched and secured.
Both Log4J and Confluence Code Injection show some seasonality, where a higher volume of attacks is carried out between September / November 2022 until March 2023. This likely reflects campaigns that are managed by attackers that are still attempting to exploit this vulnerability (an ongoing campaign is visible towards the end of July 2023).
CVE-2021-34473, CVE-2021-31207, and CVE-2021-34523: Microsoft Exchange SSRF and RCE Vulnerabilities
Three previously unknown bugs were chained together to achieve a Remote Code Execution (RCE) 0-day attack. Given how widely adopted Microsoft Exchange servers are, these exploits posed serious threats to data security and business operations across all industries, geographies and sectors.
Cloudflare WAF published a rule for this vulnerability with the Emergency Release: March 3, 2022 that contained the rule Microsoft Exchange SSRF and RCE vulnerability – CVE:CVE-2022-41040, CVE:CVE-2022-41082.
The trend of these attacks over the past year can be seen in the graph below.
This particular security vulnerability can be exploited where an unauthenticated adversary has network connectivity to the BIG-IP system (the F5 product name of a group of application security and performance solutions). Either via the management interface or self-assigned IP addresses the attacker can execute unrestricted system commands.
Cloudflare did an emergency release to detect this issue (Emergency Release: May 5, 2022) with the rule Command Injection – RCE in BIG-IP – CVE:CVE-2022-1388.
There is a relatively persistent pattern of exploitation without signs of specific campaigns, with the exception of a spike occurring in late June 2023.
CVE-2022-22954: VMware Workspace ONE Access and Identity Manager Server-side Template Injection Remote Code Execution Vulnerability
With this vulnerability, an attacker can remotely trigger a server-side template injection that may result in remote code execution. Successful exploitation allows an unauthenticated attacker with network access to the web interface to execute an arbitrary shell command as the VMware user. Later, this issue was combined with CVE-2022-22960 (which was a Local Privilege Escalation Vulnerability (LPE) issue). In combination, these two vulnerabilities allowed remote attackers to execute commands with root privileges.
Cloudflare WAF published a rule for this vulnerability: Release: May 5, 2022. Exploit attempt graph over time shown below.
CVE-2021-26084: Confluence Server Webwork OGNL injection
An OGNL injection vulnerability was found that allows an unauthenticated attacker to execute arbitrary code on a Confluence Server or Data Center instance. Cloudflare WAF performed an emergency release for this vulnerability on September 9, 2022. When compared to the other CVEs discussed in this post, we have not observed a lot of exploits over the past year.
We recommend all server admins to keep their software updated when fixes become available. Cloudflare customers — including those on our free tier — can leverage new rules addressing CVEs and 0-day threats, updated weekly in the Managed Ruleset. High-risk CVEs may even prompt emergency releases. In addition to this, Enterprise customers have access to the WAF Attack Score: an AI-powered detection feature that supplements traditional signature-based rules, identifying unknown threats and bypass attempts. With the combined strength of rule-based and AI detection, Cloudflare offers robust defense against known and emerging threats.
Conclusions
Cloudflare’s data is able to augment CISA’s vulnerability report — of note, we see attempts to exploit the top two vulnerabilities that are several orders of magnitude more compared to the remainder of the list. Organizations should focus their software patching efforts based on the list provided. It is, of course, important to note that all software should be patched, and good WAF implementations will ensure additional security and “buy time” for underlying systems to be secured for both existing and future vulnerabilities.
Many of our customers use Amazon Cognito user pools to add authentication, authorization, and user management capabilities to their web and mobile applications. You can enable the built-in advanced security in Amazon Cognito to detect and block the use of credentials that have been compromised elsewhere, and to detect unusual sign-in activity and then prompt users for additional verification or block sign-ins. Additionally, you can associate an AWS WAF web access control list (web ACL) with your user pool to allow or block requests to Amazon Cognito user pools, based on security rules.
In this post, we’ll show how you can use AWS WAF with Amazon Cognito user pools and provide a sample set of rate-based rules and advanced AWS WAF rule groups. We’ll also show you how to test and tune the rules to help protect your user pools from common threats.
Rate-based rules for Amazon Cognito user pool endpoints
The following are endpoints exposed publicly by an Amazon Cognito user pool that you can protect with AWS WAF:
Public API operations — These generate a request to Cognito API actions that are either unauthenticated or authenticated with a session string or access token, but not with AWS credentials.
A good way to protect these endpoints is to deploy rate-based AWS WAF rules. These rules will detect and block requests with high rates that could indicate an attempt to exceed your Amazon Cognito API request rate quotas and that could subsequently impact requests from legitimate users.
When you apply rate limits, it helps to group Amazon Cognito API actions into four action categories. You can set specific rate limits per action category giving you traffic visibility for each category.
User Creation — This category includes operations that create new users in Cognito. Setting a rate limit for this category provides visibility for traffic of these operations and threats such as fake users being created in Cognito, which drives up your Monthly Active User (MAU) costs for Cognito.
Sign-in — This category includes operations to initiate a sign-in operation. Setting a rate limit for this category can provide visibility into the abuse of these operations. This could indicate high frequency, automated attempts to guess user credentials, sometimes referred to as credential stuffing.
Account Recovery — This category includes operations to recover accounts, including “forgot password” flows. Setting a rate limit for this category can provide visibility into the abuse of these operations, malicious activity can include: sending fake reset attempts, which might result in emails and SMS messages being sent to users.
Default — This is a catch-all rate limit that applies to an operation that is not in one of the prior categories. Setting a default rate limit can provide visibility and mitigation from request flooding attacks.
Table 1 below shows selected Hosted UI endpoint paths (the equivalent of individual API actions) and the recommended rate-based rule limit category for each.
Additionally, the rate-based rules we provide in this post include the following:
Two IP sets that represent allow lists for IPv4 and IPv6. You can add IPs that represent your trusted source IP addresses to these IP sets so that other AWS WAF rules don’t apply to requests that originate from these IP addresses.
Two IP sets that represent deny lists for IPv4 and IPv6. Add IPs to these IP sets that you want to block in all cases, regardless of the result of other rules.
An AWS managed IP reputation rule group: The AWS managed IP reputation list rule group contains rules that are based on Amazon internal threat intelligence, to identify IP addresses typically associated with bots or other threats. You can limit requests that match rules in this rule group to a specific rate limit.
Deploy rate-based rules
You can deploy the rate-based rules described in the previous section by using the AWS CloudFormation template that we provide here.
To deploy rate-based rules using the template
(Optional but recommended) If you want to enable AWS WAF logging and resources to analyze request rates, create an Amazon Simple Storage Service (Amazon S3) bucket in the same AWS Region as your Amazon Cognito user pool, with a bucket name starting with the prefix aws-waf-logs-. If you previously created an S3 bucket for AWS WAF logs, you can choose to reuse it, or you can create a new bucket to store AWS WAF logs for Amazon Cognito.
Choose the following Launch Stack button to launch a CloudFormation stack in your account.
Note: The stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution into other AWS Regions, download the solution’s CloudFormation template and deploy it to the selected Region.
This template creates the following resources in your AWS account:
A rule group for the rate-based rules, according to the limits shown in Tables 1 and 2.
Four IP sets for an allow list and deny list for IPv4 and IPv6 addresses.
A web ACL that includes the rule group that is created, IP set based rules, and the AWS managed IP reputation rule group.
(Optional) The template enables AWS WAF logging for the web ACL to an S3 bucket that you specify.
(Optional) The template creates resources to help you analyze AWS WAF logs in S3 to calculate peak request rates that you can use to set rate limits for the rate-based rules.
Set the template parameters as needed. The following table shows the default values for the parameters. We recommend that you deploy the template with the default values and with TestMode set to Yes so that all rules are set to Count. This allows all requests but emits Amazon CloudWatch metrics and AWS WAF log events for each rule that matches. You can then follow the guidance in the next section to analyze the logs and tune the rate limits to match the traffic patterns to your user pool. When you are satisfied with the unique rate limits for each parameter, you can update the stack and set TestMode to No to start blocking requests that exceed the rate limits.
The rate limits for AWS WAF rate-based rules are configured as the number of requests per 5-minute period per unique source IP. The value of the rate limit can be between 100 and 2,000,000,000 (2 billion).
Table 3: Default values for template parameters
Parameter name
Description
Default value
Allowed values
Request rate limits by action category
UserCreationRateLimit
Rate limit applied to User Creation actions
2000
100–2,000,000,000
SignInRateLimit
Rate limit applied to Sign-in actions
4000
100–2,000,000,000
AccountRecoveryRateLimit
Rate limit applied to Account Recovery actions
1000
100–2,000,000,000
IPReputationRateLimit
Rate limit applied to requests that match the AWS Managed IP reputation list
1000
100–2,000,000,000
DefaultRateLimit
Default rate limit applied to actions that are not in any of the prior categories
6000
100–2,000,000,000
Test mode
TestMode
Set to Yes to test rules by overriding rule actions to Count. Set to No to apply the default actions for rules after you’ve tested the impact of these rules.
Yes
Yes or No
AWS WAF logging and rate analysis
EnableWAFLogsAndRateAnalysis
Set to Yes to enable logging for the AWS WAF web ACL to an S3 bucket and create resources for request rate analysis. Set to No to disable AWS WAF logging and skip creating resources for rate analysis. If No, the rest of the parameter values in this section are ignored. If Yes, choose values for the rest of the parameters in this section.
Yes
Yes or No
WAFLogsS3Bucket
The name of an existing S3 bucket where AWS WAF logs are delivered. The bucket name must start with aws-waf-logs- and can end with any suffix. Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
None
Name of an existing S3 bucket that starts with the prefix aws-waf-logs-
DatabaseName
The name of the AWS Glue database to create, which will contain the request rate analysis tables created by this template. (Important: The name cannot contain hyphens.) Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
rate_analysis
WorkgroupName
The name of the Amazon Athena workgroup to create for rate analysis. Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
rate_analysis
WAFLogsTableName
The name of the AWS Glue table for AWS WAF logs. Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
waf_logs
WAFLogsProjectionStartDate
The earliest date to analyze AWS WAF logs, in the format YYYY/MM/DD (example: 2023/02/28). Only used if the parameter EnableWAFLogsAndRateAnalysis is set to Yes.
None
Set this to the current date, in the format YYYY/MM/DD
Wait for the CloudFormation template to be created successfully.
Go to the AWS WAF console and choose the web ACL created by the template. It will have a name ending with CognitoWebACL.
Choose the Associated AWS resources tab, and then choose Add AWS resource.
For Resource type, choose Amazon Cognito user pool, and then select the Amazon Cognito user pools that you want to protect with this web ACL.
Choose Add.
Now that your user pool is being protected by the rate-based rules in the web ACL you created, you can proceed to tune the rate-based rule limits by analyzing AWS WAF logs.
Tune AWS WAF rate-based rule limits
As described in the previous section, the rate-based rules give you the ability to set separate rate limit values for each category of Amazon Cognito API actions.
Although the CloudFormation template has default starting values for these rate limits, it is important that you tune these values to match the traffic patterns for your user pool. To begin the tuning process, deploy the template with default values for all parameters, including Yes for TestMode. This overrides all rule actions to Count, allowing all requests but emitting CloudWatch metrics and AWS WAF log events for each rule that matches.
After you collect AWS WAF logs for a period of time (this period can vary depending on your traffic, from a couple of hours to a couple of days), you can analyze them, as shown in the next section, to get peak request rates to tune the rate limits to match observed traffic patterns for your user pool.
Query AWS WAF logs to calculate peak request rates by request type
You can calculate peak request rates by analyzing information that is present in AWS WAF logs. One way to analyze these is to send AWS WAF logs to S3 and to analyze the logs by using SQL queries in Amazon Athena. If you deploy the template in this post with default values, it creates the resources you need to analyze AWS WAF logs in S3 to calculate peak requests rates by request type.
If you are instead ingesting AWS WAF logs into your security information and event management (SIEM) system or a different analytics environment, you can create equivalent queries by using the query language for your SIEM or analytics environment to get similar results.
To access and edit the queries built by the CloudFormation template for use
Open the Athena console and switch to the Athena workgroup that was created by the template (the default name is rate_analysis).
On the Saved queries tab, choose the query named Peak request rate per 5-minute period by source IP and request category. The following SQL query will be loaded into the edit panel.
-- Gets the top 5 source IPs sending the most requests in a 5-minute period per request category
‐‐ NOTE: change the start and end timestamps to match the duration of interest
SELECT request_category, from_unixtime(time_bin*60*5) AS date_time, client_ip, request_count FROM (
SELECT *, row_number() OVER (PARTITION BY request_category ORDER BY request_count DESC, time_bin DESC) AS row_num FROM (
SELECT
CASE
WHEN ip_reputation_labels.name IN (
'awswaf:managed:aws:amazon-ip-list:AWSManagedIPReputationList',
'awswaf:managed:aws:amazon-ip-list:AWSManagedReconnaissanceList',
'awswaf:managed:aws:amazon-ip-list:AWSManagedIPDDoSList'
) THEN 'IPReputation'
WHEN target.value IN (
'AWSCognitoIdentityProviderService.InitiateAuth',
'AWSCognitoIdentityProviderService.RespondToAuthChallenge'
) THEN 'SignIn'
WHEN target.value IN (
'AWSCognitoIdentityProviderService.ResendConfirmationCode',
'AWSCognitoIdentityProviderService.SignUp',
'AWSCognitoIdentityProviderService.ConfirmSignUp'
) THEN 'UserCreation'
WHEN target.value IN (
'AWSCognitoIdentityProviderService.ForgotPassword',
'AWSCognitoIdentityProviderService.ConfirmForgotPassword'
) THEN 'AccountRecovery'
WHEN httprequest.uri IN (
'/login',
'/oauth2/authorize'
) THEN 'SignIn'
WHEN httprequest.uri IN (
'/signup',
'/confirmUser',
'/resendcode'
) THEN 'UserCreation'
WHEN httprequest.uri IN (
'/forgotPassword',
'/confirmForgotPassword'
) THEN 'AccountRecovery'
ELSE 'Default'
END AS request_category,
httprequest.clientip AS client_ip,
FLOOR("timestamp"/(1000*60*5)) AS time_bin,
COUNT(*) AS request_count
FROM waf_logs
LEFT OUTER JOIN UNNEST(FILTER(httprequest.headers, h -> h.name = 'x-amz-target')) AS t(target) ON TRUE
LEFT OUTER JOIN UNNEST(FILTER(labels, l -> l.name like 'awswaf:managed:aws:amazon-ip-list:%')) AS t(ip_reputation_labels) ON TRUE
WHERE
from_unixtime("timestamp"/1000) BETWEEN TIMESTAMP '2022-01-01 00:00:00' AND TIMESTAMP '2023-01-01 00:00:00'
GROUP BY 1, 2, 3
ORDER BY 1, 4 DESC
)
) WHERE row_num <= 5 ORDER BY request_category ASC, row_num ASC
Scroll down to Line 48 in the Query Editor and edit the timestamps to match the start and end time of the time window of interest.
Run the query to calculate the top 5 peak request rates per 5-minute period by source IP and by action category.
The results show the action category, source IP, time, and count of requests. You can use the request count to tune the rate limits for each action category.
The lowest rate limit you can set for AWS WAF rate-based rules is 100 requests per 5-minute period. If your query results show that the peak request count is less than 100, set the rate limit as 100 or higher.
After you have tuned the rate limits, you can apply the changes to your web ACL by updating the CloudFormation stack.
To update the CloudFormation stack
On the CloudFormation console, choose the stack you created earlier.
Choose Update. For Prepare template, choose Use current template, and then choose Next.
Update the values of the parameters with rate limits to match the tuned values from your analysis.
You can choose to enable blocking of requests by setting TestMode to No. This will set the action to Block for the rate-based rules in the web ACL and start blocking traffic that exceeds the rate limits you have chosen.
Choose Next and then Next again to update the stack.
Now the rate-based rules are updated with your tuned limits, and requests will be blocked if you set TestMode to No.
Protect endpoints with user interaction
Now that we’ve covered the bases with rate-based rules, we’ll show you some more advanced AWS WAF rules that further help protect your user pool. We’ll explore two sample scenarios in detail, and provide AWS WAF rules for each. You can use the rules provided as a guideline to build others that can help with similar use cases.
Rules to verify human activity
The first scenario is protecting endpoints where users have interaction with the page. This will be a browser-based interaction, and a human is expected to be behind the keyboard. This scenario applies to the Hosted UI endpoints such as /login, /signup, and /forgotPassword, where a CAPTCHA can be rendered on the user’s browser for the user to solve. Let’s take the login (sign-in) endpoint as an example, and imagine you want to make sure that only actual human users are attempting to sign in and you want to block bots that might try to guess passwords.
To illustrate how to protect this endpoint with AWS WAF, we’re sharing a sample rule, shown in Figure 1. In this rule, you can take input from prior rules like the Amazon IP reputation list or the Anonymous IP list (which are configured to Count requests and add labels) and combine that with a CAPTCHA action. The logic of the rule says that if the request matches the reputation rules (and has received the corresponding labels) and is going to the /login endpoint, then the AWS WAF action should be to respond with a CAPTCHA challenge. This will present a challenge that increases the confidence that a human is performing the action, and it also adds a custom label so you can efficiently identify and have metrics on how many requests were matched by this rule. The rule is provided in the CloudFormation template and is in JSON format, because it has advanced logic that cannot be displayed by the console. Learn more about labels and CAPTCHA actions in the AWS WAF documentation.
Figure 1: Login sample rule flow
Note that the rate-based rules you created in the previous section are evaluated before the advanced rules. The rate-based rules will block requests to the /login endpoint that exceed the rate limit you have configured, while this advanced rule will match requests that are below the rate limit but match the other conditions in the rule.
Rules for specific activity
The second scenario explores activity on specific application clients within the user pool. You can spot this activity by monitoring the logs provided by AWS WAF, or other traffic logs like Application Load Balancer (ALB) logs. The application client information is provided in the call to the service.
In the Amazon Cognito user pool in this scenario, we have different application clients and they’re constrained by geography. For example, for one of the application clients, requests are expected to come from the United States at or below a certain rate. We can create a rule that combines the rate and geographical criteria to block requests that don’t meet the conditions defined.
The flow of this rule is shown in Figure 2. The logic of the rule will evaluate the application client information provided in the request and the geographic information identified by the service, and apply the selected rate limit. If blocked, the rule will provide a custom response code by using HTTP code 429 Too Many Requests, which can help the sender understand the reason for the block. For requests that you make with the Amazon Cognito API, you could also customize the response body of a request that receives a Block response. Adding a custom response helps provide the sender context and adjust the rate or information that is sent.
Figure 2: AppClientId sample rule flow
AWS WAF can detect geo location with Region accuracy and add specific labels for the location. These can then be used in other rule evaluations. This rule is also provided as a sample in the CloudFormation template.
Advanced protections
To build on the rules we’ve shared so far, you can consider using some of the other intelligent threat mitigation rules that are available as managed rules—namely, bot control for common or targeted bots. These rules offer advanced capabilities to detect bots in sensitive endpoints where automation or non-browser user agents are not expected or allowed. If you receive machine traffic to the endpoint, these rules will result in false positives that would need to be tuned. For more information, see Options for intelligent threat mitigation.
The sample rule flow in Figure 3 shows an example for our Hosted UI, which builds on the first rule we built for specific activity and adds signals coming from the Bot Control common bots managed rule, in this case the non-browser-user-agent label.
Figure 3: Login sample rule with advanced protections
Adding the bot detection label will also add accuracy to the evaluation, because AWS WAF will consider multiple different sources of information when analyzing the request. This can also block attacks that come from a small set of IPs or easily recognizable bots.
We’ve shared this rule in the CloudFormation template sample. The rule requires you to add AWS WAF Bot Control (ABC) before the custom rule evaluation. ABC has additional costs associated with it and should only be used for specific use cases. For more information on ABC and how to enable it, see this blog post.
After adding these protections, we have a complete set of rules for our Hosted UI–specific needs; consider that your traffic and needs might be different. Figure 4 shows you what the rule priority looks like. All rules except the last are included in the provided CloudFormation template. Managed rule evaluations need to have higher priority and be in Count mode; this way, a matching request can get labels that can be evaluated further down the priority list by using the custom rules that were created. For more information, see How labeling works.
Figure 4: Summary of the rules discussed in this post
Conclusion
In this post, we examined the different protections provided by the integration between AWS WAF and Amazon Cognito. This integration makes it simpler for you to view and monitor the activity in the different Amazon Cognito endpoints and APIs, while also adding rate-based rules and IP reputation evaluations. For more specific use cases and advanced protections, we provided sample custom rules that use labels, as well as an advanced rule that uses bot control for common bots. You can use these advanced rules as examples to create similar rules that apply to your use cases.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the re:Post with tag AWS WAF or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
In December 2022 we announced the general availability of the WAF Attack Score. The initial release was for our Enterprise customers, but we always had the belief that this product should be enabled for more users. Today we’re announcing “WAF Attack Score Lite” and “Security Analytics” for our Business plan customers.
Looking back on “What is WAF Attack Score and Security Analytics?”
Vulnerabilities on the Internet appear almost on a daily basis. The CVE (common vulnerabilities and exposures) program has a list with over 197,000 records to track disclosed vulnerabilities.
That makes it really hard for web application owners to harden and update their system regularly, especially when we talk about critical libraries and the exploitation damage that can happen in case of information leak. That’s why web application owners tend to use WAFs (Web Application Firewalls) to protect their online presence.
Most WAFs use signature-based detections, which are rules created based on specific attacks that we know about. The signature-based method is very fast, has a low rate of false positives (these are the requests that are categorized as attack when they are actually legitimate), and is very efficient with most of the attack categories we know. However, they sometimes have a blind spot when a new attack happens, often called zero-day attacks. As soon as a new vulnerability is found, our security analysts take fast action to stop it in a matter of hours and update the WAF Managed Rules, yet we want to protect our customers during this time as well.
This is the main reason Cloudflare created a complementary feature to the WAF managed rules: a smart machine learning layer to help detect unknown attacks, and protect customers even during the time gap until rules are updated.
Early detection + Powerful mitigation = Safer Internet
The performance of any machine learning drastically depends on the data it was trained on. Our machine learning uses a supervised model that was trained over hundreds of millions of requests generated by WAF Managed Rules, data varies between clean and malicious, some were blended with fuzzy techniques to enable catching similar patterns as covered in our blog “Improving the accuracy of our machine learning WAF”. At the moment, there are three types of attacks our machine learning model is optimized to find: SQL Injection (SQLi), Cross Site Scripting (XSS), and a wide range of Remote Code Execution (RCE) attacks such as shell injection, PHP injection, Apache Struts type compromises, Apache log4j, and similar attacks that result in RCE.
And the reason why we started with them is based on Cloudflare’s Application Security Report. These categories represent more than 24% of the mitigated layer 7 attacks over the last year in our WAF, therefore more prone to exploitations.
In the full Enterprise WAF Attack Score version we offer more granularity on the attack categories and we provide scores for each class where they can be configured freely per domain.
WAF Attack Score Lite Features for Business Plan
WAF Attack Score Lite and the Security Analytics view offer three main functions:
1- Attack detection: This happens through inspecting every incoming HTTP request, bucketing or classifying the requests into 4 types: Attacks, Likely Attacks, Likely Clean and Clean. At the moment there are three types of attacks our machine learning model is optimized to find: SQL Injection (SQLi), Cross Site Scripting (XSS), and a wide range of Remote Code Execution (RCE) attacks.
2- Attack mitigation: The ability to create WAF Custom Rules or WAF Rate Limiting Rules to mitigate requests. We’re exposing a new field cf.waf.score.classthat has preset values: attack, likely_attack, likely_clean and clean. customers can use this field in rules expressions and apply needed actions.
3- Visibility over your entire traffic: Security Analytics is a new dashboard currently in beta. It provides a comprehensive view across all your HTTP traffic, which displays all requests whether they match rules or not. Security Analytics is a great tool for investigating false negatives and hardening your security configurations. Security Events is still available in (Security > Events) and Security Analytics is available in a separate tab (Security > Analytics).
Deployment and configuration
In order to enable WAF Attack Score Lite and Security Analytics, you don’t need to take any action. The HTTP machine learning inspection rollout will start today, and Security Analytics will appear automatically to all Business plan customers by the time the rollout is completed in the upcoming weeks.
It’s worth mentioning that having the detection on and viewing the attack analysis in Security Analytics does not mean you’re blocking traffic. It only offers insights and provides the freedom to create rules and mitigate the desired requests. Creating a rule to block or challenge bad traffic is needed to take effect.
A common use case
Consider an attacker executing an attack using automated web requests to manipulate or disrupt web applications. One of the best ways to identify this type of traffic and mitigate these requests is by combining bot score with WAF Attack Score.
1- Go to the Security Analytics dashboard under Security > Analytics. On the right-hand side the Attack Analysis indicates the attack class. In this case, I can select “Attack” to apply a single filter, or use the quick filters under Insights to propagate multiple filters at once. In addition to the attack class, I can also select the Bot “Automated” filter.
2- After filtering, Security Analytics provides the capability of scrolling down to see the logs and validate the results:
3- Once the selected requests are confirmed, I can select the Create WAF Custom Rules option which will direct me to the Security Events with the pre-assigned filters to deploy a rule. In this case, I want to challenge the requests matched by the rule:
And voila! You have a new rule that challenges traffic matching any automated attack variation.
Next steps
We have been working hard to provide maximum security and visibility for all our customers. This is only one step on this road! We will keep adding more product-focused analytics, and providing additional security against unknown attacks. Try it out, create a rule, and don’t hesitate to contact our sales team if you need the full version of WAF Attack Score.
One year ago we published our first Application Security Report. For Security Week 2023, we are providing updated insights and trends around mitigated traffic, bot and API traffic, and account takeover attacks.
Cloudflare has grown significantly over the last year. In February 2023, Netcraft noted that Cloudflare had become the most commonly used web server vendor within the top million sites at the start of 2023, and continues to grow, reaching a 21.71% market share, up from 19.4% in February 2022.
This continued growth now equates to Cloudflare handling over 45 million HTTP requests/second on average (up from 32 million last year), with more than 61 million HTTP requests/second at peak. DNS queries handled by the network are also growing and stand at approximately 24.6 million queries/second. All of this traffic flow gives us an unprecedented view into Internet trends.
Before we dive in, we need to define our terms.
Definitions
Throughout this report, we will refer to the following terms:
Mitigated traffic: any eyeball HTTP* request that had a “terminating” action applied to it by the Cloudflare platform. These include the following actions: BLOCK, CHALLENGE, JS_CHALLENGE and MANAGED_CHALLENGE. This does not include requests that had the following actions applied: LOG, SKIP, ALLOW. In contrast to last year, we now exclude requests that had CONNECTION_CLOSE and FORCE_CONNECTION_CLOSE actions applied by our DDoS mitigation system, as these technically only slow down connection initiation. They also accounted for a relatively small percentage of requests. Additionally, we improved our calculation regarding the CHALLENGE type actions to ensure that only unsolved challenges are counted as mitigated. A detailed description of actions can be found in our developer documentation.
Bot traffic/automated traffic: any HTTP* request identified by Cloudflare’s Bot Management system as being generated by a bot. This includes requests with a bot score between 1 and 29 inclusive. This has not changed from last year’s report.
API traffic: any HTTP* request with a response content type of XML or JSON. Where the response content type is not available, such as for mitigated requests, the equivalent Accept content type (specified by the user agent) is used instead. In this latter case, API traffic won’t be fully accounted for, but it still provides a good representation for the purposes of gaining insights.
Unless otherwise stated, the time frame evaluated in this post is the 12 month period from March 2022 through February 2023 inclusive.
Finally, please note that the data is calculated based only on traffic observed across the Cloudflare network and does not necessarily represent overall HTTP traffic patterns across the Internet.
*When referring to HTTP traffic we mean both HTTP and HTTPS.
Global traffic insights
6% of daily HTTP requests are mitigated on average
In looking at all HTTP requests proxied by the Cloudflare network, we find that the share of requests that are mitigated has dropped to 6%, down two percentage points compared to last year. Looking at 2023 to date, we see that mitigated request share has fallen even further, to between 4-5%. Large spikes visible in the chart below, such as those seen in June and October, often correlate with large DDoS attacks mitigated by Cloudflare. It is interesting to note that although the percentage of mitigated traffic has decreased over time, the total mitigated request volume has been relatively stable as shown in the second chart below, indicating an increase in overall clean traffic globally rather than an absolute decrease in malicious traffic.
81% of mitigated HTTP requests were outright BLOCKed, with mitigations for the remaining set split across the various CHALLENGE type actions.
DDoS mitigation accounts for more than 50% of all mitigated traffic
Cloudflare provides various security features that customers can configure to keep their applications safe. Unsurprisingly, DDoS mitigation is still the largest contributor to mitigated layer 7 (application layer) HTTP requests. Just last month (February 2023), we reported the largest known mitigated DDoS attack by HTTP requests/second volume (This particular attack is not visible in the graphs above because they are aggregated at a daily level, and the attack only lasted for ~5 minutes).
Compared to last year, however, mitigation by the Cloudflare WAF has grown significantly, and now accounts for nearly 41% of mitigated requests. This can be partially attributed to advances in our WAF technology that enables it to detect and block a larger range of attacks.
Tabular format for reference:
Source
Percentage %
DDoS Mitigation
52%
WAF
41%
IP reputation
4%
Access Rules
2%
Other
1%
Please note that in the table above, in contrast to last year, we are now grouping our products to match our marketing materials and the groupings used in the 2022 Radar Year in Review. This mostly affects our WAF product that comprises the combination of WAF Custom Rules, WAF Rate Limiting Rules, and WAF Managed Rules. In last year’s report, these three features accounted for an aggregate 31% of mitigations.
To understand the growth in WAF mitigated requests over time, we can look one level deeper where it becomes clear that Cloudflare customers are increasingly relying on WAF Custom Rules (historically referred to as Firewall Rules) to mitigate malicious traffic or implement business logic blocks. Observe how the orange line (firewallrules) in the chart below shows a gradual increase over time while the blue line (l7ddos) clearly trends lower.
HTTP Anomaly is the most frequent layer 7 attack vector mitigated by the WAF
Contributing 30% of WAF Managed Rules mitigated traffic overall in March 2023, HTTP Anomaly’s share has decreased by nearly 25 percentage points as compared to the same time last year. Examples of HTTP anomalies include malformed method names, null byte characters in headers, non-standard ports or content length of zero with a POST request. This can be attributed to botnets matching HTTP anomaly signatures slowly changing their traffic patterns.
Removing the HTTP anomaly line from the graph, we can see that in early 2023, the attack vector distribution looks a lot more balanced.
Tabular format for reference (top 10 categories):
Source
Percentage % (last 12 months)
HTTP Anomaly
30%
Directory Traversal
16%
SQLi
14%
File Inclusion
12%
Software Specific
10%
XSS
9%
Broken Authentication
3%
Command Injection
3%
Common Attack
1%
CVE
1%
Of particular note is the orange line spike seen towards the end of February 2023 (CVE category). The spike relates to a sudden increase of two of our WAF Managed Rules:
These two rules are also tagged against CVE-2018-14774, indicating that even relatively old and known vulnerabilities are still often targeted in an effort to exploit potentially unpatched software.
Bot traffic insights
Cloudflare’s Bot Management solution has seen significant investment over the last twelve months. New features such as configurable heuristics, hardened JavaScript detections, automatic machine learning model updates, and Turnstile, Cloudflare’s free CAPTCHA replacement, make our classification of human vs. bot traffic improve daily.
Our confidence in the classification output is very high. If we plot the bot scores across the traffic from the last week of February 2023, we find a very clear distribution, with most requests either being classified as definitely bot (less than 30) or definitely human (greater than 80) with most requests actually scoring less than 2 or greater than 95.
30% of HTTP traffic is automated
Over the last week of February 2023, 30% of Cloudflare HTTP traffic was classified as automated, equivalent to about 13 million HTTP requests/second on the Cloudflare network. This is 8 percentage points less than at the same time last year.
Looking at bot traffic only, we find that only 8% is generated by verified bots, comprising 2% of total traffic. Cloudflare maintains a list of known good (verified) bots to allow customers to easily distinguish between well-behaved bot providers like Google and Facebook and potentially lesser known or unwanted bots. There are currently 171 bots in the list.
16% of non-verified bot HTTP traffic is mitigated
Non-verified bot traffic often includes vulnerability scanners that are constantly looking for exploits on the web, and as a result, nearly one-sixth of this traffic is mitigated because some customers prefer to restrict the insights such tools can potentially gain.
Although verified bots like googlebot and bingbot are generally seen as beneficial and most customers want to allow them, we also see a small percentage (1.5%) of verified bot traffic being mitigated. This is because some site administrators don’t want portions of their site to be crawled, and customers often rely on WAF Custom Rules to enforce this business logic.
The most common action used by customers is to BLOCK these requests (13%), although we do have some customers configuring CHALLENGE actions (3%) to ensure any human false positives can still complete the request if necessary.
On a similar note, it is also interesting that nearly 80% of all mitigated traffic is classified as a bot, as illustrated in the figure below. Some may note that 20% of mitigated traffic being classified as human is still extremely high, but most mitigations of human traffic are generated by WAF Custom Rules, and are frequently due to customers implementing country-level or other related legal blocks on their applications. This is common, for example, in the context of US-based companies blocking access to European users for GDPR compliance reasons.
API traffic insights
55% of dynamic (non cacheable) traffic is API related
Just like our Bot Management solution, we are also investing heavily in tools to protect API endpoints. This is because a lot of HTTP traffic is API related. In fact, if you count only HTTP requests that reach the origin and are not cacheable, up to 55% of traffic is API related, as per the definition stated earlier. This is the same methodology used in last year’s report, and the 55% figure remains unchanged year-over-year.
If we look at cached HTTP requests only (those with a cache status of HIT, UPDATING, REVALIDATED and EXPIRED) we find that, maybe surprisingly, nearly 7% is API related. Modern API endpoint implementations and proxy systems, including our own API Gateway/caching feature set, in fact, allow for very flexible cache logic allowing both caching on custom keys as well as quick cache revalidation (as often as every second) allowing developers to reduce load on back end endpoints.
Including cacheable assets and other requests in the total count, such as redirects, the number goes down, but is still 25% of traffic. In the graph below we provide both perspectives on API traffic:
Yellow line: % of API traffic against all HTTP requests. This will include redirects, cached assets and all other HTTP requests in the total count;
Blue line: % of API traffic against dynamic traffic returning HTTP 200 OK response code only;
65% of global API traffic is generated by browsers
A growing number of web applications nowadays are built “API first”. This means that the initial HTML page load only provides the skeleton layout, and most dynamic components and data are loaded via separate API calls (for example, via AJAX). This is the case for Cloudflare’s own dashboard. This growing implementation paradigm is visible when analyzing the bot scores for API traffic. We can see in the figure below that a large amount of API traffic is generated by user-driven browsers classified as “human” by our system, with nearly two-thirds of it clustered at the high end of the “human” range.
Calculating mitigated API traffic is challenging, as we don’t forward the request to origin servers, and therefore cannot rely on the response content type. Applying the same calculation that was used last year, a little more than 2% of API traffic is mitigated, down from 10.2% last year.
HTTP Anomaly surpasses SQLi as most common attack vector on API endpoints
Compared to last year, HTTP anomalies now surpass SQLi as the most popular attack vector attempted against API endpoints (note the blue line being higher at the start of the graph just when last year’s report was published). Attack vectors on API traffic are not consistent throughout the year and show more variation as compared to global HTTP traffic. For example, note the spike in file inclusion attack attempts in early 2023.
Exploring account takeover attacks
Since March 2021, Cloudflare has provided a leaked credential check feature as part of its WAF. This allows customers to be notified (via an HTTP request header) whenever an authentication request is detected with a username/password pair that is known to be leaked. This tends to be an extremely effective signal at detecting botnets performing account takeover brute force attacks.
Customers also use this signal, on valid username/password pair login attempts, to issue two factor authentication, password reset, or in some cases, increased logging in the event the user is not the legitimate owner of the credentials.
Brute force account takeover attacks are increasing
If we look at the trend of matched requests over the past 12 months, an increase is noticeable starting in the latter half of 2022, indicating growing fraudulent activity against login endpoints. During large brute force attacks we have observed matches against HTTP requests with leaked credentials at a rate higher than 12k per minute.
Our leaked credential check feature has rules matching authentication requests for the following systems:
Drupal
Ghost
Joomla
Magento
Plone
WordPress
Microsoft Exchange
Generic rules matching common authentication endpoint formats
This allows us to compare activity from malicious actors, normally in the form of botnets, attempting to “break into” potentially compromised accounts.
Microsoft Exchange is attacked more than WordPress
Mostly due to its popularity, you might expect WordPress to be the application most at risk and/or observing most brute force account takeover traffic. However, looking at rule matches from the supported systems listed above, we find that after our generic signatures, the Microsoft Exchange signature is the most frequent match.
Most applications experiencing brute force attacks tend to be high value assets, and Exchange accounts being the most likely targeted according to our data reflects this trend.
If we look at leaked credential match traffic by source country, the United States leads by a fair margin. Potentially notable is the absence of China in top contenders given network size. The only exception is Ukraine leading during the first half of 2022 towards the start of the war — the yellow line seen in the figure below.
Looking forward
Given the amount of web traffic carried by Cloudflare, we observe a broad spectrum of attacks. From HTTP anomalies, SQL injection attacks, and cross-site scripting (XSS) to account takeover attempts and malicious bots, the threat landscape is constantly changing. As such, it is critical that any business operating online is investing in visibility, detection, and mitigation technologies so that they can ensure their applications, and more importantly, their end user’s data, remains safe.
We hope that you found the findings in this report interesting, and at the very least, gave you an appreciation on the state of application security on the Internet. There are a lot of bad actors online, and there is no indication that Internet security is getting easier.
We are already planning an update to this report including additional data and insights across our product portfolio. Keep an eye on Cloudflare Radar for more frequent application security reports and insights.
AWS WAF is a web application firewall service that helps you protect your applications from common exploits that could affect your application’s availability and your security posture. One of the most useful ways to detect and respond to malicious web activity is to collect and analyze AWS WAF logs. You can perform this task conveniently by sending your AWS WAF logs to Amazon CloudWatch Logs and visualizing them through an Amazon CloudWatch dashboard.
This blog post builds on the concepts introduced in the blog post Analyzing AWS WAF Logs in Amazon CloudWatch Logs. There we introduced how to natively set up AWS WAF logging to Amazon CloudWatch logs, and discussed the basic options that are available for visualizing and analyzing the data provided in the logs.
The only AWS services that you need to turn on for this solution are Amazon CloudWatch and AWS WAF. The solution assumes that you’ve previously set up AWS WAF log delivery to Amazon CloudWatch Logs. If you have not done so, follow the instructions for AWS WAF logging destinations – CloudWatch Logs.
You will need to provide the following parameters for the CloudFormation template:
CloudWatch log group name for the AWS WAF logs
The AWS Region for the logs
The name of the AWS WAF web access control list (web ACL)
Solution overview
The architecture of the solution is outlined in Figure 1. The solution takes advantage of the native integration available between AWS WAF and CloudWatch, which simplifies the setup and management of this solution.
Figure 1: Solution architecture
In the solution, the logs are sent to CloudWatch (when you enable log delivery). From there, they’re ready to be consumed by all the different service options that CloudWatch offers, including the ones that we’ll use in this solution: CloudWatch Logs Insights and Contributor Insights.
Deploy the solution
Choose the following Launch stack button to launch the CloudFormation stack in your account.
You’ll be redirected to the CloudFormation service in the AWS US East (N. Virginia) Region, which is the default Region to deploy this solution, although this can vary depending on where your web ACL is located. You can change the Region as preferred. The template will spin up multiple cloud resources, such as the following:
CloudWatch Logs Insights queries
CloudWatch Contributor Insights visuals
CloudWatch dashboard
The solution is quickly deployed to your account and is ready to use in less than 30 minutes. You can use the solution when the status of the stack changes to CREATE_COMPLETE.
As a measure to control costs, you can also choose whether to create the Contributor Insights rules and enable them by default. For more information on costs, see the Cost considerations section later in this post.
Explore and validate the dashboard
When the CloudFormation stack is complete, you can choose the Output tab in the CloudFormation console and then choose the dashboard link. This will take you to the CloudWatch service in the AWS Management Console. The dashboard time range presents information for the last hour of activity by default, and can go up to one week, but keep in mind that Contributor Insights has a maximum time range of 24 hours. You can also select a different dashboard refresh interval from 10 seconds up to 15 minutes.
The dashboard provides the following information from CloudWatch.
Rule name
Description
WAF_top_terminating_rules
This rule shows the top rules where the requests are being terminated by AWS WAF. This can help you understand the main cause of blocked requests.
WAF_top_ips
This rule shows the top source IPs for requests. This can help you understand if the traffic and activity that you see is spread across many IPs or concentrated in a small group of IPs.
WAF_top_countries
This rule shows the main source countries for the IPs in the requests. This can help you visualize where the traffic is originating.
WAF_top_user_agents
This rule shows the main user agents that are being used to generate the requests. This will help you isolate problematic devices or identify potential false positives.
WAF_top_uri
This rule shows the main URIs in the requests that are being evaluated. This can help you identify if one specific path is the target of activity.
WAF_top_http
This rule shows the HTTP methods used for the requests examined by AWS WAF. This can help you understand the pattern of behavior of the traffic.
WAF_top_referrer_hosts
This rule shows the main referrer from which requests are being sent. This can help you identify incorrect or suspicious origins of requests based on the known application flow.
WAF_top_rate_rules
This rule shows the main rate rules being applied to traffic. It helps understand volumetric activity identified by AWS WAF.
WAF_top_labels
This rule shows the top labels found in logs. This can help you visualize the main rules that are matching on the requests evaluated by AWS WAF.
The dashboard also provides the following information from the default CloudWatch metrics sent by AWS WAF.
Rule name
Description
AllowedvsBlockedRequests
This metric shows the number of all blocked and allowed requests. This can help you understand the number of requests that AWS WAF is actively blocking.
Bot Requests vs non-Bot requests
This visual shows the number of requests identified as bots versus non-bots (if you’re using AWS WAF Bot Control).
All Requests
This metric shows the number of all requests, separated by bot and non-bot origin. This can help you understand all requests that AWS WAF is evaluating.
CountedRequests
This metric shows the number of all counted requests. This can help you understand the requests that are matching a rule but not being blocked, and aid the decision of a configuration change during the testing phase.
CaptchaRequests
This metric shows requests that go through the CAPTCHA rule.
Figure 2 shows an example of how the CloudWatch dashboard displays the data within this solution. You can rearrange and customize the elements within the dashboard as needed.
You can review each of the queries and rules deployed with this solution. You can also customize these baseline queries and rules to provide more detailed information or to add custom queries and rules to the solution code. For more information on how to build queries and use CloudWatch Logs and Contributor Insights, see the CloudWatch documentation.
Use the dashboard for monitoring
After you’ve set up the dashboard, you can monitor the activity of the sites that are protected by AWS WAF. If suspicious activity is reported, you can use the visuals to understand the traffic in more detail, and drive incident response actions as needed.
Let’s consider an example of how to use your new dashboard and its data to drive security operations decisions. Suppose that you have a website that sells custom clothing at a bargain price. It has a sign-up link to receive offers, and you’re getting reports of unusual activity by the application team. By looking at the metrics for the web ACL that protects the site, you can see the main country for source traffic and the contributing URIs, as shown in Figure 3. You can also see that most of the activity is being detected by rules that you have in place, so you can set the rules to block traffic, or if they are already blocking, you can just monitor the activity.
You can use the same visuals to decide whether an AWS WAF rule with high activity can be changed to autoblock suspicious web traffic without affecting valid customer traffic. By looking at the top terminating rules and cross-referencing information, such as source IPs, user agents, top URIs, and other request identifiers, you can understand the traffic pattern and activity of different applications and endpoints. From here, you can investigate further by using specific queries with CloudWatch Logs Insights.
Operational and security management with CloudWatch Logs Insights
You can use CloudWatch Logs Insights to interactively search and analyze log data in Amazon CloudWatch Logs using advanced queries to effectively investigate operational issues and security incidents.
Examine a bot reported as a false positive
You can use CloudWatch Logs Insights to identify requests that have specific labels to understand where the traffic is originating from based on source IP address and other essential event details. A simple example is investigating requests flagged as potential false positives.
Imagine that you have a reported false positive request that was flagged as a non-browser by AWS WAF Bot Control. You can run the non-browser user agent query that was created by the provided template on CloudWatch Logs Insights, as shown in the following example, and then verify the source IPs for the top hits for this rule group. Or you can look for a specific request that has been flagged as a false positive, in order to review the details and make adjustments as needed.
The non-browser user agent query also allows you confirm whether this request has other rule hits that were in count mode and were non-terminating; you can do this by examining the labels. If there are multiple rules matching the requests, that can be an indicator of suspicious activity.
If you have a CAPTCHA challenge configured on the endpoint, you can also look at CAPTCHA responses. The CaptchaTokenqueryDefinition query provided in this solution uses a variation of the preceding format, and can display the main IPs from which bad tokens are being sent. An example query is shown following, along with the query results in Figure 4. If you have signals from non-browser user agents and CAPTCHA tokens missing, then that is a strong indicator of suspicious activity.
fields@timestamp, httpRequest.clientIp
| filter captchaResponse.failureReason = "TOKEN_MISSING"
| statscount(*) as requestCount by httpRequest.clientIp, httpRequest.country
| sort requestCount desc
| limit10
Figure 4: Main IP addresses and number of counts for CAPTCHA responses
This information can provide an indication of the main source of activity. You can then use other visuals, like top user agents or top referrers, to provide more context to the information and inform further actions, such as adding new rules to the AWS WAF configuration.
You can adapt the queries provided in the sample solution to other use cases by using the fields provided in the left-hand pane of CloudWatch Logs Insights.
Cost considerations
Configuring AWS WAF to send logs to Amazon CloudWatch logs doesn’t have an additional cost. The cost incurred is for the use of the CloudWatch features and services, such as log storage and retention, Contributor Insights rules enabled, Logs Insights queries run, matched log events, and CloudWatch dashboards. For detailed information on the pricing of these features, see the CloudWatch Logs pricing information. You can also get an estimate of potential costs by using the AWS pricing calculator for CloudWatch.
One way to help offset the cost of CloudWatch features and services is to restrict the use of the dashboard and enforce a log retention policy for AWS WAF that makes it cost effective. If you use the queries and monitoring only as-needed, this can also help reduce costs. By limiting the running of queries and the matched log events for the Contributor Insights rules, you can enable the rules only when you need them. AWS WAF also provides the option to filter the logs that are sent when logging is enabled. For more information, see AWS WAF log filtering.
Conclusion
In this post, you learned how to use a pre-built CloudWatch dashboard to monitor AWS WAF activity by using metrics and Contributor Insights rules. The dashboard can help you identify traffic patterns and activity, and you can use the sample Logs Insights queries to explore the log information in more detail and examine false positives and suspicious activity, for rule tuning.
For more information on AWS WAF and the features mentioned in this post, see the AWS WAF documentation.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on AWS WAF re:Post.
Want more AWS Security news? Follow us on Twitter.
In this blog post, you’ll learn how you can use a Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) with other AWS WAF controls as part of a layered approach to provide comprehensive protection against bot traffic. We’ll describe a workflow that tracks the number of incoming requests to a site’s store page. The workflow then limits those requests if they exceed a certain threshold. Requests from IP addresses that exceed the threshold will be presented a CAPTCHA challenge to prove that the requests are being made by a human.
Amazon Web Services (AWS) offers many tools and recommendations that companies can use as they face challenges with bot traffic on their websites. Web applications can be compromised through a variety of vectors, including cross-site scripting, SQL injection, path traversal, local file inclusion, and distributed denial-of-service (DDoS) attacks. AWS WAF offers managed rules that are designed to provide protection against common application vulnerabilities or other unwanted web traffic, without requiring you to write your own rules.
There are some web attacks like web scraping, credential stuffing, and layer 7 DDoS attempts conducted by bots (as well as by humans) that target sensitive areas of your website, such as your store page. A CAPTCHA mitigates undesirable traffic by requiring the visitor to complete challenges before they are allowed to access protected resources. You can implement CAPTCHA to help prevent unwanted activities. Last year, AWS introduced AWS WAF CAPTCHA, which allows customers to set up AWS WAF rules that require CAPTCHA challenges to be completed for common targets such as forms (for example, search forms).
Scenario
Consider an attack where the unauthorized user is attempting to overwhelm a site’s store page by repeatedly sending search requests for different items.
Assume that traffic visits a website that is hosted through Amazon CloudFront and attempts the above behavior on the /store URL. In this scenario, there is a rate-based rule in place that will track the number of requests coming in from each IP. This rate-based rule tracks the rate of requests for each originating IP address and invokes the rule action on IPs with rates that go over the limit. With CAPTCHA implemented as the rule action, excessive attempts to search within a 5-minute window will result in a CAPTCHA challenge being presented to the user. This workflow is shown in Figure 1.
Figure 1: User visits a store page and is evaluated by a rate-based rule
When a user solves a CAPTCHA challenge, AWS automatically generates and encrypts a token and sends it to the client as a cookie. The client requests aren’t challenged again until the token has expired. AWS WAF calculates token expiration by using the immunity time configuration. You can configure the immunity time in a web access control list (web ACL) CAPTCHA configuration and in the configuration for a rule’s action setting. When a user provides an incorrect answer to a CAPTCHA challenge, the challenge informs the user and loads a new puzzle. When the user solves the challenge, the challenge automatically submits the original web request, updated with the CAPTCHA token from the successful puzzle completion.
Walkthrough
This workflow will require an AWS WAF rule within a new or existing rule group or web ACL. The rule will define how web requests are inspected and the action to take.
To create an AWS WAF rate-based rule
Open the AWS WAF console and in the left navigation pane, choose Web ACLs.
Choose an existing web ACL, or choose Create web ACL at the top right to create a new web ACL.
Under Rules, choose Add rules, and then in the drop-down list, choose Add my own rules and rule groups.
For Rule type, choose Rule builder.
In the Rule builder section, for Name, enter your rule name. For Type, choose Rate-based rule.
In the Request rate details section, enter your rate limit (for example, 100). For IP address to use for rate limiting, choose Source IP address, and for Criteria to count requests toward rate limit, choose Only consider requests that match criteria in a rule statement.
For Count only the requests that match the following statement, choose Matches the statement from the drop-down list.
In the Statement section, for Inspect, choose URI path. For Match type , choose Contains string.
For String to match, enter the URI path of your web page (for example, /store).
In the Action section, choose CAPTCHA.
(Optional) For Immunity time, choose Set a custom immunity time for this rule, or keep the default value (300 seconds).
To finish, choose Add rule, and then choose Save to add the rule to your web ACL.
After you add the rule, go to the Rules tab of your web ACL and navigate to your rule. Confirm that the output resembles what is shown in Figure 2. You should have a rate-based rule with a scope-down statement that matches the store URI path you entered earlier, and the action should be set to CAPTCHA.
The following is the JSON for the CAPTCHA rule that you just created. You can use this to validate your configuration. You can also use this JSON in the rule builder while creating the rule.
After you complete this configuration, the rule will be invoked when an IP address unsuccessfully attempts to search the store at a rate that exceeds the threshold. This user will be presented with a CAPTCHA challenge, as shown in Figure 6. If the user is successful, they will be routed back to the store page. Otherwise, they will be served a new puzzle until it is solved.
Figure 3: CAPTCHA challenge presented to a request that exceeded the threshold
Implementing rate-based rules and CAPTCHA also allows you to track IP addresses, limit the number of invalid search attempts, and use the specific IP information available to you within sampled requests and AWS WAF logs to work to prevent that traffic from affecting your resources. Additionally, you have visibility into IPs addresses blocked by rate-based rules so that you can later add these addresses to a block list or create custom logic as needed to mitigate false positives.
Conclusion
In this blog post, you learned how to configure and deploy a CAPTCHA challenge with AWS WAF that checks for web requests that exceed a certain rate threshold and requires the client sending such requests to solve a challenge. Please note the additional charge for enabling CAPTCHA on your web ACL (pricing can be found here). Although CAPTCHA challenges are simple for humans to complete, they should be harder for common bots to complete with any meaningful rate of success. You can use a CAPTCHA challenge when a block action would stop too many legitimate requests, but letting all traffic through would result in unacceptably high levels of unwanted requests, such as from bots.
If you have feedback about this blog post, submit comments in the Comments section below. You can also start a new thread on AWS WAF re:Post to get answers from the community.
Want more AWS Security news? Follow us on Twitter.
Let’s assume you manage a job advert site. On a daily basis job-seekers will be uploading their CVs, cover letters and other supplementary documents to your servers. What if someone tried to upload malware instead?
Today we’re making your security team job easier by providing a file content scanning engine integrated with our Web Application Firewall (WAF), so that malicious files being uploaded by end users get blocked before they reach application servers.
Enter WAF Content Scanning.
If you are an enterprise customer, reach out to your account team to get access.
Making content scanning easy
At Cloudflare, we pride ourselves on making our products very easy to use. WAF Content Scanning was built with that goal in mind. The main requirement to use the Cloudflare WAF is that application traffic is proxying via the Cloudflare network. Once that is done, turning on Content Scanning requires a single API call.
Once on, the WAF will automatically detect any content being uploaded, and when found, scan it and provide the results for you to use when writing WAF Custom Rules or reviewing security analytics dashboards.
The entire process runs inline with your HTTP traffic and requires no change to your application.
As of today, we scan files up to 1 MB. You can easily block files that exceed this size or perform other actions such as log the upload.
To block a malicious file, you could write a simple WAF Custom Rule like the following:
if: (cf.waf.content_scan.has_malicious_obj)
then: BLOCK
In the dashboard the rule would look like this:
Many other use cases can be achieved by leveraging the metadata exposed by the WAF Content Scanning engine. For example, let’s say you only wanted to allow PDF files to be uploaded on a given endpoint. You would achieve this by deploying the following WAF Custom Rule:
if: any(cf.waf.content_scan.obj_types[*] != "application/pdf") and http.request.uri.path eq "/upload"
then: BLOCK
This rule will, for any content file being uploaded to the /upload endpoint, block the HTTP request if at least one file is not a PDF.
More generally, let’s assume your application does not expect content to be uploaded at all. In this case, you can block any upload attempts with:
if: (cf.waf.content_scan.has_obj)
then: BLOCK
Another very common use case is supporting file upload endpoints that accept JSON content. In this instance files are normally embedded into a JSON payload after being base64-encoded. If your application has such an endpoint, you can provide additional metadata to the scanning engine to recognise the traffic by submitting a custom scan expression. Once submitted, files within JSON payloads will be parsed, decoded, and scanned automatically. In the event you want to issue a block action, you can use a JSON custom response type so that your web application front end can easily parse and display error messages:
The full list of fields exposed by the WAF Content Scanning engine can be found on our developer documentation.
The engine
Scanned content objects
A lot of time designing the system was spent defining what should be scanned. Defining this properly helps us ensure that we are not scanning unnecessary content reducing latency impact and CPU usage, and that there are no bypasses making the system complementary to existing WAF functionality.
The complexity stems from the fact that there is no clear definition of a “file” in HTTP terms. That’s why in this blog post and in the system design, we refer to “content object” instead.
At a high level, although this can loosely be defined as a “file”, not all “content objects” may end up being stored in the file system of an application server! Therefore, we need a definition that applies to HTTP. Additional complexity is given by the fact this is a security product, and attackers will always try to abuse HTTP to obfuscate/hide true intentions. So for example, although a Content-Type header may indicate that the request body is a jpeg image, it may actually be a pdf.
With the above in mind, a “content object” as of today, is any request payload that is detected by heuristics (so no referring to the Content-Type header) to be anything that is nottext/html, text/x-shellscript, application/json or text/xml. All other content types are considered a content object.
Detecting via heuristics the content type of an HTTP request body is not enough, as content objects might be found within portions of the HTTP body or encoded following certain rules, such as when using multipart/form-data, which is the most common encoding used when creating standard HTML file input forms.
So when certain payload formats are found, additional parsing applies. As of today the engine will automatically parse and perform content type heuristics on individual components of the payload, when the payload is either encoded using multipart/form-data or multipart/mixed or a JSON string that may have “content objects” embedded in base64 format as defined by the customer
In these cases, we don’t scan the entire payload as a single content object, but we parse it following the relevant standard and apply scanning, if necessary, to the individual portions of the payload. That allows us to support scanning of more than one content object per request, such as an HTML form that has multiple file inputs. We plan to add additional automatic detections in the future on complex payloads moving forward.
In the event we end up finding a malicious match, but we were not able to detect the content type correctly, we will default to reporting a content type of application/octet-stream in the Cloudflare logs/dashboards.
Finally, it is worth noting that we explicitly avoid scanning anything that is plain text (HTML, JSON, XML etc.) as finding attack vectors in these payloads is already covered by the WAF, API Gateway and other web application security solutions already present in Cloudflare’s portfolio.
Local scans
At Cloudflare, we try to leverage our horizontal architecture to build scalable software. This means the underlying scanner is deployed on every server that handles customer HTTP/S traffic. The diagram below describes the setup:
Having each server perform the scanning locally helps ensure latency impact is reduced to a minimum to applicable HTTP requests. The actual scanning engine is the same one used by the Cloudflare Web Gateway, our forward proxy solution that among many other things, helps keep end user devices safe by blocking attempts to download malware.
Consequently, the scanning capabilities provided match those exposed by the Web Gateway AV scanning. The main difference as of today, is the maximum file size currently limited at 1 MB versus 15 MB in Web Gateway. We are working on increasing this to match the Web Gateway in the coming months.
Separating detection from mitigation
A new approach that we are adopting within our application security portfolio is the separation of detection from mitigation. The WAF Content Scanning features follow this approach, as once turned on, it simply enhances all available data and fields with scan results. The benefits here are twofold.
First, this allows us to provide visibility into your application traffic, without you having to deploy any mitigation. This automatically opens up a great use case: discovery. For large enterprise applications security teams may not be aware of which paths or endpoints might be expecting file uploads from the Internet. Using our WAF Content Scanning feature in conjunction with our new Security Analytics they can now filter on request traffic that has a file content object (a file being uploaded) to observe top N paths and hostnames, exposing such endpoints.
Second, as mentioned in the prior section, exposing the intelligence provided by Cloudflare as fields that can be used in our WAF Custom Rules allows us to provide a very flexible platform. As a plus, you don’t need to learn how to use a new feature, as you are likely already familiar with our WAF Custom Rule builder.
This is not a novel idea, and our Bot Management solution was the first to trial it with great success. Any customer who uses Bot Management today gains access to a bot score field that indicates the likelihood of a request coming from automation or a human. Customers use this field to deploy rules that block bots.
To that point, let’s assume you run a job applications site, and you do not wish for bots and crawlers to automatically submit job applications. You can now block file uploads coming from bots!
if: (cf.bot_management.score lt 10 and cf.waf.content_scan.has_obj)
then: BLOCK
And that’s the power we wish to provide at your fingertips.
Next steps
Our WAF Content Scanning is a new feature, and we have several improvements planned, including increasing the max content size scanned, exposing the “rewrite” action, so you can send malicious files to a quarantine server, and exposing better analytics that allow you to explore the data more easily without deploying rules. Stay tuned!
Cloudflare’s WAF helps site owners keep their application safe from attackers. It does this by analyzing traffic with the Cloudflare Managed Rules: handwritten highly specialized rules that detect and stop malicious payloads. But they have a problem: if a rule is not written for a specific attack, it will not detect it.
Today, we are solving this problem by making our WAF smarter and announcing our WAF attack scoring system in general availability.
Customers on our Enterprise Core and Advanced Security bundles will have gradual access to this new feature. All remaining Enterprise customers will gain access over the coming months.
Our WAF attack scoring system, fully complementary to our Cloudflare Managed Rules, classifies all requests using a model trained on observed true positives across the Cloudflare network, allowing you to detect (and block) evasion, bypass and new attack techniques before they are publicly known.
The problem with signature based WAFs
Attackers trying to infiltrate web applications often use known or recently disclosed payloads. The Cloudflare WAF has been built to handle these attacks very well. The Cloudflare Managed Ruleset and the Cloudflare OWASP Managed Ruleset are in fact continuously updated and aimed at protecting web applications against known threats while minimizing false positives.
Things become harder with not publicly known attacks, often referred to as zero-days. While our teams do their best to research new threat vectors and keep the Cloudflare Managed rules updated, human speed becomes a limiting factor. Every time a new vector is found a window of opportunity becomes available for attackers to bypass mitigations.
One well known example was the Log4j RCE attack, where we had to deploy frequent rule updates as new bypasses were discovered by changing the known attack patterns.
The solution: complement signatures with a machine learning scoring model
Our WAF attack scoring system is a machine-learning-powered enhancement to Cloudflare’s WAF. It scores every request with a probability of it being malicious. You can then use this score when implementing WAF Custom Rules to keep your application safe alongside existing Cloudflare Managed Rules.
How do we use machine learning in Cloudflare’s WAF?
In any classification problem, the quality of the training set directly relates to the quality of the classification output, so a lot of effort was put into preparing the training data.
And this is where we used a Cloudflare superpower: we took advantage of Cloudflare’s network visibility by gathering millions of true positive samples generated by our existing signature based WAF and further enhanced it by using techniques covered in “Improving the accuracy of our machine learning WAF”.
This allowed us to train a model that is able to classify, given an HTTP request, the probability that the request contains a malicious payload, but more importantly, to classify when a request is very similar to a known true positive but yet sufficiently different to avoid a managed rule match.
The model runs inline to HTTP traffic and as of today it is optimized for three attack categories: SQL Injection (SQLi), Cross Site Scripting (XSS), and a wide range of Remote Code Execution (RCE) attacks such as shell injection, PHP injection, Apache Struts type compromises, Apache log4j, and similar attacks that result in RCE. We plan to add additional attack types in the future.
The output scores are similar to the Bot Management scores; they range between 1 and 99, where low scores indicate malicious or likely malicious and high scores indicate clean or likely clean HTTP request.
Proving immediate value
As one example of the effectiveness of this new system, on October 13, 2022 CVE-2022-42889 was identified as a “Critical Severity” in Apache Commons Text affecting versions 1.5 through 1.9.
The payload used in the attack, although not immediately blocked by our Cloudflare Managed Rules, was correctly identified (by scoring very low) by our attack scoring system. This allowed us to protect endpoints and identify the attack with zero time to deploy. Of course, we also still updated the Cloudflare Managed Rules to cover the new attack vector, as this allows us to improve our training data further completing our feedback loop.
Know what you don’t know with the new Security Analytics
In addition to the attack scoring system, we have another big announcement: our new Security Analytics! You can read more about this in the official announcement.
Using the new security analytics you can view the attack score distribution regardless of whether the requests were blocked or not allowing you to explore potentially malicious attacks before deploying any rules.
The view won’t only show the WAF Attack Score but also Bot Management and Content Scanning with the ability to mix and match filters as you desire.
How to use the WAF Attack Score and Security Analytics
Let’s go on a tour to spot attacks using the new Security Analytics, and then use the WAF Attack Scores to mitigate them.
Starting with Security Analytics
This new view has the power to show you everything in one place about your traffic. You have tens of filters to mix and match from, top statistics, multiple interactive graph distributions, as well as the log samples to verify your insights. In essence this gives you the ability to preview a number of filters without the need to create WAF Custom Rules in the first place.
Step 1 – access the new Security Analytics: To Access the new Security Analytics in the dashboard, head over to the “Security” tab (Security > Analytics), the previous (Security > Overview) still exists under (Security > Events). You must have access to at least the WAF Attack Score to be able to see the new Security Analytics for the time being.
Step 2 – explore insights: On the new analytics page, you will view the time distribution of your entire traffic, along with many filters on the right side showing distributions for several features including the WAF Attack Score and the Bot Management score, to make it super easy to apply interesting filters we added the “Insights” section.
By choosing the “Attack Analysis” option you see a stacked chart overview of how your traffic looks from the WAF Attack Score perspective.
Step 3 – filter on attack traffic: A good place to start is to look for unmitigated HTTP requests classified as attacks. You can do this by using the attack score sliders on the right-hand side or by selecting any of the insights’ filters which are easy to use one click shortcuts. All charts will be updated automatically according to the selected filters.
Step 4 – verify the attack traffic: This can be done by expanding the sampled logs below the traffic distribution graph, for instance in the below expanded log, you can see a very low RCE score indicating an “Attack”, along with Bot score indicating that the request was “Likely Automated”. Looking at the “Path” field, we can confirm that indeed this is a malicious request. Note that not all fields are currently logged/shown. For example a request might receive a low score due to a malicious payload in the HTTP body which cannot be easily verified in the sample logs today.
Step 5 – create a rule to mitigate the attack traffic: Once you have verified that your filter is not matching false positives, by using a single click on the “Create custom rule” button, you will be directed to the WAF Custom Rules builder with all your filters pre-populated and ready for you to “Deploy”.
Attack scores in Security Event logs
WAF Attack Scores are also available in HTTP logs, and by navigating to (Security > Events) when expanding any of the event log samples:
Note that all the new fields are available in WAF Custom Rules and WAF Rate Limiting Rules. These are documented in our developer docs: cf.waf.score, cf.waf.score.xss, cf.waf.score.sqli, and cf.waf.score.rce.
Although the easiest way to use these fields is by starting from our new Security Analytics dashboard as described above, you can use them as is when building rules and of course mixing with any other available field. The following example deploys a “Log” Action rule for any request with aggregate WAF Attack Score (cf.waf.score) less than 40.
What’s next?
This is just step one of many to make our Cloudflare WAF truly “intelligent”. In addition to rolling this new technology out to more customers, we are already working on providing even better visibility and cover additional attack vectors. For all that and more, stay tuned!
As your organization looks to improve your security posture and practices, early detection and prevention of unauthorized activity quickly becomes one of your main priorities. The behaviors associated with unauthorized activity commonly follow patterns that you can analyze in order to create specific mitigations or feed data into your security monitoring systems.
This post shows you how you can analyze security intelligence from Amazon Cognito advanced security features logs by using AWS native services. You can use the intelligence data provided by the logs to increase your visibility into sign-in and sign-up activities from users, this can help you with monitoring, decision making, and to feed other security services in your organization, such as a web application firewall or security information and event management (SIEM) tool. The data can also enrich available security feeds like fraud detection systems, increasing protection for the workloads that you run on AWS.
Amazon Cognito advanced security features overview
Amazon Cognito provides authentication, authorization, and user management for your web and mobile apps. Your users can sign in to apps directly with a user name and password, or through a third party such as social providers or standard enterprise providers through SAML 2.0/OpenID Connect (OIDC). Amazon Cognito includes additional protections for users that you manage in Amazon Cognito user pools. In particular, Amazon Cognito can add risk-based adaptive authentication and also flag the use of compromised credentials. For more information, see Checking for compromised credentials in the Amazon Cognito Developer Guide.
With adaptive authentication, Amazon Cognito examines each user pool sign-in attempt and generates a risk score for how likely the sign-in request is from an unauthorized user. Amazon Cognito examines a number of factors, including whether the user has used the same device before or has signed in from the same location or IP address. A detected risk is rated as low, medium, or high, and you can determine what actions should be taken at each risk level. You can choose to allow or block the request, require a second authentication factor, or notify the user of the risk by email. Security teams and administrators can also submit feedback on the risk through the API, and users can submit feedback by using a link that is sent to the user’s email. This feedback can improve the risk calculation for future attempts.
To add advanced security features to your existing Amazon Cognito configuration, you can get started by using the steps for Adding advanced security to a user pool in the Amazon Cognito Developer Guide. Note that there is an additional charge for advanced security features, as described on our pricing page. These features are applicable only to native Amazon Cognito users; they aren’t applicable to federated users who sign in with an external provider.
Prerequisites and considerations for this solution
This solution assumes that you are using Amazon Cognito with advanced security features already enabled, the solution does not create a user pool and does not activate the advanced security features on an existing one.
The following list describes some limitations that you should be aware of for this solution:
This solution does not apply to events in the hosted UI, but the same architecture can be adapted for that environment, with some changes to the events processor.
The Amazon Cognito advanced security features support only native users. This solution is not applicable to federated users.
The admin API used in this solution has a default rate limit of 30 requests per second (RPS). If you have a higher rate of authentication attempts, this API call might be throttled and you will need to implement a re-try pattern to confirm that your requests are processed.
Implement the solution
You can deploy the solution automatically by using the following AWS CloudFormation template.
Choose the following Launch Stack button to launch a CloudFormation stack in your account and deploy the solution.
You’ll be redirected to the CloudFormation service in the US East (N. Virginia) Region, which is the default AWS Region, to deploy this solution. You can change the Region to align it to where your Cognito User Pool is running.
This template will create multiple cloud resources including, but not limited to, the following:
An EventBridge rule for sending the Amazon Cognito events
An Amazon SQS queue for sending the events to Lambda
A Lambda function for getting the advanced security information based on the authentication events from CloudTrail
An S3 bucket to store the logs
In the wizard, you’ll be asked to modify or provide one parameter, the existing Cognito user pool ID. You can get this value from the Amazon Cognito console or the Cognito API.
Now, let’s break down each component of the solution in detail.
Sending the authentication events from CloudTrail to Lambda
Cognito advanced security features supports the CloudTrail events: SignUp, ConfirmSignUp, ForgotPassword, ResendConfirmationCode, InitiateAuth and RespondToAuthChallenge. This solution will focus on the sign-in event InitiateAuth as an example.
The solution creates an EventBridge rule that will run when an event is identified in CloudTrail and send the event to an SQS queue. This is useful so that events can be batched up and decoupled for Lambda to process.
The EventBridge rule uses Amazon SQS as a target. The queue is created by the solution and uses the default settings, with the exception that Receive message wait time is set to 20 seconds for long polling. For more information about long polling and how to manually set up an SQS queue, see Consuming messages using long polling in the Amazon SQS Developer Guide.
When the SQS queue receives the messages from EventBridge, these are sent to Lambda for processing. Let’s now focus on understanding how this information is processed by the Lambda function.
Using Lambda to process Amazon Cognito advanced security features information
In order to get the advanced security features evaluation information, you need authentication details that can only be obtained by using the Amazon Cognito identity provider (IdP) API call admin_list_user_auth_events. This API call requires a username to fetch all the authentication event details for a specific user. For security reasons, the username is not logged in CloudTrail and must be obtained by using other event information.
You can use the Lambda function in the sample solution to get this information. It’s composed of three main sequential actions:
The Lambda function gets the sub identifiers from the authentication events recorded by CloudTrail.
Each sub identifier is used to get the user name through an API call to list_users.
3. The sample function retrieves the last five authentication event details from advanced security features for each of these users by using the admin_list_user_auth_events API call. You can modify the function to retrieve a different number of events, or use other criteria such as a timestamp or a specific time period.
Getting the user name information from a CloudTrail event
The following sample authentication event shows a sub identifier in the CloudTrail event information, shown as sub under additionalEventData. With this sub identifier, you can use the ListUsers API call from the Cognito IdP SDK to get the user name details.
After the Lambda function obtains the username, it can then use the Cognito IdP API call admin_list_user_auth_events to get the advanced security feature risk evaluation information for each of the authentication events for that user. Let’s look into the details of that evaluation.
The authentication event information from Amazon Cognito advanced security provides information for each of the categories evaluated and logs the results. Those results can then be used to decide whether the authentication attempt information is useful for the security team to be notified or take action. It’s recommended that you limit the number of events returned, in order to keep performance optimized.
The following sample event shows some of the risk information provided by advanced security features; the options for the response syntax can be found in the CognitoIdentityProviderAPI documentation.
The event information that is returned includes the details that are highlighted in this sample event, such as CompromisedCredentialsDetected, RiskDecision, and RiskLevel, which you can evaluate to decide whether the information can be used to enrich other security monitoring services.
Logging the authentication events information
You can use a Lambda extensions layer to send logs to an S3 bucket. Lambda still sends logs to Amazon CloudWatch Logs, but you can disable this activity by removing the required permissions to CloudWatch on the Lambda execution role. For more details on how to set this up, see Using AWS Lambda extensions to send logs to custom destinations.
Figure 2 shows an example of a log sent by Lambda. It includes execution information that is logged by the extension, as well as the information returned from the authentication evaluation by advanced security features.
Figure 2: Sample log information sent to S3
Note that the detailed authentication information in the Lambda execution log is the same as the preceding sample event. You can further enhance the information provided by the Lambda function by modifying the function code and logging more information during the execution, or by filtering the logs and focusing only on high-risk or compromised login attempts.
After the logs are in the S3 bucket, different applications and tools can use this information to perform automated security actions and configuration updates or provide further visibility. You can query the data from Amazon S3 by using Amazon Athena, feed the data to other services such as Amazon Fraud Detector as described in this post, mine the data by using artificial intelligence/machine learning (AI/ML) managed tools like AWS Lookout for Metrics, or enhance visibility with AWS WAF.
Sample scenarios
You can start to gain insights into the security information provided by this solution in an existing environment by querying and visualizing the log data directly by using CloudWatch Logs Insights. For detailed information about how you can use CloudWatch Logs Insights with Lambda logs, see the blog post Operating Lambda: Using CloudWatch Logs Insights.
The CloudFormation template deploys the CloudWatch Logs Insights queries. You can view the queries for the sample solution in the Amazon CloudWatch console, under Queries.
Choose Select log group(s). In thedrop-drown list, select the Lambda log group.
The query box should show the pre-created query. Choose Run query. You should then see the query results in the bottom-right panel.
(Optional) Choose Add to dashboard to add the widget to a dashboard.
CloudWatch Logs Insights discovers the fields in the auth event log automatically. As shown in Figure 3, you can see the available fields in the right-hand side Discovered fields pane, which includes the Amazon Cognito information in the event.
Figure 3: The fields available in CloudWatch Logs Insights
The first query, shown in the following code snippet, will help you get a view of the number of requests per IP, where the advanced security features have determined the risk decision as Account Takeover and the CompromisedCredentialsDetected as true.
fields@message
| filter@messagelike/INFO/
| filter AuthEvents.0.EventType like'SignIn'
| filter AuthEvents.0.EventRisk.RiskDecision like"AccountTakeover"and
AuthEvents.0.EventRisk.CompromisedCredentialsDetected =! "false"
| statscount(*) as RequestsperIP by AuthEvents.2.EventContextData.IpAddress as IP
| sort desc
You can view the results of the query as a table or graph, as shown in Figure 4.
Figure 4: Sample query results for CompromisedCredentialsDetected
Using the same approach and the convenient access to the fields for query, you can explore another use case, using the following query, to view the number of requests per IP for each type of event (SignIn, SignUp, and forgot password) where the risk level was high.
fields@message
| filter@messagelike/INFO/
| filter AuthEvents.0.EventRisk.RiskLevel like"High"
| statscount(*) as RequestsperIP by AuthEvents.0.EventContextData.IpAddress as IP,
AuthEvents.0.EventType as EventType
| sort desc
Figure 5 shows the results for this EventType query.
Figure 5: The sample results for the EventType query
In the final sample scenario, you can look at event context data and query for the source of the events for which the risk level was high.
fields@message
| filter@messagelike/INFO/
| filter AuthEvents.0.EventRisk.RiskLevel like'High'
| statscount(*) as RequestsperCountry by AuthEvents.0.EventContextData.Country as Country
| sort desc
Figure 6 shows the results for this RiskLevel query.
Figure 6: Sample results for the RiskLevel query
As you can see, there are many ways to mix and match the filters to extract deep insights, depending on your specific needs. You can use these examples as a base to build your own queries.
Conclusion
In this post, you learned how to use security intelligence information provided by Amazon Cognito through its advanced security features to improve your security posture and practices. You used an advanced security solution to retrieve valuable authentication information using CloudTrail logs as a source and a Lambda function to process the events, send this evaluation information in the form of a log to CloudWatch Logs and S3 for use as an additional security feed for wider organizational monitoring and visibility. In a set of sample use cases, you explored how to use CloudWatch Logs Insights to quickly and conveniently access this information, aggregate it, gain deep insights and use it to take action.
Forester has recognised Cloudflare as a Leader in The Forrester Wave™: Web Application Firewalls, Q3 2022 report. The report evaluated 12 Web Application Firewall (WAF) providers on 24 criteria across current offering, strategy and market presence.
You can register for a complimentary copy of the report here. The report helps security and risk professionals select the correct offering for their needs.
We believe this achievement, along with recent WAF developments, reinforces our commitment and continued investment in the Cloudflare Web Application Firewall (WAF), one of our core product offerings.
The WAF, along with our DDoS Mitigation and CDN services, has in fact been an offering since Cloudflare’s founding, and we could not think of a better time to receive this recognition: Birthday Week.
We’d also like to take this opportunity to thank Forrester.
Leading WAF in strategy
Cloudflare received the highest score of all assessed vendors in the strategy category. We also received the highest possible scores in 10 criteria, including:
Innovation
Management UI
Rule creation and modification
Log4Shell response
Incident investigation
Security operations feedback loops
According to Forrester, “Cloudflare Web Application Firewall shines in configuration and rule creation”, “Cloudflare stands out for its active online user community and its associated response time metrics”, and “Cloudflare is a top choice for those prioritizing usability and looking for a unified application security platform.”
Protecting web applications
The core value of any WAF is to keep web applications safe from external attacks by stopping any compromise attempt. Compromises can in fact lead to complete application take over and data exfiltration resulting in financial and reputational damage to the targeted organization.
The Log4Shell criterion in the Forrester Wave report is an excellent example of a real world use case to demonstrate this value.
Log4Shell was a high severity vulnerability discovered in December 2021 that affected the popular Apache Log4J software commonly used by applications to implement logging functionality. The vulnerability, when exploited, allows an attacker to perform remote code execution and consequently take over the target application.
Due to the popularity of this software component, many organizations worldwide were potentially at risk after the immediate public announcement of the vulnerability on December 9, 2021.
We believe that we scored the highest possible score in the Log4Shell criterion due to our fast response to the announcement, by ensuring that all customers using the Cloudflare WAF were protected against the exploit in less than 17 hours globally.
We did this by deploying new managed rules (virtual patching) that were made available to all customers. The rules were deployed with a block action ensuring exploit attempts never reached customer applications.
The Cloudflare WAF ultimately “bought” valuable time for our customers to patch their back end systems before attackers may have been able to find and attempt compromise of vulnerable applications.
You can read about our response and our actions following the Log4Shell announcement in great detail on our blog.
Use the Cloudflare WAF today
Cloudflare WAF keeps organizations safer while they focus on improving their applications and APIs. We integrate leading application security capabilities into a single console to protect applications with our WAF while also securing APIs, stopping DDoS attacks, blocking unwanted bots, and monitoring for 3rd party JavaScript attacks.
When an EC2 instance is suspected to have been compromised, it’s strongly recommended to investigate what happened to the instance. You should look for activities such as:
Open network connections
List of running processes
Processes that contain injected code
Memory-resident infections
Other forensic artifacts
When an EC2 instance is compromised, it’s important to take action as quickly as possible. Before you shut down the EC2 instance, you first need to capture the contents of its volatile memory (RAM) in a memory dump because it contains the instance’s in-progress operations. This is key in determining the root cause of compromise.
In order to capture volatile memory in Linux, you can use a tool like Linux Memory Extractor (LiME). This requires you to have the kernel modules that are specific to the kernel version of the instance for which you want to capture volatile memory. We also recommend that you limit the actions you take on the instance where you are trying to capture the volatile memory in order to minimize the set of artifacts created as part of the capture process, so you need a method to build the tools for capturing volatile memory outside the instance under investigation. After you capture the volatile memory, you can use a tool like Volatility2 to analyze it in a dedicated forensics environment. You can use tools like LiME and Volatility2 on EC2 instances that use x86, x64, and Graviton instance types.
An AWS Identity and Access Management (IAM) role with permissions to deploy the required resources in an AWS account. More details about these permissions follow in the next section.
Solution overview
The EC2 forensic module factory solution consists of the following resources:
Important: The SSM document clones the LiME and Volatility2 GitHub repositories, and these tools use version 2.0 of the GNU General Public License. This SSM document can be updated to include your preferred tools, like fmem or Volatility3, for forensic analysis and capture.
One security group for the EC2 instance that is provisioned during the automation
The solution uses the following VPC endpoints for AWS services:
ec2_endpoint
ec2_msg_endpoint
kms_endpoint
ssm_endpoint
ssm_msg_endpoint
s3_endpoint
Figure 1 shows an overview of the EC2 forensic module factory solution workflow.
Figure 1: Automation to build forensic kernel modules for an Amazon Linux EC2 instance
The EC2 forensic module factory solution workflow in Figure 1 includes the following numbered steps:
A Step Functions workflow is started, which creates a Step Functions task token and invokes the first Lambda function, createEC2module, to create EC2 forensic modules.
A Step Functions task token is used to allow long-running processes to complete and to avoid a Lambda timeout error. The createEC2module function runs for approximately 9 minutes. The run time for the function can vary depending on any customizations to the createEC2module function or the SSM document.
The createEC2module function launches an EC2 instance based on the Amazon Machine Image (AMI) provided.
Once the EC2 instance is running, an SSM document is run, which includes the following steps:
If a specific kernel version is provided in step 1, this kernel version will be installed on the EC2 instance. If no kernel version is provided, the default kernel version on the EC2 instance will be used to create the modules.
If a specific kernel version was selected and installed, the system is rebooted to use this kernel version.
The prerequisite build tools are installed, as well as the LiME and Volatility2 packages.
The LiME kernel module and the Volatility2 profile are built.
The kernel modules for LiME and Volatility2 are put into the S3 bucket.
Upon completion, the Step Functions task token is sent to the Step Functions workflow to invoke the second cleanupEC2module Lambda function to terminate the EC2 instance that was launched in step 2.
Option 1: Deploy the solution with AWS CloudFormation (console)
Sign in to your preferred security tooling account in the AWS Management Console, and choose the following Launch Stack button to open the AWS CloudFormation console pre-loaded with the template for this solution. It will take approximately 10 minutes for the CloudFormation stack to complete.
Option 2: Deploy the solution by using the AWS CDK
To build the app when navigating to the project’s root folder, use the following commands. npm install -g aws-cdk npm install
Run the following commands in your terminal while authenticated in your preferred security tooling AWS account. Be sure to replace <INSERT_AWS_ACCOUNT> with your account number, and replace <INSERT_REGION> with the AWS Region that you want the solution deployed to. cdk bootstrap aws://<INSERT_AWS_ACCOUNT>/<INSERT_REGION> cdk deploy
Run the solution to build forensic kernel objects
Now that you’ve deployed the EC2 forensic module factory solution, you need to invoke the Step Functions workflow in order to create the forensic kernel objects. The following is an example of manually invoking the workflow, to help you understand what actions are being performed. These actions can also be integrated and automated with an EC2 incident response solution.
To manually invoke the workflow to create the forensic kernel objects (console)
At the input prompt, enter the following JSON values. { "AMI_ID": "ami-0022f774911c1d690", "kernelversion":"kernel-4.14.104-95.84.amzn2.x86_64" }
Choose Start execution to start the workflow, as shown in Figure 2.
Figure 2: Step Functions step input example to build custom kernel version using Amazon Linux 2 AMI ID
Workflow progress
You can use the AWS Management Console to follow the progress of the Step Functions workflow. If the workflow is successful, you should see the image when you view the status of the Step Functions workflow, as shown in Figure 3.
Figure 3: Step Functions workflow success example
Note: The Step Functions workflow run time depends on the commands that are being run in the SSM document. The example SSM document included in this post runs for approximately 9 minutes. For information about possible Step Functions errors, see Error handling in Step Functions.
To verify that the artifacts are built
After the Step Functions workflow has successfully completed, go to the S3 bucket that was provisioned in the EC2 forensic module factory solution.
Look for two prefixes in the bucket for LiME and Volatility2, as shown in Figure 4.
Figure 4: S3 bucket prefix for forensic kernel modules
Open each tool name prefix in S3 to find the actual module, such as in the following examples:
Now that the objects have been created, the solution has successfully completed.
Incorporate forensic module builds into an EC2 AMI pipeline
Each organization has specific requirements for allowing application teams to use various EC2 AMIs, and organizations commonly implement an EC2 image pipeline using tools like EC2 Image Builder. EC2 Image Builder uses recipes to install and configure required components in the AMI before application teams can launch EC2 instances in their environment.
The EC2 forensic module factory solution we implemented here makes use of an existing EC2 instance AMI. As mentioned, the solution uses an SSM document to create forensic modules. The logic in the SSM document could be incorporated into your EC2 image pipeline to create the forensic modules and store them in an S3 bucket. S3 also allows additional layers of protection such as enforcing default bucket encryption with an AWS Key Management Service Customer Managed Key (CMK), verifying S3 object integrity with checksum, S3 Object Lock, and restrictive S3 bucket policies. These protections can help you to ensure that your forensic modules have not been modified and are only accessible by authorized entities.
It is important to note that incorporating forensic module creation into an EC2 AMI pipeline will build forensic modules for the specific kernel version used in that AMI. You would still need to employ this EC2 forensic module solution to build a specific forensic module version if it is missing from the S3 bucket where you are creating and storing these forensic modules. The need to do this can arise if the EC2 instance is updated after the initial creation of the AMI.
Incorporate the solution into existing EC2 incident response automation
There are many existing solutions to automate incident response workflow for quarantining and capturing forensic evidence for EC2 instances, but the majority of EC2 incident response automation solutions have a single dependency in common, which is the use of specific forensic modules for the target EC2 instance kernel version. The EC2 forensic module factory solution in this post enables you to be both proactive and reactive when building forensic kernel modules for your EC2 instances.
You can use the EC2 forensic module factory solution in two different ways:
Ad-hoc – In this post, you walked through the solution by running the Step Functions workflow with specific parameters. You can do this to build a repository of kernel modules.
Automated – Alternatively, you can incorporate this solution into existing automation by invoking the Step Functions workflow and passing the AMI ID and kernel version. An example could be the following:
An existing EC2 incident response solution attempts to get the forensic modules to capture the volatile memory from an S3 bucket.
If the specific kernel version is missing in the S3 bucket, the solution updates the automation to StartExecution on the create_ec2_volatile_memory_modules state machine.
The Step Functions workflow builds the specific forensic modules.
After the Step Functions workflow is complete, the EC2 incident response solution restarts its workflow to get the forensic modules to capture the volatile memory on the EC2 instance.
Now that you have the kernel modules, you can both capture the volatile memory by using LiME, and then conduct analysis on the memory dump by using a Volatility2 profile.
To capture and analyze volatile memory on the target EC2 instance (high-level steps)
Copy the LiME module from the S3 bucket holding the module repository to the target EC2 instance.
Capture the volatile memory by using the LiME module.
Stream the volatile memory dump to a S3 bucket.
Launch an EC2 forensic workstation instance, with Volatility2 installed.
Copy the Volatility2 profile from the S3 bucket to the appropriate location.
Copy the volatile memory dump to the EC2 forensic workstation.
Run analysis on the volatile memory with Volatility2 by using the specific Volatility2 profile created for the target EC2 instance.
Automated self-service AWS solution
AWS has also released the Automated Forensics Orchestrator for Amazon EC2 solution that you can use to quickly set up and configure a dedicated forensics orchestration automation solution for your security teams. The Automated Forensics Orchestrator for Amazon EC2 allows you to capture and examine the data from EC2 instances and attached Amazon Elastic Block Store (Amazon EBS) volumes in your AWS environment. This data is collected as forensic evidence for analysis by the security team.
The Automated Forensics Orchestrator for Amazon EC2 creates the foundational components to enable the EC2 forensic module factory solution’s memory forensic acquisition workflow and forensic investigation and reporting service. Both the Automated Forensics Orchestrator for Amazon EC2, and the EC2 forensic module factory, are hosted in different GitHub projects. And you will need to reconcile the expected S3 bucket locations for the associated modules:
Customize the EC2 forensic module factory solution
The SSM document pulls open-source packages to build tools for the specific Linux kernel version. You can update the SSM document to your specific requirements for forensic analysis, including expanding support for other operating systems, versions, and tools.
You can also update the S3 object naming convention and object tagging, to allow external solutions to reference and copy the appropriate kernel module versions to enable the forensic workflow.
Clean up
If you deployed the EC2 forensic module factory solution by using the Launch Stack button in the AWS Management Console or the CloudFormation template ec2_module_factory_cfn, do the following to clean up:
In the AWS CloudFormation console for the account and Region where you deployed the solution, choose the Ec2VolModules stack.
Choose the option to Delete the stack.
If you deployed the solution by using the AWS CDK, run the following command.
cdk destroy
Conclusion
In this blog post, we walked you through the deployment and use of the EC2 forensic module factory solution to use AWS Step Functions, AWS Lambda, AWS Systems Manager, and Amazon EC2 to create specific versions of forensic kernel modules for Amazon Linux EC2 instances.
The solution provides a framework to create the foundational components required in an EC2 incident response automation solution. You can customize the solution to your needs to fit into an existing EC2 automation, or you can deploy this solution in tandem with the Automated Forensics Orchestrator for Amazon EC2.
If you have feedback about this post, submit comments in the Comments section below. If you have any questions about this post, start a thread on re:Post.
Want more AWS Security news? Follow us on Twitter.
Do you manage more than a single domain? If the answer is yes, now you can manage a single WAF configuration for all your enterprise domains.
Cloudflare has been built around the concept of zone, which is broadly equivalent to a domain. Customers can add multiple domains to a Cloudflare account, and every domain has its own independent security configuration. If you deploy a rule to block bots on example.com, you will need to rewrite the same rule on example.org. You’ll then need to visit the dashboard of every zone when you want to update it. This applies to all WAF products including Managed, Firewall and Rate Limiting rules.
If you have just two domains that’s not a big deal. But if you manage hundreds or thousands of domains like most large organizations do. Dealing with individual domains becomes time-consuming, expensive or outright impractical. Of course, you could build automation relying on our API or Terraform. This will work seamlessly but not all organizations have the capabilities to manage this level of complexity. Furthermore, having a Terraform integration doesn’t fully replicate the experience or give the confidence provided by interacting with a well-designed UI.
Following Cloudflare philosophy of making it easy to deploy security products, we are launching Account WAF.
Customers can now have a single WAF deployment for all their enterprise domains.
Welcome to the simpler world of Account WAF
You might wonder why an organization might have thousands of domains, but this is actually very common.
For example, an e-commerce business can have tens of marketing domains for all its brands localized in different countries, they’ll have APIs that power their e-commerce sites and mobile applications, applications integrated with partners, logistics services or payment systems, domains used by employees, and so on and so forth. The structure of these accounts can be very complex.
Now, let’s imagine that you need to deal with the simple use case of deploying Cloudflare Managed ruleset across all your production domains.
Without Account WAF you’d need to track down all the correct domains and visit the WAF page of each one of them, deploy the ruleset and possibly add overrides to select only the attack vectors you are interested in. This is messy and mistakes are easy.
With Account WAF, you can now deploy a managed ruleset just once while providing the list of hostnames where you want it on. With deploying here we refer to writing a filter that defines what requests we should run (or execute) the ruleset on. The filter works like a normal WAF Custom Rule, where you can take advantage of the power of the Wirefilter syntax and use any parameter of the HTTP request, metadata and computed values, such as Bot Score or our new WAF Attack Score. For example, you can run a ruleset only on traffic with a specific User Agent, or only on your API traffic.
You can deploy these rulesets multiple times on your account, so you can have different settings for different groups of domains. For example, you might want to deploy OWASP with different sensitivity levels for your staging domains versus your production domains, or enforce a minimum level of security across all zones (e.g. for legal protection or compliance), before tailoring the security posture of the most sensitive domains. Furthermore, if in the future you are going to add a new domain to your production environment, you can simply add it to the rule filter, and we will start protecting these requests too.
It works for all WAF features
You can follow a similar flow if you want to deploy WAF Custom or Rate Limiting rules. However, in this case, to simplify management of large numbers of rules, we introduced the concept of Custom Rulesets. Like with managed rules, a ruleset is a group of rules, this time they are user defined. Like in the example above, you can deploy a custom ruleset on a user-defined filter to scope on what portion of your traffic you want to run these rules.
For example, consider the situation where you want to create two rules for all your domains: one that blocks traffic from a set of countries and then one rule to only allow requests with a non-malicious WAF Attack Score. You will create a custom ruleset with these two rules and then deploy it across your entire account.
One thing to note is that Account WAF rulesets (Managed, Custom and Rate Limiting) can be deployed on traffic to domains on Enterprise plans. You won’t be able to run rulesets on traffic of Free, Pro or Biz domains. This condition is enforced by the UI when writing a deployment filter.
Finally, you can follow the same flow to deploy custom rulesets that contain rate limiting rules. Custom rulesets are designed to contain either custom or rate limiting rules, at this stage these rules cannot be combined in the same ruleset. Please note that the Rate Limiting section will be available in October.
Who gets it?
Account WAF is an Enterprise only feature. If you are an Enterprise customer on our new Advanced plan, you will get access to the new feature automatically this week. If you are not on our Advanced plan, please reach out to your account team to learn more.
Gartner has recognised Cloudflare as a Leader in the 2022 “Gartner® Magic Quadrant™ for Web Application and API Protection (WAAP)” report that evaluated 11 vendors for their ‘ability to execute’ and ‘completeness of vision’.
You can register for a complimentary copy of the report here.
We believe this achievement highlights our continued commitment and investment in this space as we aim to provide better and more effective security solutions to our users and customers.
Keeping up with application security
With over 36 million HTTP requests per second being processed by the Cloudflare global network we get unprecedented visibility into network patterns and attack vectors. This scale allows us to effectively differentiate clean traffic from malicious, resulting in about 1 in every 10 HTTP requests proxied by Cloudflare being mitigated at the edge by our WAAP portfolio.
Visibility is not enough, and as new use cases and patterns emerge, we invest in research and new product development. For example, API traffic is increasing (55%+ of total traffic) and we don’t expect this trend to slow down. To help customers with these new workloads, our API Gateway builds upon our WAF to provide better visibility and mitigations for well-structured API traffic for which we’ve observed different attack profiles compared to standard web based applications.
We believe our continued investment in application security has helped us gain our position in this space, and we’d like to thank Gartner for the recognition.
Cloudflare WAAP
At Cloudflare, we have built several features that fall under the Web Application and API Protection (WAAP) umbrella.
DDoS protection & mitigation
Our network, which spans more than 275 cities in over 100 countries is the backbone of our platform, and is a core component that allows us to mitigate DDoS attacks of any size.
To help with this, our network is intentionally anycasted and advertises the same IP addresses from all locations, allowing us to “split” incoming traffic into manageable chunks that each location can handle with ease, and this is especially important when mitigating large volumetric Distributed Denial of Service (DDoS) attacks.
The system is designed to require little to no configuration while also being “always-on” ensuring attacks are mitigated instantly. Add to that some very smart software such as our new location aware mitigation, and DDoS attacks become a solved problem.
Our WAF is a core component of our application security and ensures hackers and vulnerability scanners have a hard time trying to find potential vulnerabilities in web applications.
This is very important when zero-day vulnerabilities become publicly available as we’ve seen bad actors attempt to leverage new vectors within hours of them becoming public. Log4J, and even more recently the Confluence CVE, are just two examples where we observed this behavior. That’s why our WAF is also backed by a team of security experts who constantly monitor and develop/improve signatures to ensure we “buy” precious time for our customers to harden and patch their backend systems when necessary. Additionally, and complementary to signatures, our WAF machine learning system classifies each request providing a much wider view in traffic patterns.
Our Bot Management product works in parallel to our WAF and scores every request with the likelihood of it being generated by a bot, allowing you to easily filter unwanted traffic by deploying a WAF Custom Rule, all this backed by powerful analytics. We make this easy by also maintaining a list of verified bots that can be used to further improve a security policy.
In the event you want to block automated traffic, Cloudflare’s managed challenge ensures that only bots receive a hard time without impacting the experience of real users.
API Gateway
API traffic, by definition, is very well-structured relative to standard web pages consumed by browsers. At the same time, APIs tend to be closer abstractions to back end databases and services, resulting in increased attention from malicious actors and often go unnoticed even to internal security teams (shadow APIs).
API Gateway, that can be layered on top of our WAF, helps you both discover API endpoints served by your infrastructure, as well detect potential anomalies in traffic flows that may indicate compromise, both from a volumetric and sequential perspective.
The nature of APIs also allows API Gateway to much more easily provide a positive security model contrary to our WAF: only allow known good traffic and block everything else. Customers can leverage schema protection and mutual TLS authentication (mTLS) to achieve this with ease.
Page Shield
Attacks that leverage the browser environment directly can go unnoticed for some time, as they don’t necessarily require the back end application to be compromised. For example, if any third party JavaScript library used by a web application is performing malicious behavior, application administrators and users may be none the wiser while credit card details are being leaked to a third party endpoint controlled by an attacker. This is a common vector for Magecart, one of many client side security attacks.
Page Shield, just like our other WAAP products, is fully integrated on the Cloudflare platform and requires one single click to turn on.
Security Center
Cloudflare’s new Security Center is the home of the WAAP portfolio. A single place for security professionals to get a broad view across both network and infrastructure assets protected by Cloudflare.
Moving forward we plan for the Security Center to be the starting point for forensics and analysis, allowing you to also leverage Cloudflare threat intelligence when investigating incidents.
The Cloudflare advantage
Our WAAP portfolio is delivered from a single horizontal platform, allowing you to leverage all security features without additional deployments. Additionally, scaling, maintenance and updates are fully managed by Cloudflare allowing you to focus on delivering business value on your application.
This applies even beyond WAAP, as, although we started building products and services for web applications, our position in the network allows us to protect anything connected to the Internet, including teams, offices and internal facing applications. All from the same single platform. Our Zero Trust portfolio is now an integral part of our business and WAAP customers can start leveraging our secure access service edge (SASE) with just a few clicks.
If you are looking to consolidate your security posture, both from a management and budget perspective, application services teams can use the same platform that internal IT services teams use, to protect staff and internal networks.
Continuous innovation
We did not build our WAAP portfolio overnight, and over just the past year we’ve released more than five major WAAP portfolio security product releases. To showcase our speed of innovation, here is a selection of our top picks:
API Shield Schema Protection: traditional signature based WAF approaches (negative security model) don’t always work well with well-structured data such as API traffic. Given the fast growth in API traffic across the network we built a new incremental product that allows you to enforce API schemas directly at the edge using a positive security model: only let well-formed data through to your origin web servers;
API Abuse Detection: complementary to API Schema Protection, API Abuse Detection warns you whenever anomalies are detected on your API endpoints. These can be triggered by unusual traffic flows or patterns that don’t follow normal traffic activity;
Our new Web Application Firewall: built on top of our new Edge Rules Engine, the core Web Application Firewall received a complete overhaul, all the way from engine internals to the UI. Better performance both in terms of latency and efficacy at blocking malicious payloads, along with brand-new capabilities including but not limited to Exposed Credential Checks, account wide configurations and payload logging;
DDoS customizable Managed Rules: to provide additional configuration flexibility, we started exposing some of our internal DDoS mitigation managed rules for custom configurations to further reduce false positives and allow customers to increase thresholds / detections as required;
Security Center: Cloudflare view on infrastructure and network assets, along with alerts and notifications for miss configurations and potential security issues;
Page Shield: based on growing customer demand and the rise of attack vectors focusing on the end user browser environment, Page Shield helps you detect whenever malicious JavaScript may have made its way into your application’s code;
API Gateway: full API management, including routing directly from the Cloudflare edge, with API Security baked in, including encryption and mutual TLS authentication (mTLS);
Machine Learning WAF: complementary to our WAF Managed Rulesets, our new ML WAF engine, scores every single request from 1 (clean) to 99 (malicious) giving you additional visibility in both valid and non-valid malicious payloads increasing our ability to detect targeted attacks and scans towards your application;
Looking forward
Our roadmap is packed with both new application security features and improvements to existing systems. As we learn more about the Internet we find ourselves better equipped to keep your applications safe. Stay tuned for more.
Gartner, “Magic Quadrant for Web Application and API Protection”, Analyst(s): Jeremy D’Hoinne, Rajpreet Kaur, John Watts, Adam Hils, August 30, 2022.
Gartner and Magic Quadrant are registered trademarks of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation.
Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
At Cloudflare, we are always looking for ways to make our customers’ faster and more secure. A key part of that commitment is our ongoing investment in research and development of new technologies, such as the work on our machine learning based Web Application Firewall (WAF) solution we announced during security week.
In this blog, we’ll be discussing some of the data challenges we encountered during the machine learning development process, and how we addressed them with a combination of data augmentation and generation techniques.
Let’s jump right in!
Introduction
The purpose of a WAF is to analyze the characteristics of a HTTP request and determine whether the request contains any data which may cause damage to destination server systems, or was generated by an entity with malicious intent. A WAF typically protects applications from common attack vectors such as cross-site-scripting (XSS), file inclusion and SQL injection, to name a few. These attacks can result in the loss of sensitive user data and damage to critical software infrastructure, leading to monetary loss and reputation risk, along with direct harm to customers.
How do we use machine learning for the WAF?
The Cloudflare ML solution, at a high level, trains a classifier to distinguish between various traffic types and attack vectors, such as SQLi, XSS, Command Injection, etc. based on structural or statistical properties of the content. This is achieved by performing the following operations:
We inspect the raw HTTP input and perform some number of transformations on it such as normalization, content substitutions, or de-duplication.
Decompose or partition it via some process of tokenization, generate statistical information about the content, or extract structural data.
Compute optimal internal numerical representations of the inputs via the process of training the model. The nature of these internal representations depends on the class of model and architecture.
Learn to map internal content representations against classes (XSS, SQLi or others), scores or some other target of interest.
At run-time, use previously learned representations and mappings to analyze a new input and provide the most likely label or score for it. The score ranges from 1 to 99, with 1 indicating that the request is almost certainly malicious and 99 indicating that the request is probably clean.
This reasonable starting point stumbles immediately upon a critical challenge right from the start: we need high quality labeled data, and lots of it as that has the biggest impact on model performance. Contrary to well-researched fields like image recognition, text sentiment analysis, or classification, large datasets of HTTP requests with malicious payloads embedded are difficult to get.
To make matters even harder, strict implementation requirements for a production-quality WAF restrict the complexity of our potential ML models or architectures to ones that are relatively simple and light-weight, implying that we cannot simply pave over shortcomings of the data.
Data and challenges
The selection of a dataset is likely the most difficult of all the aspects that contribute to the final set of attributes of a machine learning model. In most cases, the model is tasked with learning the distribution of the data in some statistical sense, thus choosing and curating the dataset to ensure that the desired properties of the final solution are even possible to learn is incredibly crucial! ML models are only as reliable as the data used to train them. If we train an ML model on an incomplete dataset, or on data that doesn’t accurately represent the population, predictions might be inaccurate as they will be a direct reflection of the data.
To build a strong ML WAF, a good dataset must have large volumes of heterogeneous data covering malicious samples for all attack categories, a diverse set of negative/benign samples, and samples representing a broad spectrum of obfuscation techniques.
Due to those constraints, creating a solid dataset has a number of challenges:
Privacy
Privacy requirements limit data availability and how it can be used. Cloudflare has strict privacy guidelines and does not keep all request data – it simply isn’t available, and what is available must be carefully selected, anonymised, and stripped of sensitive information.
Heterogeneity of samples
Due to the wide assortment of potential request content types and forms, finding enough benign samples is difficult. Furthermore, it is challenging to collect data that represents requests with various charsets and content-encodings. Covering all attack configurations is also important because some attacks can be inserted into essentially any kind of request (e.g. five bytes in a huge “regular” request)
Sample difficulty
We want a dataset with a good mix of attack techniques and isn’t dominated by the ones that are easily generated by tools which simply swap out constants, transform expressions through invariants, and so on (sqli-fuzzer). Additionally, the vast majority of freely available samples in the wild are fairly trivial auto-generated payloads as part of indiscriminate scanning and discovery tools. They have very similar structural and statistical characteristics. Some of them are fairly old as well and do not reflect the current software landscape. How to “grade” the sample difficulty is not immediately obvious! What’s easy to a human may not be easy for a particular preprocessor/model, and vice-versa.
Noisy labels
Label noise affects results a lot, especially when it comes to esoteric, specific, or unusual attacks which are likely to be classified as benign by rules WAF.
What’s the strategy to overcome this?
Data augmentation
In simple terms, Data Augmentation is a process of generating artificial (but realistic) data to increase the diversity of our data by studying statistical distribution of existing real-world data.
This is crucial for us because one of the biggest concerns with rules-based WAFs is false positives. False positives are a serious challenge for WAFs because the risk of accidentally filtering legitimate traffic deters users from employing very strict rulesets. Data augmentation is used to build a solution that does not rely on observing specific high-risk keywords or character sequences, but instead uses a more holistic analysis of content and context, making it considerably less likely to block legitimate requests.
There are many sequences of characters which appear almost exclusively in payloads, but are themselves not dangerous. In order to reduce false positives and improve overall performance, we focussed on generating a lot of heterogeneous negative samples to force the model to consider the structural, semantic, and statistical properties of the content when making a classification decision.
In the context of our data and use cases, data augmentation means that we mutate benign content in a variety of ways as the content will remain benign (this isn’t going to accidentally turn it into a valid payload, with probability 1). For instance, we can add random character noise, permute keywords, merge benign content together from multiple sources, and so on. Alternatively, we can seed benign content with ‘dangerous’ keywords or ngrams frequently occuring in payloads – this results in a benign sample, but ideally will teach the model not to be too sensitive to the presence of malicious tokens lacking the proper semantics and structure.
Benign content
First and foremost, generating benign content is way easier. Mutating a malicious block of content into different malicious blocks is difficult because malicious payloads have a stricter grammar and syntax than general HTTP content due to the fact that it has code, therefore they must be manipulated in a specific manner.
However, there are a few options if we want to do this in the future. Tools like sqli-fuzzer, automates the process of fuzzing a given payload by applying transformations which preserve the underlying semantics while changing the representation or adding obfuscation. Outside existing third-party tools, it’s possible to generate our own malicious payloads using various “append malicious content to non-malicious content” techniques, with the trade off that this doesn’t actually generate *new* malicious content, just puts it into a different context.
Pseudo-random noise samples
A useful approach we identified for bolstering the number of negative training samples was to generate large quantities of pseudo-random strings of increasing complexity.
The probability of any pseudo-random string (drawn from essentially any token distribution) being a valid payload or malicious attack is essentially zero, but we can build a series of token sampling distributions that make it increasingly difficult for the model to distinguish them from a real payload, and we discovered that this resulted in dramatically better performance in terms of false positive rate, robustness, and overall model properties.
This approach works by taking a collection of tokens and a probability distribution over these tokens, and independently sampling a stream of tokens from it to create our ‘sample’. Each sample length is selected from a separate discrete sample length distribution.
For an extremely simple example, we could take a token collection consisting of ASCII characters and a uniform sampling distribution:
We wouldn’t expect even a very simple model to struggle to learn that these samples are benign, but as we increase the complexity of the token collections, we can move towards much more ‘difficult’ noise examples, including elements such as: fragments of valid URIs, user agents, XML/XSLT content or even restricted language identifiers, or keywords.
Here are some examples of more complex token collections and the kinds of random strings they produce as our negative samples:
alphanumerics, plus special characters, plus a variant of full javascript or sql keywords and (multi-character) sub-token fragments
It’s fairly straightforward to construct a suite of these noise generators of varying complexity, and targeting different types of content: JSON, XML, URIs with SQL-esque ‘noise’, and so on. As the strings get sufficiently long, the probability that they will contain at least some dangerous looking subsequences grows, so it’s also an excellent test of model robustness.
We make extensive use of noise strings to enhance the core dataset used for training and testing the model by directly training the model on increasingly difficult noise before fine-tuning on exclusively real data, appending noise of varying complexity to malicious(real) samples or benign samples to both induce and test for model robustness for padding attacks, and estimating false positive rate for certain classes of benign content.
Beyond independent sampling of random strings?
A natural extension to the above method for generating pseudo-random strings is to drop the ‘independence’ assumption for sampling tokens. This means that we’re starting to emulate the process by which real data is generated, to some extent, yielding samples with increasingly realistic local (and eventually global) structure. Some approaches for this might include a simple Markov chain, and extend all the way to state-of-the-art Large Language Models.
We experimented with using contemporary autoregressive language models trained on our corpus of real malicious payloads and found it extremely effective at generating novel payloads, as well as transforming payloads into sophisticated obfuscated representations. As the language models approached convergence on the data the likelihood of each sample being a valid payload approached 100%, allowing us to use early samples as ‘extremely strong negatives’ and the later samples as positive samples. The success of this work has suggested that deeper investigation into the use of language models for security analysis may be fruitful, not only for training classifiers, but also for creating powerful adversarial pen-testing agents.
Results summary
Let’s see a comparative summary of results and improvements, before and after the augmentation:
Model performance on evaluation metrics
The effectiveness of machine learning models for classification problems can be evaluated using a wide range of metrics, including accuracy, precision, recall, F1 Score, and others. It is important to note that in addition to using quantitative metrics, we also consider the model’s general properties and behavioral constraints. This criteria and metrics-based approach is especially important in our domain where data is inherently noisy, labels are not trustworthy, the domain of the inputs is extremely large, and hard to cover with samples.
For this post, we will concentrate on key quantitative metrics like F1 score even though we examine a variety of metrics to assess the model performance. F1 score is the weighted average (harmonic mean) of precision and recall. We can represent the F1 score with the formula:
Where,
True Positives (TP): malicious content classified correctly by the model
False Positives (FP): benign content that the model classified as malicious
True Negatives (TN): benign content classified correctly by the model
False Negatives (FN): malicious content that the model classified as benign
Since this formula takes false positives and false negatives into consideration, this score is more reliable than other metrics. There are a few methods to calculate this for multi-class problems, like Macro F1 Score, Micro F1 Score and Weighted F1 Score. Although each method has advantages and disadvantages, we obtained nearly identical results with all three methods. Below are the numbers:
Without Augmentation
With Augmentation
Class
Precision
Recall
F1 Score
Precision
Recall
F1 Score
Benign
0.69
0.17
0.27
0.98
1.00
0.99
SQLi
0.77
0.96
0.85
1.00
1.00
1.00
XSS
0.56
0.94
0.70
1.00
0.98
0.99
Total(Micro Average)
0.67
0.99
Total(Macro Average)
0.67
0.69
0.61
0.99
0.99
0.99
Total(Weighted Average)
0.68
0.67
0.60
0.99
0.99
0.99
The important takeaway is that the range of this F1 score is best at 1 and worst at 0.
The model after augmentation appears to have similar precision and recall with good overall performance, as indicated by a value of 0.99 after augmentation, compared to 0.61 for Macro F1.
So far in the results summary, we’ve only discussed F1 Score; however, there are other improvements in characteristics that we’ve observed in the model that are listed below:
False positive characteristics
Estimated false positive rate reduced by approximately 80% on test data sets. There are significantly fewer false positives involving PromQL and other SQL-structured analogues. PromQL examples result in high scores and are classified correctly:
Today, the only major category of false positives are literal SQL or JavaScript files.
General false positive rate on noise from JSON-esque, XML/SOAP-esque, and SQL-esque content-generators reduced to about a 1/100,000 rate from about 1/50 to 1/1.
True positive characteristics
True positive rate for highly fuzzed content is vastly improved. Models trained solely on real data were easily bypassed by advanced fuzzing tools, whereas models trained on real plus augmented data are extremely resistant, with many payloads receiving higher risk scores as fuzzing increases. Examples:
These yield approximately same scores as they are a result of only a few byte alterations
Proportion of client-provided test sets that primarily contain payloads not blocked by rules-waf for XSS/SQLi successfully classified is about 97.5% (with the remaining 2.5% being arguable) up from about 91%.
Padding a payload with almost any amount of ASCII, JSON-esque, special-characters, or other content will not reduce the risk score substantially. Due to the addition of hard noise long length augmented training samples, even a six byte payload in a 100 kilobyte string will be caught. Examples:
They both generate similar scores even though the latter has junk padding around the payload.
Execution performance
Runtime characteristics are unchanged for inference.
On top of that, we validated the model against the Cloudflare’s highly mature signature-based WAF and confirmed that machine learning WAF performs comparable to signature WAF, with the ML WAF demonstrating its strength particularly in cases of correctly handling highly obfuscated or irregularly fuzzed content (as well as avoiding some rules-based engine false positives). Finally, we were able to conclude that augmentation helps in improving the model performance and induce the right set of properties.
Conclusion
We built a machine learning powered WAF, with the substantial challenge to gather a diversified training set, given constraints to avoid sensitive real customer data for privacy and regulatory considerations. To create a broader and diversified dataset without requiring vast amounts of sensitive data, we used techniques such as fuzzing, data augmentation, and synthetic data generation. This allowed us to improve the solution’s false positive robustness and overall model performance.
Furthermore, these techniques reduced the time complexity required to retrieve/clean real data, and helped induce the correct model behavior. In the future, we intend to investigate autoregressive language models to generate synthetic pseudo-valid payloads.
Today we are excited to announce thresholds for our Security Event Alerts: a new and improved way of detecting anomalous spikes of security events on your Internet properties. Previously, our calculations were based on z-score methodology alone, which was able to determine most of the significant spikes. By introducing a threshold, we are able to make alerts more accurate and only notify you when it truly matters. One can think of it as a romance between the two strategies. This is the story of how they met.
Author’s note: as an intern at Cloudflare I got to work on this project from start to finish from investigation all the way to the final product.
Once upon a time
In the beginning, there were Security Event Alerts. Security Event Alerts are notifications that are sent whenever we detect a threat to your Internet property. As the name suggests, they track the number of security events, which are requests to your application that match security rules. For example, you can configure a security rule that blocks access from certain countries. Every time a user from that country tries to access your Internet property, it will log as a security event. While a security event may be harmless and fired as a result of the natural flow of traffic, it is important to alert on instances when a rule is fired more times than usual. Anomalous spikes of too many security events in a short period of time can indicate an attack. To find these anomalies and distinguish between the natural number of security events and that which poses a threat, we need a good strategy.
The lonely life of a z-score
Before a threshold entered the picture, our strategy worked onlyon the basis of a z-score. Z-score is a methodology that looks at the number of standard deviations a certain data point is from the mean. In our current configuration, if a spike crosses the z-score value of 3.5, we send you an alert. This value was decided on after careful analysis of our customers’ data, finding it the most effective in determining a legitimate alert. Any lower and notifications will get noisy for smaller spikes. Any higher and we may miss out on significant events. You can read more about our z-score methodology in this blog post.
The following graphs are an example of how the z-score method works. The first graph shows the number of security events over time, with a recent spike.
To determine whether this spike is significant, we calculate the z-score and check if the value is above 3.5:
As the graph shows, the deviation is above 3.5 and so an alert is triggered.
However, relying on z-score becomes tricky for domains that experience no security events for a long period of time. With many security events at zero, the mean and standard deviation depress to zero as well. When a non-zero value finally appears, it will always be infinite standard deviations away from the mean. As a result, it will always trigger an alert even on spikes that do not pose any threat to your domain, such as the below:
With five security events, you are likely going to ignore this spike, as it is too low to indicate a meaningful threat. However, the z-score in this instance will be infinite:
Since a z-score of infinity is greater than 3.5, an alert will be triggered. This means that customers with few security events would often be overwhelmed by event alerts that are not worth worrying about.
Letting go of zeros
To avoid the mean and standard deviation becoming zero and thus alerting on every non-zero spike, zero values can be ignored in the calculation. In other words, to calculate the mean and standard deviation, only data points that are higher than zero will be considered.
With those conditions, the same spike to five security events will now generate a different z-score:
Great! With the z-score at zero, it will no longer trigger an alert on the harmless spike!
But what about spikes that could be harmful? When calculations ignore zeros, we need enough non-zero data points to accurately determine the mean and standard deviation. If only one non-zero value is present, that data point determines the mean and standard deviation. As such, the mean will always be equal to the spike, z-score will always be zero and an alert will never be triggered:
For a spike of 1000 events, we can tell that there is something wrong and we should trigger an alert. However, because there is only one non-zero data point, the z-score will remain zero:
The z-score does not cross the value 3.5 and an alert will not be triggered.
So what’s better? Including zeros in our calculations can skew the results for domains with too many zero events and alert them every time a spike appears. Not including zeros is mathematically wrong and will never alert on these spikes.
Threshold, the prince charming
Clearly, a z-score is not enough on its own.
Instead, we paired up the z-score with a threshold. The threshold represents the raw number of security events an Internet property can have, below which an alert will not be sent. While z-score checks whether the spike is at least 3.5 standard deviations above the mean, the threshold makes sure it is above a certain static value. If both of these conditions are met, we will send you an alert:
The above spike crosses the threshold of 200 security events. We now have to check that the z-score is above 3.5:
The z-score value crosses 3.5 and an alert will be sent.
A threshold for the number of security events comes as the perfect complement. By itself, the threshold cannot determine whether something is a spike, and would simply alert on any value crossing it. This blog post describes in more detail why thresholds alone do not work. However, when paired with z-score, they are able to share their strengths and cover for each other’s weaknesses. If the z-score falsely detects an insignificant spike, the threshold will stop the alert from triggering. Conversely, if a value does cross the security events threshold, the z-score ensures there is a reasonable variance from the data average before allowing an alert to be sent.
The invaluable value
To foster a successful relationship between the z-score and security events threshold, we needed to determine the most effective threshold value. After careful analysis of our previous attacks on customers, we set the value to 200. This number is high enough to filter out the smaller, noisier spikes, but low enough to expose any threats.
Am I invited to the wedding?
Yes, you are! The z-score and threshold relationship is already enabled for all WAF customers, so all you need to do is sit back and relax. For enterprise customers, the threshold will be applied to each type of alert enabled on your domain.
Happily ever after
The story certainly does not end here. We are constantly iterating on our alerts, so keep an eye out for future updates on the road to make our algorithms even more personalized for your Internet properties!
Security is a shared responsibility between AWS and the customer. When we use infrastructure as code (IaC) we want to describe workloads wholistically, and that includes the configuration of firewalls alongside the entrypoints to web applications. As we evolve the infrastructure that our application is built upon, we can adjust firewall rules in the same place.
To accomplish this, we’ll use AWS WAFv2. Although it’s usually complex to write your own firewall rules, we can simply use AWS Managed Rules. No tedious setup required!
What is AWS WAFv2?
AWS WAFv2 is a managed web application firewall. It can be natively enabled on CloudFront, API Gateway, Application Load Balancer, or AWS AppSync and is deployed alongside these services. AWS services terminate the TCP/TLS connection, process incoming HTTP requests, and then pass the request to AWS WAF for inspection and filtering.
For example, you can use AWS WAFv2 to protect against attacks, such as cross-site request forgery (CSRF), cross-site scripting (XSS), and SQL injection (SQLi) among other threats in the OWASP Top 10.
AWS Managed Rules for AWS WAF is a set of AWS WAF rules curated and maintained by the AWS Threat Research Team that provides protection against common application vulnerabilities or other unwanted traffic, without having to write your own rules.
Prerequisites
For this walkthrough, you should have the following prerequisites:
An application fronted by one or more of the following services: Amazon Cloudfront, Amazon API Gateway, Application Load Balancer or AWS AppSync. From here on these are called ‘entrypoint’.
At least the above mentioned ‘entrypoint’ defined in AWS CDK.
Solution overview
Figure 1. AWS WAFv2 can protect endpoints built by Amazon CloudFront, Amazon API Gateway, Application Load Balancer and AWS AppSync
Given that you have an existing web application defined in AWS CDK, we want to add a WAFv2 web ACL to its entrypoint. Instead of writing our own firewall rules to inspect and filter requests, we want to leverage an AWS Managed Rules rule group. Simultaneously, we must be able to disable or reconfigure some of the rules in the case that they cause undesirable behavior in the application.
A good first rule group to use is the core rule set (CRS) managed rule group, also named AWSManagedRulesCommonRuleSet. It contains rules that are generally applicable to web applications and provides protection against exploitation of various vulnerabilities, such as the ones described in the OWASP Top 10. You can later add more managed rule groups or write your own rules, which are specific to your application (e.g., for Windows, Linux, or WordPress).
Define the AWS WAFv2 web ACL
First, let’s give the AWS WAF module a nicely readable name:
The highlighted line references the CRS managed rule group as one Rule in the list. You could add more Rule elements, either referencing the managed rule groups or custom rules.
Note the scope attribute. If you want to attach this web ACL to an API Gateway, AWS AppSync API, or Application Load Balancer, then it will be REGIONAL. If you want to attach it to a CloudFront distribution, then make sure that your AWS WAFv2 web ACL is defined in the US East (N. Virginia) Region and the scope is CLOUDFRONT.
Attach the AWS WAFv2 web ACL to an Application Load Balancer, AWS AppSync API, or API Gateway
Now that we have a web ACL defined, we must attach it to a resource. This works exactly the same across API Gateway API’s, an AWS AppSync API, or an Application Load Balancer. We must create a CfnWebACLAssociation and point it to the previously created web ACL and the resource to protect:
const cfnWebACLAssociation = new wafv2.CfnWebACLAssociation(this,'MyCDKWebACLAssociation', {
resourceArn:<ARN of resource to protect>,
webAclArn:cfnWebACL.attrArn,
});
Amazon Resource Names (ARNs) uniquely identify AWS resources. The highlighted line shows how AWS CDK lets you get the ARN of the previously defined CfnWebAcl.
Depending on what type of service you’re using, jump to one of the three following sections to learn how to retrieve the resourceArn of API Gateway, AWS AppSync, or Application Load Balancers.
Retrieving ARN for AWS AppSync API’s
To retrieve the ARN of an AWS AppSync API, call the .arn property:
const api = new appsync.GraphqlApi(…)
const cfnWebACLAssociation = new wafv2.CfnWebACLAssociation(this,'MyCDKWebACLAssociation', {
resourceArn:api.arn,
webAclArn: cfnWebACL.attrArn,
});
Retrieving ARN for Amazon API Gateway REST API’s
In this case, we must specify which stage of the REST API we want to protect with the web ACL. Then, we reference the ARN of the stage:
const api = new apigateway.RestApi(…)
const deployment = new apigateway.Deployment(…)
const stage = apigateway.Stage(…)
const cfnWebACLAssociation = new wafv2.CfnWebACLAssociation(this,'MyCDKWebACLAssociation', {
resourceArn:stage.stageArn,
webAclArn: cfnWebACL.attrArn,
});
Retrieving ARN for Application Load Balancers
If you’re dealing with an Application Load Balancer, then this is how you can retrieve its ARN:
const lb = new elbv2.ApplicationLoadBalancer(…)
const cfnWebACLAssociation = new wafv2.CfnWebACLAssociation(this,'MyCDKWebACLAssociation', {
resourceArn:lb.loadBalancerArn,
webAclArn: cfnWebACL.attrArn,
});
Attach the AWS WAFv2 web ACL to a CloudFront distribution
Attaching a web ACL to CloudFront follows a different approach. Instead of defining a cfnWebACLAssociation, we reference the web ACL inside of the Distribution definition:
const distribution = new cloudfront.Distribution(this,'distro', {
defaultBehavior: {
origin: new origins.S3Origin(s3Bucket)
},
webAclId:cfnWebACL.attrArn
});
Note that even though the property is called webAclId, because we’re using AWS WAFv2, we must supply the ARN of the web ACL.
Exclude rules from the web ACL
Lastly, let’s understand how we can customize the web ACL further. If a rule of the managed rule group causes undesired behavior in the application, then we can exclude it from the webACL. Assume that we want to exclude the SizeRestrictions_BODY rule, which limits the request body size to 8 KB.
Go back to the definition of the web ACL, and add the highlighted lines:
Other customizations you can do include pinning the version of the rule group and narrowing the scope of the request that the rule evaluates, using Scope-down statements.
Conclusion
In this post, you’ve seen how an AWS WAFv2 web ACL can be added to your existing infrastructure defined in AWS CDK. By using Managed Rules, your application benefits from a layer of protection that is curated and maintained by AWS security experts.
As a next step, you can learn how to include AWS WAFv2 metrics from Amazon CloudWatch into your application dashboards. This will give you perspective on how your web application is performing in conjunction with the AWS WAFv2 web ACL.
To learn more about AWS WAFv2 and how to manage web ACL’s, check out the official developer guide.
About the author:
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.