All posts by Dmitriy Novikov

How HashiCorp made cross-Region switchover seamless with Amazon Application Recovery Controller

2025-07-25 Dmitriy Novikov

Post Syndicated from Dmitriy Novikov original https://aws.amazon.com/blogs/architecture/how-hashicorp-made-cross-region-switchover-seamless-with-amazon-application-recovery-controller/

This blog was co-authored by Brandon Raabe, Sr. Site Reliability Engineer at HashiCorp.

In cloud-based systems, minutes of downtime can translate to significant business impact and eroded customer trust. HashiCorp, a leader in multicloud infrastructure automation software, faced this critical challenge as their HashiCorp Cloud Platform (HCP) scaled to serve enterprise customers with stringent availability requirements. When Regional outages threatened service continuity, the complex dance of failing over DNS entries, workloads, and databases across AWS Regions had become an error-prone process requiring intense coordination. This post chronicles how HashiCorp’s Site Reliability Engineering (SRE) team transformed their disaster recovery capabilities by implementing Amazon Application Recovery Controller (ARC), creating a solution that not only dramatically simplified cross-Region failovers but also provided a standardized way to signal Regional context to their distributed services.

In this post, we discuss HashiCorp’s journey from manual, stress-inducing failover procedures to a streamlined, confident approach that fundamentally changed how they deliver on their enterprise-grade resilience promises.

Challenges with disaster recovery in a multicloud infrastructure

HashiCorp’s SRE team recognized that as their cloud platform scaled to serve mission-critical enterprise workloads, their disaster recovery approach needed an upgrade. The existing manual processes required precise coordination across multiple systems during already stressful outage scenarios, which could lead to potential complications when speed and accuracy matter most. Regional outages posed particular challenges: if the control planes for critical services became unavailable, the very tools needed to execute recovery might be inaccessible.

ARC emerged as the ideal solution with its unique architecture: a highly available data plane accessible through endpoints in five distinct Regions, so the recovery mechanism remains operational even during significant Regional disruptions. By using the AWS SDK to interface with ARC, HashiCorp gained several critical advantages. They could apply infrastructure as code (IaC) practices to disaster recovery workflows, automate testing of failover procedures, and integrate resilience seamlessly with their existing operational tooling. This solution transformed their disaster recovery from a specialized manual procedure into a codified, repeatable process embedded within their platform operations.

Requirements and architectural considerations

After evaluating multiple disaster recovery approaches, HashiCorp established three core requirements for their solution. First, while maintaining human judgment for initiating failovers, the execution needed to proceed without additional operator interventions after it was triggered. This human-in-the-loop design preserved deliberate decision-making while reducing error-prone manual steps during implementation.

Second, the architecture needed exceptional resilience against the very failures it was designed to mitigate. Traditional DNS failover solutions presented a critical vulnerability: dependency on single-Region control planes that might be unavailable during an outage. ARC solved this problem through its distributed architecture, connecting Amazon Route 53 to a resilient control mechanism, enabled by Route 53 health checks, accessible through multiple Regional endpoints. This means the failover system itself remained available even if the primary Region went offline.

Third, the solution needed to meet or exceed HashiCorp’s existing Recovery Point Objective (RPO) and Recovery Time Objective (RTO) metrics—the maximum acceptable data loss and downtime thresholds. Using ARC, the SRE team planned to not just reach these targets but make substantial improvements, reducing potential customer impact during Regional events and strengthening HashiCorp’s enterprise-grade resilience.

Solution overview

To transform their disaster recovery posture, HashiCorp’s SRE team designed an architecture centered around ARC and complemented by a purpose-built orchestration service. This architecture seamlessly bridges the human decision to initiate failover with the complex technical operations required to shift traffic between Regions with minimal disruption.

At the heart of the solution is a custom failover service that serves as the orchestration layer for Regional transitions. This service maintains configuration details for the ARC cluster and provides a single, controlled interface for initiating Regional switchovers. When activated, the service establishes a secure connection to the ARC API endpoints and executes a two-step workflow: first disabling routing controls for the primary Region, then enabling those for the secondary Region. This sequential approach provides a clean traffic transition without split-brain scenarios or dropped connections.

The DNS architecture underwent a strategic evolution to support this new capability. HashiCorp reconfigured their critical ingress endpoints as Route 53 failover record pairs, with each pair consisting of a primary and secondary record. Each record is linked to a health check that monitors the state of an ARC routing control—effectively connecting AWS’s global DNS service to the ARC routing control. The primary records resolve to endpoints in the primary Region, and secondary records point to corresponding infrastructure in the standby Region. When routing controls change state, the associated health checks automatically trigger Route 53 to adjust DNS resolution patterns, redirecting traffic to the appropriate Regional infrastructure.

HashiCorp maintains their secondary Region in a warm standby configuration, with essential services running but not actively serving client traffic until a failover event occurs. To enable seamless awareness of Region status across their distributed system, the team implemented a signaling mechanism using specially crafted TXT DNS records. These records are tied to the same ARC routing controls as the primary service endpoints, effectively creating a discoverable, global state indicator. Services can query these TXT records to dynamically determine the currently active Region and adjust their internal routing, replication, and operational behaviors accordingly — alleviating the need for a separate configuration distribution system and making sure all components have a consistent view of the current Regional state.

The following diagram illustrates the disaster recovery workflow.

This architecture combines human oversight for initiating critical Regional transitions with fully automated execution after the decision is made. The use of ARC’s globally distributed control plane removes single-Region dependencies that might otherwise compromise the failover mechanism itself during a Regional outage event.

Operational decision framework for Regional failover

HashiCorp’s Regional failover process balances automated monitoring with deliberate human decision-making. Their comprehensive observability platform continuously monitors Regional health, automatically alerting the incident response team when anomalies are detected. When alerts trigger, the incident management protocol activates, with an incident commander quickly assembling experts to assess the situation.

The team follows a structured evaluation framework to determine if failover is warranted: confirming the issue is Region-specific, verifying that redundant intra-Region components can’t mitigate the problem, and assessing whether the projected Regional recovery time exceeds acceptable customer impact thresholds. This approach prevents unnecessary Regional transitions while providing rapid action when genuinely needed.

After the decision to failover is made, an authorized operator initiates the process through a single API call to their orchestration service, which then interfaces with ARC to execute the complex sequence of routing control changes. This design preserves human judgment for the critical decision while using automation for precise execution, so HashiCorp can respond confidently and consistently during high-pressure Regional outage scenarios.

Disaster recovery testing

HashiCorp maintains operational readiness through a disciplined monthly disaster recovery testing program in their integration environment. One week before each scheduled test, the team notifies all stakeholders to confirm organization-wide awareness and participation. On test day, they follow formal incident protocols, creating dedicated communication channels for transparent observation and collaboration.

The test execution mirrors their production failover process: an operator initiates the recovery sequence through their API, activating the ARC routing controls to shift traffic to the secondary Region. What sets HashiCorp’s approach apart is their comprehensive validation methodology. The team verifies critical services in the secondary Region and then fails back to the primary Region with subsequent validation. This bidirectional testing confirms both failover and failback procedures work reliably.

Each exercise concludes with a structured retrospective where the team documents observations and identifies improvement opportunities. By treating these tests as learning experiences rather than compliance activities, HashiCorp has established a continuous improvement cycle for their disaster recovery capabilities. The insights from these regular drills have led to numerous refinements in their ARC implementation and operational procedures, so their team can respond confidently during actual outages with practiced, predictable procedures.

Conclusion

The collaboration between HashiCorp and AWS through ARC has revolutionized HashiCorp’s disaster recovery capabilities. Regional transitions that once required careful DNS record manipulation by specialized operators now execute through a single API call, with traffic shifting within seconds and full propagation completing in approximately 2 minutes. This dramatic simplification, achieved by integrating the resilient ARC architecture with HashiCorp’s custom orchestration service, has not only improved recovery metrics but has also strengthened their enterprise-grade resilience promises.

ARC has solved a fundamental distributed systems challenge by providing a reliable mechanism for services to determine the active Region. By linking ARC routing controls to specialized TXT records, HashiCorp created a consistent global indicator that allows services to automatically adjust their behavior without additional coordination systems—simplifying their architecture and reducing dependencies.

Most significantly, this implementation has democratized disaster recovery within HashiCorp, transforming it from a specialized capability to a standardized procedure executable by their regular on-call rotation. The solution’s highly available endpoints across multiple Regions makes sure the recovery mechanism itself remains operational even during severe outages—addressing a critical vulnerability in their previous approach.

For HashiCorp’s enterprise customers, these improvements translate directly to business value: reduced recovery times during Regional events, increased operational confidence, and assurance that their critical infrastructure management tools will remain available even during major cloud disruptions. As HashiCorp continues to refine their approach through rigorous testing and continuous improvement, their ARC implementation demonstrates how thoughtfully architected disaster recovery can evolve from merely an insurance policy into a strategic competitive advantage.

To learn more, visit Amazon Application Recovery Controller, AWS Multi-Region Capabilities, and AWS multi-Region fundamentals.

About the authors

Introducing the AWS WAF traffic overview dashboard

2024-03-01 Dmitriy Novikov

Post Syndicated from Dmitriy Novikov original https://aws.amazon.com/blogs/security/introducing-the-aws-waf-traffic-overview-dashboard/

For many network security operators, protecting application uptime can be a time-consuming challenge of baselining network traffic, investigating suspicious senders, and determining how best to mitigate risks. Simplifying this process and understanding network security posture at all times is the goal of most IT organizations that are trying to scale their applications without also needing to scale their security operations center staff. To help you with this challenge, AWS WAF introduced traffic overview dashboards so that you can make informed decisions about your security posture when your application is protected by AWS WAF.

In this post, we introduce the new dashboards and delve into a few use cases to help you gain better visibility into the overall security of your applications using AWS WAF and make informed decisions based on insights from the dashboards.

Introduction to traffic overview dashboards

The traffic overview dashboard in AWS WAF displays an overview of security-focused metrics so that you can identify and take action on security risks in a few clicks, such as adding rate-based rules during distributed denial of service (DDoS) events. The dashboards include near real-time summaries of the Amazon CloudWatch metrics that AWS WAF collects when it evaluates your application web traffic.

These dashboards are available by default and require no additional setup. They show metrics—total requests, blocked requests, allowed requests, bot compared to non-bot requests, bot categories, CAPTCHA solve rate, top 10 matched rules, and more—for each web access control list (web ACL) that you monitor with AWS WAF.

You can access default metrics such as the total number of requests, blocked requests, and common attacks blocked, or you can customize your dashboard with the metrics and visualizations that are most important to you.

These dashboards provide enhanced visibility and help you answer questions such as these:

What percent of the traffic that AWS WAF inspected is getting blocked?
What are the top originating countries for the traffic that’s getting blocked?
What are common attacks that AWS WAF detects and protects me from?
How do my traffic patterns from this week compare with last week?

The dashboard has native and out-of-the-box integration with CloudWatch. Using this integration, you can navigate back and forth between the dashboard and CloudWatch; for example, you can get a more granular metric overview by viewing the dashboard in CloudWatch. You can also add existing CloudWatch widgets and metrics to the traffic overview dashboard, bringing your tried-and-tested visibility structure into the dashboard.

With the introduction of the traffic overview dashboard, one AWS WAF tool—Sampled requests—is now a standalone tab inside a web ACL. In this tab, you can view a graph of the rule matches for web requests that AWS WAF has inspected. Additionally, if you have enabled request sampling, you can see a table view of a sample of the web requests that AWS WAF has inspected.

The sample of requests contains up to 100 requests that matched the criteria for a rule in the web ACL and another 100 requests for requests that didn’t match rules and thus had the default action for the web ACL applied. The requests in the sample come from the protected resources that have received requests for your content in the previous three hours.

The following figure shows a typical layout for the traffic overview dashboard. It categorizes inspected requests with a breakdown of each of the categories that display actionable insights, such as attack types, client device types, and countries. Using this information and comparing it with your expected traffic profile, you can decide whether to investigate further or block the traffic right away. For the example in Figure 1, you might want to block France-originating requests from mobile devices if your web application isn’t supposed to receive traffic from France and is a desktop-only application.

Figure 1: Dashboard with sections showing multiple categories serves as a single pane of glass

Use case 1: Analyze traffic patterns with the dashboard

In addition to visibility into your web traffic, you can use the new dashboard to analyze patterns that could indicate potential threats or issues. By reviewing the dashboard’s graphs and metrics, you can spot unusual spikes or drops in traffic that deserve further investigation.

The top-level overview shows the high-level traffic volume and patterns. From there, you can drill down into the web ACL metrics to see traffic trends and metrics for specific rules and rule groups. The dashboard displays metrics such as allowed requests, blocked requests, and more.

Notifications or alerts about a deviation from expected traffic patterns provide you a signal to explore the event. During your exploration, you can use the dashboard to understand the broader context and not just the event in isolation. This makes it simpler to detect a trend in anomalies that could signify a security event or misconfigured rules. For example, if you normally get 2,000 requests per minute from a particular country, but suddenly see 10,000 requests per minute from it, you should investigate. Using the dashboard, you can look at the traffic across various dimensions. The spike in requests alone might not be a clear indication of a threat, but if you see an additional indicator, such as an unexpected device type, this could be a strong reason for you to take follow-up action.

The following figure shows the actions taken by rules in a web ACL and which rule matched the most.

Figure 2: Multidimensional overview of the web requests

The dashboard also shows the top blocked and allowed requests over time. Check whether unusual spikes in blocked requests correspond to spikes in traffic from a particular IP address, country, or user agent. That could indicate attempted malicious activity or bot traffic.

The following figure shows a disproportionately larger number of matches to a rule indicating that a particular vector is used against a protected web application.

Figure 3: The top terminating rule could indicate a particular vector of an attack

Likewise, review the top allowed requests. If you see a spike in traffic to a specific URL, you should investigate whether your application is working properly.

Next steps after you analyze traffic

After you’ve analyzed the traffic patterns, here are some next steps to consider:

Tune your AWS WAF rules to better match legitimate or malicious traffic based on your findings. You might be able to fine-tune rules to reduce false positives or false negatives. Tune rules that are blocking legitimate traffic by adjusting regular expressions or conditions.
Configure AWS WAF logging, and if you have a dedicated security information and event management (SIEM) solution, integrate the logging to enable automated alerting for anomalies.
Set up AWS WAF to automatically block known malicious IPs. You can maintain an IP block list based on identified threat actors. Additionally, you can use the Amazon IP reputation list managed rule group, which the Amazon Threat Research Team regularly updates.
If you see spikes in traffic to specific pages, check that your web applications are functioning properly to rule out application issues driving unusual patterns.
Add new rules to block new attack patterns that you spot in the traffic flows. Then review the metrics to help confirm the impact of the new rules.
Monitor source IPs for DDoS events and other malicious spikes. Use AWS WAF rate-based rules to help mitigate these spikes.
If you experience traffic floods, implement additional layers of protection by using CloudFront with DDoS protection.

The new dashboard gives you valuable insight into the traffic that reaches your applications and takes the guesswork out of traffic analysis. Using the insights that it provides, you can fine-tune your AWS WAF protections and block threats before they affect availability or data. Analyze the data regularly to help detect potential threats and make informed decisions about optimizing.

As an example, if you see an unexpected spike of traffic, which looks conspicuous in the dashboard compared to historical traffic patterns, from a country where you don’t anticipate traffic originating from, you can create a geographic match rule statement in your web ACL to block this traffic and prevent it from reaching your web application.

The dashboard is a great tool to gain insights and to understand how AWS WAF managed rules help protect your traffic.

Use case 2: Understand bot traffic during onboarding and fine-tune your bot control rule group

With AWS WAF Bot Control, you can monitor, block, or rate limit bots such as scrapers, scanners, crawlers, status monitors, and search engines. If you use the targeted inspection level of the rule group, you can also challenge bots that don’t self-identify, making it harder and more expensive for malicious bots to operate against your website.

On the traffic overview dashboard, under the Bot Control overview tab, you can see how much of your current traffic is coming from bots, based on request sampling (if you don’t have Bot Control enabled) and real-time CloudWatch metrics (if you do have Bot Control enabled).

During your onboarding phase, use this dashboard to monitor your traffic and understand how much of it comes from various types of bots. You can use this as a starting point to customize your bot management. For example, you can enable common bot control rule groups in count mode and see if desired traffic is being mislabeled. Then you can add rule exceptions, as described in AWS WAF Bot Control example: Allow a specific blocked bot.

The following figure shows a collection of widgets that visualize various dimensions of requests detected as generated by bots. By understanding categories and volumes, you can make an informed decision to either investigate by further delving into logs or block a specific category if it’s clear that it’s unwanted traffic.

Figure 4: Collection of bot-related metrics on the dashboard

After you get started, you can use the same dashboard to monitor your bot traffic and evaluate adding targeted detection for sophisticated bots that don’t self-identify. Targeted protections use detection techniques such as browser interrogation, fingerprinting, and behavior heuristics to identify bad bot traffic. AWS WAF tokens are an integral part of these enhanced protections.

AWS WAF creates, updates, and encrypts tokens for clients that successfully respond to silent challenges and CAPTCHA puzzles. When a client with a token sends a web request, it includes the encrypted token, and AWS WAF decrypts the token and verifies its contents.

In the Bot Control dashboard, the token status pane shows counts for the various token status labels, paired with the rule action that was applied to the request. The IP token absent thresholds pane shows data for requests from IPs that sent too many requests without a token. You can use this information to fine-tune your AWS WAF configuration.

For example, within a Bot Control rule group, it’s possible for a request without a valid token to exit the rule group evaluation and continue to be evaluated by the web ACL. To block requests that are missing their token or for which the token is rejected, you can add a rule to run immediately after the managed rule group to capture and block requests that the rule group doesn’t handle for you. Using the Token status pane, illustrated in Figure 5, you can also monitor the volume of requests that acquire tokens and decide if you want to rate limit or block such requests.

Figure 5: Token status enables monitoring of the volume of requests that acquire tokens

Comparison with CloudFront security dashboard

The AWS WAF traffic overview dashboard provides enhanced overall visibility into web traffic reaching resources that are protected with AWS WAF. In contrast, the CloudFront security dashboard brings AWS WAF visibility and controls directly to your CloudFront distribution. If you want the detailed visibility and analysis of patterns that could indicate potential threats or issues, then the AWS WAF traffic overview dashboard is the best fit. However, if your goal is to manage application delivery and security in one place without navigating between service consoles and to gain visibility into your application’s top security trends, allowed and blocked traffic, and bot activity, then the CloudFront security dashboard could be a better option.

Availability and pricing

The new dashboards are available in the AWS WAF console, and you can use them to better monitor your traffic. These dashboards are available by default, at no cost, and require no additional setup. CloudWatch logging has a separate pricing model and if you have full logging enabled you will incur CloudWatch charges. See here for more information about CloudWatch charges. You can customize the dashboards if you want to tailor the displayed data to the needs of your environment.

Conclusion

With the AWS WAF traffic overview dashboard, you can get actionable insights on your web security posture and traffic patterns that might need your attention to improve your perimeter protection.

In this post, you learned how to use the dashboard to help secure your web application. You walked through traffic patterns analysis and possible next steps. Additionally, you learned how to observe traffic from bots and follow up with actions related to them according to the needs of your application.

The AWS WAF traffic overview dashboard is designed to meet most use cases and be a go-to default option for security visibility over web traffic. However, if you’d prefer to create a custom solution, see the guidance in the blog post Deploy a dashboard for AWS WAF with minimal effort.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Fine-tune and optimize AWS WAF Bot Control mitigation capability

2022-02-22 Dmitriy Novikov

Post Syndicated from Dmitriy Novikov original https://aws.amazon.com/blogs/security/fine-tune-and-optimize-aws-waf-bot-control-mitigation-capability/

Introduction

A few years ago at Sydney Summit, I had an excellent question from one of our attendees. She asked me to help her design a cost-effective, reliable, and not overcomplicated solution for protection against simple bots for her web-facing resources on Amazon Web Services (AWS). I remember the occasion because with the release of AWS WAF Bot Control, I can now address the question with an elegant solution. The Bot Control feature now makes this a matter of switching it on to start filtering out common and pervasive bots that generate over 50 percent of the traffic against typical web applications.

Reduce Unwanted Traffic on Your Website with New AWS WAF Bot Control introduced AWS WAF Bot Control and some of its capabilities. That blog post covers everything you need to know about where to start and what elements it uses for configuration and protection. This post unpacks closely-related functionalities, and shares key considerations, best practices, and how to customize for common use cases. Use cases covered include:

Limiting the crawling rate of a bot leveraging labels and AWS WAF response headers
Enabling Bot Control only for certain parts of your application with scope down statements
Prioritizing verified bots or allowing only specific ones using labels
Inserting custom headers into requests from certain bots based on their labels

Key elements of AWS WAF Bot Control fine-tuning

Before moving on to precise configuration of the bot mitigation capability, it is important to understand the components that go into the process.

Labels

Although labels aren’t unique to Bot Control, the feature takes advantage of them, and many configurations use labels as the main input. A label is a string value that is applied to a request based on matching a rule statement. One way of thinking about them is as tags that belong to the specific request. The request acquires them after being processed by a rule statement, and can be used as identification of similar requests in all subsequent rules within the same web ACL. Labels enable you to act on a group of requests that meets specific criteria. That’s because the subsequent rules in the same web ACL have access to the generated labels and can match against them.

Labels go beyond just a mechanism for matching a rule. Labels are independent of a rule’s action, as they can be generated for Block, Allow, and Count. That opens up opportunities to filter or construct queries against records in AWS WAF logs based on labels, and so implement sophisticated analytics.

A label is a string made up of a prefix, optional namespace, and a name delimited by a colon. For example: prefix:[namespace:]name. The prefix is automatically added by AWS WAF.

AWS WAF Bot Control includes various labels and namespaces:

bot:category: Type of bot. For example, search_engine, content_fetcher
bot:name: Name of a specific bot (if available). For example, scrapy, mauibot, crawler4j
bot:verified: Verified bots are generally safe for web applications. For example, googlebot and linkedin. Bot Control performs validation to confirm that such bots come from the source that they claim, using the bot confirmation detection logic described later in this section.
By default, verified bots are not blocked by Bot Control, but you can use a label to block them with a custom rule.
signal: attributes of the request indicate a bot activity. For example, non_browser_user_agent, automated_browser

These labels are added through managed bot detection logic, and Bot Control uses them to perform the following:

Known bot categorization: Comparing the request user-agent to known bots to categorize and allow customers to block by category. Bots are categorized by their function, such as scrapers, search engines, social media.

Bot confirmation: Most respectable bots provide a way to validate beyond the user-agent, typically by doing a reverse DNS lookup of the IP address to confirm the validity of domain and host names. These automatic checks will help you to ensure that only legitimate bots are allowed, and provide a signal to flag requests to downstream systems for bot detection.

Header validation: Request headers validation is performed against a series of checks to look for missing headers, malformed headers, or invalid headers.

Browser signature matching: TLS handshake data and request headers can be deconstructed and partially recombined to create a browser signature that identifies browser and OS combinations. This signature can be validated against the user-agent to confirm they match, and checked against lists of known-good browser known-bad browser signatures.

Below are a few examples of labels that Bot Control has. You can obtain the full list by calling the DescribeManagedRuleGroup API.

awswaf:managed:aws:bot-control:bot:category:search_engine
awswaf:managed:aws:bot-control:bot:name:scrapy
awswaf:managed:aws:bot-control:bot:verified
awswaf:managed:aws:bot-control:signal:non_browser_user_agent

Best practice to start with Bot Control

Although Bot Control can be enabled and start protecting your web resources with the default Block action, you can switch all rules in the rule group into a Count action at the beginning. This accomplishes the following:

Avoids false positives with requests that might match one of the rules in Bot Control but still be a valid bot for your resource.
Allows you to accumulate enough data points in the form of labels and actions on requests with them, if some of the requests matched rules in Bot Control. That enables you to make informed decisions on constructing rules for each desired bot or category and when switching them into a default action is appropriate.

Labels can be looked up in Amazon CloudWatch metrics and AWS WAF logs, and as soon as you have them, you can start planning whether exceptions or any custom rules are needed to cater for a specific scenario. This blog post explores examples of such use cases in the Common use cases sections below.

Additionally, as AWS WAF processes rules in sequential order, you should consider where the Bot Control rule group is located in your web ACL. To filter out requests that you confidently consider unwanted, you can place AWS Managed Rules rule groups—such as the Amazon IP reputation list—before the Bot Control rule group in the evaluation order. This decreases the number of requests processed by Bot Control, and makes it more cost effective. Simultaneously, Bot Control should be early enough in the rules to:

Enable label generation for downstream rules. That also provides higher visibility as a side benefit.
Decrease false positives by not blocking desired bots before they reach Bot Control.

AWS WAF Bot Control fine-tuning wouldn’t be complete and configurable without a set of recently released features and capabilities of AWS WAF. Let’s unpack them.

How to work with labels in CloudWatch metrics and AWS WAF logs

Generated labels generate CloudWatch metrics and are placed into AWS WAF logs. It enables you to see what bots and categories hit your website, and the labels associated with them that you can use for fine tuning.

CloudWatch metrics are generated with the following dimensions and metrics.

Region dimension is available for all Regions except Amazon CloudFront. When web ACL is associated with CloudFront, metrics are in the Northern Virginia Region.
WebACL dimension is the name of the WebACL
Namespace is the fully qualified namespace, including the prefix
LabelValue is the label name
Action is the terminating action (for example, Allow, Block, Count)

AWS WAF includes a shortcut to associated CloudWatch metrics at the top of the Overview page, as shown in Figure 1.

Figure 1: Title and description of the chart in AWS WAF with a shortcut to CloudWatch

Alternatively, you can find them in the WAFV2 service category of the CloudWatch Metrics section.

CloudWatch displays generated labels and the volume across dates and times, so you can evaluate and make informed decisions to structure the rules or address false positives. Figure 2 illustrates what labels were generated for requests from bots that hit my website. This example configured only a couple of explicit Allow actions, so most of them were blocked. The top section of the figure 2 shows the load from two selected labels.

Figure 2: WAFV2 CloudWatch metrics for generated Label Namespaces

In AWS WAF logs, generated labels are included in an array under the field labels. Figure 3 shows an example request with the labels array at the bottom.

Figure 3: An example of an AWS WAF log record

This example shows three labels generated for the same request. Uptimerobot follows the monitoring category label, and combining these two labels is useful to provide flexibility for configurations based on them. You can use the whole category, or be laser-focused using the label of the specific bot. You will see how and why that matters later in this blog post. The third label, non_browser_user_agent, is a signal of forwarded requests that have extra headers. For protection from bots in conjunction with labels, you can construct extra scanning in your application for certain requests.

Scope-down statements

Given that Bot Control is a premium feature and is a paid AWS Managed Rules, the ability to keep your costs in control is crucial. The scope-down statement allows you to optimize for cost by filtering out any traffic that doesn’t require inspection by Bot Control.

To address this goal, you can use scope down statements that can be applied to two broad scenarios.

You can exclude certain parts of your resource from scanning by Bot Control. Think of parts of your web site that you don’t mind being accessed by bots, typically that would be static content, such as images and CSS files. Leaving protection on everything else, such as APIs and login pages. You can also exclude IP ranges that can be considered safe from bot management. For example, traffic that’s known to come from your organization or viewers that belong to your partners or customers.

Alternatively, you can look at this from a different angle, and only apply bot management to a small section of your resources. For example, you can use Bot Control to protect a login page, or certain sensitive APIs, leaving everything else outside of your bot management.

With all of these tools in our toolkit let’s put them into perspective and dive deep into use cases and scenarios.

Common use cases for AWS WAF Bot Control fine-tuning

There are several methods for fine tuning Bot Control to better meet your needs. In this section, you’ll see some of the methods you can use.

Limit the crawling rate

In some cases, it is necessary to allow bots access to your websites. A good example is search engine bots, that crawl the web and create an index. If optimization for search engines is important for your business, but you notice excessive load from too many requests hitting your web resource, you might face a dilemma of how to slow crawlers down without unnecessarily blocking them. You can solve this with a combination of Bot Control detection logic and a rate-based rule with a response status code and header to communicate your intention back to crawlers. Most crawlers that are deemed useful have a built-in mechanism to decrease their crawl rate when you detect and respond to increased load.

To customize bot mitigation and set the crawl rate below limits that might negatively affect your web resource

In the AWS WAF console, select Web ACLs from the left menu. Open your web ACL or follow the steps to create a web ACL.
Choose the Rules tab and select Add rules. Select Add managed rule groups and proceed with the following settings:
1. In the AWS managed rule groups section, select the switch Add to web ACL to enable Bot Control in the web ACL. This also gives you labels that you can use in other rules later in the evaluation process inside the web ACL.
2. Select Add rules and choose Save
In the same web ACL, select Add rules menu and select Add my own rules and rule groups.
Using the provided Rule builder, configure the following settings:
1. Enter a preferred name for the rule and select Rate-based rule.
2. Enter a preferred rate limit for the rule. For example, 500.
  
  Note: The rate limit is the maximum number of requests allowed from a single IP address in a five-minute period.
3. Select Only consider requests that match the criteria in a rule statement to enable the scope-down statement to narrow the scope of the requests that the rule evaluates.
4. Under the Inspect menu, select Has a label to focus only on certain types of bots.
5. In the Match key field, enter one of the following labels to match based on broad categories, such as verified bots or all bots identified as scraping as illustrated on Figure 4:
  awswaf:managed:aws:bot-control:bot:verified
  awswaf:managed:aws:bot-control:bot:category:scraping_framework
6. Alternatively, you can narrow down to a specific bot using its label:
  awswaf:managed:aws:bot-control:bot:name:Googlebot
  
  Figure 4: Label match rule statement in a rule builder with a specific match key
In the Action section, configure the following settings:
1. Select Custom response to enable it.
2. Enter 429 as the Response code to indicate and communicate back to the bot that it has sent too many requests in a given amount of time.
3. Select Add new custom header and enter Retry-After in the Key field and a value in seconds for the Value field. The value indicates how many seconds a bot must wait before making a new request.
Select Add rule.
It’s important to place the rule after the Bot Control rule group inside your web ACL, so that the label is available in this custom rule.
1. In the Set rule priority section, check that the new rate-based rule is under the existing Bot Control rule set and if not, choose the newly created rule and select Move up or Move down until the rule is located after it.
2. Select Save.

Figure 5: AWS WAF rule action with a custom response code

With the preceding configuration, Bot Control sets required labels, which you then use in the scope-down statement in a rate-based rule to not only establish a ceiling of how many requests you will allow from specific bots, but also communicate to bots when their crawling rate is too high. If they don’t respect the response and lower their rate, the rule will temporarily block them, protecting your web resource from being overwhelmed.

Note: If you use a category label, such as scraping_framework, all bots that have that label will be counted by your rate-based rule. To avoid unintentional blocking of bots that use the same label, you can either narrow down to a specific bot with a precise bot:name: label, or select a higher rate limit to allow a greater margin for the aggregate.

Enable Bot Control only for certain parts of your application

As mentioned earlier, excluding parts of your web resource from Bot Control protection is a mechanism to reduce the cost of running the feature by focusing only on a subset of the requests reaching a resource. There are a few common scenarios that take advantage of this approach.

To run Bot Control only on dynamic parts of your traffic

In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL.
Choose the Rules tab and select Add rules. Then select Add managed rule groups to proceed with the following settings:
1. In the AWS managed rule groups section, select Add to web ACL to enable Bot Control in the web ACL.
2. Select Edit.
Select Scope-down statement – optional and select Enable Scope-down statement.
In If a request, select doesn’t match the statement (NOT).
In the Statement section, configure the following settings:
1. Choose URI path in the Inspect field.
2. For the Match type, choose Starts with string.
3. Depending on the structure of your resource, you can enter a whole URI string—such as images/—in the String to match field. The string will be excluded from Bot Control evaluation.
Figure 6: A scope-down statement to match based on a string that a URI path starts with
Select Save rule.

An alternative to using string matching

As an alternative to a string match type, you can use a regex pattern set. If you don’t have a regex pattern set, create one using the following guide.

Note: This pattern matches most common file extensions associated with static files for typical web resources. You can customize the pattern set if you have different file types.

Follow steps 1-4 of the previous procedure.
In the Statement section, configure the following settings:
1. Choose URI path in the Inspect field.
2. For the Match type, choose Matches pattern from regex pattern set and select your created set in the Regex pattern set. as illustrated in Figure 7.
3. In Regex pattern set, enter the pattern
  (?i)\.(jpe?g|gif|png|svg|ico|css|js|woff2?)$
  
  Figure 7: A scope-down statement to match based on a regex pattern set as part of a URI path

To run Bot Control only on the most sensitive parts of your application.

Another option is to exclude almost everything, by only enabling the Bot Control on the most sensitive part of your application. For example, a login page.

Note: The actual URI path depends on the structure of your application.

Inside the Scope-down statement, in the If a request menu, select matches the statement.
In the Statement section:
1. In the Inspect field, select URI path.
2. For the Match type, select Contains string.
3. In the String to match field, enter the string you want to match. For example, login as shown in the Figure 8.
Choose Save rule.

Figure 8: A scope-down statement to match based on a string within a URI path

To exclude more than one part of your application from Bot Control.

If you have more than one part to exclude, you can use an OR logical statement to list each part in a scope-down statement.

Inside the Scope-down statement, in the If a request menu, select matches at least one of the statements (OR).
In the Statement 1 section, configure the following settings:
1. Choose URI path in the Inspect field.
2. For the Match type choose Contains string.
3. In the String to match field enter a preferred value. For example, login.
In the Statement 2 section, configure the following settings:
1. Choose URI path in the Inspect field.
2. For the Match type choose Starts with string.
3. In the String to match field enter a preferred URI value. For example, payment/.
Select Save rule.

Figure 9 builds on the previous example of an exact string match by adding an OR statement to protect an API named payment.

Figure 9: A scope-down statement with OR logic for more sophisticated matching

Note: The visual editor on the console supports up to five statements. To add more, edit the JSON representation of the rule on the console or use the APIs.

Prioritize verified bots that you don’t want to block

Since verified bots aren’t blocked by default, in most cases there is no need to apply extra logic to allow them through. However, there are scenarios where other AWS WAF rules might match some aspects of requests from verified bots and block them. That can hurt some metrics for SEO, or prevent links from your website from properly propagating and displaying in social media resources. If this is important for your business, then you might want to ensure you protect verified bots by explicitly allowing them in AWS WAF.

To prioritize the verified bots category

In the AWS WAF menu, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. The next steps assume you already have a Bot Control rule group enabled inside the web ACL.
In the web ACL, select Add rules, and then select Add my own rules and rule groups.
Using the provided Rule builder, configure the following settings:
1. Enter a name for the rule in the Name field.
2. Under the Inspect menu, select Has a label.
3. In the Match key field, enter the following label to match based on the label that each verified bot has:
  awswaf:managed:aws:bot-control:bot:verified
4. In the Action section, select Allow to confirm the action on a request match
Select Add rule. It’s important to place the rule after the Bot Control rule group inside your web ACL, so that the bot:verified label is available in this custom rule. To complete this, configure the following steps:
1. In the Set rule priority section, check that the rule you just created is listed immediately after the existing Bot Control rule set. If it’s not, choose the newly created rule and select Move up or Move down until the rule is located immediately after the existing Bot Control rule set.
2. Select Save.

Figure 10: Label match rule statement in a Rule builder with a specific match key

Allow a specific bot

Labels also enable you to single out the bot you don’t want to block from the category that is blocked. One of the common examples are third-party bots that perform monitoring of your web resources.

Let’s take a look at a scenario where UptimeRobot is used to allow a specific bot. The bot falls into a category that’s being blocked by default—bot:category:monitoring. You can either exclude the whole category, which can have a wider impact on resource than you want, or allow only UptimeRobot.

To explicitly allow a specific bot

Analyze CloudWatch metrics or AWS WAF logs to find the bot that is being blocked and its associated labels. Unless you want to allow the whole category, the label you would be looking for is bot:name: The example that follows is based on the label awswaf:managed:aws:bot-control:bot:name:uptimerobot.
From the logs, you can also verify which category the bot belongs to, which is useful for configuring Scope-down statements.
In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. For the next steps, it’s assumed that you already have a Bot Control rule group enabled inside the webACL.
Open the Bot Control rule set in the list inside your web ACL and choose Edit
From the list of Rules find CategoryMonitoring and set to Count. This will prevent the default block action of the category.
Select Scope-down statement – optional and select Scope-down statement. Then configure the following settings:
1. Inside the Scope-down statement, in the If a request menu, choose matches all the statements (AND). This will allow you to construct the complex logic necessary to block the category but allow a specified bot.
2. In the Statement 1 section under the Inspect menu select Has a label.
3. In the Match key field, enter the label of the broad category that you set to count in step number 4. In this example, it is monitoring. This configuration will keep other bots from the category blocked:
  awswaf:managed:aws:bot-control:bot:category:monitoring
4. In the Statement 2 section, select Negate statement results to allow you to exclude a specific bot.
5. Under the Inspect menu, select Has a label.
6. In the Match key field, enter the label that will uniquely identify the bot you want to explicitly allow. In this example, it’s uptimerobot with the following label:
  awswaf:managed:aws:bot-control:bot:name:uptimerobot
Choose Save rule.

Figure 11: Label match rule statement with AND logic to single out a specific bot name from a category

Note: This approach is the best practice for analyzing and, if necessary, addressing false positives situations. You can apply exclusion to any bot, or multiple bots, based on the unique bot:name: label.

Insert custom headers into requests from certain bots

There are situations when you want to further process or analyze certain requests. or implement logic that is provided by systems in the downstream. In such cases, you can use AWS WAF Bot Control to categorize the requests. Applications later in the process can then apply the intended logic on either a broad group of requests, such as all bots within a category, or as narrow as a certain bot.

To insert a custom header

In the AWS WAF console, select Web ACLs from the left menu. Open a web ACL that you have, or follow the steps to create a web ACL. The next steps assume that you already have Bot Control rule group enabled inside the webACL.
Open the Bot Control rule set in the list inside your web ACL and choose Edit.
From the list of Rules set the targeted category to Count.
Choose Save rule.
In the same web ACL, choose the Add rules menu and select Add my own rules and rule groups.
Using the provided Rule builder, configure the following settings:
1. Enter a name for the rule in the Name field.
2. Under the Inspect menu, select Has a label.
3. In the Match key field, enter the label to match either a targeted category or a bot. This example uses the security category label:
  awswaf:managed:aws:bot-control:bot:category:security
4. In the Action section, select Count
5. Open Custom request – optional and select Add new custom header
6. Enter values in the Key and Value fields that correspond to the inserted custom header key-value pair that you want to use in downstream systems. The example in Figure 12 shows this configuration.
7. Choose Add rule.
AWS WAF prefixes your custom header names with x-amzn-waf- when it inserts them, so when you add abc-category, your downstream system sees it as x-amzn-waf-abc-category.

Figure 12: AWS WAF rule action with a custom header inserted by the service

The custom rule located after Bot Control now inserts the header into any request that it labeled as coming from bots within the security category. Then the security appliance that is after AWS WAF acts on the requests based on the header, and processes them accordingly.

This implementation can serve other scenarios. For example, using your custom headers to communicate to your Origin to append headers that will explicitly prevent caching certain content. That makes bots always get it from the Origin. Inserted headers are accessible within AWS Lambda@Edge functions and CloudFront Functions, this opens up advanced processing scenarios.

Conclusion

This post describes the primary building blocks for using Bot Control, and how you can combine and customize them to address different scenarios. It’s not an exhaustive list of the use cases that Bot Control can be fine-tuned for, but hopefully the examples provided here inspire and provide you with ideas for other implementations.

If you already have AWS WAF associated with any of your web-facing resources, you can view current bot traffic estimates for your applications based on a sample of requests currently processed by the service. Visit the AWS WAF console to view the bot overview dashboard. That’s a good starting point to consider implementing learnings from this blog to improve your bot protection.

It is early days for the feature, and it will keep gaining more capabilities, stay tuned!

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on AWS WAF re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Challenges with disaster recovery in a multicloud infrastructure

Requirements and architectural considerations

Solution overview

Operational decision framework for Regional failover

Disaster recovery testing

Conclusion

About the authors

Introduction to traffic overview dashboards

Use case 1: ­Analyze traffic patterns with the dashboard

Next steps after you analyze traffic

Use case 2: Understand bot traffic during onboarding and fine-tune your bot control rule group

Comparison with CloudFront security dashboard

Availability and pricing

Conclusion

Introduction

Key elements of AWS WAF Bot Control fine-tuning

Labels

Best practice to start with Bot Control

How to work with labels in CloudWatch metrics and AWS WAF logs

Scope-down statements

Common use cases for AWS WAF Bot Control fine-tuning

Limit the crawling rate

To customize bot mitigation and set the crawl rate below limits that might negatively affect your web resource

Enable Bot Control only for certain parts of your application

To run Bot Control only on dynamic parts of your traffic

An alternative to using string matching

To run Bot Control only on the most sensitive parts of your application.

To exclude more than one part of your application from Bot Control.

Prioritize verified bots that you don’t want to block

To prioritize the verified bots category

Allow a specific bot

To explicitly allow a specific bot

Insert custom headers into requests from certain bots

To insert a custom header

Conclusion

The collective thoughts of the interwebz

Use case 1: Analyze traffic patterns with the dashboard