34.2 WAF Rate-Based Rules and Bot Control

Alright, let’s talk about stopping the digital barbarians at the gate without slowing down your actual users to a crawl. This is where WAF’s Rate-Based Rules (RBRs) and the paid-upgrade Bot Control come in. Think of RBRs as the bouncer who counts how many times you’ve tried to get in, and Bot Control as the bouncer with a fancy gadget that can spot a fake ID from a mile away.

The Brutal Simplicity of Rate-Based Rules

An RBR is gloriously simple. It counts requests from a single IP address over a rolling 5-minute window. If that IP exceeds a limit you set, it gets blocked for the rest of that window. It’s the “stop hitting yourself” of web security. You define the limit, and AWS handles the counting and the temporary blocklisting.

Why is this your first line of defense? Because it’s cheap, effective, and stops brute-force attacks, credential stuffing, and basic scrapers dead in their tracks. The key thing to remember is that it counts all requests that match the scope of your rule. You can set it to count all requests, or only those that match other criteria in your web ACL, which is incredibly powerful.

Here’s a Terraform example to create an RBR that allows a generous 2000 requests per 5 minutes for most paths, but only 100 for your login endpoint, because that’s where the bad guys love to focus.

resource "aws_wafv2_web_acl" "main" {
  name        = "my-web-acl"
  scope       = "REGIONAL" # For ALB/API Gateway
  description = "Main Web ACL with RBR"

  default_action { allow {} }

  rule {
    name     = "LoginRateLimit"
    priority = 10

    action {
      block {}
    }

    statement {
      rate_based_statement {
        limit              = 100
        aggregate_key_type = "IP"

        # This is the magic: only count requests to /login
        scope_down_statement {
          byte_match_statement {
            field_to_match {
              uri_path {}
            }
            positional_constraint = "STARTS_WITH"
            search_string         = "/login"
            text_transformation {
              priority = 0
              type     = "NONE"
            }
          }
        }
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = true
      metric_name                = "LoginRateLimit"
      sampled_requests_enabled   = true
    }
  }

  # ... other rules ...

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "MyWebACL"
    sampled_requests_enabled   = true
  }
}

The scope_down_statement is what makes this precise. Without it, you’re counting every request from that IP, which might accidentally block a legitimate user behind a NAT gateway (like an office network).

The Common RBR Pitfall: The NAT Problem

This is the big one. You set a limit of 2000 requests, thinking it’s impossibly high for one user. Then you get a frantic call from an entire office building whose traffic all comes from one public IP. They’ve been blocked. Whoops.

The mitigation is to use scope_down_statement to be more selective (like only counting POSTs to /login) and to set limits higher than you think a single user could generate, but lower than a scraper would. It’s a balancing act. Monitor your blocks in CloudWatch like a hawk after you turn this on.

Bringing Out the Big Guns: Bot Control

Now, RBRs are dumb. They see an IP, they count. They don’t know if it’s a human, a good bot (Googlebot), or a malicious script. This is where Bot Control comes in. It’s an AWS managed rule set you pay extra for that actually inspects the request patterns and signatures to classify the client.

It labels requests with labels like AWS-BotControl-BadBot or AWS-BotControl-Scanner. Your job is then to create rules that handle these labels. You can block bad bots outright, force good bots to use a CAPTCHA (which is hilarious to do to Googlebot, please don’t), or simply monitor them.

Why use it? Because it’s intelligent. It can spot sneaky bots that stay under your rate limit. It’s your fancy ID-spotting bouncer.

rule {
  name     = "AWS-AWSManagedRulesBotControlRuleSet"
  priority = 20

  override_action {
    none {} # We'll handle the labels with our own rules below
  }

  statement {
    managed_rule_group_statement {
      name        = "AWSManagedRulesBotControlRuleSet"
      vendor_name = "AWS"
    }
  }

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "AWSManagedRulesBotControlRuleSet"
    sampled_requests_enabled   = true
  }
}

# Now, a custom rule to block confirmed bad bots
rule {
  name     = "Block-Bad-Bots"
  priority = 21

  action {
    block {
      custom_response {
        response_code = 403
        response_header {
          name  = "Content-Type"
          value = "application/json"
        }
        custom_response_body_key = "Bot-Block-Message"
      }
    }
  }

  statement {
    label_match_statement {
      scope = "LABEL"
      key   = "awswaf:managed:aws:bot-control:bad-bot"
    }
  }

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "Block-Bad-Bots"
    sampled_requests_enabled   = true
  }
}

The Bot Control “Gotcha”

The cost. It’s not just the rule itself; you pay for each request that Bot Control inspects. If you have a high-traffic site, this can add up. You must use a scope-down statement within the Bot Control rule itself to only run it on requests you’re suspicious of. Don’t run it on your static CSS and image files—that’s just burning money. Run it on POST endpoints, admin panels, and API routes.

The combination is what makes it powerful: use a broad, cheap RBR to catch the obvious flood, and then let the smarter, more expensive Bot Control analyze the trickle that gets through. It’s a layered defense, and when tuned correctly, it’s brutally effective without making your real users want to throw their laptop out a window.