24.4 Routing Policies: Simple, Weighted, Latency, Geolocation, Geoproximity, Failover, Multivalue

Alright, let’s talk about how you tell traffic where to go. Route 53’s routing policies are the brains of the operation. They’re how you answer the fundamental question: “When someone types in myawesomeapp.com, which of my seventeen servers spread across the globe should actually get this request?” The answer is rarely “just pick one,” so AWS gives you a toolbox of policies, each with its own particular brand of cleverness. Let’s crack it open.

Simple: The One-Trick Pony

This is the policy you use when you have a single resource that answers for a domain, like one web server or one CloudFront distribution. It’s DNS 101. You ask for www.example.com, you get one IP address back. It’s also what you use for aliasing, which is Route 53’s party trick for mapping a record to another AWS resource (like an ELB or an S3 bucket) without you having to worry about its actual, ever-changing IP address. The beauty of an alias record is that Route 53 responds with the resource’s IPs automatically. It’s free, and it’s smarter than a CNAME.

resource "aws_route53_record" "www" {
  zone_id = aws_route53_zone.primary.zone_id
  name    = "www.example.com"
  type    = "A"

  alias {
    name                   = aws_lb.front_end.dns_name
    zone_id                = aws_lb.front_end.zone_id
    evaluate_target_health = true
  }
}

See that evaluate_target_health? Set it to true. Always. It lets Route 53 check the health of your ALB/NLB targets and stop routing to a sick endpoint. This is your first, easiest line of defense.

Weighted: The Load Balancer (Sort Of)

When “Simple” is too simple, you graduate to Weighted routing. You create multiple records with the same name and type but assign each a relative weight. Route 53 will return the records probabilistically based on those weights. It’s fantastic for blue-green deployments, canary releases, or just sending 1% of traffic to a test environment.

The catch? This is DNS-based load balancing. A client or their resolver will cache the DNS response (TTL, remember?) and keep hitting the same IP until that cache expires. It’s not a true, per-connection load balancer. Don’t expect perfectly smooth, real-time distribution.

{
    "Comment": "Send 90% to prod, 10% to canary",
    "Changes": [{
        "Action": "UPSERT",
        "ResourceRecordSet": {
            "Name": "app.example.com",
            "Type": "A",
            "SetIdentifier": "prod",
            "Weight": 90,
            "TTL": 60,
            "ResourceRecords": [{"Value": "192.0.2.1"}]
        }},
        {
        "Action": "UPSERT",
        "ResourceRecordSet": {
            "Name": "app.example.com",
            "Type": "A",
            "SetIdentifier": "canary",
            "Weight": 10,
            "TTL": 60,
            "ResourceRecords": [{"Value": "192.0.2.2"}]
        }}
    ]
}

Latency: The Need-for-Speed

This one is pure user experience magic. You deploy identical stacks in us-east-1 and ap-southeast-1, create a Latency-based record for each, and Route 53 will answer a user’s query with the IP of the region that provides the lowest network latency from the user’s requesting DNS resolver. It’s not measuring latency to the user’s browser directly, but it’s a darn good approximation.

The genius here is that it’s automatic. You don’t need a map and a spreadsheet to figure out which region is best for someone in Lisbon. The key is that your endpoints must be functionally identical. You can’t have a user in Europe getting a low-latency connection to a US endpoint that’s missing all the features you’ve deployed in Frankfurt.

Geolocation: The Border Patrol

If Latency routing is about performance, Geolocation routing is about control. You want to send users in France to your French compliance stack, or ensure users in California see the legally-mandated Prop 65 warning. You create records and specify which continent or country they should be served from.

The big “gotcha”: What if you get a request from a continent you didn’t define? Or from an IP that geolocation databases haven’t mapped correctly? You must create a default record (a record with no geographic location specified) to catch these, or Route 53 will simply return NOANSWER and you’ve just broken the internet for someone in a remote village. Don’t be that person.

Geoproximity: The Sophisticated Compass

This is Geolocation’s more complex, data-driven cousin. It routes based on the geographic location of your resources and your users, and it lets you bias traffic. Have a resource in Virginia and another in Dublin? You can bias traffic toward Dublin for European users. The real magic is that you can bias traffic using a Route 53 Traffic Flow console, which is a visual tool that feels like it was designed by someone who has actually used a map before.

The cost: It requires a Route 53 Traffic Flow subscription. It’s powerful, but it’s not free. Use it when you have a global anycast network or multiple AWS regions and the simple “country” rules of Geolocation aren’t precise enough.

Failover: The Plan B

This is the one you hope you never need but are ecstatic to have. You create a primary record and a secondary record. Route 53 health checks the primary. If it passes, everyone goes there. If it fails, Route 53 fails over and starts sending everyone to the secondary. This is non-negotiable for any serious production system. The health checks are the key—you can make them as simple (is the endpoint responding on port 80?) or as complex (is the /health endpoint returning a 200 and the word “happy” in the body?) as you need.

Multivalue: The Simple Failover

Think of Multivalue as a slightly smarter Simple routing that plays well with health checks. You define up to eight healthy records for the same name, and Route 53 returns multiple healthy values in a random order. If a resource fails its health check, it gets removed from the list of answers.

It’s like Weighted routing without the weights and with health checks. It’s not a substitute for a proper load balancer, but it’s a decent way to get basic fault tolerance and some level of load distribution at the DNS level without any other infrastructure. Just remember the DNS caching caveat. It’s a step up from Simple, but it’s not a full Failover replacement.

The real skill isn’t knowing each policy in isolation; it’s knowing how to chain them together with health checks to build a resilient, performant, and global application. Now go route some traffic.