9.4 Network Load Balancer: Ultra-Low Latency TCP/UDP at Layer 4
Right, so you’ve decided you need raw, unfiltered performance for your TCP or UDP traffic. You’re not messing around with HTTP headers or cookie-based stickiness. You need packets to fly from your users to your instances with as little fuss and overhead as possible. Enter the Network Load Balancer (NLB). This is the tool you call when every millisecond counts and you need to handle a tidal wave of traffic without breaking a sweat.
Think of the NLB as a highly-sophisticated, stateless network traffic cop operating at the connection level (Layer 4). It doesn’t care what’s inside the packets; it just efficiently routes them based on IP protocol data. It’s a pass-through beast. A client connects to the NLB, and the NLB picks a healthy target from its list and opens a separate TCP connection (or flow for UDP) to it. The magic, and the complexity, comes from the fact that the client’s source IP is preserved. That’s right, your application instances see the real client IP, not the IP of some intermediary. This is fantastic for things like whitelisting, geolocation, or just plain old logging, but it also means your security groups have to be configured correctly, a pitfall we’ll get to in a minute.
How It Actually Routes Traffic
The NLB operates on a flow hash algorithm. When a new TCP connection or UDP flow arrives, it takes the source IP, destination IP, source port, and destination port, runs them through a hashing algorithm, and uses the result to pick a target in the target group. This means that all packets for that specific connection will always go to the same target instance. It’s not round-robin in the traditional sense; it’s consistent for the life of the connection. This is perfect for long-lived connections, like for gaming or chat applications, but it does mean that if you have a single client making millions of requests, they’ll all hammer the same instance. That’s not a flaw, it’s by design—you need to design your application with this in mind.
The Gotcha: Security Groups and Preserved IPs
Here’s the first “oh crap” moment people run into. Because the NLB preserves the client IP, the traffic hitting your EC2 instance appears to come from anywhere on the internet. Your instance’s security group must allow traffic from those client IPs (or 0.0.0.0/0) on the instance’s listener port, not the NLB’s port. This is the most common misconfiguration. You open port 80 on the NLB but forget that your instance’s security group is still blocking all traffic except from the NLB’s IP… which is now irrelevant. The traffic isn’t from the NLB; it’s from the client.
Let’s say your NLB is listening on port 80, but your application runs on port 8080. You register your instances with the target group on port 8080. The security group for your instances needs to allow TCP traffic on port 8080 from the client IPs (e.g., 0.0.0.0/0), not port 80.
The Power of Static IPs and Zonal Isolation
This is a killer feature. Unlike its older siblings, an NLB provides a static IP address per Availability Zone. You can even request Elastic IPs to assign to it. This is a godsend for scenarios where you need to whitelist specific IP addresses on a third-party service’s firewall. They won’t change on you. Furthermore, you can enable cross-zone load balancing, but it’s disabled by default. Why? Because the AWS architects decided that if you’re using an NLB, you’re probably a control freak who wants to minimize latency and knows exactly how your targets are distributed across zones. With it disabled, the NLB only routes traffic to targets in its own Availability Zone. This can be more efficient, but it puts the onus on you to ensure your targets are evenly distributed. I usually turn it on unless I have a very specific reason not to.
Let’s Build One: The Terraform Way
Enough theory. Let’s build one with Terraform, because clicking through the AWS console is for chumps. This example sets up an NLB for a TCP service on port 80, forwarding to instances on port 8080, with cross-zone load balancing enabled.
resource "aws_lb" "network_lb" {
name = "my-hardcore-nlb"
internal = false
load_balancer_type = "network"
# Enable cross-zone load balancing because we're not maniacs
enable_cross_zone_load_balancing = true
# Subnets are crucial - one per AZ for redundancy
subnet_mapping {
subnet_id = aws_subnet.public_a.id
}
subnet_mapping {
subnet_id = aws_subnet.public_b.id
allocation_id = aws_eip.nlb_ip.id # Assign an EIP to this AZ
}
}
resource "aws_lb_target_group" "nlb_tg" {
name = "nlb-tcp-targets"
port = 8080 # Your application's port
protocol = "TCP"
vpc_id = aws_vpc.main.id
# For TCP, health checks are also TCP. No HTTP status codes here.
health_check {
protocol = "TCP"
port = "traffic-port"
interval = 30
}
}
resource "aws_lb_listener" "front_end" {
load_balancer_arn = aws_lb.network_lb.arn
port = "80"
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.nlb_tg.arn
}
}
# Attach your instances (e.g., an ASG) to the target group
resource "aws_autoscaling_attachment" "asg_attachment" {
autoscaling_group_name = aws_autoscaling_group.app.name
lb_target_group_arn = aws_lb_target_group.nlb_tg.arn
}
When to Use It (And When to Run Away)
Use an NLB for:
- Extreme performance and ultra-low latency (think microseconds).
- TCP or UDP streaming applications (video, gaming, financial data).
- Handling volatile and massive traffic spikes—NLBs are designed to scale almost instantly.
- Needing static IP addresses for your load balancer.
Avoid it for:
- HTTP/HTTPS traffic where you need path-based routing, SSL termination, or advanced request manipulation. For that, you want the Application Load Balancer. The NLB is a blunt instrument; the ALB is a scalpel.
- If you need the load balancer to handle SSL offloading for you. While an NLB can do TLS termination, it’s a different beast and often the Application Load Balancer is a better fit for that use case.
The NLB is a precision tool. It doesn’t hold your hand. It assumes you know what you’re doing with your network and your application. Get it right, and it’s an absolute workhorse. Get the security groups wrong, and you’ll be staring at a Connection refused error for hours, wondering what you did to deserve this. I’ve been there. Learn from my pain.