Right, let’s talk about what happens when you decide to fire a server. It’s not as simple as just yanking the plug. If you do that, you’re a monster, and you’ll have a trail of confused users and failed requests behind you. This is where Connection Draining (for Classic and Network Load Balancers) and its slightly more nuanced sibling, Deregistration Delay (for Application Load Balancers), come in. Think of it as the polite way to tell your instances, “Hey, you’re fired, but finish what you’re doing first.”

When you deregister an instance from a target group or disable it in an ELB, the load balancer stops sending new requests to it. That’s step one. But what about the requests already in flight? The ones that are mid-calculation, waiting for a database response, or slowly streaming a large file to a user? This feature is your guarantee that those requests get a fighting chance to complete. It’s the difference between a graceful shutdown and a digital drive-by shooting.

How It Actually Works (The Polite Countdown)

You set a timeout value, between 1 and 3600 seconds. The default is a painfully slow 300 seconds, which is five minutes. I’ll get to why that’s often insane in a moment.

Once you trigger deregistration, the instance enters the draining state (CLB/NLB) or draining state for an ALB target (though the API calls it deregistering). The ELB stops sending it new connections, but it keeps the existing ones open. It then starts a timer. Any existing requests can continue to process. If they finish before the timer runs out, great. The connection closes cleanly. If they’re still running when the timer hits zero? Poof. The load balancer terminates the connection, the user gets an error, and you get a frustrated ticket. The goal is to set the delay long enough to cover your longest-running common request, but not so long that a broken instance hangs around for an eternity.

Here’s the thing people miss: this timer is a maximum, not a minimum. If all in-flight requests finish in 2 seconds, the instance is deregistered in 2 seconds. It doesn’t wait the full 300. You’re setting an upper bound for how long you’re willing to wait for a clean shutdown.

The Default is a Trap (And How to Fix It)

AWS sets the default deregistration delay to 300 seconds. 300! That’s five minutes. This was probably chosen by a well-meaning engineer who assumed you’d have some horrendously long-lived connections. For 99% of us running web services with request times measured in milliseconds, this is absurd. It means if an instance fails its health check, it will sit there, draining, for five whole minutes before the ALB finally gives up and kills it. During this time, it’s still serving some traffic (the in-flight stuff), which can be confusing as hell when you’re trying to get a faulty instance out of rotation.

Do yourself a favor and change it. For a typical JSON API, start with something like 30 seconds. For most web apps, you can probably get away with 10-20 seconds. You need to profile your application. Find the 99th or 99.9th percentile request duration and add a comfortable buffer. That’s your new value.

You can set it on the target group itself:

# Update the deregistration delay for a target group to a sane 30 seconds
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-api/1234567890abcdef \
  --attributes Key=deregistration_delay.timeout_seconds,Value=30

And for a Classic Load Balancer, you set it on the LB itself:

# Set connection draining timeout for a CLB
aws elb modify-load-balancer-attributes \
  --load-balancer-name my-classic-lb \
  --load-balancer-attributes "{\"ConnectionDraining\":{\"Enabled\":true,\"Timeout\":30}}"

The Curse of the Sticky Session

Here’s the ugly edge case. Sticky sessions (session affinity) can throw a wrench into your graceful shutdown plans. Imagine you have a user’s session pinned to the instance you’re trying to deregister. The load balancer knows not to send new requests there, but what if the user’s browser is aggressively polling an endpoint every second? Those are new requests, but for the duration of the session, they’ll still be sent to the draining instance. This means your instance might never actually finish draining until the session cookie expires or the user stops their activity.

This is a fundamental conflict between stickiness and resilience. The best practice is to avoid server-side stickiness like the plague unless you have an absolutely ironclad business reason for it. If you must have it, make your session timeouts short and implement client-side retry logic that can handle a connection being terminated.

Best Practices from the Trenches

  1. Automate It: Your deployment scripts or orchestration system (CodeDeploy, Elastic Beanstalk, your own custom script) should always deregister targets first, wait for the delay period, then terminate the instance. Don’t rely on humans to do this in the right order.
  2. Monitor the draining state: In your automation, check that the instance has actually moved to the draining state and that the healthy host count has dropped before proceeding. Don’t just assume the API call worked.
  3. Coordinate with Your App: The load balancer gives you time, but your application needs to play ball. Use your environment’s shutdown hooks (like SIGTERM signals in Linux) to finish up work, stop accepting new connections, and cleanly exit. The ELB’s draining delay gives your app’s shutdown process the time it needs to do this correctly.
  4. Set a Sane Value: I said it before, but it’s worth repeating. Change the default from 300 seconds to something that reflects your application’s reality. Your sanity during deployments will thank you.

This is one of those features that separates a robust, production-ready setup from a amateur-hour deployment. It costs nothing to enable and configure correctly, but the day you need it—and you will—it saves you from a world of pain.