Alright, let’s talk about how your Services actually decide which Pod gets the traffic. You’ve deployed your fancy, multi-Pod Deployment, exposed it with a Service, and you’re probably thinking, “Great, traffic will just spread out evenly!” And by default, you’d be right. But the real world is messy, and sometimes you need to bend that default behavior to your will. That’s where session affinity and traffic policies come in.

The Default: A Fair-Weather Friend

Out of the box, a standard ClusterIP or NodePort Service uses a completely stateless, round-robin load balancing algorithm across all ready Pods it selects. It’s the epitome of fairness. This is handled by kube-proxy on each node, either via iptables or IPVS rules. It’s simple, effective, and for most stateless workloads, it’s exactly what you want. But “most” isn’t “all.” The moment you need a user’s requests to consistently hit the same Pod—maybe because of an in-memory session, a cache, or some other sticky piece of state—this fairness becomes a problem.

Session Affinity: For When You Need to Be Sticky

Session affinity (often called “sticky sessions”) is the feature that tells the Service, “Hey, if a request comes from a particular client, try to send all its subsequent requests to the same Pod.” In Kubernetes, you configure this on the Service spec with sessionAffinity and sessionAffinityConfig.

The only type of session affinity Kubernetes offers out-of-the-box is based on the client’s IP address (ClientIP). It’s not based on cookies or some complex header; it’s a simple, network-level stickiness. The Service’s kube-proxy machinery hashes the client’s IP and uses that to map to a backend Pod.

Here’s how you’d set it up:

apiVersion: v1
kind: Service
metadata:
  name: my-sticky-service
spec:
  selector:
    app: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800 # 3 hours, the default and maximum

Now, the big “gotcha”: This is not a guarantee. It’s a best-effort affinity. Why? Because Pods die. If the Pod a client is affinitized to gets terminated, the connection breaks, and the next request from that client will be routed to a different Pod, which may not have its session state. This is why session affinity is a band-aid, not a solution. The real solution for stateful applications is to offload that state to a shared database or cache (like Redis). Use session affinity when you can’t do that and you understand the trade-off.

Also, note the timeoutSeconds. This is how long the affinity mapping lasts after the last request from that IP. The default is a whopping 3 hours. You might want to tune this down unless you enjoy holding onto mappings for clients that disconnected hours ago.

The externalTrafficPolicy Landmine

This one is a classic “it works perfectly until it doesn’t” scenario, and it’s specific to Services of type NodePort or LoadBalancer. It controls how traffic from outside the cluster is handled once it hits a node.

  • externalTrafficPolicy: Cluster (the default): This is the “everything is fine, just distribute the traffic!” option. An external request can land on any node in your cluster. If that node is running one of your Pods, great, it handles it. If it isn’t, the node forwards the traffic to a node that does have a Pod, using that node’s own IP. This preserves the original source IP of the request (it gets masqueraded as the node’s IP), which can be a problem for auditing or security tools. The upside is perfect load distribution; the downside is an extra hop, which adds latency, and the loss of the true client IP.

  • externalTrafficPolicy: Local: This is the “keep it on the ranch” option. A node will only proxy external traffic to Pods that are running on itself. If no Pods for that Service exist on the node, the connection simply fails. This sounds insane, but it has two massive advantages: it avoids the extra network hop (reducing latency) and, crucially, it preserves the original client source IP address. The downside is potentially terrible load distribution. If you have three nodes but only one has your Pod, that one node gets hammered with all the external traffic while the other two nodes reject connections to that Service.

apiVersion: v1
kind: Service
metadata:
  name: my-loadbalancer
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  externalTrafficPolicy: Local # Use this if you need the real client IP!

You must choose the right policy. Need the real client IP for your application’s logic, logging, or security? You have to use Local. But if you do, you’d better make sure your Pods are well-distributed across all your nodes (use a DaemonSet or an anti-affinity rule) or you’ll have a very bad, unbalanced time. Most cloud controllers will even warn you about this in their load balancer health checks. It’s a sharp knife; be careful.