38.5 Cloud Load Balancing Integration

Right, let’s talk about getting traffic into your cluster. You’ve built this brilliant, distributed application, and now you need to show it to the world. This is where GKE’s integration with Google Cloud’s Load Balancers goes from “nice-to-have” to “why would you ever do it any other way?”

The magic here is that GKE doesn’t just work with Cloud Load Balancing; it automates it. You don’t manually create load balancers, health checks, or backend services in the Google Cloud console. You declare what you want—a public HTTP(S) service, an internal service, SSL offloading—and GKE talks to Google’s control plane to build the real, global infrastructure for you. It’s like having a brilliant, hyper-competent intern who you just tell “make this app available” and they handle the 47-step checklist without bothering you.

The Two Flavors of Load Balancers

First, you need to know you have two primary choices, and picking the wrong one will cause you pain. You expose a service of type LoadBalancer in your Kubernetes manifest, and an annotation tells GKE which one to build.

The Network Endpoint Group (NEG)-based load balancer (often called the “container-native” one) is the modern, correct choice. It points the load balancer’s backend directly at your individual pods, not at the nodes. This is a big deal. It means better performance (fewer hops), more precise traffic distribution, and—crucially—it plays nicely with Pod autoscaling. A pod comes online, it’s immediately added to the load balancer’s backend. No waiting for a kube-proxy to update node iptables rules.

Then there’s the nodeport-based load balancer (the classic way). This one points the load balancer at the nodes in your cluster, on a specific port assigned by Kubernetes. Traffic comes in, hits a node, and then kube-proxy routes it to a pod. It’s more indirect, less efficient, and can lead to hair-pinning (traffic going node -> pod on the same node) which is just inelegant. You generally want to avoid this unless you have a very specific reason.

Here’s how you tell GKE you’re not a caveman and you want the good stuff (the NEG-based one):

apiVersion: v1
kind: Service
metadata:
  name: my-awesome-service
  annotations:
    cloud.google.com/load-balancer-type: "Internal" # For an internal LB. Omit for public.
    cloud.google.com/backend-config: '{"default": "my-backend-config"}'
  labels:
    app: my-awesome-app
spec:
  selector:
    app: my-awesome-app
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer

Health Checks: Don’t Let Google Murder Your Pods

This is the biggest “gotcha” and where most people get bitten. The Cloud Load Balancer is not part of your cluster. It’s a separate Google Cloud product. It determines if your pods are healthy by sending health check requests from outside the cluster. This has major implications:

Your health check endpoint (/ by default) must be accessible from the public internet for a public LB, or from the VPC for an internal LB. If you have any firewall rules, namespace policies, or istio-sidecar configurations that block external traffic, your health checks will fail. Google will deem your pod unhealthy and stop sending it traffic. It’s a common face-palm moment.
The default health check settings are… aggressively impatient. You can and should configure this using a BackendConfig CRD. This is GKE’s way of letting you tweak the knobs on the underlying Google Cloud backend service.

# backend-config.yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: my-backend-config
spec:
  healthCheck:
    checkIntervalSec: 15
    timeoutSec: 5
    healthyThreshold: 1
    unhealthyThreshold: 2
    port: 8080
    requestPath: /health
  logging:
    enable: true # This sends your LB logs to Cloud Logging. Do this. Always.

And then you reference it in your Service annotation like you saw above. Setting a custom, lightweight /health endpoint and making these timeouts sensible is not a best practice; it’s a requirement for anything resembling a stable service.

The Internal Load Balancer For When You’re Fancy

Sometimes you don’t want to talk to the internet. Maybe this is an API for your frontend services within your VPC. The internal load balancer is perfect for this. The annotation cloud.google.com/load-balancer-type: "Internal" is all it takes. It creates a load balancer that’s only accessible from within your Google Cloud network. It’s clean, secure, and elegantly simple. Just remember the health check rules still apply—they’ll need to be reachable from within the VPC.

The Global Anycast IP and Why It’s Cool

When you create a public HTTP(S) load balancer, you get a single, static anycast IP address. This IP is advertised from multiple points around Google’s globe-spanning network. This means a user in Tokyo and a user in Iowa will both hit that same IP, but their traffic is routed to the nearest POP and then to your cluster. You get low latency and high availability without thinking about it. You never have to worry about DNS failover or multi-region IP management. It’s just done. This is one of those things that feels like cheating, and you should appreciate it.

The rough edge? The provisioning isn’t instantaneous. Spinning up the global load balancer can take a few minutes. Don’t panic if your Service sits in Pending for a bit after you kubectl apply. It’s not broken; it’s just building you a planet-scale traffic distribution system. Try to be patient.