38.4 GKE Networking: VPC-Native Clusters and Alias IPs

Right, let’s talk networking. This is where most people’s eyes glaze over, but stick with me—it’s also where you’ll solve your most baffling problems and prevent your future self from sending angry emails to past you. GKE’s networking model, specifically the “VPC-native” bit, is one of those things Google got genuinely right. It saves you from a world of self-inflicted pain.

The old way, which GKE tellingly calls “routes-based,” was a bit of a kludge. It worked by programmatically creating a Google Cloud route for every single Pod in your cluster. You’d spin up a 500-node cluster, and suddenly your project had thousands of routes. It was a management nightmare, slow to propagate, and, crucially, hit hard quotas. It was absurd. Thankfully, it’s now deprecated and you should never, ever use it.

The modern, sane, and default approach is the VPC-native cluster. Instead of fighting the infrastructure, it embraces it. The core magic trick here is something called alias IPs.

What in the World is an Alias IP?

Think of your GKE node (the VM) as an apartment building. The VM itself has a primary IP address (the building’s address). In a VPC-native cluster, Google Cloud carves out a whole secondary range of IPs (a whole street of addresses just for this building) and assigns it to the VM’s network interface. Each Pod then gets its own IP from this secondary range—its own apartment number. These Pod IPs are “aliases” on the VM’s main network interface.

Why is this brilliant? Because the VPC network itself natively understands these alias IPs. You don’t need a separate route for Pod 172.16.0.12; the network already knows it lives on that specific VM. It’s direct, it’s fast, and it scales like a dream because you’re limited by the much more generous alias IP ranges quota, not the paltry routes quota.

Here’s the kicker: because these Pod IPs are first-class citizens in your VPC, a Pod can talk directly to a Google Cloud service like Cloud SQL or Memorystore without any hairpin NAT nonsense. The firewall rules you know and love (or love to hate) apply directly to Pod IPs. This is a huge deal for security and simplicity.

How to Define Your IP Ranges (Don’t Screw This Up)

When you create a cluster, you define two crucial IP ranges. Not doing this right is the most common pitfall. You specify them using the --cluster-ipv4-cidr and --services-ipv4-cidr flags (or their equivalents in the console/Terraform).

gcloud container clusters create my-awesome-cluster \
    --zone us-central1-a \
    --cluster-ipv4-cidr "10.0.0.0/16" \    # This is for your PODS
    --services-ipv4-cidr "10.1.0.0/16"     # This is for cluster-internal SERVICES (like kube-dns)

Best Practice #1: Plan this ahead. These ranges must not overlap with any other subnet in your VPC or any peered VPCs. I’ve seen teams bring down entire environments by carelessly overlapping ranges in a peering setup. Use a proper IP Address Management (IPAM) plan, even if it’s just a spreadsheet. For larger orgs, tools like Terraform can help manage this.

Best Practice #2: Size them appropriately. The Pod range (--cluster-ipv4-cidr) needs to be big enough for all the Pods you’ll ever run. A /16 (65,331 IPs) is a safe starting bet. The Services range (--services-ipv4-cidr) can be smaller—a /20 (4,096 IPs) is usually plenty, as you typically only have a few dozen services. Remember, each Service gets a cluster IP, and if you use LoadBalancers, they’ll need an IP from here too.

The Reality of Working with Alias IP Ranges

So you’ve created the cluster. What now? The secondary range for Pods gets automatically created as a subnet in your VPC. This is fantastic because it means you can apply standard VPC firewall rules directly to your Pods.

Want to let your backend Pods talk to a Cloud SQL instance but not your frontend Pods? Easy. Tag your Pods with something like app: backend and write a firewall rule that allows traffic from targets with that network tag to the SQL instance’s IP.

# Example gcloud command to create a firewall rule for backend pods
gcloud compute firewall-rules create allow-backend-to-sql \
    --network my-vpc \
    --allow tcp:5432 \
    --target-tags backend-pod \
    --source-ranges 172.16.0.0/16  # The CIDR of your SQL instance

Note: In practice, you’d use a more specific source range for your SQL instance, not the whole /16. This is just an example.

The gotcha? Network tags are applied to the Node VM, not the Pod. GKE automatically tags nodes with the Pods’ tags. So if you have a Pod with app: backend, the node it runs on will get a tag like gke-my-awesome-cluster-backend-pod. Your firewall rules must target these computed node tags. It’s a layer of indirection that makes perfect sense from Google’s perspective but can confuse folks coming from pure Kubernetes networking.

The Verdict

The VPC-native model is unequivocally the right way to go. It’s cleaner, more scalable, and deeply integrated. The designers made a questionable choice years ago with the routes-based model, but they’ve more than atoned for it with this. The only rough edge is the slight mental gymnastics required for firewall rules, but once you grasp that the node is the enforcement point for the Pods living on it, it clicks. Embrace the alias IP. Your future self will thank you for it.