38.2 Node Pools: Spot VMs, GPU Nodes, and Preemptible Instances
Alright, let’s talk about the real workhorses of your GKE cluster: the node pools. Think of your cluster as a nightclub; the control plane is the bouncer and manager, but the node pools are the actual dance floors where your pods (the patrons) get down to business. You don’t want just one type of dance floor. You need a VIP section, a cheap area for the rowdy crowd, and maybe a special room with fancy equipment. That’s what node pools are for.
The default node pool GKE creates for you is fine for a quick test drive, but it’s like using a sledgehammer to crack a nut for everything. It’s probably a overpriced e2-standard-4 instance running in a random zone. We can do better. We’re going to specialize.
The Cheap Seats: Preemptible and Spot VMs
Let’s cut to the chase: most of your batch jobs, development environments, and stateless web servers don’t need a guaranteed forever-home. They can tolerate being evicted with a few minutes’ warning. This is where preemptible VMs (Google’s classic product) and Spot VMs (their newer, sometimes-even-cheaper evolution) come in. The key difference? Preemptibles have a fixed maximum lifetime of 24 hours and a somewhat predictable billing model. Spots can run for longer but are priced based on surplus capacity, so their cost can fluctuate, though it’s almost always lower.
The reason they’re cheap—like 60-90% cheaper—is because Google can kick them out at any time to make room for someone paying full price. GKE handles this gracefully by marking the node for termination, which triggers the Kubernetes eviction process, giving your pods a chance to shut down properly.
You must design your workloads for this. If your app can’t handle a pod disappearing, don’t put it here. The classic “don’t do this” example is a single-replica stateful database. The classic “perfect for this” example is a web frontend with 10 replicas behind a load balancer.
Here’s how you create a node pool dedicated to this cost-saving chaos at cluster creation. Notice we’re using the spot variant. It’s generally the better choice these days.
gcloud container node-pools create "spot-pool" \
--cluster=my-cluster \
--spot \
--machine-type=e2-standard-4 \
--num-nodes=3 \
--node-labels=cost-tier=spot \
--node-taints=spot=true:PreferNoSchedule
Why the taint (--node-taints)? Because it’s irresponsible to let just any pod schedule on these unreliable nodes. The taint acts like a “Keep Out” sign. Your pods have to explicitly tolerate the taint to run there. This forces you to make a conscious choice about which workloads are resilient enough. The label (--node-labels) is just for your own convenience, to easily identify these nodes.
To let a pod run on this pool, your Deployment would need a toleration and ideally a nodeSelector:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-resilient-app
spec:
template:
spec:
containers:
- name: app
image: my-app:latest
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "PreferNoSchedule"
nodeSelector:
cost-tier: "spot"
The Muscle: GPU Nodes
Need to do some machine learning, video encoding, or fancy 3D rendering? You need GPUs. This isn’t a suggestion; it’s a requirement. CPU-based attempts at these tasks are an exercise in frustration and wasted money.
The first rule of GPUs in GKE: you must use a dedicated node pool. You never, ever want to mix GPU and non-GPU nodes in the same pool. The drivers and tooling are complex enough without adding variability. You also need to use a system image that has the necessary NVIDIA drivers pre-installed (Google provides these).
Here’s the incantation for creating a GPU pool. It’s more expensive, so we’re starting with just one node. Pro-tip: use a preemptible or spot GPU node for development and testing. The savings are astronomical.
gcloud container node-pools create "gpu-pool" \
--cluster=my-cluster \
--machine-type=n1-standard-8 \ # Don't skimp on CPU/RAM for your GPU node!
--accelerator type=nvidia-tesla-t4,count=1 \
--num-nodes=1 \
--node-taints=gpu=true:NoSchedule \ # Critical! We don't want non-GPU pods here.
--node-labels=accelerator=nvidia-tesla-t4 \
--image-type=cos_containerd \ # Use Google's Container-Optimized OS with containerd
--preemptible # Because I'm cheap and this is for testing
The NoSchedule taint is even stricter than before. Pods must have the exact toleration to even be considered for this node. Now, for the pod itself, you need to request the GPU resource and tolerate the taint. Kubernetes has a special resource type for this.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-ai-app
spec:
template:
spec:
containers:
- name: model
image: tensorflow/tensorflow:latest-gpu
resources:
limits:
nvidia.com/gpu: 1 # This is the magic. Must be a limit, not a request.
# ... other container config
tolerations:
- key: "gpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Notice that? The nvidia.com/gpu resource is set as a limit, not a request. That’s a Kubernetes GPU quirk. It’s all or nothing. And for heaven’s sake, use a base image that already has the CUDA libraries and drivers, like the official TensorFlow ones. Trying to install them yourself in the container is a recipe for migraines.
Best Practices and Pitfalls
- Don’t Pollute the Default Pool: The moment you need a special node (GPU, Spot, high-memory), create a new pool. Keep your default pool pristine for system workloads and pods that don’t have special requirements.
- Taints and Tolerations are Your Friends: They are not just for GPUs and Spots. Use them to create logical sections within your cluster (e.g.,
network-critical=true:NoSchedule). They enforce discipline in scheduling. - Node Auto-Repair is (Usually) Good: Leave it on. It automatically cordons and recreates nodes that fail health checks. It’s one less thing for you to manage.
- The Cold Hard Truth about Preemptibles/Spots: Their availability varies by zone and machine type. If you’re having trouble scaling your spot pool, try a different zone or a more common machine type like
e2-standard-2. GKE’s provisioner will tell you if it can’t find capacity. - Clean Up After Yourself: If you tear down your cluster, the node pool’s underlying Compute Engine instances will be deleted. But if you just delete a node pool, it does exactly that—deletes the nodes and their disks. Make sure your persistent data is on a PersistentVolume that isn’t tied to the node’s local disk!