Right, let’s talk about storage. It’s the part of Kubernetes everyone secretly dreads. You can’t just kubectl scale your database (well, you can, but you really, really shouldn’t). Your applications, however, have needs. They need to remember things. For that, we turn to the unsung heroes of the GKE world: the CSI drivers. Specifically, the ones for Persistent Disk (your go-to block storage) and Filestore (a managed NFS service for when you need a shared filesystem). Google was kind enough to build these for us and ship them as a default, integrated part of GKE. This is a massive win, because if you’ve ever had to manage a CSI driver yourself, you know it’s about as fun as debugging a YAML indentation error at 2 AM.

The magic of these drivers is that they translate the abstract, platform-agnostic language of a Kubernetes PersistentVolumeClaim into a very specific, concrete Google Cloud API call. You say “I want 100Gi of fast storage,” and GKE says “Righto, one pd-ssd coming up.” It handles the provisioning, attaching, mounting, and—crucially—the cleanup for you. No more orphaned disks eating your budget because someone forgot to kubectl delete pvc.

The Persistent Disk CSI Driver: Your Workhorse

This is your default. 99% of your stateful workloads will be happy here. It provides a raw block device (a Persistent Disk) to a single Pod. If another Pod tries to mount it, it’ll fail spectacularly—this is by design, for data safety. It’s perfect for databases (PostgreSQL, MySQL), message queues (RabbitMQ), and anything else that likes to have exclusive access to its data.

Let’s get something running. You’ll start by defining a StorageClass. GKE provides a few out of the box, but defining your own is a best practice—it makes your intent crystal clear.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: my-ssd-sc
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
  type: pd-ssd
  replication-type: regional-pd

Notice the volumeBindingMode: WaitForFirstConsumer. This is brilliantly pragmatic. It tells Kubernetes to wait until a Pod is actually scheduled before provisioning the disk. Why? Because the disk must be in the same zone as the node running the Pod. If you use the immediate binding mode and your Pod can’t run in the zone where the disk was provisioned, your Pod will be stuck forever. WaitForFirstConsumer avoids this entire class of silly failure.

Now, let’s claim some storage and use it.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-app-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: my-ssd-sc
  resources:
    requests:
      storage: 100Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: my-app-pod
spec:
  containers:
    - name: app
      image: my-app:latest
      volumeMounts:
        - mountPath: /data
          name: data-volume
  volumes:
    - name: data-volume
      persistentVolumeClaim:
        claimName: my-app-pvc

Boom. Your pod has a fast, durable SSD. The replication-type: regional-pd parameter is a killer feature—it gives you a highly available disk replicated across two zones in your region. If a whole zone goes down, your data is still safe. Just remember, to use it, your Pod needs to be scheduled in one of those two zones. It’s not magic, it’s just good engineering.

The Filestore CSI Driver: For When You Need to Share

Now, let’s say you have a legacy application that insists on writing files to a shared directory. Or you’re running a CI/CD workload where twenty pods need to access the same set of build artifacts simultaneously. This is the moment you reach for Filestore via its CSI driver. It’s literally a managed NFS server that GKE can hook into.

The setup is similar, but the StorageClass is different. You’re not defining disk types anymore; you’re defining Filestore instance tiers.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: my-filestore-sc
provisioner: filestore.csi.storage.gke.io
parameters:
  tier: STANDARD
  network: default

And your PersistentVolumeClaim would look like this. Crucially, the access mode is now ReadWriteMany (RWX).

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-data-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: my-filestore-sc
  resources:
    requests:
      storage: 1Ti

Here’s the rough edge, and it’s a big one: capacity. Filestore has minimum size requirements that are, frankly, comical for a lot of use cases. The basic STANDARD tier starts at 1 TiB (that’s 1024 GiB). The higher-performance tiers start at 2.5 TiB. You’re paying for a whole NFS server, not just the gigs you use. So unless you genuinely need that much space or the shared access, Persistent Disk is almost always the more economical and sensible choice. It’s a classic case of the right tool for the right job—just make sure your project’s budget agrees with your job definition.

The Devil in the Details: Pitfalls & Practices

First, always set allowVolumeExpansion: true in your StorageClass. You will always need more space later. Expanding a PVC is a online operation for PDs; it’s trivial and your pod won’t even notice. Not planning for it is just stubborn.

Second, snapshots. The CSI drivers enable these too. You can create a VolumeSnapshotClass that points to a Google Cloud Storage location, and suddenly you have a point-in-time copy of your disk for backups or testing. It’s infinitely easier than trying to gcloud compute disks snapshot from outside the cluster and then figure out how to import it.

Finally, watch out for the Permissions gotcha. The default compute engine service account has broad permissions to create disks. Your GKE nodes run as this identity. This is fine for getting started, but in a real production environment, you should create a dedicated service account with a minimal permission set (e.g., just roles/compute.storageAdmin) and assign it to your node pool. This is the “principle of least privilege” in action, and it’s what separates the pros from the amateurs.