17.3 Persistent Volumes (PV): Cluster-Level Storage Resources

Right, so you’ve got a cluster. It has nodes, they have disks. But pods are these beautiful, ephemeral little monsters that get scheduled all over the place. You can’t tell a pod, “Hey, just store your precious database files on the local disk of node k8s-node-07,” because next week that pod might be running on k8s-node-12 and it would be very, very sad and dataless. This is the problem Persistent Volumes (PVs) solve. Think of a PV as a piece of storage in the cluster that has been provisioned by an administrator. It’s a cluster resource, just like a node is a cluster resource. It exists independently of any pod’s life cycle.

The key abstraction here is that a PV separates the what (the actual storage: its type, size, location, access modes) from the who (the pod that needs to use it) and the how (the specific path to mount it). This is a brilliant bit of platform engineering because it lets devs ask for storage without needing to know the nitty-gritty details of your NFS servers or your cloud provider’s block storage API.

The Anatomy of a Persistent Volume

A PV is defined by a few critical attributes that you, the cluster admin, are responsible for setting. Let’s break them down with a realistic example. Say we’ve got an NFS share we want to make available.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-manual-nfs-pv  # This is the name you'll reference later.
spec:
  capacity:
    storage: 10Gi  # The PV's total capacity. The storage system might enforce this, or it might not. Be cynical and test this.
  volumeMode: Filesystem  # Can also be 'Block' for raw block devices. Filesystem is what you want 99% of the time.
  accessModes:
    - ReadWriteMany  # This is the big one. Can it be mounted by many nodes? Or just one?
  persistentVolumeReclaimPolicy: Retain  # The most important setting. What happens when the claim is deleted? 'Retain', 'Delete', or 'Recycle' (deprecated).
  storageClassName: manual-nfs  # This groups PVs together. A PV *can* have no class (""), but it's better to be explicit.
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /exports/data
    server: 10.0.5.21

Let’s talk about accessModes because this is where people faceplant. The mode must be supported by your underlying storage system, and it dictates pod behavior:

ReadWriteOnce (RWO): The volume can be mounted as read-write by a single node. Not a single pod—a single node. If you have ten pods on the same node, they could all mount it. This is your standard for things like cloud block storage (e.g., AWS EBS, GCP PD).
ReadOnlyMany (ROX): Can be mounted read-only by many nodes.
ReadWriteMany (RWX): The holy grail and a common source of pain. Can be mounted as read-write by many nodes. This is typically the domain of file-based systems like NFS, CephFS, or Azure Files. If your database (e.g., Postgres) says it needs RWX to run in HA mode, be very skeptical and read the fine print; it’s often a terrible idea.

The Reclaim Policy: Don’t Get Your Data Deleted

The persistentVolumeReclaimPolicy is your safety catch. When a user deletes their PersistentVolumeClaim (the “who” that wants the storage), this policy tells the cluster what to do with the PV itself.

Retain: This is the safe, sensible default for anything you care about. The PVC is deleted, but the PV remains. Its status flips to Released. The data is still on the storage backend, but the volume is essentially locked and cannot be claimed again without manual admin intervention (you have to delete and recreate the PV object). Use this for anything production-related.
Delete: The PV object and the underlying storage asset (e.g., the EBS volume, the Azure Disk) are deleted. Poof. Data gone forever. This is great for non-production environments where you want to avoid paying for unused storage, but it’s terrifyingly dangerous otherwise.
Recycle: This is deprecated and effectively useless for any real storage system. It was a hack that involved running rm -rf /the/volume/* on the volume. Just pretend it doesn’t exist.

Static vs. Dynamic Provisioning

What I just showed you is static provisioning. An admin manually creates the storage (an EBS volume, an NFS export) and then manually creates a PV object that points to it. This works, but it’s a pain in the neck. You, the admin, are now in the business of managing a pool of pre-created storage volumes.

The modern, far superior way is dynamic provisioning. Here, you never manually create a PV. Instead, you define StorageClasses, which are blueprints for how to dynamically create storage on demand. When a user creates a PVC that requests a StorageClass, the cluster automatically provisions the storage and creates the PV object for you. It’s magic. We’ll get into the glorious details of this when we talk about StorageClasses, but just know that manually creating PVs is quickly becoming a legacy operation. You should still understand it, because you’ll inevitably have to debug it when something goes wrong.