Storage | mikePietsch.com

17.9 CSI: Container Storage Interface and Third-Party Drivers

Right, so you’ve got your PersistentVolume and PersistentVolumeClaim objects all figured out. You can dynamically provision a volume with a StorageClass and feel pretty good about yourself. But let’s be honest: the built-in drivers for your cloud provider’s block storage are… fine. They get the job done. But what if your job is weirder? What if you need to talk to a storage system that isn’t AWS EBS, GCP PD, or Azure Disk? You know, something like Ceph, GlusterFS, MinIO, or that legacy NAS box in the corner that the storage team swears is “perfectly reliable”?

17.8 Volume Expansion: Growing a PVC Online

Right, so you’ve got your PVC, it’s happily bound to a PV, and your application is humming along. Then you get the dreaded No space left on device error. Classic. The old you would have to bring the whole operation to a grinding halt: scale down the app, fiddle with the underlying storage, cross your fingers, and hopefully bring it all back online without data loss. A total party. Thankfully, the Kubernetes gods have bestowed upon us the gift of online volume expansion. This is the magic trick of letting your PVC grow while it’s still mounted and actively written to by a pod. No downtime. It’s genuinely cool, and I’m not often impressed.

17.7 StorageClasses and Dynamic Provisioning

Right, so you’ve manually created a PersistentVolume and bound it to a PersistentVolumeClaim. It works. It’s also a colossal pain in the neck. You had to get your ops team to pre-provision that 100GB of storage on some NFS server or in your cloud account, write a YAML manifest pointing to it, and hope the PVC you eventually create matches its specs. This is the infrastructure equivalent of hand-knitting your own ethernet cables. It’s static provisioning, and it’s fine for a pet project, but it falls apart completely when you need to scale.

17.6 PV Reclaim Policies: Retain, Recycle, Delete

Alright, let’s talk about what happens when you’re done with your PersistentVolumeClaim. You delete the PVC, and then what? Does the underlying storage just vanish into the ether? Does it hang around like a ghost at a party, cluttering up your cluster and your cloud bill? This is where the persistentVolumeReclaimPolicy comes in. It’s the PV’s instruction manual for what to do after its one true love, the PVC, has been deleted. It’s the “break glass in case of emergency” plan for your data, and you absolutely need to know how it works.

17.5 Access Modes: ReadWriteOnce, ReadOnlyMany, ReadWriteMany

Right, let’s talk about access modes. This is where the rubber meets the road for your data. You’ve told Kubernetes you want storage, and it’s given you a shiny PersistentVolumeClaim (PVC). But can one pod use it? Can a hundred? Can they all write to it? The accessModes field in your PVC and PV is your way of laying down the law about this. Think of it as a traffic cop for your data. You’re not just saying “I need storage”; you’re saying “I need storage and here’s how it’s going to be used.” This is crucial because the underlying storage technology (your cloud provider’s disk, an NFS server, a Ceph cluster) might not support every possible use case. Kubernetes uses your requested access mode to find a PersistentVolume that can actually deliver that behavior.

17.4 Persistent Volume Claims (PVC): Requesting Storage

Right, so you’ve got your Persistent Volumes (PVs) sitting there, ready for action. They’re the disks. But you and I don’t just go around grabbing random disks off a shelf and plugging them into our servers. That would be chaos. This is Kubernetes, not a yard sale. We need a system. Enter the Persistent Volume Claim (PVC). Think of a PVC as your very polite, very specific request to the cluster: “Excuse me, I would like approximately this much storage, with these performance characteristics, please and thank you.”

17.3 Persistent Volumes (PV): Cluster-Level Storage Resources

Right, so you’ve got a cluster. It has nodes, they have disks. But pods are these beautiful, ephemeral little monsters that get scheduled all over the place. You can’t tell a pod, “Hey, just store your precious database files on the local disk of node k8s-node-07,” because next week that pod might be running on k8s-node-12 and it would be very, very sad and dataless. This is the problem Persistent Volumes (PVs) solve. Think of a PV as a piece of storage in the cluster that has been provisioned by an administrator. It’s a cluster resource, just like a node is a cluster resource. It exists independently of any pod’s life cycle.

17.2 HostPath: Mounting Node Filesystem Paths

Right, so you want to use a hostPath volume. Let’s be honest: you’re probably doing this for one of two reasons. Either you’re just testing something and need a quick and dirty way to get a file into a Pod, or you’re about to do something deeply inadvisable in production. I’m not here to judge, but I am here to make sure you know exactly what you’re getting into. This is the Kubernetes equivalent of using duct tape—it gets the job done immediately but you’ll regret it later when everything falls apart.

17.1 Ephemeral Volumes: emptyDir, configMap, secret, projected

Right, let’s talk about the stuff that doesn’t stick around. Ephemeral volumes are the sprinters of the Kubernetes storage world: blindingly fast, incredibly useful for a specific leg of the race, and then they vanish without a trace. They’re perfect for all the temporary, scratch-space, in-flight nonsense your application needs to do its job right now. Unlike their persistent cousins, these guys are tied to the lifecycle of a Pod. The Pod gets scheduled, the volume is created. The Pod dies, the volume gets deleted. Poof. It’s the ultimate “this meeting could have been an email” of storage—no permanent record.

17. Volumes, Persistent Volumes, and Persistent Volume Claims

8.7 Typical Use Cases: Databases, Kafka, Zookeeper

Right, so you’ve got your stateless web apps happily humming along on Deployments, scaling up and down without a care in the world. But now you need to run the important stuff—the things that remember who they are and where they left off. You need to run a database, a Kafka cluster, or Zookeeper. For these, a Deployment is a disaster waiting to happen. You don’t just need a Pod; you need a specific Pod with a specific identity and access to its specific data. Enter the StatefulSet, the Kubernetes controller that treats your pets like actual pets, not cattle.

8.6 StatefulSet Update Strategies: RollingUpdate and OnDelete

Right, so you’ve got your StatefulSet humming along, managing your pods with their precious stable identities and persistent storage. It’s a beautiful, orderly parade. But nothing lasts forever, my friend. Eventually, you’ll need to update the container image, maybe for a new feature or a critical security patch. This is where the designers of StatefulSets, in their infinite wisdom, gave us two primary strategies: RollingUpdate and OnDelete. And let me tell you, the choice between them is less about which is “better” and more about which flavor of control you want over the inevitable chaos.

8.5 Headless Services and DNS for StatefulSets

Right, so you’ve got your StatefulSet up and running. It’s got its stable network identity, its persistent storage, all that good stuff. But how do you actually talk to it? You can’t just use a regular old Service with a load-balancer IP. That would blast requests to any random Pod, and for a stateful application like a database, that’s a great way to corrupt your data and ruin your weekend. This is where the headless Service comes in, and it’s one of those Kubernetes concepts that seems bizarre until it clicks, and then it’s pure genius.

8.4 Ordered Pod Management: Startup, Scaling, and Deletion

Alright, let’s talk about the part of StatefulSets that feels like it was designed by someone with a deep, abiding love for ritual and order—probably while listening to a Gregorian chant. This is where we move past the “stable network ID” party trick and into the real orchestration: how these Pods are brought into this world, scaled up, and shown the door. It’s called Ordered Pod Management, and it means exactly what it says on the tin. Unlike a Deployment, which gleefully fires up all its Pods in parallel like kids released onto a playground, a StatefulSet is methodical. It’s the conga line of the Kubernetes world: one Pod at a time, in a strict, unwavering order.

8.3 VolumeClaimTemplates: Per-Pod Persistent Volumes

Right, so you’ve got your StatefulSet humming along, giving you those lovely stable network identities and ordered pod management. But let’s be honest, the real reason you’re here, the thing that makes StatefulSets truly sing, is volumeClaimTemplates. This is where we move from ephemeral, flaky pods to having state that actually sticks around. Without this, you might as well just use a Deployment and call it a day. Think of a volumeClaimTemplates as a cookie cutter. You define it once in your StatefulSet spec, and then for every Pod the StatefulSet creates (web-0, web-1, web-2, etc.), it uses that cookie cutter to stamp out a brand new PersistentVolumeClaim (PVC) specifically for that pod. This is the magic that gives each pod in your stateful application its own unique, persistent storage. No more musical chairs where a newly scheduled pod hopes it lands on the right node with the right data.

8.2 Pod Identity: Stable Network Names and Persistent Storage

Right, so you’ve got a deployment that needs to run a set of pods, but here’s the kicker: the pods aren’t fungible. They aren’t just interchangeable cogs in a stateless machine. Each pod needs its own unique, stable identity. Maybe you’re running a distributed data store like Kafka or Redis with sentinels, or a multi-master database like PostgreSQL. If Pod A’s data is on PersistentVolume X, and Pod B’s data is on PersistentVolume Y, you can’t just go swapping them around willy-nilly when a node fails. Kubernetes’ regular Deployment object, brilliant as it is for stateless apps, throws its hands up at this problem. It’s designed for cattle, not pets.

8.1 Why StatefulSets Exist: Stable Identities and Ordered Deployment

Look, you’ve run a Deployment before. It’s the workhorse. You tell it you want three replicas of your web server, and Kubernetes gives you three nearly identical Pods. They get random names (frontend-abc123, frontend-xyz789), they come up in any order, and if one dies, its replacement is a brand new Pod with a brand new identity. This is fantastic for stateless workloads. Your web server doesn’t care if it’s frontend-abc123 or frontend-xyz789; the load balancer sends traffic to whoever’s healthy.

8. StatefulSets: Ordered and Stable Workloads

23.6 When to Use Hardware RAID vs Software RAID

Alright, let’s settle this ancient, holy war. You’re standing there, looking at your server or your pile of drives, and you have to decide: do you let a dedicated piece of hardware manage your data redundancy, or do you let the Linux kernel itself do the heavy lifting? The answer, like most good things in system administration, is “it depends,” but I can tell you that for 99% of you reading this, the answer is going to be software RAID with mdadm. Let’s break down why, and more importantly, when you might actually want to reach for that expensive hardware card.

23.5 RAID and LVM Together: Common Layered Setups

Right, so you’ve got your RAID array built. It’s beautiful. It’s redundant. It’s… a single, big, dumb block device. That’s like buying a giant, empty warehouse and just throwing all your stuff in one big pile. It works, but good luck organizing it. This is where LVM waltzes in, puts up some walls, installs some shelves, and turns that raw space into something you can actually use. Think of RAID (mdadm) as your foundation for performance and resilience. It’s all about the disks. LVM (Logical Volume Manager), on the other hand, is about flexibly managing the storage space on top of that foundation. It’s the interior designer for your data warehouse. Layering LVM on top of RAID gives you the best of both worlds: the reliability of RAID with the sheer flexibility of LVM. You can resize filesystems on the fly, take snapshots, and create volumes of different sizes without having to pre-plan every partition until your brain melts.

23.4 Adding a Spare and Rebuilding After Disk Failure

Alright, let’s get our hands dirty. Your array is degraded. A drive has thrown in the towel, and the little [U_] in your mdadm --detail output is mocking you. Don’t panic. This is exactly what you built this system for. It’s not a disaster; it’s a feature. Think of it like your car’s “check engine” light—annoying, but a sign the system is smart enough to know something’s wrong. Your job now is to be the mechanic.

23.3 /proc/mdstat and mdadm --detail: Monitoring Array Health

Alright, let’s talk about checking the pulse of your RAID array. You didn’t go through all the trouble of building this digital Voltron just to cross your fingers and hope for the best. You need to know its status, and for that, we have two primary tools: the kernel’s status file, /proc/mdstat, and the Swiss Army knife itself, mdadm --detail. One gives you the quick, at-a-glance view; the other gives you the full medical chart. You’ll use both.

23.2 mdadm: Creating, Assembling, and Managing Software RAID

Right, let’s get our hands dirty with mdadm. This isn’t some glossy GUI wizard that hides the messy bits from you; this is the command-line power tool that actually builds the arrays. Think of it as the difference between ordering a pre-made sandwich and being handed a knife, a fresh loaf, and the finest ingredients. More work? Absolutely. But you get exactly what you want, and you know it’s made right.

23.1 RAID Levels: 0 (Striping), 1 (Mirroring), 5, 6, and 10

Alright, let’s talk RAID levels. Forget the marketing fluff from hardware vendors; we’re going to look at this from the perspective of someone who has to actually use and, more importantly, recover these things. RAID isn’t a backup. Let me say that again so it sinks in: RAID is not a backup. It’s a tool for uptime and performance. You back up your data to a separate system, preferably off-site. Got it? Good. Now, let’s get our hands dirty with the main levels you’ll configure with mdadm.

23. RAID with mdadm

1.6 Background Workers: Autovacuum, Checkpointer, WAL Writer, and More

Right, let’s talk about the unsung heroes of your PostgreSQL instance: the background workers. You’re not just running a database; you’re the mayor of a small, bustling city. The main postgres process is you, the mayor, holding court and delegating tasks. But a city can’t run on charisma alone. You need a sanitation department, road crews, and emergency services. That’s what these background workers are. They handle the essential, often messy jobs that keep the city from collapsing into chaos, all while you, the user, are blissfully unaware, just inserting and selecting data.

1.5 The Write-Ahead Log (WAL): Durability Without Flush-Per-Write

Right, let’s talk about the single most important reason you don’t lose data when your database server suddenly loses power, gets kicked by the datacenter janitor, or just decides to have a bad day. It’s not magic, it’s the Write-Ahead Log, or WAL. This is the unsung hero of your database’s durability, and understanding it is non-negotiable if you want to call yourself a Postgres professional. The core problem is simple: writing data to your main table and index files (the “heap”) is slow. These files are large, scattered across the disk, and updating them involves a lot of random I/O. If we had to wait for a full fsync on these files to confirm every single INSERT or UPDATE, your database’s throughput would be measured in transactions per minute, not per second. It would be a disaster.

1.4 Storage Layout: Data Directory, Tablespaces, and Relation Files

Right, let’s pull back the curtain on where PostgreSQL actually lives. Forget the abstractions for a moment; we’re going to talk about files on a disk. This isn’t some proprietary black box—it’s a meticulously organized, if occasionally quirky, file system structure. Knowing your way around this is what separates someone who just uses PostgreSQL from someone who truly operates it. When things go sideways (and they will), this knowledge is your first and best tool.

1.3 Shared Memory: Shared Buffers, WAL Buffers, and Lock Tables

Right, let’s talk about the one thing every process in a PostgreSQL cluster agrees on: shared memory. Think of it as the communal kitchen in a shared house. It’s where all the roommates (your backend processes) leave notes, stash commonly used food (data), and argue over who used the last of the milk (row locks). If this kitchen is too small, chaos ensues. If it’s too big, you’re wasting rent money. Let’s break down the main appliances in this kitchen.

1.2 The Postmaster and Backend Processes: How Connections Are Served

Right, let’s pull back the curtain on how PostgreSQL actually handles you knocking on its door. This isn’t some monolithic application that does everything itself. Oh no, that would be too simple, and frankly, a single point of failure. Instead, it uses a brilliant, time-tested model of delegation: a benevolent manager (the Postmaster) and a legion of specialized workers (backend processes). Understanding this isn’t academic; it’s the key to diagnosing performance issues, connection problems, and understanding what the hell pg_stat_activity is actually showing you.

1.1 From INGRES to Postgres to PostgreSQL: A Brief History

Right, let’s get this out of the way first: you’re not using software designed last week. You’re using a system with the architectural equivalent of a fascinating family tree, complete with brilliant ancestors, a rebellious youth, and a very sensible, stable adulthood. Understanding this history isn’t just academic; it explains the quirks, the power, and the occasional “what were they thinking?!” moments you’ll encounter. So, let’s start at the beginning, before it was even called PostgreSQL.