22.6 LVM Snapshots: Point-in-Time Copies for Backups
Right, so you’ve got your LVM setup humming along. Volumes are carved out, data is flowing. But now you need to do something terrifying: you need to back up the live, running database on it without stopping the world. Enter the LVM snapshot, one of the most useful and yet most frequently botched features in the storage admin’s toolkit.
Think of a snapshot not as a full copy of your data, but as a perfect, point-in-time photograph of the metadata of your logical volume. The original volume (let’s call it the ‘origin’) continues on with its life, changing blocks of data willy-nilly. The snapshot volume sits there, and whenever the origin volume wants to change a block for the first time since the snapshot was taken, LVM politely intercepts that write, copies the old, unchanged block of data over to the snapshot volume, then allows the new data to be written to the origin. This is called Copy-On-Write (COW). The snapshot volume is just a list of which blocks have changed; its real job is to preserve the ones that haven’t.
This is brilliantly efficient. You’re not duplicating 100 GB of data to take a snapshot; you’re just creating a small volume to track the changes. At least, that’s the theory. The practice is a bit more… exciting.
How to Create a Snapshot (Without Shooting Yourself in the Foot)
The command itself is deceptively simple. You use lvcreate with the -s (snapshot) flag and you have to tell it which origin volume to take a picture of.
# Let's say our precious data is on a logical volume named 'data_lv' in volume group 'vg_data'
lvcreate -s -n backup_snapshot -L 10G /dev/vg_data/data_lv
Key detail here: the -L 10G. This isn’t the size of the backup; it’s the size of the snapshot volume itself—the space allocated to hold those changed blocks we talked about. And this is where 90% of snapshot disasters are born.
If your origin volume is highly active and changes a lot of data, the snapshot volume will fill up with these old blocks. Once it’s 100% full, the snapshot becomes invalid. Poof. The kernel isn’t sentimental; it just throws an I/O error on the next write attempt and drops the snapshot. Your point-in-time copy is now a point-of-failure.
So how big should you make it? There is no good answer. It depends entirely on how much data change you expect during the lifetime of the snapshot. For a quick backup of a moderately busy volume, 10-20% of the origin size might be fine. For a volume undergoing massive changes (like a database vacuum operation), you might need something approaching 100%. Monitor it like a hawk with lvs (look at the Data% column for the snapshot LV).
lvs -o +dev_size,data_percent
LV VG Attr LSize Pool Origin Data% DevSize Data%
data_lv vg_data owi-aos--- 100.00g
backup_snapshot vg_data swi-a-s--- 10.00g data_lv 42.1
The Nitty-Gritty: How Snapshot Metadata Actually Works
Don’t just think of the snapshot volume as a spare closet where we throw old blocks. It’s more sophisticated than that. When you create the snapshot, LVM sets up a metadata mapping table. For every chunk of data on the origin, this table points to one of two places:
- If the chunk hasn’t been changed, it points back to the original chunk on the origin volume.
- If the chunk has been changed, it points to the location inside the snapshot volume where the old data was preserved.
This is why reading from a snapshot is perfectly safe and gives you the original data, even if the origin has moved on. The snapshot driver consults this map for every read operation. It’s a beautiful piece of engineering, really.
Using the Snapshot for a Safe Backup
This is the whole point. You create the snapshot, and then you back up from the snapshot volume, not the live origin. This gives you a perfectly consistent view of the filesystem as it was the nanosecond you created the snapshot. The origin can keep churning away; your backup process is isolated.
# 1. Create the snapshot
lvcreate -s -n db_backup_$(date +%Y%m%d) -L 15G /dev/vg_main/mysql_lv
# 2. Create a read-only mountpoint for safety (highly recommended)
mkdir /mnt/snapshot_backup
mount -o ro /dev/vg_main/db_backup_20231027 /mnt/snapshot_backup
# 3. Back it up using your tool of choice (e.g., tar, rsync)
tar -czf /backup/mysql-full-20231027.tar.gz /mnt/snapshot_backup/
# 4. Unmount and remove the snapshot *immediately* after the backup
umount /mnt/snapshot_backup
lvremove /dev/vg_main/db_backup_20231027
The moment you remove the snapshot, all that COW overhead—the extra writes to preserve old data—vanishes. Your system performance returns to normal. This is why you should never, ever leave snapshots lying around. They are a temporary tool for a specific job, not a long-term archival solution. The performance penalty for keeping one active for weeks on a busy system is brutal and completely unnecessary.
The Gotchas and Questionable Design Choices
LVM’s snapshot implementation is powerful, but it has some rough edges that feel like they were designed by someone who never had to operate this at 3 AM.
- The Size Problem: We’ve covered this. The fact that you can so easily create a snapshot that silently corrupts itself by filling up is, frankly, a design flaw. Other systems (ZFS, I’m looking at you) handle this much more elegantly.
- Performance: Copy-On-Write means every write to the origin now requires a read (the old data) and two writes (the old data to the snapshot, the new data to the origin). This can significantly impact write performance on the origin volume while the snapshot is active.
- Thin Provisioning is the Future: The older “traditional” snapshots we’re discussing here are being superseded by “thin provisioning” snapshots, which are managed in a pool and are generally more flexible. But the traditional ones are everywhere, so you have to know them. The concepts are similar, but the management commands are different (
lvcreate --thin).
The golden rule? Snapshots are a fantastic, filesystem-agnostic way to get a consistent backup image. But they are a means to an end, not the end itself. Create it, back it up, and destroy it. Quickly. Your I/O latency will thank you.