17.5 Automated Backups, Snapshots, and Point-in-Time Restore

Right, let’s talk about not losing your data. This isn’t a gentle suggestion; it’s the digital equivalent of having a fire extinguisher. You will need it. RDS gives you two primary, brilliant, and slightly different tools for this: Automated Backups and DB Snapshots. They serve different masters, and confusing them is a classic rookie mistake I’m here to help you avoid.

Automated Backups: Your First and Best Line of Defense

Think of Automated Backups as your continuous, rolling safety net. When you enable this (and you absolutely should), RDS performs a full daily snapshot of your entire DB instance. But the real magic is in the transaction logs: RDS continuously backs up every transaction and streams it to S3. This combo is what enables the killer feature: point-in-time recovery.

The retention period is configurable from 1 to 35 days. My strong recommendation? Set it to the maximum, 35 days. Storage is cheap. Regret is expensive. You’re paying for a managed service, so use the management features. This isn’t the place to be stingy.

Here’s the catch—and it’s a big one: Automated Backups require that your instance is running. If you stop (not reboot, but fully stop) an instance, the automated backup process is suspended until you start it again. Your retention clock keeps ticking, but no new backups are added. This creates a gap in your coverage. Don’t panic, your existing backups within the retention period are still there, but it’s a crucial detail to remember.

You configure this at creation time. There’s no good reason to disable it.

aws rds create-db-instance \
    --db-instance-identifier my-database \
    --db-instance-class db.t3.micro \
    --engine postgres \
    --master-username postgres \
    --master-user-password secret99 \
    --allocated-storage 20 \
    --backup-retention-period 35 \ # <-- This is the important bit
    --no-publicly-accessible

DB Snapshots: The Manual “Save Game”

While Automated Backups are for reacting to oopsies, DB Snapshots are for planned, major events. You create these manually. Think of it like this: you’re about to run a risky schema migration, deploy a new version of your application that might do something stupid, or just want a permanent, named artifact before you delete a development instance.

The key differences? They are manual and they exist until you delete them. There’s no retention period. They’ll sit in your account costing you money forever if you forget about them. They are completely independent of whether the instance is running or not—you can even create a snapshot of a stopped instance, which is a neat trick for cloning things without incurring compute costs.

Creating one via the CLI is trivial:

aws rds create-db-snapshot \
    --db-snapshot-identifier my-database-pre-migration \
    --db-instance-identifier my-database

Point-in-Time Restore: Your “Undo” Button

This is the crown jewel. Because RDS has those continuous transaction logs, you can restore your database to any specific second within your backup retention window. Did a junior developer just run DELETE FROM users; without a WHERE clause at 2:03 PM? You can restore to 2:02 PM.

The restore process doesn’t overwrite your original instance. Instead, it creates a brand new RDS instance from your backup data. This is fantastic because you can then connect to the new, recovered instance, dump the missing data, and import it back into your production database with minimal downtime. Or, if things are truly borked, you can just point your application to the new instance.

You can do this from the AWS Console by picking a time on a pretty timeline, or via the CLI by specifying the restore time (in UTC, always UTC!).

aws rds restore-db-instance-to-point-in-time \
    --source-db-instance-identifier my-database \
    --target-db-instance-identifier my-database-recovered \
    --restore-time "2023-10-27T14:02:00Z" # ISO 8601 format, UTC

The Gotchas and Best Practices

Performance Impact: That continuous logging isn’t free. During the backup window (which you can specify, but defaults to a quiet period), you might see a minor performance hit, especially for the daily storage snapshot. It’s usually negligible, but don’t schedule your biggest batch jobs for the same window.
The Multi-AZ Advantage: If your DB instance is a Multi-AZ deployment, the backup is taken from the standby replica. This means there is zero performance impact on your primary instance. It’s one of the many reasons Multi-AZ is worth the money for production workloads.
Restore Time: Restoring a large database can take hours. It’s not magic. Have a runbook that accounts for this. The size of your provisioned storage and the IOPS capacity directly affect this.
Check Your Work: The most important best practice? Regularly test your restore process. I’m dead serious. You don’t want to discover that your backup retention was set to 1 day after you need it. Once a quarter, pick a point-in-time, restore a clone to a new instance, and validate that the data is there and consistent. This is your fire drill. Do it.