16.8 S3 Glacier: Deep Archive Retrieval Options and Vault Lock
Right, let’s talk about Glacier. You’ve shoved your data into the S3 Glacier Deep Archive, the coldest of cold storage, because it costs about as much as a forgotten can of beans at the back of your pantry. Excellent. But now you need it back. This is where the fun begins, and by “fun” I mean a process designed to make you really question if you need that data after all. Retrieval isn’t like pulling a file from S3 Standard; it’s more like sending a request to a warehouse staffed by a single, very meticulous, and somewhat slow robot.
The first thing to wrap your head around is that retrieval is an asynchronous process. You don’t GET an object; you restore it. You initiate a job, AWS goes and finds your specific tape (yes, it’s often literally on a tape robot), stages it, and then, and only then, can you download it. This takes hours. Or days. Anyone who tells you otherwise is selling something.
Retrieval Options: The Waiting Game
Your choice here is fundamentally a trade-off between your wallet and your sanity. There are three retrieval options, each with its own price and pace.
Expedited: The name is a lie. It should be called “Less Slow.” It promises retrieval in 1-5 minutes… if you’re lucky and there’s spare capacity. For the low, low price of your firstborn, you can get your data in a few hours. There’s also “Provisioned Capacity,” where you literally pay a monthly fee to maybe get faster retrievals when you need them. It’s like buying a fast-pass for a ride that might be broken. Use this only if you have a truly catastrophic, “the-business-is-on-fire-unless-we-get-this-data-in-an-hour” scenario.
Standard: This is the sane default for most “I need this today” situations. It takes 3-5 hours for Deep Archive and is orders of magnitude cheaper than Expedited. You submit the job, you go get a coffee, maybe you do some actual work, and by the afternoon, your data is ready. This is your workhorse.
Bulk: The cheapest and slowest option. 5-12 hours for Deep Archive, and the price per GB is so low it’s almost a rounding error. This is for when you need to restore an entire vault or a massive dataset and you’re planning the operation for a weekend. You’re not in a hurry; you’re optimizing for cost.
Here’s how you’d initiate a Standard retrieval job for an object in Deep Archive using the AWS CLI. Notice you have to specify the tier.
# First, initiate the restore job. This tells Glacier to go get your stuff.
aws s3api restore-object \
--bucket my-super-important-bucket \
--key path/to/my/deep-frozen-data.zip \
--restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Standard"}}'
# This command will return nothing on success. Silence is golden.
# Now, you wait. For hours.
# Later, after you've received your S3:RestoreCompleted notification (or you've gotten impatient),
# you can check the status. The 'ongoing-request="false"' is your signal that it's done.
aws s3api head-object \
--bucket my-super-important-bucket \
--key path/to/my/deep-frozen-data.zip
The "Days":7 is critical. It tells AWS how long to keep the restored copy available for download before they automatically delete it from the staging area. Set this too low, and your data will vanish back into the ice before you finish downloading it. I recommend at least 7 days for any sizable restore.
Vault Lock: The Unbreakable Vow
Now for the serious stuff. Vault Lock is a terrifyingly powerful feature. It lets you create a write-once-read-never (or read-only-under-very-specific-conditions) policy for a Glacier vault. You define a policy with conditions (like “no deletions allowed for 10 years”) and lock it. Once locked, it cannot be changed or deleted by anyone, including root users of your AWS account. Yes, you read that right. Not you, not AWS Support, not a magic incantation. It is immutable.
Why would you do this? Compliance. Legal hold. Regulatory requirements like SEC Rule 17a-4. It’s the digital equivalent of welding a box shut and throwing the key into a volcano.
The process is deliberate for a reason:
- You write a policy in JSON.
- You apply it to the vault.
- AWS gives you a lock ID. You must save this. It’s your only proof that you initiated the lock.
- You have 24 hours to test this policy before it becomes permanent. This is your “oh crap” window.
- After 24 hours, or when you confirm the lock with the lock ID, the vault is sealed. Forever.
Here’s an example policy that enforces a 5-year minimum retention period on all archives:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyDeletionBefore5Years",
"Effect": "Deny",
"Principal": {
"AWS": "*"
},
"Action": "glacier:DeleteArchive",
"Resource": [
"arn:aws:glacier:us-east-1:123456789012:vaults/my-vault"
],
"Condition": {
"NumericLessThan": {
"glacier:ArchiveAgeInDays": "1825"
}
}
}
]
}
To lock it (after attaching the policy):
# Initiate the lock and get your LockId. WRITE THIS DOWN.
aws glacier initiate-vault-lock \
--vault-name my-vault \
--account-id - \
--policy file://policy.json
# Output will give you a "lockId": "xxxxxxxxxxxxx..."
# Within 24 hours, you must complete the lock using that ID.
aws glacier complete-vault-lock \
--vault-name my-vault \
--account-id - \
--lock-id xxxxxxxxxxxxx...
The Pitfall: The obvious one. Do not test this on a vault you care about. Do not lock a vault without being 110% certain. There is no undo. There is no “oops.” This is a feature designed for the point of no return. Use it with the gravitas it deserves.