14.8 S3 Batch Operations: Processing Millions of Objects at Scale

Right, so you’ve got a few million objects sitting in a bucket. Maybe you need to change their storage class, add tags, or copy them to another bucket. You’re not going to do that by hand, are you? Of course not. You’re going to fire up S3 Batch Operations, which is essentially your personal robot army for S3 object management. It’s the tool you use when a simple aws s3 sync just won’t cut the mustard and you’d rather not write a bespoke Lambda function to handle the sheer scale.

14.7 S3 Object Lambda: Transforming Data On the Fly During GET

Right, so you’ve got your data sitting in S3. It’s pristine, it’s perfect. But then the requests start rolling in. “Can we get this CSV file as JSON?” “I need this image as a WebP, not a PNG.” “Can we redact the personally identifiable information (PII) from this document before my user sees it?” The old, tedious way would be to create a whole ETL pipeline: trigger a Lambda on upload to transform the object into every possible format, store them all, and then hope you guessed right what the user would need. It’s wasteful, it’s expensive, and it’s frankly a bit daft. It’s like cooking every item on the menu the second a customer walks in, just in case they order it.

14.6 Presigned URLs: Granting Temporary Access Without AWS Credentials

Right, let’s talk about one of the most useful Swiss Army knives in the S3 toolkit: the presigned URL. Here’s the core problem it solves: you have an object in a private bucket. You want to let someone—a user on your website, a colleague, a third-party—download it (or upload it) without giving them your precious, all-powerful AWS credentials. You also don’t want to make the bucket public and unleash chaos upon the world.

14.5 S3 Event Notifications: Triggering Lambda, SQS, SNS on Object Events

Right, so you’ve got your data sitting in S3. Great. But static data is, well, static. The real magic happens when your buckets can tell you things, when they can raise their digital hand and say, “Hey, a new file just landed,” or “Psst, someone deleted that important report.” That’s S3 Event Notifications. It’s how you turn a dumb storage bin into the central nervous system of your data pipeline.

14.4 S3 Replication: CRR and SRR, Replication Rules, and IAM Role Requirements

Right, let’s talk about S3 Replication. This is the feature that stops you from having a single, catastrophic “oops” moment with your data. The core idea is simple: when you drop a file into one bucket, S3 can automatically and asynchronously copy it to another bucket for you. But as with most things in AWS, the devil is in the details, and oh boy, are there details. The first fork in the road is choosing your replication type. You’ve got Cross-Region Replication (CRR) and Same-Region Replication (SRR). The names are admirably self-explanatory. CRR is for disaster recovery, keeping your data a safe distance away from a regional meteor strike or, more likely, a configuration apocalypse. SRR is your go-to for operational reasons: maybe you need to aggregate logs from different accounts into a single bucket, or you’re creating a strict production/staging separation where your staging environment needs a real-time copy of production data without the risk of it mucking about in the actual production bucket.

14.3 Lifecycle Rules: Transitioning and Expiring Objects by Age or Prefix

Right, so you’ve got your data in S3. Great. But unless you’re made of money and enjoy watching your CFO have an aneurysm, you can’t just leave every single file on the expensive, high-performance storage tier forever. This is where lifecycle rules come in. Think of them as your automated, hyper-efficient storage janitor. They quietly go about their business, moving things to cheaper storage or taking out the trash, all so you don’t have to.

14.2 MFA Delete: Extra Protection for Version Deletion

Alright, let’s talk about MFA Delete. You know Multi-Factor Authentication from logging into your corporate VPN or your email, right? It’s that “something you have and something you know” principle. Well, AWS, in a rare moment of genuine security foresight, decided to apply that same concept to one of the most destructive operations in S3: permanently deleting object versions. Here’s the deal: S3 Versioning is fantastic. It’s your “undo button” for the cloud. But that “undo button” itself has a big, scary, permanent “redo button” called DeleteObject or DeleteVersion. Anyone with the s3:DeleteObject permission can wipe out a version, and if they nuke all the versions of an object, it’s gone for good. MFA Delete adds a crucial second factor. Even if a bad actor gets hold of your access keys, or you accidentally grant too much permission to an IAM role (it happens to the best of us), they can’t just waltz in and delete your data without also physically possessing your MFA device.

14.1 Versioning: Enabling, Suspending, and Permanent Delete with Version ID

Right, let’s talk about S3 Versioning. This is one of those features that sounds simple on the surface—“it keeps multiple versions of an object”—but the devil, as always, is in the details. And the AWS console does its best to hide those details from you, which is why we’re having this chat. Think of versioning as the ultimate “undo” button for your bucket, but an undo button that, by default, just keeps every single change you’ve ever made, forever. This is fantastic for recovery, less fantastic for your storage bill.

6.7 EC2 Instance Metadata Service (IMDSv2): Fetching Role Credentials

Right, let’s talk about the magic box inside your EC2 instance that holds all its secrets: the Instance Metadata Service (IMDS). Think of it as a highly specific, internal-only concierge service that only your instance can call. It answers questions like, “Who am I?”, “What’s my purpose?”, and most critically, “What are my temporary AWS credentials so I can actually do things?” The original version, now called IMDSv1, was a bit too simple. You could just curl a URL and get what you wanted, no questions asked. This became a problem. If some malicious code somehow got onto your instance, or if your web application was tricked into making a Server-Side Request Forgery (SSRF) attack, it could just as easily fetch those powerful credentials. Not great.

6.6 User Data Scripts: Running Commands at First Boot

Alright, let’s talk about giving your new EC2 instance a to-do list for its first day on the job. Because nobody—not even a virtual machine—wants to show up to a new job with no instructions. That’s what User Data scripts are for. They’re your way of leaning into the server’s console as it boots for the very first time and saying, “Hey, before you do anything else, here’s what I need you to do.”

6.5 Hibernate: Resuming an Instance from Memory

Alright, let’s talk about hibernation. No, not for you after a long day of debugging—for your EC2 instances. This is the feature that lets you pull off the closest thing to magic in the cloud: you stop an instance, and when you start it back up, it’s exactly as you left it. Your in-memory state—all those unsaved documents, that massive dataset you just loaded into RAM, the 47 open SSH connections you were using to prove a point—is preserved. It’s not a reboot; it’s a pause button.

6.4 Stop vs Terminate: Preserving vs Destroying the Instance

Right, let’s talk about pulling the plug. You’ve got an EC2 instance humming along, and you need to shut it down. You’ve got two big red buttons: Stop and Terminate. One is a cryogenic freeze, the other is a thermonuclear option. Pressing the wrong one is the cloud equivalent of accidentally deleting your entire thesis the night before it’s due. We’re not going to let that happen. The core distinction is brutally simple: Stop preserves the hard drive (the EBS volumes). Terminate destroys it by default. Everything else—the CPU, the memory, the network card—is ephemeral and gets reclaimed by AWS in both cases. The root volume is the soul of your instance; everything else is just the temporary, disposable body.

6.3 Connecting via SSH, EC2 Instance Connect, and Session Manager

Alright, let’s get you into your machine. Because an instance just sitting there in the AWS console, looking pretty, is about as useful as a car without a steering wheel. You need to get inside and make it do things. We have three main ways to do this, each with its own flavor of “why.” The Old Guard: SSH and Key Pairs This is the classic, the standard, the thing that will never die. SSH is your direct, encrypted line to the shell of your Linux instance. It’s powerful, it’s ubiquitous, and it’s also the one where you’ll most likely shoot yourself in the foot first.

6.2 Instance States: Pending, Running, Stopping, Stopped, Terminated

Right, let’s talk about what your EC2 instance is actually doing when you’re not looking. It’s not just sitting there; it’s got a whole internal life, a state of being. Knowing these states is the difference between confidently running infrastructure and frantically refreshing the AWS console at 2 AM wondering where all your money went. Think of these states as the instance’s mood. It can be fired up and ready for action (running), taking a well-deserved nap (stopped), or… well, dead (terminated). You need to know these moods because they directly impact two things: your bill and your data.

6.1 Launching an Instance: AMI, Type, VPC, Security Group, Key Pair

Right, let’s get you an EC2 instance. This isn’t like ordering a pizza where you just click “pepperoni” and hope for the best. You’re about to assemble a virtual server from a list of components, each with serious consequences if you get it wrong. Don’t worry, I’m here to make sure you don’t accidentally launch a publicly-accessible, password-less financial database into the wild. I’ve seen it happen. It’s not pretty.

— joke —

...