23.1 Security Groups: Stateful Firewall Rules at the ENI Level

Alright, let’s talk about the first line of defense for your EC2 instances: Security Groups. Forget the dry, academic definitions. Think of a Security Group as a bouncer for a single, specific VIP party—your Elastic Network Interface (ENI). This bouncer isn’t just any bouncer; he’s got a photographic memory. He remembers who you came in with, so he’ll let you back out without checking your invite again. This “memory” is what we call statefulness, and it’s the single most important thing to understand.

You apply Security Groups directly to an ENI (which is usually just your EC2 instance, but an instance can have multiple ENIs, each with its own bouncer). The rules you write are allow rules only. There is concept of a “deny” rule. If a request isn’t explicitly allowed by the bouncer, it gets tossed out on its ear. This is a whitelist, and it’s the only sane way to start building security.

The Anatomy of a Rule

A rule has four components, and you need to grok all of them. It specifies a protocol (like TCP, UDP, or ICMP), a port range (or ICMP type/code), a source (for inbound rules) or destination (for outbound rules), and that’s it. The source/destination can be another Security Group, a CIDR block (like 10.0.0.0/16), or a prefix list. Using another Security Group ID as the source is pure genius. It means “allow traffic from any resource that has this other Security Group attached.” This creates dynamic, intent-based rules that don’t break when IPs change. You should use this feature relentlessly.

Here’s a practical example. Let’s create a Security Group for a web server. It needs to allow HTTP and HTTPS from the public internet, but SSH access should only be allowed from my corporate IP and from any instance in the “management” group.

# Create the web server security group
aws ec2 create-security-group \
    --group-name web-server-sg \
    --description "Security group for web servers" \
    --vpc-id vpc-123abc456def

# Note the GroupId it returns, e.g., sg-0a1b2c3d4e5f6g7h8

Now, let’s add the rules. Notice how for SSH, we have two separate rules: one for a CIDR block and one for a source group.

# Allow HTTP from anywhere (0.0.0.0/0)
aws ec2 authorize-security-group-ingress \
    --group-id sg-0a1b2c3d4e5f6g7h8 \
    --protocol tcp \
    --port 80 \
    --cidr 0.0.0.0/0

# Allow HTTPS from anywhere
aws ec2 authorize-security-group-ingress \
    --group-id sg-0a1b2c3d4e5f6g7h8 \
    --protocol tcp \
    --port 443 \
    --cidr 0.0.0.0/0

# Allow SSH from my specific corporate IP
aws ec2 authorize-security-group-ingress \
    --group-id sg-0a1b2c3d4e5f6g7h8 \
    --protocol tcp \
    --port 22 \
    --cidr 203.0.113.42/32

# Allow SSH from any instance with the 'management-sg' group attached
aws ec2 authorize-security-group-ingress \
    --group-id sg-0a1b2c3d4e5f6g7h8 \
    --protocol tcp \
    --port 22 \
    --source-group sg-9i8j7k6l5m4n3o2p1 # The group ID for 'management-sg'

The Magic of Statefulness

Remember that photographic memory? This is where it pays your rent. For any connection you allow inbound, the return traffic is automatically allowed outbound, regardless of your outbound rules. If someone connects to your web server on port 80, the bouncer makes a note. When the server responds, the bouncer sees it’s part of that existing conversation and waves it through. This is why your outbound rules are often wide open (0.0.0.0/0). You’re not really allowing all outbound traffic; you’re just trusting the stateful firewall to handle the return trips for you. It’s a beautiful, logical system.

Common Pitfalls and “Oh Crap” Moments

The Default “Deny All” Inbound: The default security group AWS provides allows no inbound traffic. This is great! The problem is the default outbound rule: it allows all traffic to anywhere. This trips up so many people. You think you’re locked down, but any process on your instance can still call out to the internet. For a truly tight setup, you should lock this down to specific needs (e.g., only allow outbound to specific ports for updates).
The Ephemeral Port Nightmare: This one is a classic. Say your web app needs to call an external API. You make an outbound rule allowing TCP port 443. It works. Then, weeks later, under load, it mysteriously fails. Why? The outbound call uses a high-numbered ephemeral port (e.g., 54321) for the return trip. Your stateful firewall handles this… unless the connection stays open long enough to time out. If your NACLs (which we’ll get to) are misconfigured, they will block this return traffic because they’re stateless idiots. For services that initiate large downloads or long-lived connections, you might need to open a wide range of ephemeral ports (32768-65535) in your outbound rules. It feels dirty, but sometimes it’s necessary.
Explicitly Allowing the Security Group Itself: You might see rules that use the same Security Group as both the source and destination. This means “allow instances in this group to talk to each other.” It’s a common pattern for clustered services like Cassandra or Redis. Just be aware you’re doing it; don’t let it happen by accident because you copied a rule.

The golden rule? Your Security Groups should be ruthlessly minimal. Start with all inbound traffic denied and all outbound traffic open (but be mindful of point #1 above). Then, add rules like a surgeon—one precise incision at a time. It’s the difference between having a bouncer with a specific guest list and just yelling “COME ON IN!” into the crowded street.