Right, so you’ve got SNS. Think of it as the town crier of AWS, but instead of yelling about the plague, it’s yelling about a new user signup, an order being placed, or a server deciding to have a dramatic and untimely failure. Its entire job is to take a single message and fan it out to a bunch of different places that have all raised their hands and said, “Yes, please, I would like to know about that thing.”

The magic of an SNS Topic is its simplicity. You publish a message to the topic, and SNS takes care of the rest, pushing that message to every single subscriber. No polling, no waiting. It’s a fire-and-forget push model. The subscribers can be a motley crew: SQS queues, Lambda functions, HTTP endpoints, email addresses, even mobile push services. This is the core of event-driven architecture—decoupling the thing that happens from the things that need to react to it.

The Nuts and Bolts of a Message

Before you send anything, you need to know what you’re sending. An SNS message isn’t just a string of text; it’s a structured JSON object. When you publish to a topic, you provide a subject and a message. But here’s the kicker: the format of the message your subscribers get depends entirely on them, not on you. SNS wraps your message in a standard envelope. This is crucial to understand.

If you send this:

{
  "Subject": "OrderShipped",
  "Message": "{\"orderId\": \"12345\", \"status\": \"shipped\"}"
}

An SQS queue subscriber will receive the whole SNS envelope in its body. A Lambda function, however, will get that same envelope as its event object. This is the default behavior. It’s consistent, but it means your SQS queue or Lambda has to parse the message to get to the good stuff you actually sent.

// What an SQS queue actually receives for the message above
{
  "Type": "Notification",
  "MessageId": "abc123...",
  "TopicArn": "arn:aws:sns:us-east-1:123456789012:MyTopic",
  "Subject": "OrderShipped",
  "Message": "{\"orderId\": \"12345\", \"status\": \"shipped\"}",
  "Timestamp": "2023-10-05T18:12:34.123Z",
  ...
}

Why You’d Use Each Subscriber Type

Each subscriber type has a superpower, and you choose them based on what you need.

  • SQS Queues are for durability and throttled processing. If you have a service that can’t keep up with the event rate, or if it’s absolutely critical that no message is ever lost (e.g., processing orders), you send it to an SQS queue. The queue acts as a buffer, and your worker services can poll it at their own pace. This is the most robust pattern for decoupling services.
  • Lambda Functions are for immediate, serverless action. Need to update a database, invalidate a cache, or trigger a step function when an event occurs? Lambda is your go-to. It’s simple and you only pay when it runs. The downside? If your function errors, SNS might retry a few times, but it’s not a durable queue. For mission-critical stuff, fan out to SNS and an SQS queue that triggers a Lambda.
  • HTTP/S Endpoints are for integrating with external systems. Your SaaS vendor probably has a webhook URL you can plug in here. When your event happens, SNS will POST that JSON envelope directly to their endpoint. The catch? Your endpoint must return a 200 OK response within a timeout window, or SNS will retry. It’s a great way to avoid building and managing a custom integration layer.
  • Email / SMS are basically for alerts. Got a CI/CD pipeline that finishes? Sending a “build failed” message to an email topic is a classic use case. It’s not for high-volume, programmatic stuff. It’s for humans. And yes, the “SMS” part will make your CFO wince when they see the bill if you’re not careful. AWS pricing for SMS is… ambitious.

The Critical Detail: Access Policies

This is where everyone gets tripped up. I don’t mean to be dramatic, but IAM and resource-based policies are the blood-soaked battlefield where most SNS integrations go to die. Permissions must be correct on both ends.

For an SQS queue to subscribe to an SNS topic, the topic needs permission to send messages to the queue. The easiest way to do this is to use the AWS-provided policy when you subscribe. If you’re doing it in code, it looks like this. Notice the Policy parameter—this is the magic that grants SNS the rights it needs.

import boto3

sns = boto3.client('sns')
sqs = boto3.client('sqs')

# Create a queue
queue_response = sqs.create_queue(QueueName='MySubscriberQueue')
queue_arn = sqs.get_queue_attributes(
    QueueUrl=queue_response['QueueUrl'],
    AttributeNames=['QueueArn']
)['Attributes']['QueueArn']

# Subscribe the queue to the topic AND grant the SNS topic permission to send to it
subscription = sns.subscribe(
    TopicArn='arn:aws:sns:us-east-1:123456789012:MyTopic',
    Protocol='sqs',
    Endpoint=queue_arn,
    Attributes={
        'Policy': json.dumps({
            "Version": "2012-10-17",
            "Statement": [{
                "Effect": "Allow",
                "Principal": {"Service": "sns.amazonaws.com"},
                "Action": "sqs:SendMessage",
                "Resource": queue_arn,
                "Condition": {
                    "ArnEquals": {"aws:SourceArn": "arn:aws:sns:us-east-1:123456789012:MyTopic"}
                }
            }]
        })
    }
)

If you forget this policy, SNS will happily accept the subscription, but it will fail to deliver any messages. You’ll see a perplexing PendingConfirmation status forever in the console, and you’ll waste an hour of your life you’ll never get back. Learn from my pain.

Best Practices and Pitfalls

  1. Use Message Filtering: Don’t make every subscriber filter through messages they don’t care about. If you have a topic for user_events, use message filter policies to send user_created events to one SQS queue and user_deleted events to another Lambda. It’s more efficient and cheaper.
  2. Standardize Your Message Format: Use JSON for your inner message and be consistent. Your future self will thank you when you’re writing code to parse these things at 2 AM.
  3. Dead-Letter Queues (DLQs) Are Non-Negotiable: For SQS and Lambda subscribers, use DLQs. For SQS, it’s built-in. For Lambda, make your function’s onFailure destination a DLQ. Things will fail. Protocols will change. External HTTP endpoints will go offline. A DLQ captures these failures so you can debug them instead of just… losing them.
  4. Beware the HTTP Retry Storm: If you subscribe an HTTP endpoint that starts returning 500 errors, SNS will retry with exponential backoff. If that endpoint is your own overburdened service, this can create a retry storm that takes the whole thing down. Implement robust retry logic and circuit breakers on your HTTP endpoints.