31.7 Scaling Kinesis: Shard Splitting, Merging, and On-Demand Mode

Alright, let’s talk about making your Kinesis stream actually keep up with the real world. You built this thing to handle a firehose of data, but what happens when the firehose suddenly becomes a fire-nado? Or, more embarrassingly, when it turns into a gentle trickle and you’re paying for a firehose? That’s where scaling comes in, and Kinesis gives you two main levers to pull: the manual, surgical control of shard operations (splitting and merging) and the glorious, set-it-and-forget-it (but not really) chaos of On-Demand mode. Let’s get into it.

The Anatomy of a Shard (A Quick Refresher)

Before you start chopping and merging shards, you need to remember what you’re actually messing with. A shard isn’t just an abstract unit of scale; it’s a specific, locked-in throughput contract with AWS.

Each shard gives you:

1 MB/second data input.
2 MB/second data output (for reading, spread across all consumers).
1000 records/second for input and output.

When you split a shard, you’re essentially breaking this single contract into two new, smaller contracts. When you merge, you’re combining two smaller contracts into one bigger one. It’s not magic; it’s just renegotiating the terms of your data deal.

Splitting a Shard: When You Need More Power

You split a shard when your data inflow is threatening to breach that 1MB/s limit, causing ProvisionedThroughputExceededException errors that will haunt your dreams. You can’t just make a shard “bigger.” Instead, you break one parent shard into two new child shards.

The critical thing to understand is why you can’t just do this willy-nilly. Kinesis needs a clear point in time to make the cut to ensure data ordering for your producers. This is why you must provide a NewStartingHashKey. The hash key range of the original shard (from 0 to 2^128-1) gets divided. If you want to split evenly, you’d calculate the midpoint.

Here’s how you’d do it in Python (boto3):

import boto3
import json

client = boto3.client('kinesis')

def split_shard_evenly(stream_name, shard_to_split_id):
    # First, describe the shard to get its current hash key range
    description = client.describe_stream(StreamName=stream_name)
    shard = next(s for s in description['StreamDescription']['Shards'] if s['ShardId'] == shard_to_split_id)
    
    starting_hash = int(shard['HashKeyRange']['StartingHashKey'])
    ending_hash = int(shard['HashKeyRange']['EndingHashKey'])
    
    # Calculate the midpoint for the split
    split_point = (starting_hash + ending_hash) // 2
    
    # Perform the split. The new shards will be:
    # Child 1: [StartingHashKey, SplitPoint)
    # Child 2: [SplitPoint, EndingHashKey]
    response = client.split_shard(
        StreamName=stream_name,
        ShardToSplit=shard_to_split_id,
        NewStartingHashKey=str(split_point)
    )
    print(f"Split shard {shard_to_split_id} at point {split_point}")

# Example call
split_shard_evenly('my-data-stream', 'shardId-000000000000')

The Gotcha: Splitting is not instantaneous. The parent shard is closed for writes and will eventually vanish. Your consumers need to be smart enough to detect the new child shards and start reading from them. If your consumer is using the Kinesis Client Library (KCL), it handles this automagically. If you’re rolling your own, godspeed—you’re now responsible for the shard iteration logic.

Merging Shards: For When You Overprovisioned

Merging is the opposite. You take two adjacent shards and combine them into one. The most common reason? Cost optimization. You provisioned 10 shards for Black Friday, but now it’s January and you’re mostly processing tumbleweeds. Merging lets you reduce capacity and stop burning money.

The shards must be adjacent, meaning the ending hash key of one must be the starting hash key of the other. You just need to specify the two shard IDs.

def merge_shards(stream_name, shard_to_merge_id, adjacent_shard_id):
    response = client.merge_shards(
        StreamName=stream_name,
        ShardToMerge=shard_to_merge_id,
        AdjacentShardToMerge=adjacent_shard_id
    )
    print(f"Merged {shard_to_merge_id} and {adjacent_shard_id}")

# Example call - you need to know which shards are adjacent
merge_shards('my-data-stream', 'shardId-000000000001', 'shardId-000000000002')

The Gotcha: The same consumer logic applies. The two parent shards are closed, a new merged shard is created, and your consumer needs to pick up on that change. Also, you can’t merge shards in an On-Demand stream. Which brings us to…

On-Demand Mode: The “I Refuse to Think About This” Option

Introduced because someone at AWS finally admitted that predicting data traffic is a dark art, On-Demand mode is fantastic and infuriating. You tell Kinesis, “Here’s a stream name, you figure out the shards.” It automatically scales capacity based on observed throughput over a short period.

The biggest benefit is obvious: no more frantic manual scaling during traffic spikes or paying for idle shards during lulls. The biggest downside is less obvious: you lose all fine-grained control. You can’t split or merge. You just have to trust the system.

Enabling it is stupidly simple, but it’s a creation-time or massive stream overhaul decision.

# Creating a new On-Demand stream
client.create_stream(
    StreamName='my-on-demand-stream',
    StreamModeDetails={
        'StreamMode': 'ON_DEMAND'
    }
)

# Converting an existing PROVISIONED stream to ON_DEMAND (This is a major operation!)
client.update_stream_mode(
    StreamName='my-old-provisioned-stream',
    StreamModeDetails={
        'StreamMode': 'ON_DEMAND'
    }
)

The Brutal Honesty: On-Demand isn’t a free lunch. It’s generally more expensive for steady, predictable workloads. The scaling isn’t instantaneous; it happens within seconds to minutes, but if your traffic goes from zero to a million in a nanosecond, you might still see a few throttling errors. It’s a trade-off: operational simplicity for potential cost and a slight loss of predictability.

So, Which One Should You Use?

Here’s the heuristic, straight from the trenches:

Use PROVISIONED mode if your traffic is predictable, you have predictable daily/weekly patterns, and you want the absolute lowest cost. Be prepared to automate your scaling based on CloudWatch metrics (WriteProvisionedThroughputExceeded is your canary in the coal mine).
Use ON_DEMAND mode if your traffic is spiky, unpredictable, or you’re just building a prototype and don’t want another operational puzzle to solve. The premium you pay is often worth the saved engineering time and mental overhead.

The key takeaway is that neither is “better.” They’re tools for different jobs. Choose the one that best matches your willingness to manage infrastructure versus your willingness to pay for automation. Now go forth and stream. Responsibly.