23.5 Transaction Overhead and Batch Commit Patterns

Alright, let’s talk about the cost of doing business. Transactions are fantastic, but they’re not free. Every BEGIN you utter is a handshake with the database that says, “Hey, we’re about to get serious.” That handshake, and the subsequent commitment ceremony, comes with a price tag. It’s called overhead, and if you ignore it, you’ll be left wondering why your snappy application suddenly molasseses when it has to process ten thousand of anything.

The overhead isn’t in the CPU cycles to write your data—that’s the main event. The overhead is in all the other work the database has to do to make that transaction safe and durable. We’re talking about writing to the Write-Ahead Log (WAL), managing locks, and ensuring isolation. Every COMMIT is a moment where the database must forcefully flush that WAL to disk. This is a synchronous, “stop everything and make sure this is physically written” operation. It’s the bottleneck. Doing this after every single INSERT is like delivering one potato chip at a time from the factory to the customer. The truck driver will spend 99% of their time on the road, not moving potatoes.

The Batch Commit Pattern: Your Performance Savior

The solution is batching. Instead of committing after every operation, you queue up a whole bunch of operations inside a single transaction and then commit them all at once. This amortizes the fixed cost of the transaction overhead across many operations. The driver makes one trip with a truck full of chips. Suddenly, the overhead per chip is tiny.

Here’s the wrong way, the “one-chip delivery” method:

# This is how you bring a database to its knees. Don't do this.
for item in item_list:
    cursor.execute("INSERT INTO table (col) VALUES (%s)", (item,))
    connection.commit()  # Flush after every single INSERT. Madness.

And here’s the right way, the “rent a semi-truck” method:

# This is the way.
try:
    connection.begin()  # Or set autocommit=False, same thing.
    for item in item_list:
        cursor.execute("INSERT INTO table (col) VALUES (%s)", (item,))
    connection.commit()  # One big, beautiful flush at the end.
except Exception:
    connection.rollback()
    raise

The performance difference isn’t linear; it’s monumental. We’re talking orders of magnitude faster for large batches.

Picking the Right Batch Size

So, how many operations should you batch? The answer is the classic consultant’s favorite: “it depends.” Too small, and you’re still paying too much overhead. Too large, and you run other risks:

Long-held locks: You’re holding exclusive locks on modified rows for the entire transaction duration, blocking other queries.
WAL growth: The transaction’s WAL data must be kept until commit. A huge transaction can consume significant memory and disk space.
Atomicity gone wrong: If something fails in the middle of a 100,000-row batch, the entire ROLLBACK will be a long, painful operation.

There’s no magic number, but a good starting point is somewhere between 100 and 10,000 rows. You need to test with your specific hardware, schema, and data. The goal is to find a size where the commit overhead is a small fraction of the total batch time, without causing unacceptable lock contention or resource usage.

# A practical example of batching in chunks
batch_size = 1000
for i in range(0, len(huge_list), batch_size):
    chunk = huge_list[i:i+batch_size]
    try:
        connection.begin()
        for item in chunk:
            cursor.execute("INSERT INTO table (col) VALUES (%s)", (item,))
        connection.commit()
    except Exception:
        connection.rollback()
        raise
    # A tiny pause here can sometimes help alleviate lock contention
    # on a very busy system, letting other queries sneak in.
    # time.sleep(0.01)

The Pitfalls: What Can Go Wrong?

This pattern seems simple, but the devil is in the details.

Interleaved Reads: Remember, within your transaction, you’re operating from a consistent snapshot. If you INSERT 1000 rows, then SELECT COUNT(*) from the same table inside the same transaction, you’ll see the new rows. But any other database connection will not see them until you commit. This is usually correct, but it can trick you if you’re not thinking about isolation levels.
Idle-in-Transaction: This is a killer. If your application code acquires a connection, begins a transaction, and then… waits. Maybe it’s waiting for user input, or got stuck in some other logic. That connection is now holding locks and preventing VACUUM from cleaning up dead rows. Always, always use timeouts (statement_timeout, idle_in_transaction_session_timeout) and ensure your code has robust error handling that calls ROLLBACK if anything goes wrong.
Not All Operations Are Equal: Batching INSERTs is a no-brainer. But what about SELECTs? Wrapping a bunch of read operations in a transaction is excellent for consistency, but there’s no commit flush overhead to avoid. The main benefit there is maintaining a stable view of the data throughout your read operation.