24.5 Anomalies: Dirty Read, Non-Repeatable Read, Phantom Read, Write Skew
Alright, let’s get into the weeds. You’ve set up your transaction, you’re feeling good, but the database is a shared, chaotic space. Without proper rules, it’s like trying to have a serious conversation in the middle of a mosh pit. Isolation levels are those rules. They’re the database’s way of letting you choose how much of that mosh pit chaos you’re willing to tolerate in exchange for performance. Get it wrong, and you’ll encounter some truly bizarre behavior, formally known as concurrency anomalies. Let’s meet the usual suspects.
The Dirty Read
This one is the most straightforward, and frankly, a bit of a clown show. A Dirty Read occurs when a transaction reads data written by another transaction that hasn’t committed yet.
Why is this a problem? Because that other transaction might decide to roll back. You just read data that not only isn’t official yet, but that may never become official. You’ve made a decision based on a lie.
Think of it like seeing a “50% OFF” sign being put up in a store window, rushing in to buy the thing, and then the manager comes out and says, “Oh, sorry, Frank wasn’t supposed to put that sign up yet,” and charges you full price. You acted on information that was never truly valid.
This is why every sane database’s READ UNCOMMITTED level is basically a “here be dragons” flag. You might use it for an approximate analytical count where absolute accuracy isn’t critical, but you’d never use it for anything involving business logic.
-- Transaction 1 (The Uncommitter)
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance + 100.00 WHERE user_id = 1;
-- Balance for user 1 is now 150.00... but wait, we haven't committed!
-- Transaction 2 (The Reader, using READ UNCOMMITTED)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE user_id = 1; -- Reads 150.00
-- Transaction 2 sees the uncommitted data! It might now do something based on this.
-- Transaction 1 (The Uncommitter has a change of heart)
ROLLBACK; -- The update never happened. The true balance is still 50.00.
-- Transaction 2 is now operating on a complete fiction.
The Non-Repeatable Read
A step up from the chaos of dirty reads. A Non-Repeatable Read happens when you read the same row twice within a transaction and get different values because another transaction committed an update in between.
Notice the key difference from a dirty read: this data has been committed. It’s real, official data. The anomaly is that your transaction’s view of the world is not consistent. You can’t “repeat” your read and get the same result.
This breaks the illusion of isolation. If your business logic depends on a value not changing for the duration of your transaction (e.g., checking a balance before performing a withdrawal), this can cause serious problems.
-- Transaction 1
SET TRANSACTION ISOLATION LEVEL READ COMMITTED; -- This allows Non-Repeatable Reads!
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE user_id = 1; -- Returns 50.00
-- Transaction 2
BEGIN TRANSACTION;
UPDATE accounts SET balance = 0.00 WHERE user_id = 1;
COMMIT; -- The balance is now officially 0.00
-- Back to Transaction 1
SELECT balance FROM accounts WHERE user_id = 1; -- Now returns 0.00!
-- The same query, same transaction, different result. The world changed underneath us.
COMMIT;
The Phantom Read
Phantom reads are the evil cousin of non-repeatable reads. Instead of the values in existing rows changing, Phantom Reads occur when new rows are inserted or deleted by another transaction, causing a query to return a different set of rows.
You run a WHERE clause, get a set of rows, then later you run the exact same WHERE clause and suddenly new rows (phantoms) appear or existing ones vanish. This is particularly devastating for operations that rely on a consistent set of rows, like reporting, counting, or applying a business rule to all matching records.
-- Transaction 1
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; -- This level prevents Non-Repeatable Reads but may allow Phantoms in some DBs!
BEGIN TRANSACTION;
SELECT * FROM accounts WHERE balance > 100.00; -- Returns 0 rows.
-- Transaction 2
BEGIN TRANSACTION;
INSERT INTO accounts (user_id, balance) VALUES (3, 500.00);
COMMIT; -- The new rich user is now committed.
-- Back to Transaction 1
SELECT * FROM accounts WHERE balance > 100.00; -- Now returns 1 row (user_id 3)!
-- The phantom has appeared! Our result set is no longer consistent.
COMMIT;
The Sneaky One: Write Skew
This is the most subtle and insidious anomaly, and it explains why the strongest isolation level (SERIALIZABLE) exists. Write Skew occurs when two transactions concurrently read the same set of data, make independent decisions based on what they read, and then update different parts of that data, leading to a state that would never have been allowed if the transactions had run one after the other.
It’s a violation of a business rule constraint that isn’t enforced by a simple single-row check. The classic example is a system with two doctors who can’t both be off-call at the same time.
-- The rule: At least one doctor must be on_call at all times.
-- Assume currently both doctors 1 and 2 are on_call.
-- Transaction 1 (for Doctor 1)
BEGIN TRANSACTION;
SELECT on_call FROM doctors WHERE id = 2; -- Sees that Doctor 2 is on_call. Thinks, "Great, I can go off-call."
-- ...sometime later...
UPDATE doctors SET on_call = FALSE WHERE id = 1; -- This seems safe!
COMMIT;
-- Transaction 2 (for Doctor 2, running concurrently)
BEGIN TRANSACTION;
SELECT on_call FROM doctors WHERE id = 1; -- Sees that Doctor 1 is on_call. Thinks, "Great, I can go off-call."
-- ...sometime later...
UPDATE doctors SET on_call = FALSE WHERE id = 2; -- This also seems safe!
COMMIT;
-- Final state: Both doctors are off-call. The business rule is violated.
Neither transaction wrote to a row the other was writing to, so REPEATABLE READ wouldn’t block it. Each transaction’s view of the world was consistent at the time of its read, but the final outcome is inconsistent. This is the kind of bug that only shows up in production at 2 AM on a Tuesday. To prevent it, you need true serializability, which might use aggressive locking or fancy algorithms like Serializable Snapshot Isolation (SSI) to detect this potential conflict and abort one of the transactions.
The takeaway? Choose your isolation level like you’re choosing armor. READ COMMITTED is the default for a reason—it’s a good balance. But if your business logic has constraints spanning multiple rows, you need to understand that REPEATABLE READ might not be enough. You need to reach for SERIALIZABLE and pay the performance price, because the alternative is data corruption that’s incredibly hard to trace.