18.3 CompareAndSwap: The Foundation of Lock-Free Algorithms
Alright, let’s get our hands dirty. If you’re going to understand lock-free programming, you need to wrap your head around its most fundamental primitive: CompareAndSwap (CAS). Forget mutexes for a minute. We’re not asking for permission anymore; we’re going to just try to update things and see if it worked. It’s the difference between politely raising your hand and just shouting out the answer. One is polite; the other is faster, but you might get it wrong and have to try again.
CAS is the atomic equivalent of “I’ll only take this candy bar if no one else has touched it since I last looked.” It’s a single, uninterruptible operation that does three things:
- It looks at a memory address (
&myCandyBar). - It compares the current value there to an expected value (
expected == looksUntouched). - Only if the comparison is true, it swaps in a new value (
new = takenByMe). And it tells you whether it succeeded.
The entire operation is a single atomic instruction on the CPU. This is the magic. It’s impossible for another thread to change the value after you read it but before you write it, which is the classic race condition nightmare. The CPU hardware itself prevents it.
In Go, this is exposed through methods like atomic.CompareAndSwapInt64. Let’s look at its signature:
func CompareAndSwapInt64(addr *int64, old, new int64) (swapped bool)
It’s beautifully simple. You give it a pointer to the value, what you expect it to be (old), and what you want it to be (new). It returns true if the swap happened, false if not. If it returns false, you know someone else slipped in and changed the value right under your nose. Your job, in a lock-free algorithm, is to then figure out what to do next (usually, you just try again).
The Classic Example: A Simple Spinlock
Let’s not get too fancy. The simplest thing you can build with CAS is a spinlock. It’s not always the best idea (spinning burns CPU cycles), but it’s a perfect pedagogical tool.
package main
import (
"fmt"
"sync/atomic"
)
// SimpleSpinLock is a toy. Don't use this in production without serious thought.
type SimpleSpinLock struct {
state int64 // 0 is unlocked, 1 is locked
}
func (s *SimpleSpinLock) Lock() {
// We spin until we successfully change the state from 0 (unlocked) to 1 (locked)
for !atomic.CompareAndSwapInt64(&s.state, 0, 1) {
// In a real lock, you might call runtime.Gosched() here to be polite.
// But for this example, we'll just burn cycles like it's 1999.
}
}
func (s *SimpleSpinLock) Unlock() {
// We just set it back to 0. We don't need CAS here because we assume the holder is the only one unlocking.
atomic.StoreInt64(&s.state, 0)
}
func main() {
var lock SimpleSpinLock
var counter int64
// Launch 10 goroutines that hammer the counter
for i := 0; i < 10; i++ {
go func() {
for i := 0; i < 1000; i++ {
lock.Lock()
counter++ // This is our critical section
lock.Unlock()
}
}()
}
// Wait for goroutines to finish... (use a WaitGroup in real code)
fmt.Println("Counter:", counter) // Will reliably print 10000
}
See what happened there? Lock() keeps trying to CAS the state from 0 to 1. If it’s already 1, CAS fails and the loop continues. When the lock is freed (state set to 0), one goroutine’s CAS will finally succeed, breaking it out of the loop and into the critical section. This is the essence of most lock-free operations: a loop that keeps trying until it succeeds.
The ABA Problem: A Ghost in the Machine
Here’s where things get weird, and where CPU architects rightfully earned their pay. Meet the ABA problem. Imagine this:
- Thread 1 reads value
Afrom addressX. - Thread 1 gets preempted. Thread 2 runs.
- Thread 2 changes
XfromAtoB. - Then, another thread changes
Xback fromBtoA. - Thread 1 resumes and calls
CAS(&X, expected=A, new=C).
What happens? The CAS succeeds! The value is A, just like it expected. But the state of the world is completely different. The memory at X has been to B and back. For a simple counter or a lock, this might be fine. But if A was a pointer… oh boy.
Imagine a lock-free stack. A points to the top node. You read A, and it points to node 1. You get ready to pop it and push a new node. Meanwhile, another thread pops node 1, causing the top to point to node 2 (B). Then it pops node 2, and pushes node 1 back on, so the top is A again. Your CAS will succeed, but now you’ll be setting the next pointer of your new node to the original node 1, which has already been popped and might be sitting in some other thread’s memory waiting to be reused. You’ve just corrupted the entire stack.
Go’s sync/atomic package provides CompareAndSwapPointer which is susceptible to this on most architectures. However, it’s less of a concern in managed memory languages like Go than in C/C++ because the garbage collector generally prevents a pointer from being reused while another goroutine might still hold a reference to it. But it’s a classic pitfall you must be aware of. The common solution is to use a “double-wide” CAS that also checks a version number or a counter alongside the pointer. Go’s sync/atomic doesn’t directly offer this, which is a conscious (and sometimes frustrating) design choice to keep the API simple.
The Loop of Retry: The Pattern of Life
The most important pattern to internalize is the CAS loop. You will almost never call CAS once. You’ll be in a loop until you win. The structure always looks something like this:
for {
current := atomic.LoadInt64(&sharedValue)
new := current + 1 // Or some more complex operation
if atomic.CompareAndSwapInt64(&sharedValue, current, new) {
break // We did it! We escaped the loop!
}
// CAS failed. The value of 'current' is no longer true.
// Loop around, reload 'current', and try again.
}
This is the heartbeat of lock-free code. You observe, you calculate, you attempt to commit. If your commit fails because the world changed, you simply observe again and recalculate based on the new state. It’s a optimistic retry loop. It works brilliantly for low-contention scenarios. Under high contention, it can degenerate into a crowd of people all trying to get through a single door at once, everyone constantly bumping into each other and having to step back. In those cases, a good old-fashioned mutex might actually be faster because it puts threads to sleep instead of having them burn CPU. Profile, always profile.
So there you have it. CAS is your tiny, powerful, atomic brick. It’s deceptively simple, but building robust structures with it requires careful thought about memory ordering, contention, and sneaky problems like ABA. It’s the tool that lets you write code that doesn’t just stand in line, but politely and efficiently jostles for position.