29.5 Benchmarks: func BenchmarkXxx(b *testing.B)

Right, so you’ve written some code and it doesn’t explode. Congratulations. But is it fast? Or, more importantly, is it fast enough? And how do you know if your latest “optimization” actually made things better or just made the code look like a Rube Goldberg machine? You guess. I benchmark.

In Go, benchmarking isn’t a dark art; it’s a first-class citizen built right into the testing package. A benchmark function looks almost identical to a test function, but it uses a different parameter: *testing.B instead of *testing.T.

Here’s the simplest possible benchmark. Let’s say you have a function you’re deeply suspicious of, like this overly clever string reversal:

package main

func Reverse(s string) string {
    r := []rune(s)
    for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
        r[i], r[j] = r[j], r[i]
    }
    return string(r)
}

To benchmark it, you’d create a file reverse_test.go:

package main

import "testing"

func BenchmarkReverse(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Reverse("Hello, 世界")
    }
}

The magic sauce is b.N. The benchmark framework will run your function over and over, increasing b.N each time until it collects a statistically significant measurement. You don’t decide how many times it runs; the framework does. Your job is to make sure the loop runs b.N times, putting the code you want to measure inside it.

Run it with go test -bench=.. The . is a regex matching all benchmarks; use -bench=BenchmarkReverse to run just one. You’ll get output that looks like this:

goos: darwin
goarch: arm64
pkg: github.com/you/yourproject
BenchmarkReverse-8   	12345679	        92.4 ns/op
PASS

That tells us that on my Apple Silicon Mac, the benchmark ran 12,345,679 times, and each call to Reverse took about 92.4 nanoseconds. The -8 indicates it used 8 CPU cores.

The `b.N` Loop is Your Hammer, Not Your Anvil

This is the most common rookie mistake. Do not do expensive setup inside the b.N loop. You will benchmark your setup code and completely skew the results. If you need to set up a database connection, pre-compute a large dataset, or allocate a massive buffer, do it outside the loop and then reset the timer.

func BenchmarkExpensiveOperation(b *testing.B) {
    // This runs ONCE before the benchmark loop even starts.
    hugeThing := createAHugeThingThatTakesForever()
    
    // Now, reset the timer to exclude the setup time from the benchmark.
    b.ResetTimer()
    
    for i := 0; i < b.N; i++ {
        // Now we're only measuring the operation itself.
        hugeThing.Operate()
    }
}

If your setup is per-operation and can’t be reused, but is still expensive, use b.StopTimer() and b.StartTimer() to pause the clock. It’s clunky, but it works.

func BenchmarkOperationWithPerLoopSetup(b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer() // Pause the benchmark timer
        data := loadGiganticFile() // This setup is slow
        b.StartTimer() // Resume the timer

        actualOperationWeCareAbout(data) // Measure only this
    }
}

Benchmarking with Inputs: Don’t Cheat with a Constant

The first benchmark used a constant input ("Hello, 世界"). This is often unrealistic. The runtime’s optimizer is fiendishly clever and might just compute the result once and cache it, making your function look impossibly fast. To avoid this, you need to provide varying inputs. A common pattern is to use a slice of possible inputs and cycle through them inside the loop.

func BenchmarkReverseWithInputs(b *testing.B) {
    testCases := []string{
        "Hello",
        "Hello, 世界",
        "This is a much longer string to see if that affects performance",
    }
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        // Use the current benchmark index to choose an input,
        // ensuring we cycle through them and avoid constant propagation.
        input := testCases[i%len(testCases)]
        Reverse(input)
    }
}

Benchmarks are Code: Keep Them Honest

Just like your regular code, benchmarks can have bugs. The most insidious one is a benchmark that doesn’t actually run the code you think it does because the result is unused. The compiler might simply optimize the entire call away. Always sink the result to a package-level variable to prevent this.

var sink string // A package-level variable to store results

func BenchmarkReverse(b *testing.B) {
    for i := 0; i < b.N; i++ {
        sink = Reverse("Hello, 世界") // Assigning to 'sink' prevents optimizations
    }
}

It feels silly, but it’s absolutely necessary. The sink variable is a global so the compiler can’t see that the result of the benchmark is never actually used elsewhere in your program and thus can’t be eliminated.

Comparing Apples to Apples: `benchcmp` is Your Best Friend

The raw output of a single benchmark run is mildly interesting. The real power comes from comparison. You change your Reverse function, run the benchmark again, and… now what? Manually comparing two numbers is a pain.

Enter go get golang.org/x/tools/cmd/benchcmp. This tool takes the output of two benchmark runs and shows you the difference.

$ go test -bench=BenchmarkReverse > old.txt
# ...make your changes...
$ go test -bench=BenchmarkReverse > new.txt
$ benchcmp old.txt new.txt
benchmark            old ns/op     new ns/op     delta
BenchmarkReverse-8   92.4          45.1          -51.19%

A 51% improvement? Now you’ve got data, not a feeling. This is how you know your clever bit-shifting trick was actually worth the readability hit. Or, more often, how you confirm that your “optimization” made things 200% slower and you should quietly revert the change and never speak of it again. We’ve all been there. The benchmark doesn’t lie.

The b.N Loop is Your Hammer, Not Your Anvil

Benchmarking with Inputs: Don’t Cheat with a Constant

Benchmarks are Code: Keep Them Honest

Comparing Apples to Apples: benchcmp is Your Best Friend

The `b.N` Loop is Your Hammer, Not Your Anvil

Comparing Apples to Apples: `benchcmp` is Your Best Friend