37.1 pprof: CPU and Memory Profiling

Right, let’s talk about pprof. This isn’t some abstract academic concept; it’s the scalpel you use when your application starts coughing up blood. You don’t just “think” your code is slow—you know it, with data. pprof is how you get that data. It’s the single most powerful tool in the Go profiler’s arsenal, and it’s built right into the standard library. The designers at Google, for all their quirks, absolutely nailed this one.

The basic idea is gloriously simple: your running application exposes an HTTP endpoint. When you hit that endpoint, it starts silently collecting samples of what your program is doing, many times a second. You then use the go tool pprof command to pull that data and explore it. It’s like attaching a heart monitor to your code.

First, you need to wire it up. The common way is to import net/http/pprof. It automatically registers its handlers with the default HTTP mux. This is the “just get it working” approach.

package main

import (
    _ "net/http/pprof" // Side-effect import: registers handlers on default mux
    "net/http"
)

func main() {
    // Your amazing, and currently slow, application code here
    go someIntensiveFunction()

    // This fires up the pprof server on port 6060
    http.ListenAndServe("localhost:6060", nil)
}

Yes, the underscore import looks a bit weird. It’s a side-effect import—its sole job is to run its init() function, which slaps a bunch of profiling routes onto http.DefaultServeMux. It feels a bit magical, and I usually hate magic, but in this case, the convenience is worth it.

Now, with your app running, the magic endpoints are live. The big ones are:

http://localhost:6060/debug/pprof/heap for memory profiles
http://localhost:6060/debug/pprof/profile for a CPU profile (this one will profile for 30 seconds by default)

Collecting a CPU Profile

This is your first stop. A CPU profile doesn’t measure time; it measures execution time. It works by having the OS send a SIGPROF signal to your application at a crazy high rate (100 times a second by default). Each time the signal hits, the Go runtime records a stack trace. The genius is in the aggregation: if a function appears in 30% of the collected stack traces, it’s estimated to be using 30% of the CPU cycles. It’s statistically brilliant and, frankly, a bit obvious in hindsight.

To grab a 30-second profile, you run:

go tool pprof http://localhost:6060/debug/pprof/profile

This will block, collect data, and then drop you into an interactive shell. You can also save it to a file for later:

go tool pprof -pdf http://localhost:6060/debug/pprof/profile > cpu.pdf

This will give you a beautiful, and often horrifying, call graph showing you exactly which functions are burning your CPU.

Sniffing Out Memory Allocations

Memory profiling is a bit trickier. The heap profile isn’t a raw dump of your heap; it’s a sample of currently allocated memory objects and past allocations that are now garbage (kept in a separate, aptly named “allocation pool”). This is crucial to understand. When you look at a heap profile, you’re seeing where memory is currently living, but also where it was born before it got garbage collected.

The most common way to collect it is the same:

go tool pprof http://localhost:6060/debug/pprof/heap

But here’s the first “gotcha”: this gives you a snapshot of current allocations. If you have a memory leak, it’s often more useful to see what’s accumulating. For that, you use -alloc_space instead of the default -inuse_space view.

go tool pprof -alloc_space http://localhost:6060/debug/pprof/heap

Now the profile shows you the total volume of memory allocated throughout the program’s life, which points a giant arrow at the function that’s creating all those short-lived objects that never seem to die.

The Interactive Shell and Your Best Friends: top and list

Once you’re in the go tool pprof interactive shell, two commands are your workhorses. top10 shows you the top 10 functions consuming the resource. Each line shows flat vs cumulative time. “Flat” means time spent in that function alone. “Cumulative” means time in that function plus all the functions it calls. If a function has high flat time, it’s doing heavy lifting itself. If it has low flat but high cumulative, it’s a coordinator calling other expensive functions.

But the real magic is list. You run list <functionName>, and it shows you the actual source code of that function, annotated line-by-line with the cost of each operation. It’s the moment the abstraction breaks and you see the exact for loop or json.Marshal call that’s murdering your performance. It’s so good it feels like cheating.

The Cardinal Sin of Profiling

Here’s the biggest mistake, the one we’ve all made: profiling your code on your MacBook while it’s running in “happy local dev mode” with no real load. The profiles will be useless. You must profile under production-like load. If that’s not possible, you need to write a benchmark or a load test that realistically simulates the worst-case scenario. Profiling an idle application tells you precisely nothing. It’s like checking your oil with the engine off.

pprof is your truth-teller. It cuts through guesswork, superstition, and “well it feels slow” conversations with hard, undeniable data. Get comfortable with it. Your future self, the one who isn’t getting paged at 3 AM, will thank you.