Right, let’s get our hands dirty. You’ve just run your Go service under pprof, you’ve captured a profile, and now you’re staring at a terminal prompt or a scary-looking SVG. It feels like you’ve been handed the blueprints to a skyscraper written in a foreign language. Don’t panic. We’re going to learn that language together.

The first thing to internalize is that pprof is not a single tool; it’s a Swiss Army knife with a dozen blades. The most common profiles you’ll grab are the CPU profile and the Heap (memory) profile. They answer two fundamentally different questions: “What is burning my CPU time?” and “Where is my memory getting allocated?”.

Let’s start by firing it up. You’ve probably already added the net/http/pprof import to your main server and are hitting http://localhost:6060/debug/pprof/profile?seconds=30 to grab a 30-second CPU profile. You’ve got this binary blob, profile.pb.gz. Now what?

The Two Faces of pprof: Text and Web

You can go two ways from here: the quick-and-dirty terminal interface or the visual glory of the web UI. I almost always start in the terminal because it’s fast.

# This drops you into an interactive prompt
go tool pprof profile.pb.gz

# From the (pprof) prompt, top5 is your best friend
(pprof) top5
Showing nodes accounting for 320ms, 55.2% of 580ms total
Showing top 5 nodes out of 47
      flat  flat%   sum%        cum   cum%
     130ms 22.41% 22.41%      130ms 22.41%  runtime.futex
      70ms 12.07% 34.48%       70ms 12.07%  crypto/sha256.block
      50ms  8.62% 43.10%       50ms  8.62%  runtime.epollwait
      40ms  6.90% 49.98%      190ms 32.76%  myapp.com/parser.(*State).parseElement
      30ms  5.17% 55.17%       30ms  5.17%  runtime.memmove

See those columns? flat is time spent in that function alone. cum (cumulative) is time spent in that function and all the functions it calls. If a function has a high flat time, it’s doing heavy lifting itself (like that memmove). If it has a low flat but high cum, it’s a manager function—it’s not working hard, it’s just making a lot of calls to other functions that are working hard (like our parseElement).

But the text view has limits. This is where the web UI comes in. From the same pprof prompt, just type web. This will generate a flame graph. If your machine doesn’t have Graphviz, it will politely tell you to go install it. Do it. It’s worth it.

Deciphering the Flame Graph

A flame graph looks intimidating until you know the secret: the x-axis is not time. It’s the proportion of samples. Each stack trace from the profiler is represented as a single, horizontal bar. The width of the bar is how often that call path was present in the samples. The y-axis is the stack depth. The function on top called the one below it, which called the one below it.

So, to find your problem, you look for a wide bar that’s near the top. A wide bar at the very bottom (like runtime.futex) is often a red herring—it’s a low-level syscall that everything uses, so it’s always going to be sampled. The real culprit is usually the function a few layers up that’s causing all those futex calls. Click around. Hover over bars. The UI is fantastic for tracing the chain of responsibility upwards from a hot leaf node to the often-surprisingly innocent-looking function in your codebase that started it all.

The Critical Distinction: CPU vs. Heap Profiles

This is the part that trips everyone up. You must tell pprof what you’re looking at. Let’s grab a heap profile.

curl -s http://localhost:6060/debug/pprof/heap > heap.pb.gz
go tool pprof heap.pb.gz

If you just run top here, it shows you allocations, not usage. This is a classic “what were the designers thinking” moment. The heap profile’s default view tells you where memory is being allocated, which is often, but not always, the same as where it’s being retained. To see the latter, you need the -inuse_space flag.

go tool pprof -inuse_space heap.pb.gz
(pprof) top5
  • -inuse_space: Shows the amount of memory currently in use. (What’s live right now?)
  • -alloc_objects: Shows the total number of allocations that happened during the profile period. (How many times did we call malloc?)
  • -alloc_space: Shows the total amount of memory allocated, even if it was subsequently freed. (How much total memory did we churn through?)

Use -inuse_space to find memory leaks (why is my RSS so high?). Use -alloc_objects to find optimization opportunities (why is my garbage collector going nuts?).

A Real-World Example: The Hidden Allocator

Let’s say your top output shows a ridiculous number of allocations in io.ReadAll. You look at your code:

func parseConfig(r io.Reader) (Config, error) {
    data, err := io.ReadAll(r) // Allocation city!
    if err != nil {
        return Config{}, err
    }
    var config Config
    err = json.Unmarshal(data, &config)
    return config, err
}

io.ReadAll slams the entire stream into a []byte. That’s one big allocation. The flame graph would show a huge alloc_space bar for this. The fix? If you’re parsing JSON, skip the intermediate buffer and decode directly from the io.Reader:

func parseConfig(r io.Reader) (Config, error) {
    var config Config
    decoder := json.NewDecoder(r)
    err := decoder.Decode(&config)
    return config, err
}

Boom. Allocation gone. The flame graph bar vanishes. This is the real payoff: you used pprof to find a wide bar, understood what it meant (alloc_space), traced it to a line of your code, and applied a deeper understanding of the standard library to fix it. That’s the cycle. Now go do it again.