Alright, let’s get our hands dirty with go tool trace. You’ve probably been staring at CPU and memory profiles until your eyes cross, wondering why your beautifully concurrent Go application isn’t going as fast as it should. Sometimes, the problem isn’t what your code is doing, but how and when the goroutines are being scheduled to do it. That’s where the execution tracer comes in. It’s like getting a top-down view of a busy highway system; a CPU profile just tells you which cars are revving their engines the hardest.

Think of the tracer as the ultimate recording device for your Go runtime. It captures a wealth of low-level runtime events over a short period: when goroutines start, block, unblock, and get scheduled on OS threads, when the garbage collector kicks in, and even network and syscall latency. It’s the closest thing we have to a time machine for understanding the orchestration, and occasional chaos, of your running program.

First, let’s generate a trace. It’s stupidly simple. You import runtime/trace, and start and stop recording. You can do this around a specific operation or, for a web server, you can hook it up to a /debug/trace endpoint.

package main

import (
    "os"
    "runtime/trace"
)

func main() {
    // Create a file to hold the trace
    f, err := os.Create("my_trace.out")
    if err != nil {
        panic(err)
    }
    defer f.Close()

    // Start the trace
    err = trace.Start(f)
    if err != nil {
        panic(err)
    }
    defer trace.Stop()

    // The code you want to trace goes here
    myFunctionThatIsTooSlow()
}

Run that code, and you’ll get a my_trace.out file. Now for the fun part. Fire up the trace tool:

go tool trace my_trace.out

This will launch a web server (usually on http://127.0.0.1:XXXX) and open your browser to a dashboard. This dashboard is your control center. Now, let’s break down what you’re actually looking at.

The Trace Dashboard: Your New Control Center

The main page presents a series of links, each a different lens on the same data. The most immediately useful views for scheduler issues are “View trace” and “Goroutine analysis”. “View trace” is the main event—a zoomable, panning timeline of your entire program’s execution. It can be overwhelming at first, like looking at a city’s metro map for the first time, but you’ll learn to read it.

The “View Trace”: Reading the Metro Map

This is the interactive timeline. The UI is… functional. Let’s be honest, it feels like it was designed by a brilliant backend engineer who had a UI deadline of “yesterday.” You’ll need to click and drag to pan, and use your mouse wheel or the W/S keys to zoom. It’s worth the friction.

Along the top, you’ll see rows for each logical CPU (or P in Go’s scheduler parlance), showing you which goroutine was running on which CPU at any nanosecond. Below that are rows for each goroutine, showing its state changes (runnable, running, blocked). Further down are rows for heap size, garbage collection cycles, and more.

The magic happens when you click on anything. See a long blank space on a CPU row? That’s idle time. Click it. The bottom details pane will tell you why the CPU was idle—often waiting for a goroutine to become runnable. See a goroutine that’s runnable for a long time before it gets scheduled? That’s a sign of scheduler saturation or a bottleneck. Click on it to see its stack trace, which tells you what was desperately trying to run.

Goroutine Analysis: The Group Therapy Session

This view aggregates all goroutines by their state and root call stack. It’s fantastic for answering questions like “What was the most common reason my goroutines were blocked?” You’ll get a list of categories like IO wait, channel send, channel receive, semaphore acquisition, and sync.Mutex.

If you see that 40% of your goroutine latency was spent blocked on a single mutex, congratulations, you’ve just found a major contention point. Clicking into that category will show you the exact stack trace that was holding the lock, leading you directly to the line of code you need to fix.

A Realistic Example: The Hidden Mutex Contention

Let’s say you have a simple function that uses a mutex to guard a shared counter. It seems fine, but under load, it performs terribly.

package main

import (
    "sync"
    "runtime/trace"
    "os"
)

var (
    counter int
    mu      sync.Mutex
)

func increment() {
    mu.Lock()
    defer mu.Unlock()
    counter++ // This is our critical section.
}

func main() {
    f, _ := os.Create("contention.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    var wg sync.WaitGroup
    for i := 0; i < 10000; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            increment()
        }()
    }
    wg.Wait()
}

When you trace this, you won’t see much CPU activity because the program is mostly blocked. In the “Goroutine analysis” view, you’ll see a huge number of goroutines blocked in semaphore acquisition (which is how the runtime represents mutexes). The view trace will show a comical scene: one goroutine briefly running on a CPU to increment the counter, while dozens of others are parked in the runnable state, waiting for their tiny turn with the lock. The fix, of course, might be to use a sync/atomic operation or a different structuring, but the trace pinpointed the exact nature of the problem instantly.

Best Practices and Pitfalls

  1. Trace the Right Thing: Tracing adds overhead. Don’t trace your entire 8-hour production run. Isolate the specific operation that’s slow (a single API call, a batch job) and trace just that.
  2. The Time Problem: The trace view’s default time units are nanoseconds. Your brain doesn’t think in nanoseconds. Use the timeline to identify relative problems—“this blocking event is 1000x longer than that one”—not absolute ones.
  3. Correlation is Key: The tracer is phenomenal at showing the effect (blocked goroutines) and the cause (e.g., a GC stall, a network wait). Your job is to connect the two by clicking around. The stack traces in the detail view are your best friend.
  4. It’s Not a Crystal Ball: The tracer shows you what the runtime did, not what you expected it to do. Be prepared to have your assumptions shattered. I’ve seen traces that revealed latency was due to a time.Sleep() call in a vendor dependency I’d never even read. The tracer doesn’t lie.

The go tool trace is the most powerful tool for understanding the concurrent behavior of your application. It has a learning curve steeper than a simple pprof flame graph, but the insight it provides into the scheduler’s mind is absolutely unparalleled. Master it, and you’ll stop guessing about performance and start knowing.