36.6 Error Group: golang.org/x/sync/errgroup
Right, let’s talk about errgroup. You’ve been there: you have a handful of goroutines doing work, and you need to wait for them all to finish, but if any of them fails, you want to cancel the whole operation immediately. You could roll this yourself with a sync.WaitGroup, some channels, and a context.Context for cancellation, but you’d be writing the same boilerplate for the tenth time this week. Stop that. The fine folks at the Go team felt your pain and gave us golang.org/x/sync/errgroup. It’s essentially a WaitGroup that understands errors and context.
Think of an errgroup.Group as a sophisticated bouncer for your goroutines. You hand it tasks, and it makes sure they all get in. But the moment one of them causes a scene (returns an error), the bouncer kicks everyone out and tells you who started it.
First, you need to get it: go get golang.org/x/sync/errgroup.
The Basic Setup: Go and Wait
The API is beautifully simple. You create a new group, fire off your concurrent tasks with the Go method, and then Wait for them all to complete.
package main
import (
"context"
"errors"
"fmt"
"time"
"golang.org/x/sync/errgroup"
)
func main() {
g, ctx := errgroup.WithContext(context.Background())
// Goroutine 1: A task that succeeds after a bit.
g.Go(func() error {
time.Sleep(100 * time.Millisecond)
fmt.Println("Task 1: Succeeded!")
return nil // The golden child.
})
// Goroutine 2: A task that fails spectacularly.
g.Go(func() error {
time.Sleep(50 * time.Millisecond)
return errors.New("Task 2: exploded spectacularly") // The problem child.
})
// Goroutine 3: A task that takes a while but gets canceled.
g.Go(func() error {
select {
case <-time.After(2 * time.Second):
fmt.Println("Task 3: Would have succeeded, but no one will ever know.")
return nil
case <-ctx.Done():
fmt.Println("Task 3: Canceled because someone else failed:", ctx.Err())
return ctx.Err() // It's good practice to return the cancellation reason.
}
})
// Wait for all goroutines to finish, and get the first error returned.
if err := g.Wait(); err != nil {
fmt.Println("Overall error:", err)
}
// Output will be something like:
// Task 2: exploded spectacularly
// Task 3: Canceled because someone else failed: context canceled
// Overall error: Task 2: exploded spectacularly
}
The key thing to notice here is that the moment Task 2 fails, the ctx passed to the group is canceled. This is the mechanism by which the other goroutines are notified to pack it up. Task 3 is smart enough to listen for ctx.Done(), so it stops its long-running work immediately. If it weren’t listening, it would just keep running in the background like a ghost process, which is a great way to introduce subtle resource leaks.
The Magic of WithContext
You’ll always start an errgroup with errgroup.WithContext(parentCtx). This does two crucial things:
- It gives you a new
errgroup.Group. - It gives you a derived
context.Contextthat is canceled the instant the first non-nil error is returned from any of the goroutines.
This context is your lifeline for making your concurrent tasks responsive to failure. You must pass this context down to any function inside your goroutine that accepts a context. This is how you get cancellation for free: HTTP requests, database calls, you name it—if it takes a context, it will honor the cancellation.
g, ctx := errgroup.WithContext(context.Background())
g.Go(func() error {
// Use the group's context, not the original background one!
req, _ := http.NewRequestWithContext(ctx, "GET", "https://example.com", nil)
resp, err := http.DefaultClient.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
// ... process response ...
return nil
})
If another goroutine in this group fails, the ctx will be canceled, and this HTTP request will be cleanly aborted. This is the pattern you’ll use for almost everything.
The One Major Pitfall: Error Shadowing
Here’s the gotcha. Wait() only returns the first non-nil error. This is usually what you want—the root cause. But sometimes, you might care about all the errors that occurred. The standard errgroup does not provide this. If you need that, you’re looking at a different beast, like a slice of errors protected by a mutex, or a more advanced library like github.com/hashicorp/go-multierror.
This design choice keeps the API simple and focused on the most common use case: “succeed fast or fail fast.” It assumes that if one part of your coordinated operation fails, the other failures are probably just cascading effects of the initial problem and are less important to log individually.
Best Practices and The Obvious Thing Everyone Forgets
Check for Cancellation: Inside your goroutines, especially long-running ones, you must periodically check
ctx.Done(). If you don’t, your goroutine will become un-cancelable, defeating the entire purpose of using an errgroup. Useselectstatements withctx.Done()or use context-aware functions (likehttp.Doshown above).Return the Context’s Error: When you are canceled, it’s idiomatic to return
ctx.Err()instead of justnilor some custom “I was canceled” error.ctx.Err()will neatly be eithercontext.Canceledorcontext.DeadlineExceeded.It’s for Coordination, Not Management: The errgroup manages the execution and error handling of your goroutines, but it doesn’t limit the number of them. If you fire off ten million goroutines with
Go(), you’ll have ten million goroutines. For that, you’d pair it with a semaphore (a buffered channel) to control concurrency, which is a pattern so common it’s practically part of the package.
g, ctx := errgroup.WithContext(context.Background())
semaphore := make(chan struct{}, 3) // Allow only 3 goroutines to run concurrently
for i := 0; i < 10; i++ {
// Acquire a semaphore slot (blocking if full)
semaphore <- struct{}{}
i := i
g.Go(func() error {
defer func() { <-semaphore }() // Release the slot when done
// Do work with ctx here...
fmt.Printf("Working on task %d\n", i)
return nil
})
}
In essence, errgroup is one of those packages that feels like it should be in the standard library. It solves a very specific but incredibly common problem with an elegant, context-aware solution. Use it whenever your goroutines are working towards a common goal and a single point of failure should scuttle the whole mission. Just remember to actually listen when it tells you to go home.