21.4 Using Recover to Prevent Library Panics from Crashing Callers
Right, so you’ve decided you don’t want your library to be the reason someone else’s production service goes down in a ball of flames. Good call. A panic bubbling up from your code into a caller you don’t control is the professional equivalent of setting off a fire alarm and then leaving the building. It’s rude, unprofessional, and leaves everyone else to deal with the mess.
The escape hatch for this specific problem is recover. It’s Go’s panic button, literally. You use it inside a defer to catch a panic that’s happening on the same goroutine. Think of it as a net that you stretch out below you just as you’re about to jump. If you don’t jump, the net just hangs there, useless. If you do jump, it catches you before you hit the ground and splatter all over the innocent pedestrians below (your callers).
Here’s the absolute simplest, almost useless version of it:
func MightPanic() {
defer func() {
if r := recover(); r != nil {
fmt.Println("Recovered from panic:", r)
}
}()
// This will panic, but we'll catch it!
panic("oh no")
}
Run that, and it will print “Recovered from panic: oh no” and then return peacefully to its caller instead of crashing the whole program. Neat, right? But just catching the panic and logging it is what I call the “print and die locally” pattern. For a library, it’s barely better than letting the panic through. You’ve just turned a crashing panic into a silent failure. Your caller has no idea it happened! They’ll just wonder why their function didn’t do anything.
Turning a Panic into an Error
This is the part the designers got right. Since recover() returns whatever was passed to panic(), you can use it to convert a catastrophic, program-ending event into a simple, manageable error. This is the gold standard for library code.
func SafeParser(data string) (result int, err error) {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("parser panicked: %v", r)
}
}()
result = ParseThatRiskyData(data) // Let's say this function might panic
return result, nil
}
Now, the caller of SafeParser can handle this just like any other error with standard if err != nil { logic. They don’t need to know you had a civil war inside your function; they just get a report that things didn’t go well. This is clean, idiomatic, and respectful.
What You Actually Recover
This is a crucial detail: recover() returns an interface{}. You have absolutely no guarantee what that value is. A lot of code panics with strings (panic("divide by zero")), the built-in error type exists for a reason, people! But you’ll also encounter panics from the runtime (like slice bounds out of range) which you have even less control over.
This means your recovery code must be defensive. You can’t just type assert it to a string or an error. You have to handle anything. The most robust way is to turn it into an error, as shown above, using fmt.Errorf("%v", r). This handles any type gracefully.
The Goroutine Gotcha
This is the big one, the footgun that gets everyone. recover only works for panics happening in the same goroutine. If you spin up a new goroutine and that guy panics, your defer in the parent goroutine is useless. The panic will crash the whole program.
func ThisWillStillCrash() {
defer func() {
recover() // This does nothing! The panic is in a different goroutine.
}()
go func() {
panic("in a goroutine")
}()
time.Sleep(time.Second) // give the goroutine a moment to explode
}
Yep, that code will still crash. If you’re firing off goroutines, they need their own deferred recoveries. It’s like giving each acrobat their own net instead of hoping the one net on the ground will catch everyone.
func SafeGoroutine() {
go func() {
defer func() {
if r := recover(); r != nil {
log.Printf("recovered in goroutine: %v", r)
}
}()
panic("all good, this is contained")
}()
time.Sleep(time.Second)
}
Knowing When Not to Recover
Here’s the controversial bit: sometimes, you shouldn’t recover. A panic is often a sign of a profound logic error, something so broken that continuing is dangerous. If you catch a panic, you’ve stabilized the patient, but you have no idea what internal state was corrupted in the process.
Did you panic in the middle of modifying a shared map? Now the map might be locked or in a partial state. Did you panic while holding a mutex? Congrats, you’ve probably just created a deadlock.
The best practice is to only use recover to:
- Prevent a crash at a boundary (like a library public API, or an HTTP request handler).
- Clean up resources in the process of panicking (close files, unlock mutxes).
- Log the failure with as much context as possible before converting it to an error.
You are not using it to pretend the problem didn’t happen. You’re using it to fail gracefully. The difference is everything. You’re telling your caller “I failed,” not “I failed and then kept going like an idiot, probably corrupting your data.”