36.4 Worker Pool Pattern: Bounding Concurrent Work

Right, the Worker Pool pattern. You’ve hit that beautiful moment in your Go journey where you realize that just slapping a go keyword in front of every function call is a fantastic way to trigger a cascading failure, get rate-limited into the next decade, or simply melt your machine’s CPU. Congratulations! Welcome to the big leagues, where we think about bounding our concurrency instead of just unleashing it like a herd of cats.

The core idea is simple but profound: you create a fixed number of “worker” goroutines (the pool) that all listen on a shared channel for jobs. You then feed jobs into that channel from another goroutine (often your main or a producer). The magic is that you’ve now put a hard upper limit on how many jobs can be processed concurrently at any given time. This is your concurrency governor. It prevents you from spawning a million goroutines that all try to simultaneously ask a database for a connection from a pool of only 100, thereby causing it to weep and die.

The Basic Blueprint

Let’s break it down into its constituent parts. You’ll need:

A channel to receive the jobs (jobsChan).
A channel to send the results back on (resultsChan). (Optional, if you need outputs).
A function to spin up your N workers.
A function, or logic, to send jobs into the jobsChan.

Here’s the skeleton code. It’s the scaffold you’ll hang almost every worker pool on.

// Number of workers. This is your concurrency limit.
const numWorkers = 5

// This is the "job" we're sending to the workers. Could be anything.
type Job struct {
    ID int
    // ... other job-specific data
}

func main() {
    // Create buffered channels. The buffer sizes are a tuning parameter.
    jobsChan := make(chan Job, 100)
    resultsChan := make(chan string, 100) // Result type depends on your job

    // Start the worker pool
    for w := 1; w <= numWorkers; w++ {
        go worker(w, jobsChan, resultsChan)
    }

    // Send jobs (this blocks until we close the channel later)
    go sendJobs(jobsChan)

    // Collect results (you might do this in a separate goroutine too)
    for r := 1; r <= totalNumberOfJobs; r++ {
        result := <-resultsChan
        fmt.Println(result)
    }

    // All jobs done, clean up. We'll talk about a better way to do this shortly.
    close(resultsChan)
}

// worker is the function that each goroutine runs.
func worker(id int, jobs <-chan Job, results chan<- string) {
    for job := range jobs { // This loop exits when jobs is closed and drained.
        // Do the actual work here. This is where you call that slow API, run the calculation, etc.
        fmt.Printf("Worker %d started job %d\n", id, job.ID)
        result := fmt.Sprintf("Worker %d finished job %d", id, job.ID) // Simulate work
        results <- result
        fmt.Printf("Worker %d finished job %d\n", id, job.ID)
    }
    fmt.Printf("Worker %d shutting down\n", id)
}

// sendJobs populates the jobs channel.
func sendJobs(jobs chan<- Job) {
    for i := 1; i <= 20; i++ { // Let's send 20 jobs
        jobs <- Job{ID: i}
    }
    close(jobs) // CRITICAL: This tells the workers there's no more work.
}

Graceful Shutdown: The Part Everyone Gets Wrong

The example above is naive. It just counts results. In the real world, you don’t always know how many results you’ll get, and you need to know when all workers are done so you can safely close the resultsChan. This is where sync.WaitGroup earns its keep. We use it to wait for all workers to finish after we’ve closed the jobsChan.

func main() {
    jobsChan := make(chan Job, 10)
    resultsChan := make(chan string, 10)
    var wg sync.WaitGroup

    // Start workers
    for w := 1; w <= numWorkers; w++ {
        wg.Add(1) // Tell the WaitGroup we're adding a goroutine
        go func(workerID int) {
            worker(workerID, jobsChan, resultsChan)
            wg.Done() // Tell the WaitGroup this goroutine is done
        }(w)
    }

    // Send jobs
    go func() {
        sendJobs(jobsChan)
        close(jobsChan) // Close jobs channel after sending all jobs
    }()

    // Wait for all workers to finish, THEN close results channel
    go func() {
        wg.Wait()      // Block here until all workers call wg.Done()
        close(resultsChan) // Now it's safe to close results
    }()

    // Collect results until the results channel is closed
    for result := range resultsChan {
        fmt.Println(result)
    }
    // Program exits naturally
}

This flow is robust and idiomatic. The WaitGroup ensures we don’t close the resultsChan while a worker is still trying to send to it, which would cause a panic.

Choosing Pool and Buffer Sizes (The Dark Art)

Here’s the uncomfortable truth: there’s no single right answer. The optimal number of workers depends entirely on what your task is.

I/O-bound work (HTTP requests, database calls, reading files): Your workers spend most of their time waiting. You can have a much larger pool size, often hundreds, because they’re not fighting for CPU. The limit here might be external, like a database’s max connections.
CPU-bound work (complex calculations, image processing): Your workers will actually use the CPU. Making a pool larger than your number of logical CPUs (runtime.NumCPU()) is often counterproductive. You’re just creating more goroutines for the scheduler to juggle, adding overhead for no gain. Start with runtime.NumCPU() and benchmark.

The channel buffer size is a performance tuning knob. A larger buffer lets the producer queue up many jobs without blocking, which can smooth out bursts. But it also uses more memory. A buffer size of 0 (unbuffered) means the producer and worker hand-off directly, synchronizing on every single job. This is safe but can be slower. A buffer size of 1 is already a huge improvement. Start with something sensible like 10 or 100, profile, and adjust.

Error Handling: Don’t Just Panic and Die

What if a job fails? The basic pattern above ignores this. A more robust approach is to structure your result to include an error.

type Result struct {
    JobID int
    Output string
    Err    error
}

func worker(id int, jobs <-chan Job, results chan<- Result) {
    for job := range jobs {
        output, err := doSomethingThatCanFail(job)
        results <- Result{JobID: job.ID, Output: output, Err: err}
    }
}

Then, when collecting results, you can check result.Err and handle retries, logging, or cancellation logic as needed. Speaking of cancellation, for anything non-trivial, you should use context.Context to pass cancellation signals to your workers, so you can tell them to stop early if needed.