36.5 Pipes (|): Connecting stdout to stdin
Right, let’s talk about the pipe. This little vertical bar (|) is arguably the single most elegant and powerful piece of punctuation in the entire shell universe. It’s the duct tape of the command line, and I mean that in the most respectful way possible. It takes the gushing firehose of text from one program’s standard output (stdout) and plugs it directly into the next program’s standard input (stdin). No temporary files, no manual copy-pasting, just a clean, direct connection.
Think of it like an assembly line. Program A does its one specific job and pushes the result onto the conveyor belt. Program B, which is specifically designed to take things from a conveyor belt, picks it up and does its job. Neither program needs to know or care about the other’s existence; they just agree on the universal interface of text streams. This philosophy of small, focused tools that do one thing well is the entire reason the Unix shell (and by extension, Linux) is so fantastically powerful.
Here’s the canonical example, the “Hello, World!” of piping:
ls -l /usr/bin | grep bash
This isn’t just a command; it’s a sentence. It reads: “List the contents of /usr/bin in long format, and then filter that list for lines containing the word ‘bash’.” The ls command blissfully outputs everything, and grep, which was built to receive input, sifts through it. You didn’t have to save the output of ls to a file and then tell grep to read that file. The pipe handles it in one fluid motion.
How It Really Works (It’s Not What You Think)
Don’t imagine the first command finishing and then the second one starting. That would be painfully inefficient. The shell sets up the pipe before it forks and runs either command. Both programs start running concurrently. As soon as ls writes its first bytes of output, they are immediately available for grep to read. The operating system’s scheduler handles the switching between them. If the buffer in the pipe fills up, the writing process (ls) is put to sleep until grep reads some data and makes space. Conversely, if grep tries to read from an empty pipe, it’s put to sleep until ls writes something. This interleaved execution is incredibly efficient and is the secret sauce behind being able to process massive datasets without grinding your machine to a halt.
Building Longer Pipelines
The real magic happens when you chain more than two commands together. You can create a sophisticated data processing workflow right on the command line.
# Find all .log files, search them for "ERROR", sort the results, and then show unique lines with a count
find /var/log -name "*.log" -exec cat {} \; | grep ERROR | sort | uniq -c | sort -nr
Let’s break down this beautiful monstrosity:
finddigs through/var/logandcats the contents of every log file it finds, spewing it all into the pipe.grepfilters this torrent, only letting through lines containing “ERROR”.sortalphabetically orders all the error lines. This is a crucial step because…uniq -cdepends on sorted input to correctly count consecutive duplicate lines.- Finally,
sort -nrsorts the results numerically (-n) in reverse order (-r) so the most frequent error is at the top.
This pipeline is a perfect example of leveraging each tool’s specialty. You’d need hundreds of lines of Python code to replicate this, and it almost certainly wouldn’t be as fast.
Common Pitfalls and The Buffering Gotcha
Here’s the part where I call out the rough edges. The biggest “oh come on” moment with pipes is buffering.
Programs often buffer their output to be efficient. Instead of writing every single character immediately, they collect data in a chunk in memory (a “buffer”) and write it all at once. There are three modes: unbuffered, line-buffered (flush the buffer on every newline \n), and fully-buffered (flush only when the buffer is full).
When a program’s output is going to a terminal, it’s usually line-buffered so you see results immediately. But if the shell detects that its output is going to a pipe (not a terminal), it often switches to full buffering to maximize throughput.
This can be maddening. You’ll run a long-running command piped to grep and sit there staring at an empty screen for minutes, wondering if it’s frozen. The first command is happily running and filling its buffer, but grep hasn’t seen a single byte because the buffer hasn’t been flushed yet.
The solution? Use stdbuf if you need to. The grep command is actually a good citizen here; it’s typically line-buffered. But for other commands like python scripts, you might need to force the issue:
# Force line buffering on the output of a Python script
python3 slow_generator.py | stdbuf -oL grep "pattern"
The -oL option tells stdbuf to set the standard output stream to line buffering.
Best Practices and Knowing When to Stop
Pipes are brilliant, but they aren’t the solution to every problem.
- Know Your Tools: Some commands have flags that make pipes unnecessary. Need to search through files?
grep -ris often better thanfind ... -exec cat {} \; | grep. - Exit Status: By default, the exit status of a pipeline is the exit status of the last command. This is usually what you want (
grepfound something = success). If you need to know if the first command succeeded, you need to checkPIPESTATUSin Bash (an array of all exit codes) or use other shell-specific methods. - Binary Data: Pipes are for text. If you try to pipe binary data (like an image from
curl), you might run into issues if a downstream tool likegrepmodifies line endings or otherwise mangles the stream. For binary pipelines, you need to be extremely careful and use tools specifically designed for it (likebase64encoding/decoding if you must).
The pipe is a masterpiece of simple design. It empowers you to compose the system’s existing tools into new, more powerful tools on the fly. It’s the reason the command line remains, decades later, the most potent and efficient programming environment ever created. Now go plumb something together.