31.6 wc: Counting Lines, Words, and Bytes
Right, wc. The name stands for “word count,” which is a bit of a lie because it’s so much more useful than that. It’s the tool you reach for when you need to ask a file the most basic, fundamental questions: “How big are you? How many lines do you have? What’s the deal here?” It’s the digital equivalent of picking up a box and giving it a shake.
Let’s start with the simplest, most common invocation. You run wc and give it a file. It gives you back four numbers.
$ wc story_draft.txt
143 982 5431 story_draft.txt
From left to right: lines, words, bytes, filename. So, 143 lines, 982 words, and 5431 bytes. The byte count is, for all intents and purposes with plain text, the character count. It’s the first, most basic sanity check. Is this a 10-line config file or a 10,000-line novel? wc will tell you instantly.
The Three Core Counts (and the Forgotten Fourth)
You don’t have to take all three counts at once. You can ask for just what you need using flags:
-lfor lines-wfor words-cfor bytes (characters)-mfor characters… but wait.
Ah, -m. Here’s our first designer quirk. Why -c and -m? On modern systems with simple ASCII text, -c (bytes) and -m (characters) will report the same number. The distinction only matters for multibyte characters (like UTF-8). The POSIX standard says -c counts bytes and -m counts characters. But here’s the kicker: many implementations of wc, for performance reasons, don’t actually do the complex work to count multibyte characters unless they have to. They often just assume one byte equals one character unless the locale settings specifically demand otherwise. So, in practice, you’ll almost always use -c. Consider -m a rarely-used vestigial organ.
$ echo "café" | wc -c # Counts bytes. 'é' is 2 bytes in UTF-8.
5 # Includes the newline character!
$ echo "café" | wc -m # Should count characters.
4 # On some systems, you might get 5. It's a mess. Trust -c.
Piping: Where wc Truly Shines
You’ll rarely use wc on a standalone file. Its real power is at the end of a pipeline, telling you the final tally of some operation.
“How many files are in this directory?”
ls -1 | wc -l
(Though, pro tip: ls -1 | wc -l will be horrifically wrong if you have files with newlines in their names. Welcome to the trenches. For a more robust count, use find . -maxdepth 1 -type f | wc -l).
“How many times did I use the word ‘proprietary’ in my rant about software?”
grep -i "proprietary" software_rant.txt | wc -l
(Though, even better: grep -c does this directly.)
The Newline Gotcha: It’s Always There
This is the most important thing to internalize about wc -l. It doesn’t count lines of text; it counts newline characters. This is a crucial distinction. A file that ends without a newline character will be undercounted by one line. This isn’t a bug in wc; it’s by definition. A “line” is a sequence of characters terminated by a newline.
Look at this example. We’re using echo -n to create text without a trailing newline.
$ echo -n "hello" > no_newline.txt
$ cat no_newline.txt
hello% # (No newline, so your prompt appears right after the text)
$ wc -l no_newline.txt
0 no_newline.txt
Yep. Zero lines. Because there is no \n character. This will break scripts that rely on wc -l to be 100% accurate. The best practice is to always ensure your text files end with a newline. Most proper text editors do this for you. If you’re hacking together files with shell redirection, be aware of the issue.
Knowing What You’re Counting
Remember, wc is dumb, fast, and literal. It doesn’t care about your concept of a “word”; it just counts sequences of non-whitespace characters separated by whitespace.
$ echo "hello---world" | wc -w # 'hello---world' is one 'word'
1
$ echo "hello world" | wc -w # 'hello' and 'world' are two words
2
So, if you’re counting “words” in a CSV file or a log full of hyphenated data, take that number with a grain of salt. It’s a metric, not a gospel truth.
In the end, wc is a workhorse. It’s not glamorous, but you’ll use it constantly. It’s the quickest way to get a numerical read on your data before you dive into the messy business of actually changing it.