31.4 cut: Extracting Columns by Delimiter or Byte Position
Right, let’s talk about cut. It’s the command you reach for when you have a nicely structured line of text—a config file, a CSV, the output of another command—and you just want to pull out a specific piece of it. It’s the digital equivalent of taking a scalpel to a log file. Simple concept, right? And it is. Until it isn’t. cut is one of those tools that will work perfectly 99% of the time and then fail in the most spectacularly confusing way the other 1%. I’m here to make sure you’re ready for that 1%.
The core idea is that cut slices up each line of input based on one of two criteria: a delimiter (like a comma or a tab) or a fixed byte position. It then outputs only the pieces you asked for.
The Delimiter Dance (-d and -f)
This is the most common way to use cut. You specify a delimiter character with -d and then the field(s) you want with -f.
Let’s say you have a file users.txt with contents like this:
alan.shearer:1001:/home/alan.shearer:/bin/bash
thierry.henry:1002:/home/thierry.henry:/bin/zsh
eric.cantona:1003:/home/eric.cantona:/usr/bin/fish
You want just the usernames. That’s the first field, colon-delimited.
cut -d':' -f1 users.txt
alan.shearer
thierry.henry
eric.cantona
Clean. Simple. Beautiful. Now, what if you want the username and the user ID? You can specify multiple fields as a comma-separated list.
cut -d':' -f1,2 users.txt
alan.shearer:1001
thierry.henry:1002
eric.cantona:1003
Notice it preserves the delimiter between the selected fields. You can also use ranges: -f2-4 for fields 2 through 4, or -f-3 for fields 1 to 3. This is genuinely useful.
Now, the first pitfall. What if your data doesn’t have a consistent number of fields? cut does not care. It will happily give you an empty string or, worse, just skip the line entirely if a field number doesn’t exist. It’s brutally literal. There’s no --please-be-flexible flag. This is why for messy, real-world data, awk is often a better choice, but cut is lighter and faster for well-behaved files.
Byte and Character Positioning (-b and -c)
Sometimes, your data isn’t delimited but is fixed-width. Maybe it’s the output of some ancient COBOL program or a specific log format. Here, you use -b for byte or -c for character positions.
Let’s take a classic: the output of ls -l.
ls -l | head -3
-rw-r--r-- 1 user staff 12345 Dec 5 10:30 report.pdf
drwxr-xr-x 7 user staff 224 Nov 28 14:17 Projects
-rwxr--r-- 1 user staff 101 Dec 1 09:15 script.sh
The permissions are bytes 1-10, the size is (roughly) bytes 48-55. To get just the size and filename:
ls -l | cut -c48-55,61-
12345 report.pdf
224 Projects
101 script.sh
Here’s the second, massive pitfall: the distinction between -b and -c. In 99.9% of situations on a modern UTF-8 system, they are the same. But -b is for bytes, and -c is for characters. If you have a multi-byte character (like an emoji, or an accented letter in certain encodings) in your text, using -b will slice right through the middle of it and output corrupt garbage. -c is supposed to handle this correctly. My advice? Just use -c for character-based slicing and never think about -b again unless you’re dealing with pure ASCII or you enjoy debugging encoding nightmares.
The Absurd Limitation (And How to Fix It)
Here’s the thing that drives me up a wall about cut: it only accepts a single character as a delimiter. One. You cannot use -d '::' or -d '\t+'. This is a design choice from the 1970s that we’re apparently stuck with. It’s absurd.
Need to split on a tab? You have to use shell quoting magic: -d$'\t' in bash. Need to split on multiple spaces? Tough luck. You can’t. This is the primary reason I often abandon cut for awk midway through a script. awk -F ' +' lets me use a full regex as a delimiter. It’s infinitely more powerful.
So, the best practice is simple: use cut for what it’s good at—quick, simple extractions from clean, columnar data with a single-character delimiter. The moment your parsing needs get more complex, graduate to awk. It’s like using a scalpel (cut) for a precise job versus a fully stocked surgical robot (awk). Both have their place, but you gotta know when the job has outgrown the tool.
# For a quick and dirty extraction of the 2nd field from a CSV
cut -d',' -f2 data.csv
# For anything more complex (like a CSV field that might contain a comma itself!)
awk -F',' '{print $2}' data.csv # Still not perfect for CSV, but you get the idea
Remember, cut is a sharp, simple tool. Respect its limitations, and it will serve you well. Ignore them, and it will cut you.