Right, let’s talk about tr. This is one of those tools that seems almost comically simple until you realize it’s the duct tape of the text manipulation world. It doesn’t read files. It doesn’t do regular expressions. It just does one thing: it translates or deletes characters from its standard input. And it does that one thing blindingly fast and with a ruthless, single-minded efficiency that more powerful tools like sed can sometimes envy for simple tasks.

The basic syntax is tr [options] SET1 [SET2]. It reads stdin, replaces any character from SET1 with the corresponding character from SET2, and prints the result to stdout. The magic, and the frustration, is in how you define those sets.

The Basic Translation Act

At its heart, you’re doing a simple character mapping. Think of it like a decoder ring.

echo "hello" | tr 'el' 'ip'

This outputs hippo. Why? Because every e in the input is replaced with an i, and every l is replaced with a p. It’s not replacing the string “el”, it’s individually translating each character in the first set. This is the most common tr gotcha. It’s not substring replacement.

You can also use ranges, which is where tr starts to sing.

echo "Password123" | tr 'a-z' 'A-Z'

Output: PASSWORD123. This is the classic, and fastest, way to convert text to uppercase. It’s literally translating any character in the range a-z to the corresponding character in the range A-Z.

Squeezing and Deleting

This is where tr moves from handy to indispensable. The -s option (“squeeze”) replaces a sequence of repeated characters with a single one. The -d option (“delete”) simply removes characters.

echo "I    need    more     space" | tr -s ' '

Output: I need more space. It squeezed all those multiple spaces down to singles. Incredibly useful for cleaning up messy text output.

Now, let’s delete something. Say you want to strip all non-digit characters from a string.

echo "My phone number is (555) 123-4567" | tr -d '[:alpha:]() -'

Output: 5551234567. We used a character class ([:alpha:] for all letters) and then literally listed the other characters we wanted to nuke: parentheses, space, and dash.

Character Classes: Your New Best Friend

This is the pro move. Instead of listing out every possible letter, number, or punctuation mark, you can use predefined character classes. They’re like little shortcuts for common sets.

  • [:alnum:] : Alphanumeric characters (a-z, A-Z, 0-9)
  • [:alpha:] : Alphabetic characters (a-z, A-Z)
  • [:digit:] : Digits (0-9)
  • [:lower:] : Lowercase letters
  • [:upper:] : Uppercase letters
  • [:space:] : All whitespace characters (space, tab, newline etc.)

Want to convert a file to lowercase? Easy.

tr '[:upper:]' '[:lower:]' < myfile.txt

The Complementation Trick

This is the cleverest and most confusing part. The -c flag means “complement.” This tells tr to operate on the set of characters that are NOT in SET1. This is best explained with the classic example of stripping everything but digits.

echo "Price: $123.99" | tr -cd '[:digit:]'

Output: 12399. Let’s break it down: -c '[:digit:]' means “select every character that is not a digit”. Then the -d says “delete these selected characters”. So, we’re saying “delete all non-digit characters,” leaving only the digits behind. Mind-bending the first time you see it, but utterly brilliant.

The Rough Edges and Pitfalls

tr is old. It’s from a time when everyone agreed ASCII was a great idea and the world was simple. This causes problems.

  1. No Regex: You can’t use .* or \d. You list characters. That’s it. If you need that power, you need sed or awk.
  2. Portability Gremlins: The behavior of character classes, especially with non-ASCII characters, can vary wildly between systems. Don’t trust it for anything international unless you’ve tested it.
  3. The Set Size Mismatch: If SET1 is longer than SET2, the final character of SET2 is repeated to match the length of SET1. This can lead to bizarre, silent errors. This is a design choice I absolutely loathe. For example, tr 'abcde' '12' is interpreted as tr 'abcde' '12222'. Always double-check your set lengths. For safety, often use -d instead of trying to “translate” characters to nothing.

So, when do you use it? For simple, bulk character operations on known input—like making text uppercase, squeezing spaces, or stripping out a known set of punctuation—it’s unbeatable. It’s the scalpel for character-level surgery, not the swiss army knife. Use it accordingly.