31.7 paste and join: Combining Files Side by Side and by Key

Right, so you’ve sorted your data, you’ve de-duplicated it, you’ve sliced and diced it. Now you’re left with two or more files, each a neat column of information, and you need to put them together. This is where paste and join come in. They are the dynamic duo of horizontal file combination, but they have wildly different personalities and use cases. One is a simple, no-fuss bricklayer; the other is a finicky, key-obsessed database administrator.

31.6 wc: Counting Lines, Words, and Bytes

Right, wc. The name stands for “word count,” which is a bit of a lie because it’s so much more useful than that. It’s the tool you reach for when you need to ask a file the most basic, fundamental questions: “How big are you? How many lines do you have? What’s the deal here?” It’s the digital equivalent of picking up a box and giving it a shake. Let’s start with the simplest, most common invocation. You run wc and give it a file. It gives you back four numbers.

31.5 tr: Translating and Deleting Characters

Right, let’s talk about tr. This is one of those tools that seems almost comically simple until you realize it’s the duct tape of the text manipulation world. It doesn’t read files. It doesn’t do regular expressions. It just does one thing: it translates or deletes characters from its standard input. And it does that one thing blindingly fast and with a ruthless, single-minded efficiency that more powerful tools like sed can sometimes envy for simple tasks.

31.4 cut: Extracting Columns by Delimiter or Byte Position

Right, let’s talk about cut. It’s the command you reach for when you have a nicely structured line of text—a config file, a CSV, the output of another command—and you just want to pull out a specific piece of it. It’s the digital equivalent of taking a scalpel to a log file. Simple concept, right? And it is. Until it isn’t. cut is one of those tools that will work perfectly 99% of the time and then fail in the most spectacularly confusing way the other 1%. I’m here to make sure you’re ready for that 1%.

31.3 uniq: Removing Duplicate Lines (-c, -d, -u)

Right, uniq. The name is a bit of a lie, and that’s the first thing you need to get over your head. It doesn’t magically find all unique lines in a file. No, no. Its job is far more specific, and frankly, a little bit dumb: it only removes adjacent duplicate lines. If you don’t sort your data first, uniq is about as useful as a screen door on a submarine.

31.2 sort -k and -t: Sorting by Column and Field Delimiter

Right, so sort by itself is fine for a quick-and-dirty alphabetical sort of a file. But let’s be honest, your data is rarely that polite. It’s usually in columns, like some kind of data spreadsheet that got lost and ended up in the terminal. This is where sort graduates from a simple tool to a data-wrangling ninja, using the -k (key) and -t (delimiter) options. The basic idea is simple: instead of sorting the entire line, you tell sort to look at a specific part of each line, a specific column or “field.” But as with all things in the shell, the devil is in the details, and those details will bite you if you’re not careful.

31.1 sort: Alphabetical, Numeric, and Reverse Sorting

Right, let’s talk about sort. It’s one of those commands you’ll use so often it becomes a reflex, but it’s also deceptively powerful. Most people get to sort file.txt and call it a day, but that’s like using a sports car to drive to the mailbox and back. We’re going to open the garage door and take this thing for a proper spin. By default, sort does what you’d expect: it reads lines of text and sorts them in ascending order. But here’s the first “gotcha” that trips up everyone, including me on a bad day: it uses the locale’s collating order. This means sort file.txt on your machine and my machine might give slightly different results if we’re in different countries. It’s generally alphabetical, but it knows that in Spanish, for example, “ñ” should sort after “n”. For 99% of what you do, this is fine, but just know the ghost of internationalization is in the machine.

— joke —

...