Search | mikePietsch.com

32.7 Combining find and grep for Code Search

Right, so you’ve graduated from just finding files to actually searching inside them. This is where you stop being a mere user and start feeling like a digital archaeologist, sifting through layers of ancient code for that one, cursed variable name. The classic combo for this is find and grep. It’s the peanut butter and jelly of the command line: two simple tools that, when combined, become an unstoppable force for good (or for finding out who wrote that terrible function).

32.6 which, whereis, and type: Finding Executables

Before we dive into the big, complex tools, let’s start with the easy wins. You’re at the command line, you type a command like nmap, and bash hits you with a “command not found.” Or worse, it runs a version of the command, but it’s the wrong one. Your first instinct shouldn’t be to break out the heavy artillery like find; it should be to ask a simple question: “Where the heck is this thing?”

32.5 locate and updatedb: Fast Filename Search via Database

Right, so you’ve just learned about find, and you’re thinking, “This is powerful, but it feels like searching an entire city block for your car keys by walking every inch of it.” You’re not wrong. That’s where locate comes in. It’s the speed demon of the file-searching world. Instead of crawling through your filesystem in real-time, it consults a pre-built database. The result? It returns answers almost instantly. The trade-off, and there’s always a trade-off, is that its database isn’t live. It’s a snapshot, typically updated once a day by a cron job. This means if you created a file five minutes ago, locate probably won’t know it exists yet. It’s like having a brilliant friend with a photographic memory, but they only take one picture per day. For finding ancient config files you forgot about or that source code from last month, it’s unbeatable.

32.4 xargs -P: Parallel Execution for Bulk Operations

Alright, let’s talk about xargs -P. This is where xargs stops being a helpful librarian fetching your books one at a time and becomes a manic circus master, flinging commands at your CPU cores as fast as they can possibly juggle them. It’s the single most effective way to turn a slow, grinding, sequential process into a fire-breathing speed demon. But, as with most fire-breathing things, you need to know how to handle it or you’ll get burned.

32.3 xargs: Building Command Lines from Standard Input

Alright, let’s talk about xargs. You’ve probably just come from the find command, and you’re rightfully excited about all the files you can now locate. But then you hit a wall: you want to do something to those files. You try something brilliantly obvious: find . -name "*.txt" | rm And… nothing. Or worse, an error. It feels like the universe is gaslighting you. Why? Because pipes (|) pass standard input (text), but most commands, like rm, aren’t built to accept their arguments that way. They expect them as command-line arguments. This is the chasm that xargs was born to bridge. Its job is to take that stream of text from standard input and use it to build and execute command lines. It’s the adapter that makes find and friends actually useful.

32.2 find -exec and -execdir: Running Commands on Results

Alright, let’s get our hands dirty. You’ve used find to get a list of files. Great. But now you want to do something with them. Your first thought might be to pipe that list into another command. Don’t. Please, for the love of all that is holy, just don’t. Filenames can contain spaces, newlines, and other characters that will make most command-line tools have a complete meltdown. This is where -exec and -execdir come in. They are find’s built-in, robust, “I’ve got this” mechanism for handing results to another command. They handle all the weird characters correctly, because find talks directly to the command, not through a shell that might misinterpret things.

32.1 find: Searching by Name, Type, Size, Time, and Permissions

Right, let’s talk about find. This is the command you reach for when ls just won’t cut it. It’s the swiss army knife of file searching, capable of slicing through your filesystem based on almost any attribute you can think of: name, type, size, when you last cried over your code (modification time), and who’s allowed to see it (permissions). It’s powerful, it’s ubiquitous, and its syntax is a historical artifact that will make you question the life choices of early Unix developers. Don’t worry, we’ll get through it together.

32. File Searching: find, xargs, and locate

33.6 Algolia DocSearch: Hosted Search for Documentation Sites

Right, Algolia DocSearch. Let’s get this out of the way: this is the “it just works” option, provided you qualify for the free tier. It’s a hosted service Algolia generously offers for open-source and documentation sites. Think of it as the concierge of search: they handle the heavy lifting of indexing, updating, and hosting the search engine itself. You just plug in the UI. The magic—and the potential pitfall—happens on their servers. A custom crawler (which they run) visits your site according to a schedule you configure, sucks up all the content, and pushes it to an Algolia index. Your JavaScript then queries that index directly. This is fantastic because it means no build-time slowdowns and your search index is always up-to-date with your latest published content. The trade-off? You surrender control. You can’t index unreleased content, and your search is wholly dependent on their crawler and infrastructure.

33.5 Pagefind: Automated Static Search from Build Output

Alright, let’s talk about Pagefind. If you’ve just wrestled Lunr.js or Fuse.js into submission, you’re going to look at Pagefind and wonder if it’s a prank. It feels like cheating. Why? Because it does the one thing they don’t: it happens after you build your site. While the others are client-side libraries you have to manually feed a JSON index you built during your Hugo build, Pagefind takes the opposite approach. It’s a post-processing step. You let Hugo do its thing, spit out a beautifully complete set of static HTML files, and then Pagefind comes along, reads your entire public directory, and builds a search index from the actual final output. This is its killer feature. It means it indexes exactly what your users see, CSS classes and all. No more wrestling with .Plain or .Summary in your index template to avoid dumping Markdown cruft into your searchable content. It just reads the HTML.

33.4 Fuse.js: Fuzzy Search for Hugo

Alright, let’s talk about Fuse.js. If Lunr.js is the meticulous librarian who needs everything indexed and catalogued just so, Fuse.js is the brilliant, slightly scatterbrained friend who can find your lost keys by vaguely describing the general area you might have left them in. It’s a fuzzy search library, and “fuzzy” is the operative word here. It doesn’t need a pre-built index; it just takes your content and a search term and, through a kind of textual witchcraft, finds matches even when the words are misspelled, out of order, or just kinda-sorta similar.

33.3 Lunr.js Integration: Indexing and Querying

Right, so you’ve decided you want search on your Hugo site. Good for you. It’s a fantastic feature, but let’s be clear: Hugo doesn’t give you a search button to click. You have to build the engine yourself. It’s like buying a fancy car frame and being handed a box of engine parts. Let’s get our hands greasy. The first and most “classic” way to do this is with Lunr.js. It’s a pure JavaScript, in-browser, full-text search library. The big idea is simple but powerful: during your site build, Hugo generates a massive JSON file containing the text of every page you want to search. Then, when a user visits your site, their browser downloads this JSON file, Lunr.js loads it, builds a search index right there in their browser, and then queries that index. No server required. Neat, huh?

33.2 Building a Search Index with a JSON Output Format

Right, so you’ve decided to build a search function for your Hugo site. Good for you. It’s the single best feature you can add to a static site that’s grown beyond a handful of pages. But Hugo, in its infinite wisdom, doesn’t ship with a built-in search. It gives you the ingredients—your content—and expects you to bake the cake yourself. The first and most crucial step is creating the index, a JSON file that our search libraries can actually understand. Think of it as the map to your treasure trove of content. No map, no treasure.

33.1 Client-Side Search Architecture: JSON Index + JavaScript

Alright, let’s get our hands dirty. Client-side search in Hugo is a bit of a magic trick. We’re going to pre-build a search index—a highly structured JSON file that’s basically a map of every word on your site and where it lives—and then we’ll teach a JavaScript library how to read that map to find what you’re looking for, all without ever bothering a server. It’s fast, it’s static, and it’s surprisingly powerful.