27.7 Language Configurations and Custom Dictionaries

Right, so you’ve got your tsvectors and tsquerys working. You’re feeling pretty good. You search for ‘database’ and it finds ‘database’. A modern miracle. But then you try to search for ‘running’ and it doesn’t find ‘ran’. You search for ‘colour’ and it doesn’t find ‘color’. Suddenly, this powerful tool feels a bit… stupid. This is where we stop just using the tool and start configuring it. This is the layer that separates a toy from a professional-grade search setup.

27.6 Highlighting Matches with ts_headline()

Right, so you’ve got your search set up. Your tsvector is primed, your tsquery is sharp, and your GIN index is making it all gloriously fast. You get a ranked list of results back. Fantastic. But now what? You present the user with a list of document titles? That’s like a chef describing a beautiful dish by just listing the ingredients. We need to show the user why this document matched. We need to give them a glimpse. We need ts_headline().

27.5 Ranking Results: ts_rank() and ts_rank_cd()

Right, so you’ve got your search results. They’re… correct. That’s the boring part. The magic trick, the part that makes users think your app is brilliant instead of just accurate, is putting the best results at the top. That’s where ts_rank() and its slightly more pedantic cousin ts_rank_cd() come in. They don’t just find the needles in the haystack; they tell you which needles are shiniest. These functions essentially ask: “How well does this tsvector match that tsquery?” and return a number. Higher number, better match. It’s a deceptively simple concept that hides a surprising amount of nuance.

27.4 GIN Indexes for Full-Text Search Performance

Alright, let’s talk about making this full-text search thing actually fast. You’ve got your tsvector columns, you’ve crafted your elegant tsquery strings, and your WHERE clause is a thing of beauty. But then you run it on your table with a few million rows and it feels like watching paint dry. That’s because PostgreSQL is doing a sequential scan, reading every single row, converting the text to a tsvector, and then checking if it matches. It’s thorough, I’ll give it that, but it’s also a performance nightmare.

27.3 to_tsvector() and to_tsquery(): Converting Text and Queries

Right, let’s get your words ready for a fight. You’ve got a pile of text in a TEXT column and a user typing queries into a search box. Throwing a LIKE '%query%' at that problem is like bringing a butter knife to a gunfight. It’s slow, it’s clumsy, and it will utterly fail the moment you need to find “run” but the text says “running”. PostgreSQL’s full-text search is your special ops unit. It understands language. It’s fast. But first, we have to speak its language. That means converting your messy human text and your user’s even messier queries into two structured, machine-readable types: tsvector and tsquery.

27.2 tsquery: Search Queries with AND, OR, NOT, and Phrase Operators

Right, so you’ve got your documents neatly packaged into tsvector. Now it’s time to actually ask for something back. That’s where tsquery comes in. Think of tsvector as the pre-processed, indexed filing cabinet, and tsquery as the set of instructions you hand to your very literal, slightly pedantic intern on how to search it. A tsquery is a representation of the Boolean operators & (AND), | (OR), and ! (NOT), along with phrase search. It’s what you use within the @@ operator to see if it matches a tsvector.

27.1 tsvector: Normalized Lexeme Vectors

Alright, let’s get our hands dirty with tsvector. This is the foundational data type for full-text search in PostgreSQL, and if you don’t understand it, the rest of this chapter will feel like wizardry performed by a very unhelpful wizard. Think of a tsvector not as a block of text, but as a normalized, optimized index of the words in that text. It’s the “after” picture in a makeover montage.

20.6 BRIN: Block Range Indexes for Naturally Ordered Large Tables

Right, BRIN indexes. Let’s talk about the index you use when you’ve given up on being a hero. You’ve got a table so monstrously large that the mere thought of building a standard B-tree index on it makes your wallet physically cringe and your maintenance window manager burst into tears. This is where BRIN, or Block Range INdexes, saunters in. It’s the index of last resort, and when it works, it feels like absolute black magic.

20.5 SP-GiST: Space-Partitioned Trees for Non-Overlapping Data

Now, if B-trees are your meticulously organized filing cabinet and GiST is your flexible, multi-purpose Swiss Army knife, then SP-GiST is your specialist architect for building highly efficient, custom storage structures for data that doesn’t play nicely in a uniform grid. The name is a mouthful—Space-Partitioned Generalized Search Tree—but the concept is brilliantly simple once you peel it back. It’s a framework for building balanced, non-overlapping tree structures. Think of it as PostgreSQL handing you the tools to design your own index for your weird data, saying, “You clearly have a specific problem. Here’s the engine; you define the rules.”

20.4 GIN: Generalized Inverted Index for Arrays and JSONB

Right, so you’ve met the B-tree, the reliable workhorse. It’s great for your standard WHERE id = 42 queries. But what happens when your data isn’t neat and tidy like that? What if you’re dealing with the glorious, chaotic mess of arrays, JSONB documents, or full-text search? You try to use a B-tree on an array column and it will just shrug and give up. That’s where our friend, the GIN (Generalized Inverted Index), comes in. Think of it less like a neat filing cabinet and more like the index at the back of a massive textbook. It doesn’t care about the whole row; it just knows which pages (rows) contain which words (elements).

20.3 GiST: Generalized Search Tree for Geometric and Full-Text Data

Right, so you’ve met B-trees. They’re the reliable, sensible sedan of the index world. They’re fantastic for one-dimensional data where we can ask clear-cut questions like “is this equal to,” “less than,” or “greater than?” But what happens when your data is… weirder? What if you’re dealing with shapes on a map, full-text search, or arrays that might overlap? You can’t just ask if one polygon is “less than” another polygon. That’s nonsense. This is where the Generalized Search Tree, or GiST, comes in. It’s the Swiss Army knife of PostgreSQL indexes, and it’s brilliantly clever.

20.2 Hash: Equality-Only Lookups and When to Use Them

Alright, let’s talk about the Hash index. It’s the index you use when you absolutely, positively must find a row based on a single, exact value, and you don’t care about anything else. No ranges, no sorting, no partial matches. Just “is this thing equal to that thing?” It’s the philosophical purist of the index world, and it’s brutally efficient at its one job. The B-tree index is the swiss army knife you use for everything; the Hash index is the scalpel you use for one very specific, very precise incision. It works exactly how you’d hope: it takes the value you’re indexing, chucks it into a hash function (which is a fancy way of saying “a function that jumbles it into a predictable, fixed-size jumble of bits”), and uses that resulting hash code to find the data’s location directly. Think of it like a massive library where every book has a unique Dewey Decimal code. You don’t browse the shelves; you just compute the code and go straight to the exact shelf. This makes lookups for equality (=) blazingly fast, theoretically in constant time, O(1).

20.1 B-tree: The Default and When It Excels

Right, let’s talk about the B-tree index. It’s the default for a reason. When you create an index in PostgreSQL and don’t specify a type, this is what you get. It’s the workhorse, the reliable old pickup truck of the indexing world. It’s not always the flashiest or the fastest for every single job, but it will get you and your data where you need to go, predictably, 99% of the time. It’s the index you’ll use most, so let’s get to know it intimately.

— joke —

...