27.7 Language Configurations and Custom Dictionaries

Right, so you’ve got your tsvectors and tsquerys working. You’re feeling pretty good. You search for ‘database’ and it finds ‘database’. A modern miracle. But then you try to search for ‘running’ and it doesn’t find ‘ran’. You search for ‘colour’ and it doesn’t find ‘color’. Suddenly, this powerful tool feels a bit… stupid. This is where we stop just using the tool and start configuring it. This is the layer that separates a toy from a professional-grade search setup.

27.6 Highlighting Matches with ts_headline()

Right, so you’ve got your search set up. Your tsvector is primed, your tsquery is sharp, and your GIN index is making it all gloriously fast. You get a ranked list of results back. Fantastic. But now what? You present the user with a list of document titles? That’s like a chef describing a beautiful dish by just listing the ingredients. We need to show the user why this document matched. We need to give them a glimpse. We need ts_headline().

27.5 Ranking Results: ts_rank() and ts_rank_cd()

Right, so you’ve got your search results. They’re… correct. That’s the boring part. The magic trick, the part that makes users think your app is brilliant instead of just accurate, is putting the best results at the top. That’s where ts_rank() and its slightly more pedantic cousin ts_rank_cd() come in. They don’t just find the needles in the haystack; they tell you which needles are shiniest. These functions essentially ask: “How well does this tsvector match that tsquery?” and return a number. Higher number, better match. It’s a deceptively simple concept that hides a surprising amount of nuance.

27.4 GIN Indexes for Full-Text Search Performance

Alright, let’s talk about making this full-text search thing actually fast. You’ve got your tsvector columns, you’ve crafted your elegant tsquery strings, and your WHERE clause is a thing of beauty. But then you run it on your table with a few million rows and it feels like watching paint dry. That’s because PostgreSQL is doing a sequential scan, reading every single row, converting the text to a tsvector, and then checking if it matches. It’s thorough, I’ll give it that, but it’s also a performance nightmare.

27.3 to_tsvector() and to_tsquery(): Converting Text and Queries

Right, let’s get your words ready for a fight. You’ve got a pile of text in a TEXT column and a user typing queries into a search box. Throwing a LIKE '%query%' at that problem is like bringing a butter knife to a gunfight. It’s slow, it’s clumsy, and it will utterly fail the moment you need to find “run” but the text says “running”. PostgreSQL’s full-text search is your special ops unit. It understands language. It’s fast. But first, we have to speak its language. That means converting your messy human text and your user’s even messier queries into two structured, machine-readable types: tsvector and tsquery.

27.2 tsquery: Search Queries with AND, OR, NOT, and Phrase Operators

Right, so you’ve got your documents neatly packaged into tsvector. Now it’s time to actually ask for something back. That’s where tsquery comes in. Think of tsvector as the pre-processed, indexed filing cabinet, and tsquery as the set of instructions you hand to your very literal, slightly pedantic intern on how to search it. A tsquery is a representation of the Boolean operators & (AND), | (OR), and ! (NOT), along with phrase search. It’s what you use within the @@ operator to see if it matches a tsvector.

27.1 tsvector: Normalized Lexeme Vectors

Alright, let’s get our hands dirty with tsvector. This is the foundational data type for full-text search in PostgreSQL, and if you don’t understand it, the rest of this chapter will feel like wizardry performed by a very unhelpful wizard. Think of a tsvector not as a block of text, but as a normalized, optimized index of the words in that text. It’s the “after” picture in a makeover montage.

— joke —

...