27.6 Highlighting Matches with ts_headline()

Right, so you’ve got your search set up. Your tsvector is primed, your tsquery is sharp, and your GIN index is making it all gloriously fast. You get a ranked list of results back. Fantastic. But now what? You present the user with a list of document titles? That’s like a chef describing a beautiful dish by just listing the ingredients. We need to show the user why this document matched. We need to give them a glimpse. We need ts_headline().

Think of ts_headline() as your search result’s hype man. Its job is to grab the original text, find the bits that matched your query, and wrap them in something eye-catching, like bold tags. It’s the difference between “This document contains the word ‘postgres’” and “This document contains the word postgres”. Context is everything.

The Bare Minimum Example

Let’s start with the simplest possible usage. We’ll take a block of text and create a headline for the word ‘postgres’.

SELECT ts_headline(
    'PostgreSQL is a powerful, open source object-relational database system.',
    to_tsquery('postgres')
);

-- Result: <b>PostgreSQL</b> is a powerful, open source object-relational database system.

First thing you’ll notice: it found ‘PostgreSQL’ and highlighted it, even though our query was for ‘postgres’. That’s the magic of the search configuration (probably english) working in the background, normalizing the word to its root. The second thing: it wrapped it in <b> tags. That’s the default, and we’ll get to how to change that in a second.

Choosing Your Highlighting Style

The <b> tag is fine, I suppose, if you’re building a website in 1995. For modern applications, you’ll probably want something else. That’s what the StartSel and StopSel options are for.

SELECT ts_headline(
    'The quick brown fox jumped over the lazy dog.',
    to_tsquery('fox'),
    'StartSel=<mark>, StopSel=</mark>'
);

-- Result: The quick brown <mark>fox</mark> jumped over the lazy dog.

Much better. Now you can style that <mark> tag with CSS to have a nice yellow highlight. You can use any string here: ** for Markdown, [HIGHLIGHT] and [/HIGHLIGHT] for some bizarre XML-like thing you might have—you do you.

Controlling the Excerpt Length

By default, ts_headline() tries to return a short, relevant snippet, typically one sentence or a fragment. But you’re in control. The MaxWords, MinWords, and ShortWord parameters let you fine-tune the length of the output to prevent uselessly short headlines or absurdly long ones.

SELECT ts_headline(
    'This sentence is about cats. This next sentence is about dogs, specifically the word canine which is a synonym. This third sentence is about birds.',
    to_tsquery('canine | dog'),
    'MaxWords=10, MinWords=5, ShortWord=3'
);

-- Result: ...is about dogs, specifically the word <b>canine</b> which...

This tells PostgreSQL: “Give me a fragment. Make it no more than 10 words, no less than 5, and for heaven’s sake, don’t let the last word be shorter than 3 letters.” The ... ellipses are added automatically when it truncates text. This is crucial for preventing it from chopping a word in half and leaving you with grammatical nonsense.

The Pitfall of Stop Words

Here’s where things get a bit absurd, and you need to pay attention. Remember stop words? Those common words like ’the’, ‘is’, and ‘over’ that your search configuration ignores? Well, ts_headline() still has to find them in the original text to highlight them, even if the search itself ignores them.

Try this:

SELECT ts_headline(
    'The quick brown fox jumped over the lazy dog.',
    to_tsquery('the')
);

Go on, run it. On a standard English configuration, the query for ’the’ will match nothing because it’s a stop word. So ts_headline() has nothing to highlight. It will just return the original text, unchanged. This is a common point of confusion. The function highlights what the query matches, not what the user typed. If the query normalizes to nothing, there’s nothing to highlight.

Why You Must Headline the Original Text

This is the most important best practice, so listen up. Your tsvector column is a pre-processed, tokenized, stemmed mess of lexemes. It is not the original text. You cannot run ts_headline() on your tsvector column. I’ve seen people try. The result is a garbled, meaningless string of highlighted stems.

You must store the original, unadulterated text in a separate column and pass that column to the ts_headline() function.

Wrong:

-- Assume my_table has a tsvector column 'search_vector'
SELECT ts_headline(
    search_vector, -- NO! This is the tsvector!
    to_tsquery('postgres')
) FROM my_table;

Right:

-- Assume my_table has a text column 'body_text' AND a tsvector column 'search_vector'
SELECT ts_headline(
    body_text, -- YES! This is the original text.
    to_tsquery('postgres')
) FROM my_table WHERE search_vector @@ to_tsquery('postgres');

The workflow is always: 1) Use the tsvector column and GIN index to find the matching rows quickly. 2) Use the ts_headline() function on the original text column of those rows to display the results. It’s a two-step process, and conflating the two is a one-way ticket to confusing, broken search results. Don’t do it.