20.5 SP-GiST: Space-Partitioned Trees for Non-Overlapping Data

Now, if B-trees are your meticulously organized filing cabinet and GiST is your flexible, multi-purpose Swiss Army knife, then SP-GiST is your specialist architect for building highly efficient, custom storage structures for data that doesn’t play nicely in a uniform grid. The name is a mouthful—Space-Partitioned Generalized Search Tree—but the concept is brilliantly simple once you peel it back. It’s a framework for building balanced, non-overlapping tree structures. Think of it as PostgreSQL handing you the tools to design your own index for your weird data, saying, “You clearly have a specific problem. Here’s the engine; you define the rules.”

The magic, and the complexity, of SP-GiST lies in its partitioning strategy. A B-tree can only split a page one way: by a key value. SP-GiST is far more creative. It allows you to define how a key space is partitioned. This makes it the go-to choice for data types that are inherently non-linear or hierarchical, like geospatial coordinates, phone routing tables, or anything that fits on a “this goes here, but that definitely goes over there” principle.

How It Partitions Your Data’s Reality

The core idea is that each inner (non-leaf) node in an SP-GiST index defines a set of rules for routing incoming values to its children. Unlike a B-tree, these rules aren’t just “less than or greater than.” They can be anything you can define in code. The built-in implementations give us the best view into this.

For example, one of the most common uses is for point data using a quadtree partitioning. The rule is: for a given node, divide its space into four equal quadrants (NW, NE, SW, SE). An incoming point is routed to the one child node responsible for its quadrant. It’s a recursive subdivision of space until each leaf page holds a manageable number of points.

Another built-in method is for text (like varchar) using a radix tree (or trie). Here, the partitioning is done by the text’s prefix. The inner node might route all keys starting with ‘A’ to one child, ‘B’ to another, and so on. It doesn’t index the entire value at once; it breaks the problem down by successive characters.

This is the “generalized” part. PostgreSQL provides the engine for managing the tree structure, balancing, and locking. You (or the data type’s author) provide the functions that answer: “For a given key, which child node should it go to?” and “For a given query predicate, which child nodes need to be searched?”

The Built-In Superstars: When to Use SP-GiST

You’re most likely to use SP-GiST through its built-in implementations. Let’s be real: you’re probably not writing your own. And that’s fine.

For geometric types (point, box, circle, etc.), SP-GiST is often a fantastic choice. It supports all the usual operators (<->, &&, @>, <@) and is frequently more efficient than a GiST index for these types, especially for “find nearest neighbor” queries (<->). It’s the king of “find all the coffee shops within this block.”

-- Let's find the nearest pub to my current, sadly pub-less location.
CREATE TABLE locations (
    id serial PRIMARY KEY,
    name text,
    coords point
);

CREATE INDEX idx_locations_coords_spgist ON locations USING SPGIST (coords);

-- This is the query that makes it sing.
SELECT name, coords <-> point '(0,0)' AS distance
FROM locations
WHERE name ILIKE '%pub%'
ORDER BY coords <-> point '(0,0)'
LIMIT 5;

The index can quickly traverse the partitioned space to zero in on the closest points without a full table scan.

For text pattern matching (text, varchar), the SP-GiST implementation provides a powerful alternative to standard B-trees for LIKE and ~ (regex) queries. Its radix tree approach is particularly good for prefix searches.

CREATE INDEX idx_users_name_spgist ON users USING SPGIST (name text_ops);

-- This will use the index efficiently for the prefix match.
SELECT * FROM users WHERE name LIKE 'Smith%';

The Rough Edges and When to Think Twice

It’s not all roses and efficient nearest-neighbor searches. SP-GiST has its quirks.

First, it does not support UNIQUE constraints. The partitioning rules are fantastic for search but don’t guarantee the global ordering required to easily enforce uniqueness. If you need UNIQUE, you’re back to a B-tree.

Second, while it’s brilliant for non-overlapping data, that’s also its limitation. The partitions cannot overlap. This is a fundamental rule of the structure. This makes it ideal for points (a point exists in exactly one quadrant) but means it can’t be the one-stop shop for all data types like GiST can. You can’t use the built-in point_ops to effectively index a range type, for instance; that’s a job for GiST.

Third, choose your ops class wisely. For text, you have a choice between text_ops (for pattern matching) and array_ops (if you’re, for some reason, indexing a text array). Picking the wrong one will render your index useless for your queries. Always check the documentation for which operators are supported by which operator class.

Best practice? For point data and nearest-neighbor searches, start with SP-GiST. It’s often the winner. For text prefix patterns, benchmark it against a standard B-tree index; if your data has long common prefixes, SP-GiST’s radix tree can be more efficient. And finally, remember it’s a specialist tool. Don’t try to force it onto a problem that screams for a B-tree (equality, ranges, uniqueness) or a GIN index (for array containment). Use the brilliant, quirky architect for the custom-built home your weird data deserves.