20.4 GIN: Generalized Inverted Index for Arrays and JSONB
Right, so you’ve met the B-tree, the reliable workhorse. It’s great for your standard WHERE id = 42 queries. But what happens when your data isn’t neat and tidy like that? What if you’re dealing with the glorious, chaotic mess of arrays, JSONB documents, or full-text search? You try to use a B-tree on an array column and it will just shrug and give up. That’s where our friend, the GIN (Generalized Inverted Index), comes in. Think of it less like a neat filing cabinet and more like the index at the back of a massive textbook. It doesn’t care about the whole row; it just knows which pages (rows) contain which words (elements).
The core idea is “inversion.” A B-tree points from a row to its values. A GIN index inverts that relationship: it points from a value (or a component of a value) back to all the rows that contain it. If you have an array column tags with a row containing ['postgres', 'database', 'awesome'], the GIN index will have an entry for 'postgres' that points to that row, an entry for 'database' that points to that row, and so on. This makes answering the question “find all rows where the array contains 'awesome'" incredibly fast. It just looks up 'awesome' in the index and immediately gets a list of candidate rows. No sequential scan required.
The Anatomy of a GIN Index
Under the hood, a GIN index is built on a “B-tree over a sorted list.” Wait, what? Let me explain. For each distinct value (or “key”) extracted from your complex data, PostgreSQL builds a sorted list of the row identifiers (TIDs) that contain that key. It then indexes these keys themselves, along with a pointer to their respective TID list, in a B-tree. So you have a B-tree that helps you quickly find the key you’re looking for, and then a super-efficient, compact list of rows for that key. This two-layer structure is why it’s so damn fast for ANY, @>, and <@ operations. It’s doing two quick lookups instead of one slow scan.
Indexing Arrays: Taming the List
This is the classic use case. Let’s say you’re building a blog and articles have tags. This is a nightmare for a B-tree but a walk in the park for GIN.
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
tags TEXT[]
);
-- This is the magic incantation
CREATE INDEX idx_articles_tags ON articles USING GIN (tags);
-- Now, find all articles tagged with 'postgresql'
-- This will use the index and be blazingly fast.
SELECT * FROM articles WHERE tags @> '{postgresql}';
-- Find articles with *both* 'postgresql' AND 'performance'
SELECT * FROM articles WHERE tags @> '{postgresql,performance}';
The @> operator means “contains.” So tags @> '{postgresql}' is asking “does the tags array contain the entire array on the right?” In this case, the array on the right has one element, so it’s effectively “contains this element.”
Conquering JSONB: Your Document Store’s Best Friend
JSONB is the reason GIN indexes are practically a default requirement for any non-trivial document store in Postgres. JSONB is great because it’s binary, parsed, and fast. But without an index, finding a value within a JSONB document still requires a full table scan. GIN to the rescue, again by inverting the structure.
CREATE TABLE products (
id SERIAL PRIMARY KEY,
metadata JSONB NOT NULL
);
-- The default GIN operator class for jsonb is perfect for most things.
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
-- Find all products where metadata has a 'brand' key with the value 'Acme'
SELECT * FROM products WHERE metadata @> '{"brand": "Acme"}';
-- The query planner is smart enough to use the index for this too.
SELECT * FROM products WHERE metadata -> 'tags' ? 'electronics';
Here’s the crucial bit: the default GIN operator class for jsonb indexes every key and value in the document. It creates index entries for 'brand', 'Acme', 'tags', 'electronics', etc. This is why the @> (contains) operator works so well—it can check the index for the existence of both the key and its value.
The jsonb_path_ops Operator Class: When You Need a Scalpel
The default indexing is fantastic, but it can lead to a large index because it indexes everything. Sometimes, you know you’ll only ever be querying based on a specific key/value pair, like a product_id embedded deep within the JSON. For this, we have the brilliant jsonb_path_ops operator class.
-- A more focused index. Notice the different syntax.
CREATE INDEX idx_products_metadata_path_ops ON products USING GIN (metadata jsonb_path_ops);
-- This index ONLY supports the @> operator. The ? operator won't use it.
-- But it's typically smaller and faster for @> queries.
SELECT * FROM products WHERE metadata @> '{"product_id": "XYZ-123"}';
The jsonb_path_ops class doesn’t index every individual element. Instead, it hashes entire key/value paths (e.g., "product_id"->"XYZ-123") into a single index entry. This results in a much smaller index, and often faster lookups for the specific queries it supports. It’s a trade-off: flexibility for size and speed. Choose wisely.
The Gotchas: Fast Indexes, Slow Updates
Here’s the part the manual often buries. GIN indexes are not free. That beautiful inverted structure has a cost: write speed.
When you update a row, say, by adding a new tag to an array, the index must be updated for every single element in that new array. For a jsonb column, an update might require updating the index entries for dozens of keys. This can make INSERT, UPDATE, and DELETE operations significantly slower than with a B-tree. This isn’t a design flaw; it’s a fundamental trade-off. You’re exchanging write performance for blistering read performance on complex data.
The best practice? Don’t just slap a GIN index on everything. Be strategic. Add them to tables that are write-once, read-often (like our articles table) or where the read performance benefit for your common queries vastly outweighs the write penalty. And if you’re doing a bulk load, consider dropping the GIN indexes, loading the data, and then recreating them. Building them all at once is much faster than updating them incrementally for every row. You’re welcome.