26.3 GIN Indexes on JSONB: jsonb_ops vs jsonb_path_ops
Right, let’s talk about making your JSONB queries not just work, but scream. You’ve loaded up a table with a mountain of JSON documents, and you’re running WHERE data @> '{"status": "published"}'. It’s fast at first, but as your data grows, it starts to feel like wading through molasses. You’ve heard about GIN indexes, the workhorse for JSONB, but then you’re hit with a choice: jsonb_ops or jsonb_path_ops? It’s not just academic; picking the wrong one is like showing up to a Formula 1 race with a go-kart engine.
The core idea of a GIN (Generalized Inverted Index) index is brilliantly simple: it creates a massive, detailed map of every single key and value inside your JSONB documents. When you ask “show me all documents where data->'user'->>'name' is ‘Alice’”, the index doesn’t have to scan every document. It just looks up ‘Alice’ in its map, sees which documents contain her, and returns them instantly. Magic.
The Default: jsonb_ops (The Kitchen Sink)
When you create a GIN index the standard way, you get jsonb_ops. This is the “index everything” option.
CREATE INDEX idx_gin_data ON my_table USING GIN (data);
This single line of code creates separate index entries for every key, every value, and every key-value combination in your JSONB column. It’s incredibly versatile. It can turbocharge a huge range of queries:
Existence: data ? 'email' (Does the JSON have an ’email’ key?)
Containment: data @> '{"user": {"name": "Alice"}}' (The most powerful and common JSONB query)
Key-existence: data ?| ARRAY['email', 'phone'] (Does it have email OR phone?)
The downside? Size. This index is a behemoth. It’s indexing all the things, so it takes up a lot of space. For most applications, this is a perfectly acceptable trade-off. The flexibility is worth the disk space. It’s the safe, sensible default.
The Specialist: jsonb_path_ops (The Scalpel)
Now, meet its leaner, more focused cousin: jsonb_path_ops.
CREATE INDEX idx_gin_path_ops ON my_table USING GIN (data jsonb_path_ops);
See the difference? We’re explicitly telling it to use the jsonb_path_ops operator class. This index does not index every individual key and value. Instead, it only indexes the hashed value of each key-value path within the document.
Think of it like this: a jsonb_ops index might have separate entries for the key user, the key name, and the value Alice. A jsonb_path_ops index takes the entire path user.name and the value Alice, hashes them together into a single token, and indexes that.
This has two massive consequences:
- It’s much, much smaller. By hashing the paths, it stores far less information, often resulting in an index that’s 2-4x smaller than its
jsonb_opsequivalent. - It’s far less flexible. It only supports the containment operator (
@>). That’s it. Your existence (?) and key-existence (?|) queries? They get no benefit from this index and will fall back to a full table scan.
So, Which One Do You Pick?
This isn’t a hard choice; it’s a trade-off. Ask yourself this:
- Do the vast majority of your performance-critical queries use
@>? If you’re mainly filtering based on nested structures (e.g., “find all published posts by user 123 tagged ‘postgres’”), thenjsonb_path_opsis your undisputed champion. The performance is often even faster thanjsonb_opsfor these queries, and the space savings are a huge win. - Do you need to use
?,?|, or?&operators frequently? If you have queries that just check for the existence of a key, regardless of its value, you must usejsonb_ops. The path ops index is useless here.
Here’s a pro tip: you can have both. If you have one set of queries that needs the speed of @> with jsonb_path_ops and another that needs the flexibility of jsonb_ops, just create two indexes. Let the query planner pick the right tool for the job. Just remember, indexes aren’t free; they slow down writes, so don’t go crazy.
The Hidden Quirk: Numbers and Booleans
Pay very close attention here, because this trips everyone up. JSONB has the charming, mildly infuriating property of preserving numeric types. The number 42 (an integer) is a different underlying value than the number 42.0 (a float).
The jsonb_path_ops index hashes the entire path, including the exact type of the value. This means your query must match the type exactly.
-- Our data: {"count": 42} -- (a JSON integer)
-- This will use the jsonb_path_ops index:
SELECT * FROM my_table WHERE data @> '{"count": 42}'; -- integer
-- This will NOT use the jsonb_path_ops index:
SELECT * FROM my_table WHERE data @> '{"count": 42.0}'; -- float
The jsonb_ops index, with its separate entries for values, is often more forgiving in practice here. This is a classic case of the designers making a choice for correctness over convenience. It’s not questionable, but it is a sharp edge you will cut yourself on. Always ensure your application is sending the correct numeric type in its queries.