26.4 JSON Path Queries with jsonpath (PostgreSQL 12+)
Alright, let’s talk about jsonpath. You’ve probably been using the -> and ->> operators to navigate your JSONB data, and they’re great for simple, straightforward paths. But what happens when your data structure gets more complex, or you need to do something more powerful, like filtering for elements within an array based on a condition? You start writing these monstrous, nested SQL expressions that are a pain to write and a nightmare to read.
This is where jsonpath swoops in, looking dramatically slow-motion in a leather jacket. Introduced in PostgreSQL 12, it’s a dedicated language for querying JSON data. Think of it as XPath, but for JSON, and thankfully, a bit less arcane. The real magic is that PostgreSQL can use its advanced indexing capabilities with these expressions, which is a massive win for performance. We’re not just making pretty queries; we’re making fast ones.
The Core Syntax: It’s All About the $
The jsonpath expression is a string, and you use it with the @@ operator (for a boolean match) or the jsonb_path_query[_first] functions (to get actual values). Every jsonpath expression starts with a $ variable, which represents the JSONB value you’re evaluating—usually the column you’re querying.
Let’s say we have a table events with a log column (JSONB) storing, well, event logs.
CREATE TABLE events (
id serial PRIMARY KEY,
log jsonb NOT NULL
);
INSERT INTO events (log)
VALUES
('{
"app": "web-api",
"duration": 150,
"tags": ["error", "database"],
"users": [
{"name": "Alice", "action": "login", "score": 100},
{"name": "Bob", "action": "logout", "score": 85}
]
}');
Now, let’s see jsonpath in action.
-- Does the 'tags' array contain the string 'error'?
SELECT * FROM events WHERE log @@ '$.tags == "error"';
-- Is the duration greater than 100?
SELECT * FROM events WHERE log @@ '$.duration > 100';
-- Get the name of the first user (returns a JSONB string)
SELECT jsonb_path_query_first(log, '$.users[0].name') FROM events;
-- Get the name of the first user (returns a SQL text value)
SELECT jsonb_path_query_first(log, '$.users[0].name') #>> '{}' FROM events;
The last example highlights a key quirk: the jsonb_path_query* functions return JSONB types. To get a native SQL text value out, you often need to use the #>> '{}' trick, which is PostgreSQL’s slightly awkward way of saying “convert this entire JSONB scalar to text.”
Accessors and Filters: The Real Power
This is where you stop being polite and start getting real. You can use the .* and .** accessors for wildcard matching and recursive descent (digging through every level of the object), respectively. But the star of the show is the filter expression.
You can apply a filter to an array to find elements that match a condition. The syntax is ?(@.some_path some_operator "some_value"). The @ here represents the current array element being evaluated.
-- Find all events where ANY user has a score greater than 90
SELECT * FROM events WHERE log @@ '$.users[*].score > 90';
-- Find all events where ANY user has a score greater than 90 (alternative, more explicit)
SELECT * FROM events WHERE log @@ '$.users ? (@.score > 90)';
-- Get the names of all users who performed a "login" action
SELECT jsonb_path_query(log, '$.users ? (@.action == "login").name') FROM events;
See that? The filter ? (@.action == "login") iterates through each object in the users array and checks the condition. Only the objects that pass the filter have their .name path evaluated. This is incredibly powerful for querying nested data without having to unnest and join everything in sight.
Best Practices and Pitfalls
Indexing is Key: Remember I said this was fast? You need a GIN index for that. The most powerful type is the
jsonb_path_opsindex, which works brilliantly withjsonpathqueries.CREATE INDEX idx_events_log_path_ops ON events USING GIN (log jsonb_path_ops);This index won’t help with every single random access pattern, but for the existential checks (
@@) and path queries we’re doing here, it’s an absolute powerhouse. Without it, you’re doing a full table scan, and I will personally come to your data center to frown at you.Mind the Return Types: I mentioned this earlier, but it’s a common trip-up.
jsonb_path_queryreturns a set ofjsonb. If you want to use that data in aWHEREclause or treat it like a normal SQL value, you’ll need to cast it or use#>> '{}'to convert it to text.jsonb_path_query_first(...) #>> '{}'is your best friend for extracting scalar values.Double Quotes are Non-Negotiable: The
jsonpathlanguage requires double quotes around string literals. Using single quotes will cause a syntax error. It’s the opposite of standard SQL, which is a choice, but we have to live with it. Wrong:'$.users ? (@.action == 'login')'Right:'$.users ? (@.action == "login")'Use
jsonb_path_query_firstWhen You Know Better: If your path expression is designed to return at most one result (e.g., you’re accessing a specific object by a known key), usejsonb_path_query_first. It’s a clear signal of intent and stops processing after the first match, which can be a minor performance boost. Use the set-returningjsonb_path_querywhen you genuinely expect multiple results and want to work with all of them.
The jsonpath language is deep, and we’ve only scratched the surface here. You can use predicates like like_regex, starts with, and even do arithmetic. It’s the most robust tool in your arsenal for making PostgreSQL’s JSONB support truly sing. Use it to write queries that are both expressive and efficient, and leave those gnarly nested SQL expressions in the past where they belong.