26.1 json vs jsonb: Storage and Operator Differences
Right, let’s settle this. You’ve probably already been told that jsonb is the one you should use 99.9% of the time. You nod, you move on. But I know you. You’re the kind of person who needs to know why. Because if you don’t, that 0.1% case will sneak up and bite you in production at 3 AM on a Sunday. So let’s get our hands dirty.
The core difference isn’t about what they store—they both store perfectly valid JSON. It’s about how they store it. The json type stores an exact, whitespace-and-all copy of the text you put in. It’s a glorified text field with syntax validation. Need to preserve the exact textual representation for legal reasons or because some external system is ridiculously fussy? Fine, use json. For everyone else, read on.
jsonb stands for “JSON Binary” or, as I like to think of it, “JSON Better.” It parses the JSON on input, decomposing it into a binary format that strips out whitespace, avoids duplicate keys (keeping only the last one, which is what the JSON spec says to do anyway), and, most importantly, sorts the keys within each object. This upfront parsing cost is the only real downside, and it’s almost always worth it.
The Storage Punchline
Let’s see the size difference. It’s not always dramatic, but it can be.
SELECT
'{"this": "is", "a": "test", "with": "some", "extra": "whitespace"}'::json AS json_data,
pg_column_size('{"this": "is", "a": "test", "with": "some", "extra": "whitespace"}'::json) AS json_size,
'{"this": "is", "a": "test", "with": "some", "extra": "whitespace"}'::jsonb AS jsonb_data,
pg_column_size('{"this": "is", "a": "test", "with": "some", "extra": "whitespace"}'::jsonb) AS jsonb_size;
json_size | jsonb_size
-----------+------------
101 | 85
The jsonb version is smaller. But the real magic isn’t the storage; it’s what this binary representation unlocks: indexing and performant querying.
The Operator Divide
This is where the designers clearly had a few too many cups of coffee and decided to make things “interesting.” The operators are a historical mess.
For json, you primarily use -> and ->> to traverse paths. The -> operator returns a json type (so you can keep chaining), and ->> returns text (so you can actually use the value).
-- For a json column
SELECT
my_json_column -> 'user' -> 'name' AS name_json, -- Type is json: '"Alice"'
my_json_column -> 'user' ->> 'name' AS name_text -- Type is text: 'Alice'
FROM my_table;
For jsonb, you get those too, but you also get the far more powerful #> and #>> for path lookups, and the absolutely essential @> for containment checks.
-- Is the 'tags' array in this jsonb document a superset of the array ['postgres', 'tech']?
SELECT * FROM posts
WHERE metadata @> '{"tags": ["postgres", "tech"]}';
-- Get the text value at a specific path (user -> contact -> email)
SELECT metadata #>> '{user, contact, email}' AS user_email FROM posts;
The containment operator (@>) is the workhorse of jsonb querying. It’s why you can create powerful GIN indexes and get blistering speed. You simply cannot do this with the plain json type in any efficient way.
The Indexing Chasm
This is the deal-breaker. You can’t create a useful index on a json column. You can create expression indexes on specific paths, but it’s clunky and doesn’t handle deep queries or arrays well.
With jsonb, you have first-class citizenship in the indexing world. You create a GIN index on the entire column, and it can efficiently answer queries about any key and value pair within the document. It’s like having a built-in search engine for your structured-but-not-really data.
-- The magic wand
CREATE INDEX idx_posts_metadata_gin ON posts USING GIN (metadata);
-- Now this query is lightning fast, even on a million rows
SELECT * FROM posts WHERE metadata @> '{"status": "published", "priority": "high"}';
So When Would You Ever Use json?
I’ll give you two scenarios, and they’re both edge cases:
- Input/Output Fidelity: You need to guarantee the exact same textual output, including whitespace and key order, for some external system that throws a tantrum otherwise.
- Write-Optimized, Never-Read Workloads: You’re just ingesting logs as fast as humanly possible and will never, ever query them by their contents. You’re just throwing the raw text over the wall. The minimal overhead of parsing for
jsonbis too much for this insane throughput.
If you’re not doing one of those two things, you’re handicapping yourself by using json. The query flexibility and performance gains of jsonb are so profound that the choice isn’t even a choice. It’s the default. Use it.