20.2 Hash: Equality-Only Lookups and When to Use Them

Alright, let’s talk about the Hash index. It’s the index you use when you absolutely, positively must find a row based on a single, exact value, and you don’t care about anything else. No ranges, no sorting, no partial matches. Just “is this thing equal to that thing?” It’s the philosophical purist of the index world, and it’s brutally efficient at its one job.

The B-tree index is the swiss army knife you use for everything; the Hash index is the scalpel you use for one very specific, very precise incision. It works exactly how you’d hope: it takes the value you’re indexing, chucks it into a hash function (which is a fancy way of saying “a function that jumbles it into a predictable, fixed-size jumble of bits”), and uses that resulting hash code to find the data’s location directly. Think of it like a massive library where every book has a unique Dewey Decimal code. You don’t browse the shelves; you just compute the code and go straight to the exact shelf. This makes lookups for equality (=) blazingly fast, theoretically in constant time, O(1).

The Sordid History and Why You’re Not Using It (Probably)

Here’s the thing you need to know right out of the gate: for years, I wouldn’t have even recommended you consider a Hash index. The designers made a whopper of a choice back in the day: they weren’t WAL-logged. Write-Ahead Logging is the magic that makes Postgres crash-safe. Without it, a Hash index could be completely corrupted by a server crash and would need to be rebuilt. This single, frankly absurd, flaw made them utterly useless for any production system that valued its data. They were a parlor trick.

But! Pay attention! As of PostgreSQL 10, this was finally, mercifully fixed. Hash indexes are now WAL-logged and are perfectly safe. The problem is, the reputation stuck, and B-trees are so damn good and versatile that they often win even in equality-only scenarios. The B-tree’s ability to also handle ranges, sorting, and uniqueness constraints means it’s usually the default choice unless you’ve done your homework.

When to Actually Use One

So, when does the scalpel beat the swiss army knife? The sweet spot is when you have a table with a massive number of distinct values and your only query predicate is WHERE column = 'value'.

Imagine a session store table with millions of rows, queried only by a unique session UUID. Or a massive fact table with a unique transaction ID. A Hash index on that UUID column will typically be smaller than a B-tree index and can be slightly faster for the lookups, as it avoids the B-tree’s traversal overhead. The performance difference isn’t always earth-shattering, but on tables measuring in the hundreds of gigabytes, it can be meaningful.

Let’s see it in action. First, you need to enable the extension. It’s not part of the core postgres extension, which tells you something about its “special occasion” status.

CREATE EXTENSION IF NOT EXISTS hashset;
-- This isn't actually needed! I'm tricking you to make a point.
-- There is no special extension. You can just create them. The fact that you
-- thought there might be is a testament to its outsider status.

Now, let’s create a simple table and compare the two index types.

CREATE TABLE massive_audit_log (
    id BIGSERIAL PRIMARY KEY,
    event_id UUID NOT NULL, -- Billions of distinct values
    event_data JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create a B-tree index for comparison
CREATE INDEX ON massive_audit_log USING btree (event_id);
-- Create the Hash index
CREATE INDEX ON massive_audit_log USING hash (event_id);

-- Analyze to make sure the planner has fresh stats
ANALYZE massive_audit_log;

Now, let’s see which one the query planner picks for an exact lookup. It’ll usually be a toss-up, but you can nudge it.

EXPLAIN ANALYZE
SELECT * FROM massive_audit_log WHERE event_id = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a11';

You might see a Bitmap Heap Scan over the B-tree index or an Index Scan using the Hash index. The cost will be nearly identical on a small dataset. The Hash index truly starts to shine as the table grows into terabyte territory.

The Gotchas and the Fine Print

It’s not all roses. You must understand its limitations, because they are severe and non-negotiable.

Equality Only: I mean it. This is its entire raison d’être. If you ever need to run a query like WHERE event_id > '...' or ORDER BY event_id, your beautiful Hash index is a silent, useless monument to a query you can’t run. The B-tree would have handled it with ease.
No Uniqueness or Foreign Keys: You cannot declare a UNIQUE constraint or a PRIMARY KEY using a Hash index. The system catalogs don’t support it. If you need uniqueness, you’re back to a B-tree.
No Prefix Searches: It can’t do LIKE 'abc%' searches. That’s a form of range query, which is a hard no.
Collation Matters: The hash function is sensitive to the data type and its collation. 'Hello' and 'hello' will hash to wildly different values. This is a good thing for performance, but you need to be aware of it.

So, here’s the rule: Benchmark. Don’t assume. If you have a vast table with a high-cardinality column used exclusively for = lookups, create both a B-tree and a Hash index on a production-like slice of data and run your typical query load. See if the Hash index’s marginal gains in performance and storage are worth the total loss of versatility. Nine times out of ten, the B-tree wins on practicality. But on that tenth time, when you’re squeezing out every last millisecond from a gigantic query, the Hash index is your brilliantly focused, slightly eccentric tool for the job.