21.3 Expression Indexes: Indexing a Function of a Column

Right, so you’ve got a table. You’re querying it. It’s slow. You slap an index on a column. It gets faster. You feel like a wizard. Life is good.

But then you hit a query like this, and the magic spell fizzles:

SELECT user_id, order_date FROM orders
WHERE DATE_TRUNC('month', order_date) = '2023-10-01';

You check the query plan (EXPLAIN ANALYZE is your best friend, by the way—never forget that), and sure enough, it’s doing a full table scan. Your beautiful index on order_date is just sitting there, useless. Why? Because you’re not searching for order_date; you’re searching for a function of order_date. The index stores the raw values 2023-10-15 14:32:01, 2023-10-02 09:15:44, etc., but it has no idea what the DATE_TRUNC('month'...) of those values is.

This is where expression indexes waltz in, adjust their tie, and say, “Allow me.”

The Basic Incantation

An expression index is exactly what it sounds like: you create an index not on a column, but on the result of an expression involving that column. The syntax is almost identical to a regular index, you just replace the column name with the expression.

For our slow query above, the fix is:

CREATE INDEX idx_orders_order_date_month ON orders (DATE_TRUNC('month', order_date));

Now, when you run that same WHERE DATE_TRUNC('month', order_date) = '2023-10-01' filter, the planner can look at your new idx_orders_order_date_month index, which has all the pre-computed month-start values neatly sorted, and do a super-efficient index lookup. It’s like we taught the index a new language.

Why This Black Magic Works (And Its Price)

The database isn’t doing anything mystical. When you create the index, it literally runs that expression against every single row in the table once, stores the result, and sorts it. It’s a upfront cost for a long-term gain. This is why you have to be thoughtful about what you index; you’re trading write/update speed for read speed.

This also means the index is only useful for queries that use the exact same expression. The optimizer isn’t a AI that understands semantics; it’s a brilliant but literal-minded robot.

This will use the index:

WHERE DATE_TRUNC('month', order_date) = '2023-10-01'

This will NOT use the index:

WHERE EXTRACT(month FROM order_date) = 10
-- Different function, different signature. The index is useless here.

This is the most common pitfall. You build this clever index and then get clever in your query writing a different—but logically equivalent—expression. The database shrugs. Be consistent.

Real-World Voodoo: Case-Insensitive Search

Let’s talk about a classic use case that doesn’t involve dates. Imagine a users table where you want a case-insensitive search on username. Without an index, WHERE LOWER(username) = 'alice' means a full scan where every single username gets lowercased on the fly. Terrible.

The solution? Index the function.

CREATE INDEX idx_users_username_lower ON users (LOWER(username));

Now, a query for WHERE LOWER(username) = 'alice' becomes lightning fast. It’s a game-changer for user authentication or search features.

The Immutable Function Rule: Don’t Summon Demons

Here’s the part the manual often buries in boring legalese, but it’s critically important: The expression in your index must be immutable.

An immutable function is one that, given the same input, always returns the same output, forever and ever, amen. LOWER(), UPPER(), DATE_TRUNC() are immutable. CURRENT_TIMESTAMP, RANDOM(), or any function that depends on the current session’s timezone are not. They are mutable.

Why does this matter? The index is built once. If you somehow managed to create an index on CURRENT_TIMESTAMP, it would calculate the timestamp for each row once when the index was built and then never change. The data would be frozen in time, wrong, and utterly useless. PostgreSQL is smart enough to stop you from doing this with most built-in volatile functions, but if you’re indexing your own custom functions, you must declare them as IMMUTABLE or you will create a silent, terrible mess.

The Hidden Superpower: UNIQUE on an Expression

This is where it gets really fun. You can also create a UNIQUE constraint on an expression. Let’s say you have a table of products, and you have a sku column, but you want to enforce that the lowercase version of the SKU is unique to avoid ABC-123 vs abc-123 duplicates.

CREATE UNIQUE INDEX idx_products_sku_unique_lower ON products (LOWER(sku));

Boom. Now the database itself will reject any insert that would create a duplicate based on that lowercased expression. You’ve moved your data integrity logic from the application layer right down into the database, where it belongs. This isn’t just a performance trick; it’s a powerful design tool.

So, the next time you see a query filtering on or grouping by a function, don’t just groan and accept the sequential scan. Ask yourself, “Should this be an expression index?” The answer, more often than you’d think, is a resounding yes.