20.1 B-tree: The Default and When It Excels

Right, let’s talk about the B-tree index. It’s the default for a reason. When you create an index in PostgreSQL and don’t specify a type, this is what you get. It’s the workhorse, the reliable old pickup truck of the indexing world. It’s not always the flashiest or the fastest for every single job, but it will get you and your data where you need to go, predictably, 99% of the time. It’s the index you’ll use most, so let’s get to know it intimately.

How a B-tree Actually Organizes Your Data

The “B” stands for “balanced,” not “binary,” which is the first thing most people get wrong. This isn’t a binary search tree from your CS 101 final. A balanced tree means every single leaf node—the nodes at the bottom that actually hold the data—is exactly the same number of steps from the root. This balance is absolutely critical because it guarantees that finding any row, whether it’s the first or the billionth, will take the exact same number of steps. This predictable performance is why it’s the default.

Imagine the index as a phone book (ask your parents, kids). You don’t start at page 1 and scan for “Smith.” You open to the middle, see if you’re in the M’s, and then flip forward or backward accordingly. A B-tree does this algorithmically, with each internal node acting like a directory page that tells the database which branch to take next.

The Sweet Spot: Equality and Range Queries

This sorted, hierarchical structure is what makes the B-tree excel at two fundamental types of queries:

Equality searches (=): “Find the user with the email jane@example.com.” The B-tree navigates directly to that exact value. Blazingly fast.
Range queries (<, <=, >, >=, BETWEEN): “Find all orders placed between January 1st and January 15th.” This is the B-tree’s killer feature. Because the values are stored in sorted order, once it finds the starting point (Jan 1st), it can simply walk the leaf pages in order until it hits the end of the range (Jan 15th). Other index types often can’t do this efficiently.

Let’s see it in action. We’ll set up a table with a classic use case: a created_at timestamp.

-- Create a table and load it with a million rows of sample data
CREATE TABLE sensor_readings (
    id SERIAL PRIMARY KEY,
    sensor_id INTEGER NOT NULL,
    reading DECIMAL NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

INSERT INTO sensor_readings (sensor_id, reading, created_at)
SELECT (random() * 100)::int,
       random() * 100,
       NOW() - (random() * INTERVAL '365 days')
FROM generate_series(1, 1000000);

-- Now, the magic. Let's index the timestamp column.
CREATE INDEX idx_sensor_readings_created_at ON sensor_readings(created_at);

-- This query can now use the sorted nature of the B-tree to quickly find the range.
EXPLAIN ANALYZE
SELECT * FROM sensor_readings
WHERE created_at BETWEEN '2023-06-01' AND '2023-06-02';

The EXPLAIN ANALYZE output should show an Index Scan using our new idx_sensor_readings_created_at index, swiftly locating the start of the June 1st period and scanning until it passes June 2nd.

Best Practices and the One Weird Trick

A B-tree index isn’t magic; it can only help if your query’s WHERE clause matches the leftmost part of the index’s definition. This is the most common “why isn’t my index working?!” pitfall.

-- Our index is on (sensor_id, created_at). It's sorted first by sensor_id, then by created_at.
CREATE INDEX idx_sensor_readings_composite ON sensor_readings(sensor_id, created_at);

-- This query is golden. It uses the first column.
SELECT * FROM sensor_readings WHERE sensor_id = 55;

-- This query is also golden. It uses the first column for equality, then the second for a range.
SELECT * FROM sensor_readings WHERE sensor_id = 55 AND created_at > '2023-01-01';

-- This query is useless for this index. It's like asking for "Chapter 5" from a book sorted by author first.
-- The planner will likely ignore the index and do a sequential scan.
SELECT * FROM sensor_readings WHERE created_at > '2023-01-01';

The rule of thumb: put the most selective, most frequently used-for-equality column first. And remember, while you can SELECT any column, an index can only be used to quickly locate rows based on the criteria that are part of its sorted structure.

The Surprising Things You Can Index (And Why You Might Not Want To)

You can create a B-tree on just about any data type, but for some, it’s a… questionable life choice.

boolean columns: Don’t. Just don’t. You have two, maybe three values. An index on a boolean field is rarely selective enough to be worth the overhead. A sequential scan will often be faster.
text columns with massive values: A B-tree index can only handle values up to about ⅓ of a page (roughly 3kB). If you try to index a text field that might exceed this, it’ll work, but it will use a less efficient “toasted” value. For pattern matching (LIKE 'abc%'), it’s your only built-in option. For full-text search, you’d later graduate to a GIN index.
Low-selectivity columns: Indexing a column with only a few distinct values (e.g., gender, status flags) is usually a waste of space and write-performance. The query planner will often ignore it.

The B-tree is your foundational tool. It’s robust, flexible, and understands order. Use it for most of your standard relational data—timestamps, IDs, numbers, and short strings. It’s the index you can always count on, even if it sometimes gives you a gentle lecture about your life choices regarding boolean columns.