21.4 Covering Indexes: INCLUDE Columns for Index-Only Scans
Right, let’s talk about covering indexes and the INCLUDE clause. This is one of those features that, once you understand it, you’ll wonder how you ever lived without it. It’s the difference between your database engine doing all its work in the tidy, fast, ordered world of the index, versus having to schlep all the way back to the main table (the “heap” or clustered index) to grab a few more pieces of data. That round trip is expensive, and we’re going to eliminate it.
Here’s the core problem: a standard nonclustered index is like the index in the back of a book. It lets you quickly find the page number (the row) where a topic (your key column) is discussed. But if you need information that isn’t in the index entry itself, you have to go to that page. The INCLUDE clause lets us cheat. It lets us stuff extra columns into the index’s “leaf pages” – the very end of the index structure where the actual row pointers live – without making those columns part of the index key. This means the database can satisfy the entire query just by reading the index, resulting in what’s called an “index-only scan.” It’s a beautiful thing.
Why INCLUDE Beats Just Adding to the Key
You might be thinking, “Why not just add these extra columns to the index key itself?” Great question, and the answer reveals the genius of INCLUDE.
The key columns dictate the order of the entire index. They are used for searching and sorting. If you add LastName as a key column, the index is sorted by LastName. If you then add FirstName, it’s sorted by LastName, then by FirstName within each LastName. This is crucial for WHERE, ORDER BY, and JOIN clauses.
But what if you just need FirstName for the final SELECT list? Making it a key column would:
- Increase the index size unnecessarily, as all intermediate levels of the B-tree must now store this larger key.
- Could break existing queries that rely on the key order. A wider key might change the effectiveness of the index for range scans.
- Be utterly pointless for filtering, which is the key’s actual job.
The INCLUDE column sidesteps all this. It’s just data lazily tossed into the leaf pages. It doesn’t affect ordering. It doesn’t affect the intermediate levels of the index. It’s pure, denormalized payload for the sole purpose of making your frequent queries faster.
Let’s make this concrete. Imagine a Users table and a common query to find users by country and get their full name.
-- Our table (simplified for example)
CREATE TABLE dbo.Users (
UserId int PRIMARY KEY,
CountryCode nvarchar(3) NOT NULL,
FirstName nvarchar(50) NOT NULL,
LastName nvarchar(50) NOT NULL,
-- ... 20 other columns we don't care about for this query
);
-- The common query
SELECT FirstName, LastName
FROM dbo.Users
WHERE CountryCode = 'USA';
A naive index would be:
CREATE INDEX IX_Users_CountryCode ON dbo.Users (CountryCode);
This helps the WHERE clause find rows quickly, but for each row found, the engine must do a “Key Lookup” back to the clustered index (the PRIMARY KEY) to fetch the FirstName and LastName. This is a lot of random I/O.
The “just add it to the key” approach is inefficient:
CREATE INDEX IX_Users_CountryCode ON dbo.Users (CountryCode, FirstName, LastName);
-- Now the index is ordered by all three. Overkill.
-- The smarter, covering approach using INCLUDE:
CREATE INDEX IX_Users_CountryCode_Included ON dbo.Users (CountryCode)
INCLUDE (FirstName, LastName);
This second index is the winner. The key is small and efficient for seeking to ‘USA’. Once it finds those leaf pages, bam – the FirstName and LastName are right there. No need to go back to the main table. The query is satisfied entirely from the nonclustered index.
The Glorious Index-Only Scan
When you create the right covering index, the execution plan will show the beautiful green text of an “Index Seek” or “Index Scan” followed immediately by a “SELECT” operator. The dreaded “Key Lookup” (or “RID Lookup” for heaps) is completely absent. This is the goal. You can see this in SQL Server’s actual execution plan; the OBJECT_TYPE of the seek will be your nonclustered index, and the Output List will show it’s pulling all required columns from that single object.
Best Practices and Pitfalls
Don’t go wild and INCLUDE every column in your table. That would just create a second, fatter copy of your entire table. Be surgical.
Include Only What You Need: Analyze your most expensive and frequent queries. Use the actual execution plan to see which columns are causing Key Lookups. Those are your candidates. Usually, they’re columns in the
SELECTlist or in theWHEREclause that aren’t selective enough to be key columns themselves (e.g., aBITfield likeIsActiveis a primeINCLUDEcandidate).Large Data Types are a Trap: You can technically
INCLUDEcolumns likeNVARCHAR(MAX),VARBINARY(MAX), orXML. Just because you can, doesn’t mean you should. These large data types (LOBs) are stored off-page anyway. Including them often forces the database to do a separate lookup for that LOB data, defeating the entire purpose of the covering index and potentially making performance worse. Avoid it.Write Operations Have a Cost: Remember, an index isn’t free. Every
INSERT,UPDATE, orDELETEon your table must now also update every single index that includes those columns. If youINCLUDEa frequently updated column, you’re adding write overhead. This is a classic read-vs-write tradeoff. The performance gain on yourSELECTqueries must be worth the write penalty.Filtered Indexes Love INCLUDE: This is a power combo. If you create a filtered index (e.g.,
WHERE IsActive = 1), you can useINCLUDEto make it covering for queries that target that specific subset of data. This keeps the index incredibly small and hyper-efficient.
So, use INCLUDE wisely. It’s not magic fairy dust, but it’s the closest thing we have to a free lunch for read-heavy queries. Stop making your database do the extra legwork. Give it a covering index and let it relax in the leaf pages for a while.