10.4 PRIMARY KEY: Identity, Implicit Unique Index, and NOT NULL
Right, let’s talk about the PRIMARY KEY. You’ve probably heard it’s important. That’s an understatement. It’s the single most important column (or set of columns) in your entire table. It’s the one thing that absolutely, positively must be unique for every single row. Think of it as your table’s social security number, its fingerprint, its “this one is mine” identifier. And because it’s so important, SQL gives it a whole bundle of superpowers automatically. Let’s break down what you’re really getting when you declare one.
The Holy Trinity: NOT NULL, UNIQUE, and the Implicit Index
When you slap PRIMARY KEY onto a column definition, you’re not asking for one constraint. You’re getting a package deal of three critical features, and the database will not let you have one without the others.
First, it implies NOT NULL. This is non-negotiable. A primary key must have a value. Allowing a NULL would completely defeat the purpose of a unique identifier. If you try to be clever and add NULL to a primary key column, the database will laugh at you (by throwing an error, which is its way of laughing).
Second, it implies UNIQUE. Obviously. It’s the primary key. Its entire job is to be unique across the table.
Third, and this is the part that often surprises people, it automatically creates a unique index to enforce that uniqueness. This isn’t just a constraint; it’s a performance feature. The database needs a fast way to check for duplicates and to find rows by their primary key. This index is how it does that. You don’t have to write CREATE INDEX separately. The database does it for you, and it’s almost always a B-tree index, which is perfect for equality and range lookups.
Here’s the basic syntax. It’s deceptively simple.
CREATE TABLE dbo.Patrons (
PatronID INT PRIMARY KEY, -- The simple, inline way
FullName NVARCHAR(100) NOT NULL
);
But I almost never write it that way. The method above is fine for a quick script, but for anything serious, use the constraint syntax at the end of the table definition. It’s clearer, and it’s your only option if your primary key spans more than one column (a composite key).
CREATE TABLE dbo.BookLoans (
PatronID INT NOT NULL,
BookID INT NOT NULL,
LoanDate DATE NOT NULL,
-- Other columns...
CONSTRAINT PK_BookLoans -- Naming your constraints is a best practice. Do it.
PRIMARY KEY (PatronID, BookID, LoanDate) -- Composite key on three columns
);
The Identity Crutch (And When It’s Actually Good)
In the Microsoft SQL Server world (and others), you’ll often see PRIMARY KEY paired with IDENTITY(1,1). This is the classic “auto-incrementing number” pattern.
CREATE TABLE dbo.Products (
ProductID INT IDENTITY(1,1) PRIMARY KEY, -- Starts at 1, increments by 1
ProductName NVARCHAR(100) NOT NULL
);
This is a perfectly valid and incredibly common approach. It gives you a simple, guaranteed-unique, meaningless surrogate key. The database handles the complexity of generating the next number, and you never have to worry about it. It’s a great choice for most OLTP tables where the natural data doesn’t have a clear, simple, immutable unique identifier.
But here’s my “brilliant friend” advice: don’t do this by default without thinking. It’s a crutch. Sometimes a very useful crutch, but a crutch nonetheless. The real world often does have natural keys: a combination of (ISBN, EditionNumber), (ServerName, MetricName, Timestamp), or (CountryCode, PhoneNumber). If a sensible, unique natural key exists and is stable, consider using it. You’ll save a join in many queries. The downside? If that natural key ever has to change (e.g., a phone number changes), it’s a nightmare to update foreign keys. That’s the trade-off. Surrogate keys (IDENTITY) protect you from reality. Natural keys embrace it, for better or worse.
The One Pitfall Everyone Hits: The Clustering Default
Here’s the big one. In SQL Server, when you create a PRIMARY KEY, it defaults to also being the clustered index. The clustered index dictates the physical order in which the data is stored on disk. This is a huge deal for performance.
This default makes perfect sense 80% of the time. You often want to look up rows by their primary key, so having them physically sorted that way is efficient.
But for the other 20%, it’s a trap. Imagine a Orders table where the primary key is an OrderID (an IDENTITY column), but 99% of your queries are searching for all orders for a specific CustomerID and OrderDate. Having the table physically sorted by the incrementing OrderID is useless for those queries. You’d be better off clustering on (CustomerID, OrderDate) and making the OrderID primary key a nonclustered index.
You have to be explicit to break the default:
CREATE TABLE dbo.Orders (
OrderID INT IDENTITY(1,1) NOT NULL,
CustomerID INT NOT NULL,
OrderDate DATE NOT NULL,
-- ...
CONSTRAINT PK_Orders
PRIMARY KEY NONCLUSTERED (OrderID), -- Explicitly nonclustered PK
CONSTRAINT IX_Orders_CustomerDate
UNIQUE CLUSTERED (CustomerID, OrderDate, OrderID) -- Clustered on what we actually query
);
The takeaway: always think, “How will this table be queried?” before you blindly accept the default clustering. The primary key defines your uniqueness. The clustered index defines your physical storage. They are separate concepts, and decoupling them is a key sign of a senior developer who has been burned by this before.