5.1 smallint, integer, and bigint: Ranges and Storage
Alright, let’s talk about the integer family: smallint, integer, and bigint. These are your workhorses, the bedrock of counting things. They’re not fancy, but understanding their limits is the difference between a robust application and a digital dumpster fire. The designers of PostgreSQL basically gave us three sizes: small, medium, and “you’ll probably run out of disk space first.”
The core difference between them is simple: how many bytes they hog and, consequently, how high (or low) they can count. This isn’t just academic; picking the wrong one is like using a teacup to bail out a sinking ship—it’s a fundamentally flawed strategy that will end in tears.
The Cold, Hard Numbers on Storage and Range
Let’s cut to the chase. Here’s what each type can handle:
smallint: 2 bytes of storage. Range: -32,768 to +32,767. You use this for things you’re absolutely, positively sure will stay small. Like the number of wheels on a car (unless you’re building a monster truck in your database, in which case, you have bigger problems).integer: 4 bytes. Range: -2,147,483,648 to +2,147,483,647. This is the Goldilocks choice. It’s the default for a reason. It handles most counting tasks you’ll ever throw at it, from the number of page views on a popular blog to the number of times I’ve sighed while debugging someone’sbigint-as-default schema.bigint: 8 bytes. Range: -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807. This is the “make the number as big as possible” button. It’s for when you’re counting things at an astronomical scale, like national debt or the number of atoms in a grain of sand. Or, more realistically, bank transaction IDs or high-resolution timestamps.
The integer type is almost always your best bet. It offers a massive range for a negligible 4-byte cost. Defaulting to bigint everywhere “just to be safe” is a classic rookie move. You’re doubling your storage footprint for every integer value for a scenario you’ll likely never encounter. On a massive table, that wasted space adds up fast, bloating your indexes and slowing down everything.
Why This Mess? Two’s Complement, That’s Why.
You might wonder why the ranges are these weird, uneven numbers instead of a clean +/- 32,000 or 2 billion. Blame two’s complement, the clever (and frankly, brilliant) way computers represent signed integers. In a nutshell, with 2 bytes (16 bits), you have 2^16 = 65,536 possible values. Half are used for negative numbers, half for positive, and zero gets one spot. So the range is -(2^15) to (2^15 - 1). Hence -32,768 to 32,767. The same logic applies to the 4-byte and 8-byte types. It’s not a messy choice; it’s a mathematically elegant one.
The Integer Overflow Pitfall
Here’s where the rubber meets the road. The database will not save you from yourself. If you try to shove a number that’s too large into a column, it will fail spectacularly. This is a good thing! It’s better than the alternative, which is silent, corrupt truncation.
CREATE TABLE widget_inventory (
widget_id serial PRIMARY KEY,
stock_count smallint -- Whoops, chosen poorly!
);
-- This will work fine... until it doesn't.
INSERT INTO widget_inventory (stock_count) VALUES (32767);
-- This will fail with an error: "smallint out of range"
INSERT INTO widget_inventory (stock_count) VALUES (32768);
The takeaway? Think about your domain logic. Will you ever have more than 32,767 widgets? If there’s even a shadow of a doubt, just use integer. The 2-byte savings isn’t worth the future headache.
The serial Pseudotypes: A Convenient Lie
You’ve probably seen serial and bigserial. These aren’t actual data types; they’re a convenience feature—a neat bit of syntactic sugar. Under the hood, serial creates an integer column, and bigserial creates a bigint column. They also automatically create a sequence object to provide default auto-incrementing values.
-- This...
CREATE TABLE books (
id serial PRIMARY KEY,
title text NOT NULL
);
-- ...is essentially equivalent to this more verbose version:
CREATE SEQUENCE books_id_seq;
CREATE TABLE books (
id integer NOT NULL DEFAULT nextval('books_id_seq'),
title text NOT NULL
);
ALTER SEQUENCE books_id_seq OWNED BY books.id;
The best practice? Use serial if an integer suits your identity range (it almost always will). Use bigserial only if you’re building a system for a galactic empire that plans to catalog every star. And for the love of all that is holy, never use smallserial. The 2-byte auto-incrementing key is an absurd contradiction in terms. If you’re worried about running out of 2 billion IDs, you have a wildly successful application and can handle the migration.