9.5 Practical Schema Layouts: Multi-Tenant and Feature-Based
Right, let’s talk about organizing your data. You’ve got tables, you’ve got columns, but how do you actually arrange this mess? This isn’t about CREATE TABLE syntax; you can get that from any manual. This is about the high-level strategy that will either make your application a joy to scale or an absolute nightmare to refactor. We’re going to look at two heavyweight contenders: Multi-Tenant and Feature-Based schemas. Both are valid. Both have trade-offs. And both will make you curse the name of a previous developer if you pick the wrong one for the job.
The Multi-Tenant Pattern: One Schema to Rule Them All
The core idea here is simple: you have one, single schema. Every table in it serves all of your customers (or “tenants”). The magic that keeps User A from seeing User B’s data is a tenant_id column on (almost) every single table. It’s the digital equivalent of putting everyone in the same apartment building and giving them all keys only to their own unit.
CREATE TABLE tenants (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
CREATE TABLE projects (
id SERIAL PRIMARY KEY,
tenant_id INTEGER NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
-- ... other columns
-- This is the money shot. This constraint is what keeps the lights on.
UNIQUE(tenant_id, name)
);
CREATE TABLE tasks (
id SERIAL PRIMARY KEY,
tenant_id INTEGER NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
description TEXT,
-- ... other columns
FOREIGN KEY (tenant_id, project_id) REFERENCES projects(tenant_id, id)
);
Why you’d do this: The operational simplicity is intoxicating. One backup. One set of migrations. One set of indexes to tune. When you add a new feature, you deploy one script and it’s immediately available for every tenant. It’s also incredibly efficient with resources; a small tenant uses little disk/CPU, while a large one can use more, all within the same database instance.
The catch (and it’s a big one): You can never, ever, ever forget that tenant_id in your WHERE clauses. Every query you write for the rest of your life will start with WHERE tenant_id = X. Get this wrong, and you have a data leak on your hands. It’s a constant, low-level paranoia. Tools like Row Level Security in Postgres can help, but they’re a bandage on a design choice. Also, if one tenant decides to run a report that’s a Cartesian join of doom, they can potentially slow down the database for everyone else. Good times.
The Feature-Based Pattern: A Schema for Every Job
This approach throws the “one schema” concept out the window. Instead, you structure your schemas around features or components of your application. Think auth for user accounts, billing for invoices, reporting for analytics. Each is its own little fiefdom with its own tables.
-- Instead of a monolithic 'public' schema, we create domains.
CREATE SCHEMA auth;
CREATE SCHEMA billing;
CREATE SCHEMA reporting;
-- The auth schema owns user data. Nothing else should touch this.
CREATE TABLE auth.users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash TEXT NOT NULL
);
-- The billing schema handles all things money.
CREATE TABLE billing.invoices (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL, -- This references auth.users, but cross-schema FKs are... tricky.
amount DECIMAL(10, 2) NOT NULL,
paid BOOLEAN DEFAULT FALSE
);
-- You'd likely query across them by joining schemas.
SELECT auth.users.email, billing.invoices.amount
FROM auth.users
JOIN billing.invoices ON auth.users.id = billing.invoices.user_id
WHERE billing.invoices.paid = FALSE;
Why this is brilliant: The separation of concerns is beautiful. Your authentication system is now a sealed module. You can back up the billing schema more frequently than the reporting schema. A DBA can analyze query patterns per feature. It forces a kind of API-like thinking between your application’s components, which often leads to a cleaner code architecture as well.
Where the designers clearly made a questionable choice: Cross-schema foreign keys are often not enforced by the database, or if they are, it’s a pain. You’re trading one type of complexity (remembering tenant_id) for another (orchestrating data across schemas). Also, some ORMs and application frameworks still have a mild heart attack when confronted with multiple schemas. You’ll spend more time configuring your tools.
So, Which One Do I Pick?
Stop looking for a silver bullet. There isn’t one.
- Choose Multi-Tenant if you’re building a SaaS application where all your tenants are on the same version of the product and operational simplicity is your god. This is the default for most web apps for a reason.
- Choose Feature-Based if you’re building a large, complex application (a monolithic beast, if you will) where different parts have genuinely different data access patterns, security requirements, or scaling needs. It’s the choice for when you need to be able to scale or manage components independently.
Here’s the secret they don’t put in the manuals: you can hybridize. Nothing is stopping you from having a multi-tenant setup within a feature-based schema layout. Your billing schema could have an invoices table with a tenant_id. You absolute maniac. You’ve just created a fractal of complexity. I love it.