9.4 The pg_catalog and information_schema Schemas
Right, let’s talk about the two schemas PostgreSQL created for you without even asking: pg_catalog and information_schema. Think of them as your database’s nervous system and its meticulously organized public library, respectively. One is the raw, gritty truth from the engine’s perspective. The other is a polished, standards-compliant interface that sometimes feels like it’s hiding the good stuff. You need to know both.
The information_schema: The Standards-Compliant Librarian
The information_schema is a set of views defined by the SQL standard. Its entire job is to provide a consistent, predictable way to interrogate your database’s structure, regardless of which SQL database you’re using (PostgreSQL, MySQL, etc.). It’s fantastic for writing portable tools or if you’re coming from another database system and just want to find your bearings.
Let’s say you want a list of all your tables. You’d query the information_schema.tables view.
SELECT table_name, table_type
FROM information_schema.tables
WHERE table_schema = 'public';
This is safe, standard, and a bit… verbose. The information_schema is notorious for its long, descriptive column names. It’s designed for clarity, not for your typing speed. The upside? The same query works on other databases. The downside? It often requires joins across a dozen views to get the same information a single pg_catalog table might give you. It’s the designer’s questionable choice to prioritize standards-compliance over convenience, which is admirable in theory and occasionally frustrating in practice.
The pg_catalog: The Grizzled System Engineer
This is where PostgreSQL keeps its guts. pg_catalog is a schema full of system tables, views, functions, and data types that define everything about your database. It’s PostgreSQL’s internal scratchpad, and it’s exposed for you to read. This is where you go when the polite questions to information_schema don’t give you the answer you need.
Want that list of tables again, but maybe with some extra juicy details like the table’s on-disk size or its access method? information_schema taps out. pg_catalog steps up.
SELECT relname AS table_name,
relkind AS type, -- 'r' for table, 'v' for view, etc.
pg_size_pretty(pg_relation_size(oid)) AS size
FROM pg_class
WHERE relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'public')
AND relkind = 'r';
See the difference? We’re dealing with real internal identifiers (oid), cryptic type codes (relkind), and powerful PostgreSQL-specific functions like pg_relation_size. This query tells you not just what exists, but something material about it. The learning curve is steeper because the pg_catalog tables are designed for the system’s convenience first, but the power is unmatched.
Why Two Systems? (The Best of Both Worlds)
This duality isn’t a mistake; it’s a brilliant design. The information_schema gives you portability and a safety net. You can grant users read access to it without worrying they’ll see truly internal, potentially confusing implementation details. The pg_catalog gives you the unvarnished truth and the deep, proprietary features that make PostgreSQL awesome.
Here’s a classic pitfall: trying to find a function. In information_schema, you’d look at routines and then have to join to parameters to see the function signature. It’s a whole thing. In pg_catalog, you just query pg_proc. But be warned: pg_proc includes every internal function, making it a crowded party. You have to filter by namespace.
-- Find a custom function in pg_catalog
SELECT proname, proargtypes
FROM pg_proc
WHERE pronamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'public')
AND proname = 'my_function';
-- The equivalent(ish) in information_schema
SELECT routine_name, data_type
FROM information_schema.routines
WHERE routine_schema = 'public'
AND routine_name = 'my_function';
See how the pg_catalog version gives you the raw type OIDs for the arguments? That’s the kind of powerful, low-level detail information_schema hides from you.
Best Practices and The Golden Rule
- For Portability: Use
information_schema. If you’re building a tool that might need to work with MySQL or SQL Server one day, this is your friend. - For Power and Depth: Use
pg_catalog. When you need to know about PostgreSQL-specific features, replication status, or precise locking behavior, you have no other choice. - The Golden Rule: Never, ever manually update or delete data in the
pg_catalogtables. You will break your database. I’m not joking. Treat them as read-only, unless you are a PostgreSQL core developer and you know exactly what you’re doing. You’re not. I’m probably not. We read, we don’t write.
Ultimately, you’ll end up using both. You’ll start with the standard information_schema to learn the concepts and then gradually graduate to the raw power of pg_catalog for your heavy lifting. Knowing how to navigate both is the mark of someone who doesn’t just use the database, but understands it.