12.2 INSERT ... SELECT: Copying Data Between Tables
Right, so you’ve got data in one table and you need to get it into another. You could write a hundred individual INSERT statements, but you’re not a barbarian. You’re a programmer. You automate things. This is where INSERT ... SELECT comes in, and it’s one of the most powerful tools in your SQL toolbox. It’s the Swiss Army knife for copying or transforming data from one part of your database to another. The basic idea is simple: you take the output of a SELECT statement and you feed it directly into an INSERT.
Let’s start with the most straightforward scenario. Imagine you have an old_users table that’s being deprecated and a shiny new users table with the same structure.
-- Let's see what we're working with
SELECT * FROM old_users;
id | username | email
----+------------+---------------------
1 | alice | alice@example.com
2 | bob | bob@example.com
3 | charlie | charlie@example.com
Your new users table is empty and needs this data. Here’s how you do it in one fell swoop:
INSERT INTO users (id, username, email)
SELECT id, username, email
FROM old_users;
Boom. Done. The SELECT statement runs first, producing a result set. The INSERT statement then takes that entire result set and inserts it row-by-row into the target table. The column list in the INSERT (id, username, email) is crucial—it defines the order and destination for the values coming from the SELECT. The SELECT’s column list must match in number, order, and compatible data types.
Not Just Blind Copies
Here’s where it gets interesting. You’re not just a photocopier; you’re a data sculptor. The SELECT statement can be any valid SELECT statement. This means you can filter, transform, and calculate on the fly.
Maybe you only want to migrate active users from the old table, and you want to timestamp them with the migration time:
INSERT INTO users (username, email, date_joined)
SELECT username, email, NOW()
FROM old_users
WHERE is_active = true;
Notice I didn’t insert the id from the old table. This is a best practice if your new table uses a SERIAL or GENERATED AS IDENTITY primary key. You’re letting the new table generate its own clean set of IDs, avoiding potential future conflicts. The old id becomes just another piece of data you can store in a column like legacy_id if you need it for reference.
The Type and Order Mismatch Pitfall
This is the most common way to shoot yourself in the foot. The database isn’t psychic. It will obediently try to shove whatever the SELECT returns into the columns you specified in the INSERT, in the exact order you listed them.
-- This is a disaster waiting to happen
INSERT INTO users (username, email, date_joined)
SELECT email, username, NOW() -- Oops. Swapped email and username!
FROM old_users;
The above statement would happily insert the email string into the username column and the username string into the email column, creating a glorious mess. The query will run without a syntax error because the data types are likely both TEXT and are therefore “compatible.” Always double-check your column order. Explicitly listing the columns, as we’ve been doing, is a very good habit—it saves you from this and from future schema changes breaking your queries.
Using SELECT to Generate Data
You can even use INSERT ... SELECT without a source table at all to generate a batch of data. Need to create ten test users?
INSERT INTO users (username, email)
SELECT 'user_' || n, 'user_' || n || '@example.com'
FROM generate_series(1, 10) AS n;
This uses generate_series(), a fantastically useful function, to create a virtual table of numbers from 1 to 10. Then for each number n, it generates a username and email. It’s a clean, single-statement way to populate test data without writing a loop in your application code.
The power here is the sheer flexibility. You can JOIN tables in your SELECT, use CASE statements to conditionally transform values, or aggregate data on the fly before inserting it. It’s your primary method for moving and reshaping data within the database itself, where this work belongs. It’s faster than moving the data to your application and back, and it’s almost always simpler. Use it liberally.