17.1 Scalar Subqueries: A Single Value in an Expression

Alright, let’s talk about scalar subqueries. This is where we start getting fancy, and frankly, where a lot of you will start writing queries that make your future self (or the poor soul who has to maintain your code) weep quietly at their desk.

A scalar subquery is, in essence, a SELECT query you shove inside another query that has the decency to return exactly one value: one row, with one column. Think of it as a single, atomic piece of data you can slot right into an expression, anywhere you’d normally put a literal value like 42 or a column name like users.name. It’s the most well-behaved of the subquery family. The one that gets invited to nice functions because it won’t cause a scene.

The Golden Rule: One Row, One Column

This is not a suggestion; it’s the law. If your so-called “scalar” subquery returns more than one row, the database engine will throw a fit (an error, technically) and your query will explode. It’s like trying to fit an entire watermelon into a coin slot. If it returns zero rows, it’s usually treated as a NULL value, which brings its own bag of fun surprises we’ll get to.

Here’s the simplest possible example. Let’s say you want a list of all products, but alongside each product, you also want to show the most recent price ever paid for any product. A weird request, but I’ve seen weirder.

SELECT
  product_name,
  list_price,
  (SELECT MAX(unit_price) FROM order_items) AS highest_price_ever
FROM products;

That little subquery in the parentheses (SELECT MAX(unit_price) FROM order_items) is our scalar subquery. It runs once, returns one value (the maximum price), and that value is slapped onto every single row of the outer query. Efficient? For a small dataset, maybe. For a massive one, it’s a bit daft, but it demonstrates the point.

Where You Can Slot Them In

You’re not limited to the SELECT clause. You can use these things anywhere an expression is valid:

In a WHERE clause:

SELECT employee_id, first_name, last_name
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

“Show me all employees who earn more than the company average.” The subquery calculates the average once, and then each row in the outer query is compared against that single, precomputed value.

In a SET clause of an UPDATE:

UPDATE products
SET list_price = (SELECT AVG(unit_price) * 1.1 FROM order_items WHERE product_id = 123)
WHERE product_id = 123;

“Set the list price for product 123 to be 10% above the average price it has actually sold for.” This is more dangerous and interesting because the subquery is now correlated—it references the outer table. We’ll get to that beast next.

The NULL Pitfall

Remember I said an empty subquery returns NULL? This is where you can get logically burned. Let’s say you’re feeling clever and write this:

SELECT product_name, (SELECT unit_price FROM order_items WHERE product_id = 999) AS last_price
FROM products;

You’re hoping to get the last price for product 999. But if product 999 has never been ordered, that subquery returns zero rows, which becomes NULL. So your result for last_price is just NULL. This might be what you want! But often, it’s not. You might have intended to use a LEFT JOIN to handle the missing data more explicitly. This silent NULL-ification is a common source of “I don’t understand why this isn’t working” bugs.

Performance: The Hidden Cost

Here’s the thing the manual often glosses over: that innocent-looking scalar subquery in the SELECT clause? It might run once for every single row in the outer table if the optimizer can’t flatten the query. In the very first example, the (SELECT MAX(unit_price) FROM order_items) query is harmless; it has no dependency on the outer query, so a good optimizer will run it exactly once and just reuse the value (this is called “constant folding”).

But if you do something like this:

SELECT
  product_id,
  product_name,
  (SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.product_id) AS order_count
FROM products p;

You’ve just created a correlated subquery (more on that next). For each product row p, the database has to stop, run the COUNT(*) query against the order_items table for that specific product_id, and then continue. This is a recipe for an N+1 query problem, a classic performance killer. For a few hundred products, you’ll never notice. For ten thousand, you’ll be waiting for a while. In this specific case, you’d almost always be better off with a GROUP BY and a LEFT JOIN. Use scalar subqueries judiciously, not as a default tool for every problem.

The takeaway? Scalar subqueries are incredibly powerful and expressive. They let you write very clear, logical SQL. But you must respect the rules: one row, one column. And you must be acutely aware that you’re potentially asking the database to do a lot of extra work, hidden inside what looks like a simple expression.