6.6 Pattern Matching: LIKE, ILIKE, and SIMILAR TO
Alright, let’s talk about finding stuff. You’ve got a database full of text, and you need to pull out the rows where a column kind of matches what you’re looking for. Maybe you need everyone whose name starts with ‘Mc’, or all error codes that look like ‘404-%-something’. This is where we leave the clean, well-lit house of equality operators (=) and venture into the wild, sometimes messy, woods of pattern matching.
PostgreSQL gives you three main tools for this job: the straightforward LIKE, its case-insensitive cousin ILIKE, and the old, powerful, but frankly kind of weird SIMILAR TO. We’ll cover them in order of how much you’ll actually want to use them.
The Workhorses: LIKE and ILIKE
These two are your daily drivers. Their logic is simple: you provide a pattern string where the percent sign % matches any sequence of characters (including zero characters), and the underscore _ matches any single character.
-- Find all products with a name that begins with 'Super'
SELECT * FROM products WHERE name LIKE 'Super%';
-- Find all users with a four-letter first name starting with 'J' and ending with 'e'
SELECT * FROM users WHERE first_name LIKE 'J__e';
Why does LIKE exist when we have regular expressions? Because it’s brutally simple and, for many common indexable patterns, it’s fast. If your pattern starts with a literal string (like 'Super%'), Postgres can often use a standard B-tree index to speed things up, which is a huge win. A pattern like '%something' that starts with a wildcard, however, will always force a full table scan. That’s just the way it is.
Now, let’s address the elephant in the room: case sensitivity. LIKE follows the rules of your database’s collation. In most setups, 'A' does not equal 'a'.
-- This might return zero rows, because 'postgres' != 'Postgres'
SELECT * FROM events WHERE description LIKE 'Postgres%';
Enter ILIKE. It’s a PostgreSQL extension (so not standard SQL) that does everything LIKE does, but ignores case. It’s a lifesaver.
-- This will find 'Postgres', 'postgres', 'POSTGRES', you name it.
SELECT * FROM events WHERE description ILIKE 'postgres%';
The catch? ILIKE can sometimes be less friendly to indexes, depending on your locale settings. If you’re doing a ton of case-insensitive matching on a huge table, look into the citext extension or expression indexes.
The Regex Poser: SIMILAR TO
Buckle up. SIMILAR TO is… a choice. It was added to PostgreSQL to bridge the gap between SQL’s LIKE and proper regular expressions. The result is a Frankenstein’s monster that uses the LIKE syntax but incorporates regex-like elements using a bizarre syntax with | for OR and {1,5} for repetition.
-- Find phone numbers in the format (555) 123-4567
SELECT phone FROM customers
WHERE phone SIMILAR TO '\([0-9]{3}\) [0-9]{3}-[0-9]{4}';
Why would you use this? Honestly, in almost no practical scenario. It’s more powerful than LIKE but far less powerful and readable than true regular expressions. The pattern above is much clearer when written with the ~ operator:
SELECT phone FROM customers
WHERE phone ~ '\([0-9]{3}\) [0-9]{3}-[0-9]{4}';
The only arguable advantage SIMILAR TO has is that it must match the entire string, just like LIKE, whereas the ~ regex operator only needs to find a match inside the string (unless you use anchors ^ and $). This is a tiny benefit that is massively outweighed by its clunky, non-standard syntax. My advice: feel free to ignore SIMILAR TO. It’s a historical artifact.
Escaping: The Crucial Detail You’ll Forget
Sooner or later, you’ll need to search for a literal percent sign or underscore. This is where everyone gets bitten. If you write LIKE '100%', that % is a wildcard, so it’ll match ‘100’, ‘1000’, ‘100 bottles of beer on the wall’… everything starting with ‘100’.
You must escape these characters using the ESCAPE clause. The default escape character is a backslash, but you can define your own. This is non-negotiable.
-- This will fail to find '100%' because it's looking for '100' plus any sequence.
SELECT * FROM metrics WHERE completion_rate LIKE '100%';
-- This is correct. It searches for the literal string '100%'.
SELECT * FROM metrics WHERE completion_rate LIKE '100\%' ESCAPE '\';
-- You can use any character as an escape. Here, we're using the letter 'e'.
SELECT description FROM log_entries WHERE description LIKE 'Error 404%e found' ESCAPE 'e';
Always, always think about your escape strategy when building LIKE patterns with user input. It’s a classic source of bugs and, in extremely rare cases, can even open up edge-case security holes if you’re building patterns directly from untrusted input.