6.5 String Functions: length, substring, position, trim, upper, lower
Right, let’s talk about making your text do what you want. You’ve got strings in your database, and staring at them isn’t going to change them. You need tools. The functions we’re about to cover are your basic wrench and screwdriver set. They’re simple, they’re essential, and if you don’t know how to use them, you’re going to end up trying to hammer in a screw. I’ve seen it. It’s not pretty.
First, a crucial piece of advice that will save you countless hours of frustration: remember that SQL string functions are almost universally 1-indexed, not 0-indexed. The first character is at position 1. If you start counting at zero, you’re going to get the wrong substring, your position will be off-by-one, and you’ll curse the database gods. They don’t care. They designed it this way to be more intuitive for non-programmers. We just have to deal with it.
The Core Toolkit
Let’s run through your essential string functions. We’ll use a common example: a users table with a poorly designed full_name column that’s a varchar(255) and has some messy data.
-- A sample of our messy data
SELECT full_name FROM users;
-- full_name
-- ---------------------
-- ALICE SMITH
-- bob jones
-- "Dr. Eva Pérez"
LENGTH (or CHAR_LENGTH)
LENGTH does exactly what you think: it returns the number of characters in a string. Its sibling CHAR_LENGTH is a synonym in PostgreSQL (which is what we’re mostly using here). The key thing to know is that this counts characters, not bytes. For accented characters or multi-byte Unicode, LENGTH and OCTET_LENGTH (which counts bytes) will give you different numbers.
SELECT
full_name,
LENGTH(full_name) AS char_count,
LENGTH('Pérez') AS name_char_count, -- Returns 5
OCTET_LENGTH('Pérez') AS name_byte_count -- Returns 6 in UTF-8 (because of 'é')
FROM users;
SUBSTRING
This is your go-to for extracting a piece of a string. The syntax is SUBSTRING(string FROM start [FOR count]). Remember, start is 1-indexed! If you omit the FOR count part, it grabs everything from the start position to the end.
-- Get the first 5 characters of each name
SELECT full_name, SUBSTRING(full_name FROM 1 FOR 5) AS first_five
FROM users;
-- Get everything from the 5th character onward
SELECT full_name, SUBSTRING(full_name FROM 5) AS from_fifth
FROM users;
POSITION (or STRPOS)
Need to find where a substring appears inside another string? POSITION is your friend. It returns the integer index of the first occurrence. Crucially, it returns 0 if it finds nothing, not NULL. This is a lifesaver because it means you can use it in WHERE clauses without immediately worrying about null propagation.
-- Find the position of the space in the name
SELECT full_name, POSITION(' ' IN full_name) AS space_pos
FROM users;
-- Use it to find people with no space in their name (i.e., mononyms)
SELECT full_name FROM users WHERE POSITION(' ' IN full_name) = 0;
TRIM
Data is messy. People lead and follow values with spaces. TRIM lops off those specified characters from the beginning and end of a string. The default is to trim spaces, but you can get fancy.
-- The classic: clean up whitespace
SELECT
full_name,
TRIM(full_name) AS trimmed_name
FROM users;
-- Feeling specific? Trim leading zeros from a stringified number.
SELECT TRIM(LEADING '0' FROM '000123') AS leading_zeros_gone; -- Returns '123'
UPPER and LOWER
These are the straightforward ones. They change the case of a string. UPPER is fantastic for case-insensitive searching or comparisons without worrying about the collation settings.
-- Normalize for a case-insensitive search
SELECT *
FROM users
WHERE UPPER(full_name) = UPPER('alice smith');
-- Or just clean up your output
SELECT full_name, LOWER(full_name) as lower_name
FROM users;
Putting It All Together: A Practical Example
Let’s say you want to extract the first name from that messy full_name field. You can combine these functions to build a robust solution.
-- A naive first attempt: get everything before the first space.
SELECT
full_name,
SUBSTRING(
full_name
FROM 1
FOR POSITION(' ' IN full_name) - 1
) AS presumed_first_name
FROM users;
But wait! What about our user “Dr. Eva Pérez”? The first space is after the “Dr.”, so this query would extract “Dr.” as the first name. This is the reality of string parsing. There is no one-size-fits-all solution because human names are a fractal nightmare of edge cases. The function toolkit gives you the power to handle most cases, but you must always be aware of the absurdity of the data you’re working with. Always validate your assumptions with a SELECT before you run an UPDATE. Your future self will thank you.