Text | mikePietsch.com

6.6 Pattern Matching: LIKE, ILIKE, and SIMILAR TO

Alright, let’s talk about finding stuff. You’ve got a database full of text, and you need to pull out the rows where a column kind of matches what you’re looking for. Maybe you need everyone whose name starts with ‘Mc’, or all error codes that look like ‘404-%-something’. This is where we leave the clean, well-lit house of equality operators (=) and venture into the wild, sometimes messy, woods of pattern matching.

6.5 String Functions: length, substring, position, trim, upper, lower

Right, let’s talk about making your text do what you want. You’ve got strings in your database, and staring at them isn’t going to change them. You need tools. The functions we’re about to cover are your basic wrench and screwdriver set. They’re simple, they’re essential, and if you don’t know how to use them, you’re going to end up trying to hammer in a screw. I’ve seen it. It’s not pretty.

6.4 Database Encoding: UTF-8, SQL_ASCII, and Collations

Right, let’s talk about the alphabet soup of database text storage. You’ve probably already created a varchar column or two, but the real decisions—the ones that will either save your bacon or cause a multi-byte, accented-character-filled nightmare at 2 AM—happen at the encoding level. This isn’t academic; it’s the bedrock of storing anything beyond “Hello World.” Think of encoding as the dictionary your database uses to translate bits into letters. If you tell it to use an English dictionary (like LATIN1) and then try to store a Japanese character (こんにちは), it’s going to have a complete meltdown. The only sane choice for a modern application is UTF-8. It’s the one dictionary to rule them all, capable of storing pretty much every character from every human writing system, plus emojis. Yes, your database can store that pile of poop emoji (💩). You’re welcome.

6.3 char(n): Fixed-Length Strings and Padding Behavior

Now we arrive at the char(n) type, the most misunderstood and, frankly, most often misused character type in the SQL universe. Its defining characteristic is that it is fixed-length. When you define a column as char(10), every single value in that column is exactly 10 characters long. Always. No exceptions. The database engine enforces this with a zeal usually reserved for bouncers at an exclusive club. If you insert a string shorter than the defined length, say ‘cat’ into a char(10) column, PostgreSQL doesn’t just store those three characters. Oh no. It pads the string with spaces until it reaches the specified length. It becomes ‘cat ’ (with seven trailing spaces). When you retrieve that value, by default, the trailing spaces are stripped off, making it seem like the padding never happened. This is the source of much confusion and many a quiet bug.

6.2 varchar(n): Length-Limited Strings and When to Use Them

Alright, let’s talk about varchar(n). This is the data type you’ll use 99% of the time when you’re storing strings that aren’t the size of a small novel. Think of it as a string with a maximum length you get to define—the (n) part. It’s the “I know this field should be roughly this long, but I’m not a monster, I’ll let you use less space if you want” option. It’s the workhorse, and you need to understand its quirks, because they will bite you if you’re not careful.

6.1 text: The Unlimited Variable-Length Type

Right, let’s talk about text. This is the one you’ll use 95% of the time you need to store anything longer than a simple word or two. It’s PostgreSQL’s workhorse for storing strings of practically any length you can throw at it. Forget about pre-defining a maximum length; that’s the whole point. It’s a variable-length type, meaning it only takes up the space it needs (plus a tiny bit of overhead), which is exactly what you want most of the time.