22.7 unicode and unicode/utf8: Working with Runes
Right, let’s talk about text. You’ve probably been happily using string for everything, thinking “it’s just text.” Go’s string is a fantastic abstraction, but it’s built on a lie of omission. Under the hood, a string is a read-only slice of bytes ([]byte). Not characters. Bytes. And this is where the entire world of unicode and unicode/utf8 comes crashing into our pleasant little program. The problem is simple: the world uses more than 128 characters (the limit of ASCII). My last name has an “é”; that’s one character, but it’s represented by two bytes in UTF-8. If you try to process it by just indexing the string (s[0], s[1]…), you’re slicing through those bytes and getting utter garbage. The crucial concept here is the rune.