5.4 byte (alias for uint8) and rune (alias for int32)

Right, let’s talk about byte and rune. These two are the aliases in the room. They don’t introduce new behavior, but they give a massive hint about intent. Using them is like saying, “I’m not just storing a number; I’m storing a meaning.”

The Humble `byte` (a.k.a. `uint8`)

type byte = uint8 — that’s its entire definition. It’s just a friendly, semantic alias for an unsigned 8-bit integer. So why does it exist? Because we constantly deal with 8-bit data. Think about it: raw memory, network packets, and—most importantly—every single element of a slice that makes up a string. Using byte instead of uint8 is you telling everyone (and your future self), “This isn’t just any number from 0 to 255; this is a piece of data.”

Here’s the classic use case: slicing a string to get its raw bytes.

package main

import "fmt"

func main() {
    s := "Go is cool"
    byteSlice := []byte(s)
    fmt.Printf("Bytes: %v\n", byteSlice) // Bytes: [71 111 32 105 115 32 99 111 111 108]

    // Let's see what the first byte value is
    fmt.Printf("First byte: %d, which is the character '%c'\n", byteSlice[0], byteSlice[0])
    // First byte: 71, which is the character 'G'
}

The pitfall here, which we’ll get into more with strings, is that not all characters are a single byte. But for raw, binary data, byte is your go-to. When you’re reading from a file with ioutil.ReadFile, you get a []byte, not a []uint8, because you’re dealing with data, not a list of numbers.

The Slightly-More-Sophisticated `rune` (a.k.a. `int32`)

type rune = int32. This one is a bit more interesting. A rune is a Go term for what Unicode calls a code point. It’s a single value representing a Unicode character.

Now, why is it an int32? Because that’s what it takes to hold any possible Unicode code point, which range from 0 to 0x10FFFF. A uint8 (a byte) would be laughably insufficient. Using rune screams, “This is meant to represent a textual character!”

The most common place you’ll use it is when you range over a string. The for loop is smart enough to decode the UTF-8 for you.

package main

import "fmt"

func main() {
    s := "Hello, 世界" // "Hello, world" in Chinese

    // Doing it the 'byte' way (the wrong way for characters)
    fmt.Println("By byte:")
    for i := 0; i < len(s); i++ {
        fmt.Printf("%c ", s[i])
    }
    fmt.Println() 
    // Output: H e l l o ,   ä ¸ â 

    // Yikes. That's because the Chinese characters are multi-byte in UTF-8.
    // The loop is printing each byte individually, most of which are invalid on their own.

    // Doing it the 'rune' way (the correct way)
    fmt.Println("By rune:")
    for index, runeValue := range s {
        fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
    }
    // Output: 
    // U+0048 'H' starts at byte position 0
    // U+0065 'e' starts at byte position 1
    // ...
    // U+4E16 '世' starts at byte position 7
    // U+754C '界' starts at byte position 10
}

See the difference? The range loop iterates by rune, not by byte. It handles the UTF-8 decoding for you, giving you the correct int32 value for each character and the byte index where it starts. This is why you almost always use for range with strings.

Common Pitfalls and The Great Misunderstanding

The biggest “gotcha” is assuming len(string) gives you the number of characters. It does not. It gives you the number of bytes.

s := "世界"
fmt.Println("Bytes:", len(s)) // Bytes: 6 (each Chinese character is 3 bytes in UTF-8)
fmt.Println("Runes:", utf8.RuneCountInString(s)) // Runes: 2

If you need the character count, you use utf8.RuneCountInString(s), or more commonly, you convert the string to a []rune first. But be warned: that conversion allocates a new slice and is an O(n) operation because it must decode the entire string.

runeSlice := []rune(s)
fmt.Println("Runes in slice:", len(runeSlice)) // Runes in slice: 2

Another pitfall is directly indexing a string. s[0] returns a byte (type uint8), not a rune. This is only safe for strings you know are purely ASCII, like “ABC”. For any real-world text, it’s a recipe for corrupted output.

Best Practices: A Simple Rule of Thumb

For raw data: Use byte. Think files, network streams, and crypto hashes.
For textual characters: Use rune. Think iterating over a string with range, or processing user input character-by-character.
For general math: Use the explicit integer types (int, uint8, int32). If you’re counting apples or calculating a checksum, the intent is a number, not a symbol.

The designers nailed this one. The aliases add clarity without adding complexity. They’re a free comment baked right into your code’s type system. Use them.

The Humble byte (a.k.a. uint8)

The Slightly-More-Sophisticated rune (a.k.a. int32)

Common Pitfalls and The Great Misunderstanding

Best Practices: A Simple Rule of Thumb

The Humble `byte` (a.k.a. `uint8`)

The Slightly-More-Sophisticated `rune` (a.k.a. `int32`)