5.5 Strings: Immutable UTF-8 Byte Sequences

Right, let’s talk about strings. You’d think a simple sequence of characters would be the least dramatic part of a programming language, but no. Rust’s strings are a masterclass in forcing you to think correctly about text upfront, saving you from a world of pain later. They are also, and I say this with affection, a bit weird at first glance.

The first thing you need to get your head around is that a String in Rust is not an array of characters. It’s not. Stop thinking that. It’s a growable, mutable, owned, UTF-8 encoded byte vector. The &str (pronounced “string slice”) is its immutable, borrowed view into that UTF-8 data. This distinction—String for owning and modifying, &str for borrowing and viewing—is central to everything, and it’s brilliant once it clicks.

The UTF-8 Mandate

Why the obsession with UTF-8? Because the world has moved on from ASCII, and pretending otherwise is how you get software that breaks the moment someone uses an emoji or writes in Japanese. UTF-8 is the dominant encoding on the web and in most modern systems. Rust’s design forces you to handle text correctly by default, which is its way of being a good friend who stops you from embarrassing yourself internationally.

The catch? Because a single “character” (what Unicode calls a “grapheme cluster”) can be made up of multiple bytes, you cannot index a string by a simple numeric position. Try it, and the compiler will slap your hand.

let hello = "hello";
// This will NOT compile. Thank the compiler for saving you.
// let first_char = hello[0];

The error message is wonderfully direct: “String cannot be indexed by usize”. It’s telling you that byte-indexing might give you half of a multi-byte character, resulting in garbage. This is a feature, not a bug.

So How Do I Get at the Characters?

You have to be explicit about how you want to view the string. You can iterate over bytes or over Unicode scalar values (which are roughly, but not exactly, characters).

let salutation = "Hello, 世界!";

// Iterate over bytes - you get each individual u8.
for byte in salutation.bytes() {
    print!("{} ", byte);
}
// Prints: 72 101 108 108 111 44 32 228 184 150 231 149 140 33

println!(); // Just a newline

// Iterate over chars - you get Rust's char type (Unicode scalar values).
for c in salutation.chars() {
    print!("{} ", c);
}
// Prints: H e l l o ,   世 界 !

Notice how the Chinese characters each became multiple bytes but a single char. This is the right way to do it.

The Slicing Danger Zone

You can get a slice of a string using byte indices with &my_string[0..5], but here be dragons. You must ensure your indices land on valid UTF-8 character boundaries. If you slice in the middle of a multi-byte sequence, your program will panic! at runtime. It’s the one time Rust chooses a dramatic crash over silent corruption, and honestly, I respect the choice.

let world = "世界"; // Each character is 3 bytes in UTF-8.

// This is safe because we're slicing at a 3-byte boundary.
let safe_slice = &world[0..3];
assert_eq!(safe_slice, "世");

// This will panic at runtime. Don't do this.
// let panic_slice = &world[0..2]; // ❌

Always use is_char_boundary to check if an index is safe if you’re calculating it dynamically. Or better yet, structure your code to use the iterator methods (chars(), char_indices()) which handle this for you.

Concatenation and the `&str` Coercion

This is where the magic happens. You’ll constantly be writing functions that take &str because it’s more flexible. A &String will automatically coerce (or “deref coerc”) to a &str, so you can pass a &String to a function expecting &str.

fn print_text(text: &str) {
    println!("{}", text);
}

let s = String::from("I'm a String");
let slice = "I'm a str";

// Both work perfectly.
print_text(&s); // Notice the `&` to create a reference, which coerces to &str
print_text(slice);

To concatenate strings, you usually use the + operator or the format! macro. The + operator is a little quirky but makes sense when you see what’s happening under the hood: it takes ownership of the first String and appends a &str to it.

let s1 = String::from("Hello, ");
let s2 = String::from("world!");
// Note: s1 is moved and can no longer be used here.
// s2 is still valid because it was borrowed as &str.
let s3 = s1 + &s2;

// The format! macro is often clearer and doesn't take ownership.
let s4 = format!("{}{}", s3, " How are you?");

The best practice? For simple stuff, + is fine. For anything more complex, format! is your best friend. It’s more readable and doesn’t play games with ownership.

The String Literal Lifesaver

Remember, a string literal like "hello" isn’t a String; it’s a &'static str—a string slice with a 'static lifetime that’s baked directly into your program’s binary. It’s immutable and exists for the entire program’s duration. This is why they’re so painless to use. When you need to own or modify the string, you have to explicitly convert it using to_string() or String::from().

// A borrowed view into static data.
let static_view: &'static str = "I live forever";

// An owned, mutable string on the heap.
let mut owned_string = static_view.to_string();
owned_string.push_str(" (or at least until I'm dropped)");

Embrace the distinction. Use &str in your function parameters for maximum flexibility and String when you genuinely need to own and potentially mutate the string data. It’s this kind of explicit design that makes Rust code robust and, eventually, a joy to write.

The UTF-8 Mandate

So How Do I Get at the Characters?

The Slicing Danger Zone

Concatenation and the &str Coercion

The String Literal Lifesaver

Concatenation and the `&str` Coercion