23.3 bufio.Scanner: Line-by-Line Reading
Right, let’s talk about bufio.Scanner. This is where we graduate from the blunt-force trauma of raw Read calls to something that feels like it was designed for actual human programmers. If you’ve ever tried to read a file line by line using ioutil.ReadFile (RIP) or os.ReadFile and then split the bytes on \n, you were doing the compiler’s job. Scanner exists so you don’t have to.
Think of a Scanner as a sensible, efficient iterator for your data stream. Its primary job is to take a Reader (like a file) and break it down into manageable tokens, the most common one being lines of text. It handles the buffering, the edge cases, and the memory management for you. It’s your brilliant intern that actually does the work correctly.
The Absolute Basics: Making and Using a Scanner
You create a Scanner by wrapping any io.Reader—an os.File is the classic example. The basic usage is so straightforward it almost feels like cheating.
package main
import (
"bufio"
"fmt"
"log"
"os"
)
func main() {
file, err := os.Open("data.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close() // You *are* doing this, right? Good.
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
fmt.Println("I got:", line)
}
if err := scanner.Err(); err != nil {
log.Fatal("Oops, something went wrong during scan:", err)
}
}
Here’s the magic: the for scanner.Scan() loop. Scan() advances the scanner to the next token (in this default case, the next line). It returns true if it found one and false when it hits an error or EOF. Inside the loop, you grab the text of that token with scanner.Text(). After the loop, you must check scanner.Err() to see if it terminated due to an error or just a clean end-of-file. Forgetting this is a classic rookie mistake. Don’t be a rookie.
Why It’s More Than Just strings.Split()
You might think, “Why not just read the whole file and strings.Split(string(byteSlice), "\n")?” For a small file, fine, whatever. But for a 10GB log file? You’re about to have a very bad time as your program tries to allocate a 10GB slice and then another massive slice of strings. Scanner is streaming. It only reads chunks of the file into a buffer at a time (64 KiB by default, which you can change with bufio.NewScannerSize), finds the next newline in that chunk, and yields the line. Your memory footprint remains constant and manageable regardless of file size. This is a big deal.
It’s Not Just For Lines: Custom Splitting Functions
The designers didn’t stop at lines. The real power of Scanner is its pluggable SplitFunc. The default is ScanLines, but the package also provides ScanWords and ScanRunes. You can also write your own. This is how you’d scan a file of space-separated values, for instance.
scanner := bufio.NewScanner(file)
scanner.Split(bufio.ScanWords) // Now our token is a word, not a line.
for scanner.Scan() {
word := scanner.Text()
fmt.Printf("Word: %q\n", word)
}
Writing a custom SplitFunc is a more advanced topic, but it’s what makes Scanner so versatile for parsing unique or tricky data formats.
The One Big Gotcha: Long Lines and bufferio.ErrTooLong
Here’s the part where the designers made a… let’s call it a “pragmatic” choice. That internal buffer I mentioned has a fixed size. If you’re reading a line and that line is longer than your buffer, the default ScanLines function can’t handle it. It goes into a panic-induced fit and returns a special error: bufio.ErrTooLong.
This is the scanner’s way of saying, “Look, I gave you a perfectly good buffer and you’re trying to shove a 128KB JSON line into it. Have some self-respect.” This is a runtime error, and your scan loop will break.
The solution? If you expect stupidly long lines (and in the real world, you always should), you must proactively increase the scanner’s buffer capacity.
file, err := os.Open("giant_json_lines.log")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Handle those massive, arguably poorly-formatted, lines
const maxCapacity = 1024 * 1024 // 1 MB, because why not?
buf := make([]byte, maxCapacity)
scanner := bufio.NewScanner(file)
scanner.Buffer(buf, maxCapacity) // Tell the scanner to use our bigger buffer.
for scanner.Scan() {
line := scanner.Text()
// Process your comically large line
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
It’s a slight inconvenience, but it’s a small price to pay for the robustness it provides. It forces you to think about the actual shape of your data, which is a good thing.
Text() vs. Bytes()
You’ll use Text() 99% of the time because it returns a string, which is what you almost always want. But if you’re a performance zealot and want to avoid the string allocation (remember: a string is immutable, converting bytes to string allocates a new underlying array), you can use scanner.Bytes(). The catch? The slice returned by Bytes() is a view into the scanner’s internal buffer. It is overwritten on the next call to Scan(). This is a fantastic way to introduce heisenbugs into your program.
You must use it immediately or copy the data if you need to keep it around.
for scanner.Scan() {
byteSlice := scanner.Bytes() // This is only valid until the next Scan()
// Do something with byteSlice RIGHT NOW.
// If you need to store it, you MUST copy it:
data := make([]byte, len(byteSlice))
copy(data, byteSlice)
// Now you can store 'data' safely.
}
Text() is safer and almost always the right call. Use Bytes() only when you’ve measured a performance problem and you know exactly what you’re doing.