10.6 strings: Extracting Printable Text from Binary Files
Right, so you’ve got a file. It’s gibberish. Your text editor is having a full-blown existential crisis trying to display it. But you know there’s something human-readable in there—maybe a hardcoded password, a version string, or the secret URL it’s phoning home to. This is where strings becomes your best friend. It’s the digital equivalent of panning for gold in a river of mud. It doesn’t care about file formats, structure, or encoding (well, mostly, we’ll get to that). It just sifts through the bytes and yells out anything that looks like text.
The premise is gloriously simple: strings scans a file and prints any sequence of at least four (by default) printable characters followed by an unprintable one (like a null byte or a control character). It’s the tool you use when cat makes your terminal go nuts and less gives up on life.
The Default Behavior and Why Four is the Magic Number
Let’s start with the basics. You just throw a filename at it.
strings suspicious_binary
It will spit out a waterfall of text. Now, why the default minimum of four characters? It’s a beautifully pragmatic hack to filter out noise. In a binary file, two or three printable characters in a row happen all the time by pure chance. A random jF! might be part of an opcode or a memory address. But a sequence of four or more? The probability of that being accidental plummets, making it far more likely you’re looking at actual intentional text. It’s the difference between seeing a single lit pixel and seeing a word spelled out in lights.
Controlling the Noise: The -n Flag
Sometimes four is too many. If you’re looking for, say, a two-letter country code or a three-character file extension buried in the muck, you’ll need to lower the bar. That’s what the -n flag is for.
strings -n 2 suspicious_binary
Be warned: this will unleash a torrent of nonsense. You’re telling it to be less strict, so it will be. It’s a trade-off. Use this when you have a specific, short string in mind and are prepared to grep through the output.
Conversely, you can raise the limit to -n 10 to only get very long strings, which is great for cutting through the noise if you’re only interested in substantial blocks of text.
Not All Text is ASCII: The -e Flag (or Lack Thereof)
Here’s the first rough edge. The classic UNIX strings command primarily deals with 7-bit ASCII text. The modern GNU version (which is what you probably have on Linux) is smarter and can handle other encodings, but the flags are… a mess.
The -e flag lets you specify encoding. The problem? The options are archaic and confusing. -e l for 16-bit little-endian (like UTF-16LE), -e b for 16-bit big-endian, -e S for 32-bit. It feels like a vestigial organ from a bygone era.
strings -e l utf16_encoded_file
The good news? The GNU version often auto-detects UTF-8 and wide character strings without any flags, which is what you’d use 99% of the time. The -e flags are really only for the weird, legacy 16-bit stuff. If you’re dealing with a modern binary that embeds UTF-8, just run strings normally; it’ll probably figure it out. The lesson here: test it. If you’re not finding what you know should be there, try the -e flags as a Hail Mary.
Targeting Specific Sections with -t
This is where strings goes from “neat” to “indispensable for reverse engineering”. When you find a juicy string, the immediate next question is: “Where in the file is this?” The -t flag answers that by prefixing each string with its offset within the file.
You can specify the format: -t x for hexadecimal (my personal favorite, it’s what debuggers use), -t d for decimal, or -t o for octal.
strings -t x suspicious_binary
This might output:
1020 .bashrc
1040 /bin/sh
1a80 CFBundleVersion
Now you know that the string “/bin/sh” starts at offset 0x1040 bytes into the file. You can then use a hex editor like xxd or hexdump to jump directly to that location and see what’s around it. Is it near a function that calls system()? This is how you connect the dots.
Piping and Filters: Playing Well with Others
strings is a filter in the classic UNIX sense. You can pipe data into it, which is fantastically useful for looking at things that aren’t just files on disk.
# Check a process's memory maps for strings
cat /proc/1234/mem | strings
# See if there's anything readable in a network packet capture
tcpdump -i eth0 -w - | strings
# Analyze a firmware image dumped from a device
dd if=/dev/sdb bs=1M skip=1024 | strings -t x
This makes it a universal first-step tool for any blob of data.
The Biggest Pitfall: It’s a Blunt Instrument
Let’s be brutally honest: strings is dumb. It has no concept of context. It will happily extract text from the code section (.text), the read-only data section (.rodata), the debug sections, and, most dangerously, from the random garbage in the uninitialized data section (.bss). Just because you find the string “admin123” in a binary doesn’t mean it’s a password; it could be a leftover fragment from a variable name, a log message, or pure chance. You must always corroborate what you find. Use strings to generate leads, not conclusions. Pair it with grep to narrow things down and a disassembler to understand the context. It’s the starting pistol, not the finish line.