9.7 Inodes: What They Are and Why They Matter for Links
Alright, let’s get into the weeds. You’ve probably heard the term “inode” thrown around and maybe nodded along without really getting it. That’s fine. Everyone does at first. But if you want to truly understand how your system handles files—especially when we talk about links—you need to wrap your head around this concept. It’s the secret sauce, and it’s actually pretty elegant once the scales fall from your eyes.
Think of a file on your disk not as a single, monolithic thing, but as being split into two parts.
- The Data: The actual, honest-to-goodness contents of the file—the bits and bytes that make up your brilliant Python script or that embarrassing photo of your cat. This is stored in one or more blocks on the disk.
- The Metadata: This is everything else about the file. Its name, its size, who owns it, who can read it, when it was last modified, and—crucially—a pointer to where those data blocks live on the disk. This collection of metadata is stored in a data structure called an inode (index node).
Your filesystem maintains a big table of these inodes. It’s like the world’s most detailed library card catalog, but for bytes. The filename you lovingly type into commands like ls or cp? The system barely cares about it. It uses the filename to look up the inode number, and then it’s off to the races with the inode itself.
The Inode Itself: A Peek Under the Hood
So what’s actually in an inode? You can see most of this info with the stat command. Let’s run it on a file.
$ stat myfile.txt
File: myfile.txt
Size: 1234 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 26351221 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ user) Gid: ( 1000/ user)
Access: 2023-10-26 14:30:00.000000000 -0400
Modify: 2023-10-26 14:29:57.000000000 -0400
Change: 2023-10-26 14:29:57.000000000 -0400
Look at that beautiful data. The inode number (26351221) is the unique ID for this file’s metadata on this filesystem. The “Links” count is 1, which is our segue into the next big topic. You also see all the classic info: permissions, size, timestamps. The “Change” time (ctime) is particularly sneaky—it’s not when the file’s content changed, but when its metadata (like permissions or ownership) did. A common “gotcha” that trips up even seasoned pros.
Hard Links: The Inode’s Party Trick
This is where inodes stop being academic and start getting useful. A hard link is not a copy of a file. It’s not a shortcut. It is a second, completely equal directory entry (a filename) that points to the exact same inode.
Think of the inode as the actual, physical dog. A hard link is just another name tag you hang on its collar. You can call it “Fido” from the front door and “Mr. Fluffypants” from the back door, but it’s the same dog. Let’s create one.
# Create a file
$ echo "Important Data" > original.txt
# Create a hard link to it
$ ln original.txt hardlink.txt
# Now let's inspect their inodes
$ ls -li original.txt hardlink.txt
26351222 -rw-r--r-- 2 user user 15 Oct 26 14:35 hardlink.txt
26351222 -rw-r--r-- 2 user user 15 Oct 26 14:35 original.txt
Boom. Same inode number (26351222), same everything. Notice the link count is now 2. The system knows two filenames are pointing to this inode. This is why deleting a file is actually a misnomer. When you rm original.txt, you’re just removing one of the links to the inode. The data blocks aren’t freed until that link count drops to zero. This is why you can’t create a hard link to a directory (ln dir otherdir); the potential for creating cyclical nightmares in the filesystem tree is too high, so the designers wisely just said “Nope.”
The massive limitation? Hard links cannot span filesystems. Why? Because the inode number is only unique within its own filesystem. My inode 26351222 on /dev/sda1 means something completely different on /dev/sdb1. The kernel can’t track that, so it doesn’t let you try.
Soft Links: The Pragmatic Illusionists
This limitation is why soft links (or symbolic links, symlinks) exist. They are the filesystem equivalent of a post-it note that says “The file you’re looking for is actually over there.” A symlink is a very special file whose data is just a path to another file. It has its own inode and everything.
# Create a symbolic link
$ ln -s original.txt symlink.txt
# Inspect it with ls
$ ls -li original.txt symlink.txt
26351222 -rw-r--r-- 1 user user 15 Oct 26 14:35 original.txt
26351223 lrwxrwxrwx 1 user user 12 Oct 26 14:40 symlink.txt -> original.txt
See the difference? symlink.txt has its own inode (26351223), its own distinct metadata, and its file type is l for link. Its permissions are usually a wide-open 777 because the kernel checks the permissions of the target file, not the link. It’s just a pointer.
The big advantage is that it can point to anything, anywhere, even across filesystems or to a non-existent file (these are called “dangling links” and they’re a fantastic way to confuse your future self). The downside? If you delete the original original.txt, your symlink.txt becomes a broken promise, pointing to nothing and throwing a No such file or directory error when you try to use it.
Why This All Matters: Beyond the Theory
So, why should you care? Because this distinction bites people all the time.
- Backups:
tarwithout the-hoption will happily back up a symlink as a symlink. If you restore that archive, you restore the link, which might now be broken. It won’t follow the link and back up the actual data. Hard links, however, will be treated as separate files, potentially backing up the same data multiple times. You need to know which behavior you’re getting. - Deletion:
rmis final. There is no trash can for a file whose link count hits zero. The space is marked as free and will be overwritten eventually. With symlinks, you have to be careful you’re deleting the link and not the target.rm symlink.txt/(with a trailing slash!) will actually follow the link and delete the target file, which is almost never what you intended. It’s a weird quirk of how Bash handles autocompletion. - Finding Stuff: Need to find all files pointing to the same data? Use
find . -samefile original.txt. It will find bothoriginal.txtandhardlink.txtbecause they share an inode. It won’t findsymlink.txtbecause that’s a separate file.
The inode is the fundamental unit of the filesystem. Filenames are just friendly aliases we humans use to reference them. Once you internalize that model, everything from linking to file recovery starts to make a whole lot more sense.