44.1 Linux Namespaces: The Isolation Primitive (pid, net, mnt, uts, ipc, user)

Right, let’s talk about namespaces. If cgroups are the resource accountants, namespaces are the office architects who build the walls, install the soundproofing, and give everyone a separate phone line. They are the fundamental isolation primitive in Linux. Without them, a container is just a fancy, jailed process. With them, a process can be given the utterly unshakable illusion that it is the only process on a machine, with its own network, its own hostname, and its own file system. It’s a brilliant magic trick, and like all good magic, it relies on a healthy dose of misdirection.

The kernel’s job is to manage all the system resources: processes, network interfaces, mount points, etc. A namespace simply wraps a resource in a layer of abstraction. Instead of seeing the global list of processes, a process in a PID namespace sees only the processes that are in its namespace. The kernel is the bouncer, checking the namespace ID on your process’s metaphorical wristband before it lets you see or interact with anything. There are seven types of namespaces, and we’ll walk through the big six (we’ll skip cgroup for now as it’s a bit meta).

PID Namespaces: The Process Illusion

This is the big one. A PID namespace isolates process ID numbers. A process can be PID 1 inside its namespace—the all-important “init” process—while being, say, PID 4422 on the host system. When the “init” process in a container dies, the kernel terminates all other processes in that namespace, which is why you can docker stop a container and know everything inside it got the axe.

Here’s the fun part: this isolation is hierarchical. A parent namespace can see into its child namespaces, but not vice versa. This is how tools like ps on the host can show you all the containerized processes. They’re in a different namespace, but the host’s namespace is their grandparent.

Try this. We’ll use the unshare command to launch a shell in a new PID namespace. The --fork is crucial because it forks a new process (our shell) to be the init process of the new namespace.

sudo unshare --pid --fork --mount-proc /bin/bash
# Now inside the new namespace
ps aux

You’ll see a beautifully sparse process list, probably just bash and ps. You are PID 1. You feel powerful. But try to reboot the system. Go on, I dare you. Nothing happens. Because you’re not really PID 1; you’re isolated. This is the illusion. The most common pitfall here is forgetting that tools like ps rely on the /proc filesystem, which is why we use --mount-proc to mount a private, sanitized version of it for our new namespace.

Network Namespaces: Your Private Internet

A network namespace gives a process its own private network stack: its own interfaces, iptables rules, routing tables, and ports. When you create one, it has a loopback interface (lo) and nothing else. It’s a computer fresh off the assembly line with no Ethernet cable plugged in.

The magic of containers comes from connecting these isolated namespaces to the outside world. This is done using a virtual Ethernet (veth) pair—think a virtual network cable. One end is plugged into the container’s namespace (usually named eth0), and the other end is plugged into a bridge (like docker0) on the host. The host then acts as a router and firewall, using NAT masquerading so your container’s traffic can escape to the real world.

# Create a new network namespace named 'mynet'
sudo ip netns add mynet
# List your namespaces. See it? Good.
sudo ip netns list
# Now, let's see what interfaces exist inside 'mynet'
sudo ip netns exec mynet ip link list

You’ll see only the lonely lo (down, of course). No wlan0, no eth0. It’s completely cut off. This is why, by default, a docker run container has no network access unless you explicitly give it some with flags like --network.

Mount Namespaces: A Unique Filesystem View

This was the first namespace implemented, and it’s a killer feature. A mount namespace gives a process its own private view of the mount tree. You can mount and unmount filesystems to your heart’s content inside a container, and the host is none the wiser. This is how a container can have a completely different root filesystem (/) using chroot-like mechanics (though it’s more sophisticated than chroot).

When you create a new mount namespace, it receives a copy of the mount points from its parent namespace. Subsequent changes are independent. This is why you can mount /dev/sdb1 /mnt inside a container and it doesn’t suddenly appear on the host’s /mnt.

UTS, IPC, and User Namespaces: The Supporting Cast

UTS Namespace: Isolates two system identifiers: the hostname and the NIS domain name. This is why you can hostname my-awesome-container inside a container without renaming your actual laptop. It’s simple but absolutely necessary.
IPC Namespace: Isolates System V inter-process communication objects, like message queues, shared memory segments, and semaphore arrays. This prevents a container from messing with another container’s shared memory. It’s not something you think about daily until a weird legacy app breaks without it.
User Namespace: This is the wild west and the future of secure containers. It allows a user (and group) ID inside the namespace to be mapped to a different ID outside the namespace. The big deal? You can run a process as root inside the container while it maps to an unprivileged user ID (like 1001) on the host. This massively reduces the blast radius of a container breakout. It’s incredibly powerful but has historically been a source of subtle security bugs and quirks. Docker enables it by default now, and you should leave it that way.

The real power, of course, is combining all of these namespaces together. A docker run command is essentially a script that calls unshare() or clone() with a whole bunch of flags (CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS...) to create this isolated environment before exec’ing your process. It’s not a virtual machine; it’s just a process with a really, really convincing costume.