44.3 How Docker and Podman Use Namespaces and cgroups

Alright, let’s pull back the curtain. When you type docker run or podman run, you’re not just asking for a container. You’re asking these tools to be your personal stage manager for a one-act play starring your application. Their job is to use Linux’s core features—namespaces and cgroups—to build the set, cast the actors, and enforce the rules of the performance. The magic isn’t in the tools themselves; it’s in how they wield these underlying kernel facilities. They’re just particularly good stage managers.

The Stage Crew: cgroups (Control Groups)

Think of cgroups as the stage manager with a clipboard and a walkie-talkie, responsible for resource allocation and enforcement. They answer the boring but critical questions: “How much CPU can this process use?” “What’s its max memory?” “How much I/O bandwidth does it get?” Without cgroups, a container could be a brilliantly isolated little snowflake that still greedily consumes every CPU cycle on the host, becoming a total nightmare for its neighbors.

Docker and Podman automatically create a cgroup for your container to live in. Let’s say you run:

docker run -it --cpu-quota=50000 --memory=100m alpine /bin/sh

Here’s what’s happening backstage. Docker’s runtime (containerd or runc) is talking to the kernel, creating a new cgroup under the cpu and memory subsystems. The --cpu-quota=50000 flag essentially says, “This container can use 50% of a single CPU core over time.” The --memory=100m flag is the hard limit: “You get 100 megabytes of RAM. The instant you try to use more, you get the OOM (Out-Of-Memory) killer’s bill.”

You can see this for yourself. Find a running container’s ID and poke around in /sys/fs/cgroup/, which is the cgroup filesystem. It’s a bit like the manager’s clipboard; you can read the rules.

# Find your container ID
docker ps

# Look at its cgroup memory limits (path might vary slightly by system)
cat /sys/fs/cgroup/memory/docker/<long-container-id>/memory.limit_in_bytes

You should see 104857600—which is 100 MiB in bytes. Neat, huh? The key takeaway: cgroups are about accounting and limiting resources. They make sure your brilliant microservice doesn’t accidentally become a denial-of-service attack on your own host.

The Set Designers: Namespaces

If cgroups are the stage manager, namespaces are the set designers. Their job is isolation and illusion. They make your process feel like it’s the star of the show, alone on its own stage, with its own props, instead of one actor in a massive, chaotic play.

When you run a container, Docker/Podman typically request six(!) different namespaces from the kernel. This is where the real “container feel” comes from. Let’s break down the important ones:

PID Namespace: This gives the container its own isolated process tree. Process ID 1 inside the container is not PID 1 on the host (which is init or systemd). It’s your designated entrypoint process. This is why ps aux inside a container only shows the processes in that container. It’s not a special container-aware version of ps; it’s just a regular ps looking at its own, limited reality.
Network Namespace: This is a big one. It gives the container its own private network stack: its own loopback adapter, its own network interfaces, its own IP address, its own routing tables, and its own iptables rules. When you use -p 8080:80, you’re asking Docker to set up a port forward between the host’s network namespace and the container’s. The container still blissfully thinks it’s listening on 0.0.0.0:80.
Mount Namespace: This gives the container its own view of the filesystem. This is how the container can have a / root directory that looks completely different from the host’s. The tool achieves this by performing a chroot() into the container’s image layers and then managing the mount points within this new context. When you use a bind mount (-v /host/path:/container/path), you’re asking the tool to take a file or directory from the host’s mount namespace and make it visible in the container’s.
UTS Namespace: This isolates the hostname and domain name. The command docker run --hostname my-cool-container isn’t just setting an environment variable; it’s leveraging the UTS namespace to give the container its own uname() syscall response. The host sees its own name, the container sees my-cool-container.

Podman’s Party Trick: User Namespaces

Here’s a prime example of Podman’s design philosophy differing from Docker’s. Both can use the User Namespace, but Podman often uses it by default in rootless mode, which is its killer feature.

A user namespace remaps user and group IDs. The most important practical effect: a process running as root (UID 0) inside the container can be mapped to a completely unprivileged user (e.g., UID 1000) outside the container on the host.

Let’s say you run a rootless container with Podman:

podman run -it alpine cat /proc/self/status | grep 'Uid'

You’ll see something like Uid: 0 0 0 0. It thinks it’s root. Now, on the host, find that process:

ps aux | grep 'cat /proc/self/status'

You’ll see it’s owned by your regular user, not the root user. This is pure security gold. Even if an attacker breaks out of the container and manages to exploit a kernel vulnerability, they escape as your non-root user, not as the host’s root. It dramatically reduces the blast radius of a breakout. Docker can do this with the --userns-remap flag, but Podman makes it the default behavior for rootless containers, which is frankly how it should be.

The Key Difference: Daemons and Daemonlessness

So, if they both use the same kernel features, what’s the real difference? The architecture.

Docker follows a client-server model. The docker CLI talks to a long-running dockerd daemon, which has to run as root. This daemon is the one that actually calls containerd, which then calls runc to create the container. This is a chain of trust, and if the daemon is compromised, you have a big problem.

Podman is daemonless. The podman CLI executable itself (often running as your user) makes the necessary kernel calls directly via runc. There’s no central, always-running, privileged process. This is a fundamentally more secure and simpler architecture. It’s why you can easily run containers as a non-root user with Podman without a complex setup. It’s just… cleaner.

The takeaway? Docker and Podman are both excellent stage managers. They both expertly use cgroups and namespaces to create the container illusion. Podman just has a stronger focus on security by default with its rootless approach, and a simpler, more Unix-like architecture. You can’t go wrong with either, but understanding how they pull off the trick makes you a better developer and operator. You’re not just running commands; you’re orchestrating kernel features.