21.1 Security Context: runAsUser, runAsNonRoot, readOnlyRootFilesystem
Right, let’s talk about the three amigos of Pod security that actually do something useful at runtime: runAsUser, runAsNonRoot, and readOnlyRootFilesystem. This isn’t abstract policy stuff; this is the kernel-level nitty-gritty that stops an attacker who’s already inside your container from doing something truly stupid. Think of this as the final, desperate line of defense after someone has already kicked the front door in.
First, a dose of reality. These settings live in your Pod’s securityContext or container’s securityContext. The Pod-level one sets the default for all containers, but the container-level one overrides it. My advice? Be explicit at the container level. You’ll thank me later when you’re not debugging why one weird container is behaving differently from the other seven in the Pod.
The User ID: runAsUser and runAsNonRoot
Forget usernames for a second. Inside the Linux kernel, processes are owned by a user ID (UID), a number. runAsUser lets you set that number explicitly. The goal? To make sure your container isn’t running as root (UID 0), the superuser with keys to the entire kingdom.
But here’s the first absurdity: by default, if the Dockerfile doesn’t specify a USER, your container runs as root. It’s the equivalent of every new car starting with the key already in the ignition and the doors unlocked. Brilliant.
So, you use runAsNonRoot: true. This is the smart, lazy person’s choice. It tells Kubernetes: “Hey, don’t run this as root, figure it out.” Kubernetes will check the image. If the image specifies a non-root user, it uses that. If it doesn’t, the Pod will fail to start with a glorious “container has runAsNonRoot and image has non-numeric user” error. This is good. It’s forcing you to deal with the problem.
Sometimes, you need more control. You need a specific UID, often for permissions on a mounted volume. That’s where runAsUser comes in.
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
containers:
- name: sec-ctx-demo
image: nginx:1.25
securityContext:
runAsUser: 1000 # Explicitly run as UID 1000
runAsNonRoot: true # This is now redundant but harmless
Pitfall Alert: Just setting runAsUser: 1000 does not magically make that user exist in the container’s /etc/passwd. This is a huge “gotcha”. Your process runs with UID 1000, but tools like ls -l might show “I have no idea who this is” (a.k.a. I have no name!). This is mostly fine for the kernel, but if your app checks /etc/passwd for a username, it will fail. You need to build that user into your image.
Making the Root Filesystem Read-Only
This is my personal favorite. readOnlyRootFilesystem: true does exactly what it says on the tin. It mounts the container’s root filesystem (/) as read-only. Why is this arguably the most important setting here? Because it neuters a huge class of attacks. An attacker can’t download a binary to /tmp and execute it. They can’t modify your application code. They can’t write to /etc and mess with the system. It’s a beautiful thing.
“But my app needs to write logs!” I hear you cry. Of course it does. This is the part where the designers were at least somewhat sensible.
securityContext:
readOnlyRootFilesystem: true
This setting doesn’t make every filesystem read-only. It only makes the root filesystem (the container image itself) read-only. You can, and absolutely should, mount emptyDir volumes or persistent volumes on directories where the app needs to write—like /logs, /tmp, or whatever your funky application requires.
apiVersion: v1
kind: Pod
metadata:
name: read-only-demo
spec:
containers:
- name: app
image: my-app:latest
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
volumeMounts:
- name: log-volume
mountPath: /logs
- name: tmp-volume
mountPath: /tmp
volumes:
- name: log-volume
emptyDir: {}
- name: tmp-volume
emptyDir: {}
The Rough Edge: Some applications are idiots and try to write to locations you wouldn’t expect. You’ll know you’ve hit this when your container crashes with a “Read-only file system” error. The fix isn’t to disable this setting; it’s to use strace or similar to figure out where the app is trying to write and mount a writable volume there. This forces better, more secure application behavior. It’s a feature, not a bug.
The combination of these three—running as a non-root user, making the root filesystem read-only, and mounting explicit writable volumes—is what moves your container from being a vulnerable workload to a hardened one. It doesn’t make it impenetrable, but it makes an attacker’s life infinitely harder, which is the entire point.