27.6 Log Retention, Rotation, and Storage Costs

Right, let’s talk about the part of logging everyone loves to ignore until they get a frantic 3 AM call from finance asking why the cloud bill has a line item the size of a used hatchback: storage. In Kubernetes, your logs don’t just magically disappear. If you’re not careful, they’ll pile up like junk mail in a digital hallway, cluttering your nodes and vacuuming your wallet. The default setup is, frankly, a trap for the unwary.

The primary culprit and your first line of defense is the kubelet, the node agent. It manages the containers running on its node, and this includes a thankless job called log rotation. It does this for each container, and the rules are simple, brutal, and global for the node.

The Default Rotation: A Ticking Time Bomb

By default, the kubelet will rotate a container’s log file once it reaches 10 megabytes. It will keep up to 5 rotated log files per container. After that, the oldest one gets deleted. That’s it. That’s the entire plan.

Let’s do the math, and not just because I’m a nerd. 5 files * 10 MB each = 50 MB of log storage per container. If you have a pod with 3 containers, that’s 150 MB on that node. If you have 50 pods on the node, you’re looking at 7.5 GB of potential log storage, which suddenly doesn’t seem so trivial.

The real kicker? These defaults are set by flags on the kubelet itself (--container-log-max-size and --container-log-max-files). This means they are node-wide settings. You can’t configure this per pod or per namespace with the standard setup. This is a classic Kubernetes “simplicity” move: it works reliably but is incredibly inflexible. Want a chatty debug container to rotate at 1 MB but a critical production app at 100 MB? Tough luck. You have to change it for the entire node, which often means managing it through your node image or provisioning tool (like kops, GKE’s node config, etc.).

You can see the current settings and the grim reality by shelling into a node and checking the kubelet’s running process:

ps aux | grep kubelet
# Look for flags like --container-log-max-size=10Mi and --container-log-max-files=5

And you can find the actual logs themselves, neatly rotated with a timestamp, on the node’s filesystem at /var/log/pods and /var/log/containers.

# On a node, see the log files for a pod
ls -la /var/log/pods/<namespace>_<pod-name>_<pod-uid>/

# Example output might look like:
# myapp-0f1a2b3c4d.log
# myapp-0f1a2b3c4d.log.20250315-150000.gz
# myapp-0f1a2b3c4d.log.20250315-160000.gz

When the Defaults Will Bite You

This system fails spectacularly in two common scenarios:

The “Noisy Neighbor” Container: One badly behaved, overly verbose container on a node can fill up the node’s disk with its 50 MB of logs, causing disk pressure evictions. The kubelet will see the disk is full and start evicting pods to save the node itself—and it might choose your important, quiet database pod instead of the chatty one, because Kubernetes eviction policies are a complex beast. It’s like getting evicted from your apartment because your roommate won’t stop buying giant, inflatable dinosaurs.
The “Need for History” Application: For compliance or deep debugging, 5 rotations (roughly 50 MB of history) is a cruel joke. A busy application can blow through that in minutes. You’ll go to investigate an incident that happened two hours ago only to find the logs have already been rotated away and deleted.

Taking Back Control: The DaemonSet Escape Hatch

The only sane way to handle logs in production is to treat the node’s local storage as a brief, temporary buffer. Your real strategy should be to ship logs to a central, external system as fast as possible. This is where DaemonSets like Fluent Bit, Filebeat, or Vector come in.

These tools run on every node, tail the log files from /var/log/pods, and ship them off to a destination like Elasticsearch, Loki, Splunk, or a cloud storage bucket like S3/GCS. This is the real solution for retention and cost control. Your retention policy is now defined by your backend system (e.g., “delete indices in Elasticsearch after 30 days”), not by the whims of a node’s disk space.

Here’s a taste of a Fluent Bit DaemonSet configuration that adds a layer of intelligence the kubelet lacks. This example parses the log format, adds Kubernetes metadata, and ships to Elasticsearch.

# fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Parsers_File  parsers.conf

    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        Parser            docker
        Tag               kube.*
        Refresh_Interval  5
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On

    [OUTPUT]
        Name            es
        Match           *
        Host            elasticsearch-logging.default.svc.cluster.local
        Port            9200
        Logstash_Format On
        Replace_Dots    On
        Retry_Limit     False
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:2.2.0
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: fluent-bit-config
          mountPath /fluent-bit/etc/
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config

The Final Word on Cost

The local rotation is a safety net, not a strategy. Your storage cost is now the cost of your chosen central logging backend. This is where you make your real financial decisions. Object storage (S3/GCS) is cheap for long-term retention but slower to query. Managed services like Elasticsearch are more expensive but provide blazing fast search and analytics. Open-source options like Loki are built specifically to be cost-effective for logs. The choice is yours, but now you’re making it consciously, not by accident.