27.6 Log Retention, Rotation, and Storage Costs
Right, let’s talk about the part of logging everyone loves to ignore until they get a frantic 3 AM call from finance asking why the cloud bill has a line item the size of a used hatchback: storage. In Kubernetes, your logs don’t just magically disappear. If you’re not careful, they’ll pile up like junk mail in a digital hallway, cluttering your nodes and vacuuming your wallet. The default setup is, frankly, a trap for the unwary.
The primary culprit and your first line of defense is the kubelet, the node agent. It manages the containers running on its node, and this includes a thankless job called log rotation. It does this for each container, and the rules are simple, brutal, and global for the node.
The Default Rotation: A Ticking Time Bomb
By default, the kubelet will rotate a container’s log file once it reaches 10 megabytes. It will keep up to 5 rotated log files per container. After that, the oldest one gets deleted. That’s it. That’s the entire plan.
Let’s do the math, and not just because I’m a nerd. 5 files * 10 MB each = 50 MB of log storage per container. If you have a pod with 3 containers, that’s 150 MB on that node. If you have 50 pods on the node, you’re looking at 7.5 GB of potential log storage, which suddenly doesn’t seem so trivial.
The real kicker? These defaults are set by flags on the kubelet itself (--container-log-max-size and --container-log-max-files). This means they are node-wide settings. You can’t configure this per pod or per namespace with the standard setup. This is a classic Kubernetes “simplicity” move: it works reliably but is incredibly inflexible. Want a chatty debug container to rotate at 1 MB but a critical production app at 100 MB? Tough luck. You have to change it for the entire node, which often means managing it through your node image or provisioning tool (like kops, GKE’s node config, etc.).
You can see the current settings and the grim reality by shelling into a node and checking the kubelet’s running process:
ps aux | grep kubelet
# Look for flags like --container-log-max-size=10Mi and --container-log-max-files=5
And you can find the actual logs themselves, neatly rotated with a timestamp, on the node’s filesystem at /var/log/pods and /var/log/containers.
# On a node, see the log files for a pod
ls -la /var/log/pods/<namespace>_<pod-name>_<pod-uid>/
# Example output might look like:
# myapp-0f1a2b3c4d.log
# myapp-0f1a2b3c4d.log.20250315-150000.gz
# myapp-0f1a2b3c4d.log.20250315-160000.gz
When the Defaults Will Bite You
This system fails spectacularly in two common scenarios:
The “Noisy Neighbor” Container: One badly behaved, overly verbose container on a node can fill up the node’s disk with its 50 MB of logs, causing disk pressure evictions. The kubelet will see the disk is full and start evicting pods to save the node itself—and it might choose your important, quiet database pod instead of the chatty one, because Kubernetes eviction policies are a complex beast. It’s like getting evicted from your apartment because your roommate won’t stop buying giant, inflatable dinosaurs.
The “Need for History” Application: For compliance or deep debugging, 5 rotations (roughly 50 MB of history) is a cruel joke. A busy application can blow through that in minutes. You’ll go to investigate an incident that happened two hours ago only to find the logs have already been rotated away and deleted.
Taking Back Control: The DaemonSet Escape Hatch
The only sane way to handle logs in production is to treat the node’s local storage as a brief, temporary buffer. Your real strategy should be to ship logs to a central, external system as fast as possible. This is where DaemonSets like Fluent Bit, Filebeat, or Vector come in.
These tools run on every node, tail the log files from /var/log/pods, and ship them off to a destination like Elasticsearch, Loki, Splunk, or a cloud storage bucket like S3/GCS. This is the real solution for retention and cost control. Your retention policy is now defined by your backend system (e.g., “delete indices in Elasticsearch after 30 days”), not by the whims of a node’s disk space.
Here’s a taste of a Fluent Bit DaemonSet configuration that adds a layer of intelligence the kubelet lacks. This example parses the log format, adds Kubernetes metadata, and ships to Elasticsearch.
# fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
[OUTPUT]
Name es
Match *
Host elasticsearch-logging.default.svc.cluster.local
Port 9200
Logstash_Format On
Replace_Dots On
Retry_Limit False
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.2.0
volumeMounts:
- name: varlog
mountPath: /var/log
- name: fluent-bit-config
mountPath /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: fluent-bit-config
configMap:
name: fluent-bit-config
The Final Word on Cost
The local rotation is a safety net, not a strategy. Your storage cost is now the cost of your chosen central logging backend. This is where you make your real financial decisions. Object storage (S3/GCS) is cheap for long-term retention but slower to query. Managed services like Elasticsearch are more expensive but provide blazing fast search and analytics. Open-source options like Loki are built specifically to be cost-effective for logs. The choice is yours, but now you’re making it consciously, not by accident.