27.3 Fluent Bit: Lightweight Log Collector as DaemonSet
Right, so you’ve got a Kubernetes cluster, and it’s spewing logs from its various Pods like a firehose into a void. Your job is to catch that stream, make sense of it, and send it somewhere useful. That’s where Fluent Bit comes in. It’s the lean, mean, log-processing machine we all turn to because it’s written in C, uses a fraction of the memory of its bigger sibling (Fluentd), and is ruthlessly efficient. We’re going to run it as a DaemonSet, which is a fancy way of saying “one copy of this Pod on every single node in our cluster.” This is non-negotiable; you need an agent on each node to read the logs from /var/log/containers, which is where the kubelet helpfully symlinks all your container logs.
The Absolute Minimum DaemonSet
Let’s start with the bare bones. You need permissions, which in Kubernetes means RBAC. Fluent Bit needs to read logs from the node’s filesystem and talk to the Kubernetes API to enrich those logs with metadata (like which Pod they came from). Here’s the basic setup. First, the service account and roles:
# fluent-bit-service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: kube-system
---
# fluent-bit-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit
rules:
- nonResourceURLs:
- "/metrics"
verbs:
- get
- apiGroups: [""]
resources:
- namespaces
- pods
verbs:
- get
- list
- watch
---
# fluent-bit-role-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: kube-system
Apply those. Now, the DaemonSet itself. Notice we mount the node’s /var/log directory and the directory for the container log symlinks. We also mount the varlibdockercontainers directory—a classic pitfall if you forget it, as it’s where the raw Docker/container runtime JSON logs live, and Fluent Bit needs them to get the full log stream.
# fluent-bit-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
app.kubernetes.io/name: fluent-bit
spec:
selector:
matchLabels:
app.kubernetes.io/name: fluent-bit
template:
metadata:
labels:
app.kubernetes.io/name: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.2.0
imagePullPolicy: Always
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/fluent-bit.conf
subPath: fluent-bit.conf
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
The Configuration: Inputs, Filters, Outputs
The config map is where the magic—and the frustration—happens. Fluent Bit’s configuration is a pipeline: you define an Input to suck in logs, optional Filters to parse and modify them, and an Output to send them somewhere. Here’s a basic config that tails the container logs and sends them to stdout (for debugging) and a more useful destination like Elasticsearch.
# fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-system
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key data
Merge_Log_Trim On
[OUTPUT]
Name stdout
Match *
# Uncomment this to send to Elasticsearch instead
# [OUTPUT]
# Name es
# Match *
# Host elasticsearch-logging
# Port 9200
# Logstash_Format On
# Replace_Dots On
# Retry_Limit False
Why this works: The tail input plugin watches the files. The kubernetes filter is the star—it talks to the API server to attach metadata (pod name, namespace, labels) to the log entry, turning a raw JSON log line into a richly documented event. The Merge_Log option is crucial; it tries to parse a log line that’s already in JSON and embed it neatly, preventing a nasty double-encoded JSON mess.
Common Pitfalls and Battle-Scarred Wisdom
The Parser Trap: The config above uses
Parser dockerin the input. This assumes your container runtime (Docker, containerd) writes logs in the Docker JSON format. It almost certainly does. But if you see"log": "{\"key\": \"value\"}"in your destination, you’ve hit this. Thedockerparser extracts the inner JSON. If it’s missing, you get a string. Themerge_logfilter is your backup plan here.Memory Buffering: See
Mem_Buf_Limit 5MB? That’s a lifesaver. If your output (e.g., Elasticsearch) goes down, Fluent Bit will buffer up to 5MB of logs in memory per node before it starts dropping chunks. It’s better than OOMKilling your node. Set this. Always.Database Location: The
DBparameter in thetailinput tracks the offset of files it’s reading. We’re storing it in/var/log/, which is on the host’s filesystem. This is vital. If you don’t persist this, a Fluent Bit Pod restart means it will re-read all the log files from the beginning, flooding your output with old data. Mount a hostPath or use a volume for this if you want it to survive a node reboot.The Liveness Probe Problem: Don’t bother. A DaemonSet will automatically restart failed Pods. A liveness probe on a log collector that might be temporarily blocked by a slow network output is a recipe for a restart loop. Trust the DaemonSet controller to do its job.
The goal is to get the raw, chaotic data from your containers, dress it up with the context of Kubernetes, and ship it off without breaking a sweat. Fluent Bit, configured correctly, does exactly that. Now go point that OUTPUT to a real destination. You’ve earned it.