27.4 The EFK Stack: Elasticsearch, Fluent Bit, Kibana

Right, so you’ve got a Kubernetes cluster, and it’s spewing out logs from a dozen different pods like a firehose into a bucket. You can’t just kubectl logs your way out of this mess. You need a proper logging stack. Enter the old guard: the EFK Stack (Elasticsearch, Fluent Bit, Kibana). It’s the industry-standard workhorse for a reason, but let’s be clear: it’s a bit like adopting a pet elephant. Powerful, impressive, but it needs a lot of room and you will, at some point, be cleaning up after it.

Here’s the basic play: Fluent Bit acts as your nimble agent, tailing log files from your pods and nodes. It collects, parses, and ships those logs off to Elasticsearch, a distributed search and analytics engine that stores them in a way that’s actually queryable. Finally, Kibana is the beautiful UI you use to actually make sense of it all, building dashboards and hunting for that one fatal error in a haystack of a trillion log lines.

Deploying with Helm: The Sane Choice

You could craft all the YAML yourself, but you’d be here all week. We use Helm. It’s the package manager for K8s, and it saves us from the soul-crushing boredom of configuring stateful sets, services, and configmaps manually for something this complex.

First, add the Elastic Helm charts repo. These are officially maintained, which is better than some random YAML you found in a Git commit from 2018.

helm repo add elastic https://helm.elastic.co
helm repo update

Now, the big moment. We’ll deploy Elasticsearch first because everything else depends on it. Notice the values.yaml file? This is how you avoid passing 20 --set flags. We’re overriding the default resource requests because the defaults are… optimistic for a real cluster. Elasticsearch is a memory hog; it likes to cache the world for speed.

# values-elasticsearch.yaml
volumeClaimTemplate:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 50Gi # Start with something reasonable
resources:
  requests:
    memory: "2Gi"
    cpu: "1"
  limits:
    memory: "4Gi" # Java and memory limits are a fraught topic, but we'll get to that.

helm install elasticsearch elastic/elasticsearch -f values-elasticsearch.yaml -n logging --create-namespace

Wait for its health to go green. No, really, wait. kubectl get pods -n logging --watch. Go get a coffee. This takes a few minutes because it’s bootstrapping an entire distributed database.

Configuring Fluent Bit to Do the Heavy Lifting

With Elasticsearch up, we deploy Fluent Bit. Its job is to grab logs from every pod on every node. It does this by mounting the host’s /var/log directory and watching for new log files created by the container runtime. Clever, right?

The critical piece here is the configuration that tells Fluent Bit how to parse and ship the logs. We pass this via a values.yaml file. The [INPUT] section grabs everything, the [FILTER] section tries to parse Kubernetes metadata (like which pod a log came from), and the [OUTPUT] section points to our Elasticsearch service.

# values-fluentbit.yaml
config:
  inputs: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filters: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On

  outputs: |
    [OUTPUT]
        Name            es
        Match           *
        Host            elasticsearch-master.logging.svc.cluster.local
        Port            9200
        Logstash_Format On
        Logstash_Prefix logstash
        Replace_Dots    On
        Retry_Limit     False

helm install fluent-bit elastic/fluent-bit -f values-fluentbit.yaml -n logging

The kubernetes filter is magic. It enriches your raw log stream with metadata: pod name, namespace, labels, and more. This is why you can later in Kibana filter for logs just from the frontend app in the production namespace.

The Kibana Dashboard: Your Reward

Finally, deploy Kibana and tell it where its Elasticsearch brain is.

helm install kibana elastic/kibana -n logging --set elasticsearchHosts=["http://elasticsearch-master.logging.svc.cluster.local:9200"]

Forward the port and check it out: kubectl port-forward deployment/kibana-kibana 5601:5601 -n logging. Navigate to localhost:5601 in your browser.

You’ll need to define an index pattern in Kibana (try logstash-*). After that, head to the “Discover” tab. If everything worked, you should see logs. Congratulations, you’ve just tamed the chaos.

Common Pitfalls and The Memory Drama

Let’s talk about the elephant in the room. Elasticsearch and Memory. By default, Elasticsearch in the Helm chart uses a fixed memory limit. The JVM within the container sees this limit and, by default, sets its heap size to about half of it. This is often wrong. If you set a 4Gi limit, the JVM heap might be set to 2Gi, but the container itself needs memory for off-heap things (like mmapped files). If the total memory usage exceeds the 4Gi limit, Kubernetes kills the pod. Dead.

You must explicitly set the Java heap options to leave room for the container. This is the single biggest gotcha.

# In your values-elasticsearch.yaml, under the 'esJavaOpts' setting
esJavaOpts: "-Xms2g -Xmx2g" # Explicitly set min and max heap to the same value
resources:
  requests:
    memory: "4Gi"
  limits:
    memory: "4Gi"

This tells the JVM to use a fixed 2Gi heap, leaving 2Gi of the container’s 4Gi limit for everything else. It’s not a perfect science, but it keeps the OOM killer at bay.

Other pitfalls? Storage pressure. Elasticsearch grinds to a halt if it runs out of disk space. Set up alerts for your volumes. Data explosion. Those logs add up fast. Use Elasticsearch’s built-in Index Lifecycle Management (ILM) policies to roll over to new indices daily and delete old data after a set period (e.g., 7 days). Otherwise, you’ll be buying a lot of hard drives.

It’s a stack with moving parts, but when it’s humming, there’s nothing better for figuring out what on earth your applications are actually doing.