26.8 kube-prometheus-stack: The Batteries-Included Helm Chart
Right, so you’ve decided you want metrics. Good choice. Staring at a wall of log files to figure out why your application is having a conniption is like trying to read a book by smelling it. You need numbers, graphs, and a way to ask “what changed five minutes before everything caught on fire?”
You could assemble this whole monitoring stack yourself: deploy Prometheus, then Grafana, then the various exporters, then the custom resource definitions (CRDs) for service monitors, then figure out the permissions… it’s a lot. It’s the kind of project that starts on a Friday afternoon and ruins your entire weekend. The kube-prometheus-stack Helm chart is the antidote to that self-inflicted pain. It’s the “batteries-included” approach, and frankly, it’s brilliant.
Think of it as a curated, pre-wired observability suite. With one Helm install, you get:
- Prometheus: The time-series database and the heart of the operation.
- Grafana: The visualization layer that makes the data actually look meaningful.
- A bouquet of exporters: The node exporter for machine metrics, the kube-state-metrics exporter for Kubernetes object state, and more.
- The Prometheus Operator: This is the secret sauce. It manages Prometheus and Alertmanager for you using Kubernetes custom resources.
Why the Prometheus Operator is a Game-Changer
The Operator pattern is Kubernetes for “please, for the love of all that is holy, manage this complex stateful application for me.” Instead of manually editing a massive YAML config file and constantly reloading Prometheus, you define what you want to monitor using Kubernetes-native resources like ServiceMonitor and PodMonitor.
You tell the Operator: “See this app? It has a service called my-app-service with the label app: my-app. It exposes metrics on port web at the path /metrics. Go monitor it.” The Operator sees your declaration, says “cool, got it,” and automatically generates the correct Prometheus scrape configuration. It’s declarative monitoring. You state the desired end state, and the Operator figures out how to get there. This is infinitely better than the old imperative way.
Installing the Stack: It Should Be This Easy
First, add the Helm repository. I know, it feels like a formality, but just do it.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Now, for the install. You can’t just run helm install with no values; you’ll get a default setup that might work but is probably useless for anything real. You need a custom values file. Let’s create a minimal one to start. Save this as custom-values.yaml.
# custom-values.yaml
grafana:
adminPassword: "supersecretpassword" # Change this. Seriously. I mean it.
service:
type: LoadBalancer # So you can actually get to it from your browser
prometheus:
service:
type: LoadBalancer # Same reason as above
# This is the crucial bit that often gets forgotten: where do you want Prometheus to store all that data?
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi # Start with something sane. It will fill up.
Now, install it into a dedicated namespace. This is a best practice. This stack creates a lot of resources.
helm install kube-prom-stack prometheus-community/kube-prometheus-stack \
-n monitoring --create-namespace \
-f custom-values.yaml
Watch the magic happen with kubectl get pods -n monitoring --watch. After a few minutes, you should have a small army of pods running.
The First Hurdle: Finding Grafana’s Password
You set the admin password in the values file, right? Cool. But if you didn’t (because you skipped my advice), you’ll need to get it from the Kubernetes secret. This is a classic “oh right” moment.
kubectl get secret -n monitoring kube-prom-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
Log in to your Grafana LoadBalancer URL, use admin and that password, and you’re in. You’ll immediately see the dashboards the stack pre-loaded for you. Check out the “Kubernetes / Compute Resources / Namespace (Pods)” dashboard. It’s glorious. All that data, instantly. It feels like cheating.
The Most Common Pitfall: Permission Denied
Here’s where you’ll likely get stuck. Your application is running, it has a /metrics endpoint, but Prometheus isn’t scraping it. You check Targets in Prometheus and see a 403 Forbidden or Permission Denied error.
Why? Because by default, the Prometheus Operator only has permission to scrape metrics from endpoints within its own namespace (monitoring). Your app is in the default or app-namespace. You need to tell the Prometheus ServiceAccount where it’s allowed to go. This is done via RBAC and the serviceMonitorSelector and podMonitorSelector in the Prometheus custom resource. Luckily, the Helm chart lets you configure this easily.
You need to add this to your custom-values.yaml and upgrade your release (helm upgrade -f custom-values.yaml ...):
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
Setting these to false is the key. It tells the Operator: “Don’t just look for ServiceMonitors in the same release/namespace as you. Look for them anywhere in the cluster.” Now, when you create a ServiceMonitor in your app’s namespace, the Operator will find it and instruct Prometheus to scrape it.
Creating Your First ServiceMonitor
This is how you onboard your app. Let’s say your app has a service like this:
# app-service.yaml
apiVersion: v1
kind: Service
metadata:
name: my-great-app
labels:
app: my-great-app
spec:
ports:
- name: web
port: 8080
targetPort: 8080
selector:
app: my-great-app
Your corresponding ServiceMonitor would look like this:
# app-service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-great-app
# Note: This is in your APP'S namespace, e.g., 'default'
namespace: default
spec:
selector:
matchLabels:
app: my-great-app # This selects the Service
endpoints:
- port: web # Matches the port name in the Service
path: /metrics # The standard metrics path
interval: 30s # How often to scrape
Apply this, wait a moment, and check the Prometheus UI under “Status -> Targets”. You should see your new target listed and becoming healthy. That’s it. You’ve just dynamically configured Prometheus without touching a config file. This is the way.