13.5 cert-manager: Automated Certificate Management

Look, manually managing TLS certificates is the digital equivalent of hand-washing every dish you use. It’s a noble, character-building exercise for about five minutes before you start wondering why you aren’t using the dishwasher. That dishwasher is cert-manager. It’s a Kubernetes-native utility that automates the procurement, renewal, and management of TLS certificates from a variety of Issuers, most famously Let’s Encrypt. It turns a tedious, error-prone process into a declarative “set it and mostly forget it” affair.

Here’s the core mental model: you don’t create certificates directly. You declare your desired outcome—a Certificate—and cert-manager works with an Issuer (or ClusterIssuer) to make it happen. The Issuer is your config for where to get the certs (e.g., Let’s Encrypt’s production vs. staging endpoint), and the Certificate is your request for what you need (e.g., myapp.example.com).

Installing cert-manager: Do It Right

The Helm method is the way to go. The installation is more complex than a kubectl apply because cert-manager uses CRDs extensively, and Helm handles that dependency dance beautifully. Always, always use the official Jetstack chart. Don’t just copy-paste a YAML from a blog post from 2019.

helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.14.4 \
  --set installCRDs=true

The --set installCRDs=true is crucial. It tells Helm to install the Custom Resource Definitions that give you the Issuer, Certificate, etc., APIs. Without them, cert-manager is a car with no wheels.

Configuring Your First ClusterIssuer

An Issuer is namespaced. A ClusterIssuer is, shockingly, cluster-scoped. You’ll almost always want a ClusterIssuer because you’ll probably want to provision certificates across multiple namespaces from a single, central configuration. Let’s set up a Let’s Encrypt ClusterIssuer. We’ll start with their staging environment. Why? Because Let’s Encrypt has very strict rate limits, and if you screw up your config while testing, you can easily get yourself banned. Staging exists for you to make a glorious mess without consequence.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: your-email@example.com # Seriously, use your real email here.
    privateKeySecretRef:
      name: letsencrypt-staging-account-key
    solvers:
    - http01:
        ingress:
          class: nginx

This config uses the HTTP-01 challenge. It works by cert-manager creating a temporary token, which the ingress controller serves on a specific path (.well-known/acme-challenge/). Let’s Encrypt then checks that your domain can serve that token. It’s elegant and doesn’t require opening firewall ports.

Requesting a Certificate with a Certificate Resource

Now for the main event. You create a Certificate resource in the same namespace as your Ingress. This resource points to the ClusterIssuer we made and specifies the domain(s) we want.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-app-tls
  namespace: my-app-namespace
spec:
  secretName: my-app-tls-secret # The name of the Secret that will be created with the cert & key.
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer
  dnsNames:
  - myapp.example.com
  - www.example.com

Apply this, and cert-manager gets to work. It will create a CertificateRequest, and you can watch the beautiful, verbose logs with kubectl describe certificate my-app-tls. When it’s successful, a Kubernetes Secret named my-app-tls-secret will pop into existence, containing the tls.crt and tls.key.

Hooking It All Together with Ingress

The final piece is telling your Ingress to use the secret that cert-manager is managing. This is the part that feels like magic.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-ingress
  namespace: my-app-namespace
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - myapp.example.com
    secretName: my-app-tls-secret # Must match the secretName in the Certificate!
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app-service
            port:
              number: 80

Once this is applied, cert-manager will automatically notice the Ingress, solve the HTTP-01 challenge by creating a temporary route, get the certificate, store it in the secret, and your Ingress controller will seamlessly pick it up and start serving HTTPS. The renewal process is entirely automatic and happens in the background.

Common Pitfalls and How to Avoid Them

Wrong Ingress Class: The http01 solver in the ClusterIssuer is configured for class: nginx. If you’re using Traefik, AWS ALB, or any other ingress controller, you must change this. This is the number one reason challenges fail. Check your solver config!
Staging vs. Production: You will 100% forget to switch from the staging ClusterIssuer to the production one. Your browser will give you a scary warning about the invalid staging cert, and you’ll spend 20 minutes thinking you broke everything. I’ve done it. We’ve all done it. Create a production ClusterIssuer (using https://acme-v02.api.letsencrypt.org/directory) and update your Certificate to use it.
Rate Limiting: Let’s Encrypt production has a hard limit of 50 certificates per registered domain per week. If you’re testing and deleting/recreating Certificate resources willy-nilly, you will hit this. Use the staging endpoint until your configuration is completely solid.
DNS Propagation: If you just changed your DNS to point to your cluster, wait. The HTTP-01 challenge requires Let’s Encrypt to resolve your domain to your ingress controller’s public IP. If the DNS record hasn’t propagated globally, the challenge will fail. Be patient.