Right, let’s talk about securing etcd. If you’ve gotten this far, you already know etcd is the absolute heart of your Kubernetes cluster. It’s where every single secret, every pod spec, every internal thought your cluster has ever had, is stored. Leaving it unprotected is like writing your deepest secrets on a postcard and hoping the mailman is having a good day. We’re not going to do that.

The gold standard for etcd security is TLS encryption and client certificate authentication. This means two things: first, the communication between the etcd server and its clients (like the API server) is encrypted so no one can eavesdrop. Second, the server positively identifies any client trying to connect, ensuring only approved systems can even talk to your precious data store. It’s a bouncer with a cryptographic guest list.

The Core Concepts: PKI Circus

Before we get our hands dirty, we need to agree on some terms, because most of the pain here comes from not understanding the moving parts. We’re building a small Public Key Infrastructure (PKI). Don’t let that term scare you; it’s just a fancy way of saying “a system for creating and managing certificates.”

  • CA (Certificate Authority): This is the trusted root. It’s like the company that makes your passport—everyone trusts its stamp. Both the etcd server and all its clients need to trust the same CA.
  • Server Certificate: This is etcd’s own ID card. It contains the server’s hostnames or IPs (as the Subject Alternative Name field, which is crucially important) and is signed by the CA. When a client connects, it presents this certificate to prove “I am who I say I am.”
  • Client Certificate: This is the API server’s (or etcdctl’s) ID card, also signed by the same CA. When the client connects, it presents this to the etcd server to prove “I am allowed to be here.”
  • Peer Certificate: etcd members in a cluster also talk to each other. They use a separate set of peer certificates for mutual authentication within the cluster. It’s the same idea, just for a different communication channel.

The most common, and frankly most absurd, self-inflicted wound is getting the SANs wrong. If your server certificate lists etcd.local but your API server is trying to connect to 10.0.1.10, the TLS handshake will fail spectacularly with a hostname error. The certificate must match the connection string.

Generating the Necessary Certificates

Let’s use cfssl—the tool from the etcd folks themselves—to generate these. It’s less error-prone than OpenSSL for this specific task. First, we define our CA.

// ca-csr.json
{
  "CN": "My Own CA",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "US",
      "L": "Austin",
      "O": "My Org",
      "OU": "CA Department",
      "ST": "Texas"
    }
  ]
}

Generate the CA certificate and key:

cfssl gencert -initca ca-csr.json | cfssljson -bare ca

Now, the critical part: the server certificate. Notice the hosts field. This is NOT the CN; it’s the SAN list. Put every DNS name and IP address anyone might use to connect to this etcd server. Yes, even the load balancer IP. Yes, even the pod IP if it’s running in Kubernetes. Be exhaustive.

// server-csr.json
{
  "CN": "etcd-server",
  "hosts": [
    "localhost",
    "127.0.0.1",
    "etcd-prod.internal",
    "10.0.1.10",
    "10.32.0.1"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "US",
      "L": "Austin",
      "O": "etcd",
      "OU": "Cluster Prod",
      "ST": "Texas"
    }
  ]
}

Generate the server cert, signed by our CA:

cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server-csr.json | cfssljson -bare server

Repeat the process for a client certificate (e.g., for the API server). The key difference is the profile (server vs client) and the hosts field is often omitted or minimal for clients, as we’re authenticating an identity, not a specific network location.

cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client-csr.json | cfssljson -bare client

Configuring etcd to Use Them

Now, you start etcd with the flags that make all this magic (and pain) happen. The designers chose flags over a config file for a long time, so we’ll roll with it.

etcd \\
  --name my-etcd \\
  --data-dir /var/lib/etcd \\
  --listen-client-urls https://10.0.1.10:2379 \\
  --advertise-client-urls https://etcd-prod.internal:2379 \\
  --cert-file=/etc/etcd/ssl/server.pem \\
  --key-file=/etc/etcd/ssl/server-key.pem \\
  --client-cert-auth \\
  --trusted-ca-file=/etc/etcd/ssl/ca.pem \\
  --auto-tls=false

Let’s break down the security-specific flags:

  • --cert-file and --key-file: etcd’s own server certificate.
  • --client-cert-auth: This is the big one. It enables client certificate authentication. Without this, etcd will encrypt traffic but not authenticate clients.
  • --trusted-ca-file: The CA file etcd uses to verify whether a presented client certificate is legitimate. Any cert signed by this CA is allowed in.

Connecting with etcdctl (Without Pulling Your Hair Out)

This is where 90% of people get stuck. You’ve set it all up, you try to connect, and you get a cryptic error. The etcdctl command needs the same trio of files to authenticate to the server.

# The WRONG way (will fail with an auth error)
ETCDCTL_API=3 etcdctl --endpoints=https://etcd-prod.internal:2379 get / --prefix

# The RIGHT way
ETCDCTL_API=3 etcdctl \\
  --endpoints=https://etcd-prod.internal:2379 \\
  --cacert=/path/to/ca.pem \\
  --cert=/path/to/client.pem \\
  --key=/path/to/client-key.pem \\
  get / --prefix

If you get a connection refused, check your firewall. If you get a certificate error, triple-check those SANs in your server cert. If you get an authentication error, ensure your client cert was signed by the CA that etcd trusts.

Best Practices and Rough Edges

  • Separate CAs for Client vs Peer Traffic: For a production cluster, you should use different CAs for client authentication and peer-to-peer communication between etcd nodes. This limits the blast radius if one key is compromised. It’s more work, but it’s the right thing to do.
  • Certificate Rotation: This is the real nightmare. Kubernetes needs to talk to etcd to function, so rotating these certs requires a carefully orchestrated dance of updating the API server flags and the etcd configuration, often in a rolling fashion. Do not wait until they’re about to expire to practice this.
  • etcdctl Wrappers: The command-line flags are verbose and annoying. Use environment variables (ETCDCTL_ENDPOINTS, ETCDCTL_CACERT, ETCDCTL_CERT, ETCDCTL_KEY) or better yet, a shell alias or script to save your sanity.
  • Honesty Time: The etcd team’s documentation on this is technically accurate but often assumes a level of PKI fluency that most of us don’t have at 3 AM during an outage. It’s okay to find this confusing. The system is complex because the problem is hard. The payoff, however, is a cluster that doesn’t leak all its secrets to the world. Worth it.