39.6 AKS Add-Ons: Monitoring, Policy, and Ingress

Right, let’s talk about AKS add-ons. This is where Azure tries to save you from yourself, or at least from the sheer drudgery of wiring up the same open-source projects for the thousandth time. The idea is simple: click a checkbox (or flip a --enable-whatever flag) and Azure will install, configure, and manage a core component on your cluster for you. It’s tempting to just enable everything. Don’t. Be strategic. Some are brilliant time-savers; others… well, let’s just say you might want to bring your own.

The Monitoring Add-on (Azure Monitor for Containers)

This is the one you should almost always enable. It’s the “why on earth would I do this myself?” add-on. When you enable it, Azure automatically deploys a containerized agent (the omsagent) as a DaemonSet on your nodes. This agent sucks up all the juicy telemetry—pod metrics, node metrics, stdout/stderr logs, the whole shebang—and pipes it directly into a Log Analytics workspace.

The magic isn’t the collection; it’s the integration. You get a pre-built, actually-useful dashboard in the Azure portal under “Insights” for your cluster. You can see your node CPU/Memory pressure, your controller replica counts, and even drill down into individual pod logs, all without touching Grafana, Prometheus, or any of that if you don’t want to. For getting from zero to “I can see what’s happening” in under five minutes, it’s unbeatable.

The catch? You’re locked into Azure’s ecosystem. Want to use a custom Prometheus operator to scrape metrics from your fancy .NET app? You can, but you’ll be managing two metric pipelines. The built-in add-on is fantastic for operational monitoring of the cluster itself and standard workloads, but for deep application-level observability, you might still end up rolling your own solution alongside it.

Here’s how you enable it on a new cluster. Notice I’m specifying a Log Analytics workspace—it’s best practice to create this first so you can reuse it across multiple clusters for a unified view.

# Create a Log Analytics workspace first, because you're not a savage.
az monitor log-analytics workspace create \
  --resource-group myResourceGroup \
  --workspace-name myClusterLogs

# Now create the cluster with monitoring enabled and point it at the workspace
az aks create \
  --resource-group myResourceGroup \
  --name myManagedCluster \
  --enable-addons monitoring \
  --workspace-resource-id /subscriptions/<subscription-id>/resourcegroups/myResourceGroup/providers/microsoft.operationalinsights/workspaces/myClusterLogs

The Azure Policy Add-on

This is Microsoft handing you a giant, configurable stick to whack your cluster users with. And you should use it. Enabling this add-on installs the Gatekeeper controller, which is the admission controller for Open Policy Agent (OPA), and then connects it to Azure Policy. Why does this matter? It means you can define governance and compliance rules declaratively in Azure Policy, and they will be enforced in real-time on your Kubernetes API server.

Think: “No pods can be deployed without a resource limit,” or “Absolutely no containers running as root,” or “Only pull images from our approved Azure Container Registry.” This is powerful stuff. Instead of hoping your developers read the 50-page PDF you wrote on best practices, you just enforce them. The policy engine will straight-up reject any YAML that breaks the rules.

The rough edge? The policies themselves are defined in Azure’s JSON-ish language, which is… an acquired taste. It’s not pure Rego (OPA’s native language), but a layer on top. It’s powerful, but the abstraction can leak when you try to do something complex.

# Enable the add-on during cluster creation
az aks create \
  --resource-group myResourceGroup \
  --name myManagedCluster \
  --enable-addons azure-policy

After enabling, you don’t do anything in Kubernetes. You go to the Azure Portal, assign a built-in policy initiative like “Kubernetes Pod Security Standards restricted,” and watch the magic (or the denials) happen.

The Application Gateway Ingress Controller (AGIC) Add-on

Ah, the ingress add-on. This is the most “your mileage may vary” option of the bunch. When you enable this, Azure installs and manages the Application Gateway Ingress Controller (AGIC). This is a Kubernetes controller that watches for Ingress resources in your cluster and translates those rules into configuration on an Azure Application Gateway (a Layer 7 load balancer) that sits outside your cluster.

This is a fundamentally different model from the more common nginx-ingress controller, where the ingress controller runs as a pod in your cluster and the Azure Load Balancer is just a dumb TCP forwarder. With AGIC, your cluster is just the control plane; the data plane (the actual traffic routing) is handled by a fully-managed, Azure-native Application Gateway.

The benefit? You get all the features of App Gateway—like Web Application Firewall (WAF), SSL termination, and cookie-based affinity—directly integrated with your Kubernetes ingress definitions. The downside? It’s more expensive and tightly couples you to Azure. It also adds a bit of latency to configuration changes, as AGIC has to call the Azure API to reconfigure the gateway instead of just updating an NGINX config file.

# First, create an Application Gateway. This is a prerequisite.
# Then, create your AKS cluster with the AGIC add-on enabled
az aks create \
  --resource-group myResourceGroup \
  --name myManagedCluster \
  --enable-addons ingress-appgw \
  --appgw-name myApplicationGateway \
  --appgw-subnet-cidr 10.2.0.0/16

So, which ingress should you use? If you need WAF and deep Azure integration and don’t mind the cost, AGIC is a solid choice. If you want portability, a simpler model, and to save a few bucks, you’re probably better off installing the nginx-ingress controller yourself via Helm. It’s one of those choices where there’s no single right answer, just a series of trade-offs that Azure is all too happy to let you make.