Right, let’s talk about what “healthy” and “synced” even mean in the cold, logical eyes of ArgoCD. This isn’t some feel-good wellness retreat; it’s a brutally honest assessment of your cluster’s state. My job is to make sure you understand the diagnosis.

At its core, ArgoCD performs two distinct checks on every resource you’ve asked it to manage. First, it asks, “Are you alive and functioning correctly?” That’s Health. Then it asks, “Do you look exactly like the manifest I have in my Git repository?” That’s Sync Status. A resource can be healthy but out-of-sync (you changed a replica count in Git but haven’t applied it yet), or synced but unhealthy (you applied a broken configuration that crashed the Pod on startup). You need both to be green for me to stop nagging you.

How ArgoCD Determines Health

It doesn’t just guess. For common resources like Pods, Services, Deployments, and so on, ArgoCD uses a set of built-in health assessment rules. They’re actually pretty smart. For a Pod, it’s not just “does it exist?”; it checks the phase (Running), the status of all its containers, and the readiness probes. This is why a Pod stuck in ContainerCreating or with a crashing container shows as Degraded and not Missing. For a Deployment, it checks the status.conditions to see if the rollout is complete and all replicas are available.

But what about that custom resource you built for your fancy new operator? By default, ArgoCD will treat it as “healthy” if it merely exists. That’s… optimistic. To fix this, you provide a custom health check in a Lua script. You tell ArgoCD how to interrogate your specific resource.

Here’s an example. Let’s say you have a CronJob resource (from the batch/v1 API) which, believe it or not, didn’t have a health check in older ArgoCD versions. You’d define this in your argocd-cm ConfigMap:

resource.customizations.health.batch_CronJob: |
  hs = {}
  hs.status = "Healthy"
  -- Check if the resource has a 'lastScheduleTime' which indicates it's actually working.
  if obj.status ~= nil and obj.status.lastScheduleTime ~= nil then
    hs.status = "Healthy"
    hs.message = "CronJob is scheduling jobs"
  else
    hs.status = "Progressing"
    hs.message = "CronJob has not yet scheduled a job"
  end
  return hs

This script checks for the lastScheduleTime field. If it’s there, the CronJob is working. If not, it’s still Progressing—which is better than a blind, unconditional “Healthy.”

The Sync Status Deep Dive

Sync status is both simpler and more pedantic. It’s a direct, byte-for-byte comparison between the live object in your cluster and the manifest in Git. But wait, it’s not literally byte-for-byte, because the cluster API often adds default fields, manages immutable fields, and so on. ArgoCD is smart enough to ignore those. It uses a process called “normalization” to compare what matters.

The most common way you’ll see a OutOfSync state is when someone gets impatient and runs a quick kubectl edit or kubectl scale. This is the cardinal sin of GitOps. You’ve just created configuration drift. The live state now differs from the declared state in Git. ArgoCD will spot this immediately and, if you have auto-sync enabled, will very politely undo your “fix” and put it back the way Git says it should be. This is a feature, not a bug. It prevents the dreaded “how did this ever work?” debugging session at 2 AM.

You can see the gory details of what’s out of sync. The UI shows you a diff, but you can also get it from the CLI:

argocd app diff my-app-name

This command is your best friend for figuring out why something is marked OutOfSync.

Common Pitfalls and How to Avoid Them

  1. The Ignore Diff Dilemma: Sometimes, a field is managed by another controller or is just noisy. A classic example is the kubectl.kubernetes.io/last-applied-configuration annotation. To stop ArgoCD from caring about it, you use ignoreDifferences in your Application spec. Use this power sparingly; it’s a trap door for letting drift back into your system.

    spec:
      ignoreDifferences:
      - group: ""
        kind: Service
        jsonPointers:
        - /spec/clusterIP
    
  2. The Helm Timeout Tango: Helm hooks often use the helm.sh/hook-weight annotation to control order. After the hook runs, Helm deletes the job pod. ArgoCD sees the resource it’s tracking suddenly vanish and freaks out, reporting it as Missing. The solution? Tell ArgoCD to ignore the deletion of hook resources using the resource.labels approach in your ignoreDifferences block. It’s a clunky workaround for a fundamental tension between two tools.

  3. Sync Waves for Orderly Rollouts: Need to create a ConfigMap before the Deployment that uses it? Use sync waves. Annotate your resource with argocd.argoproj.io/sync-wave: "-5" (the lower number goes first). The Deployment, by default, is wave 0. This is how you enforce ordering without resorting to clunky hooks. It’s one of ArgoCD’s best features for managing complex interdependencies.

The goal is to get everything to Healthy and Synced. When that happens, you have a high degree of confidence that your cluster’s reality matches your Git repository’s declared intent. And that, my friend, is where the real magic—and more importantly, a good night’s sleep—happens.