10.7 Indexed Jobs: Work Queues with Stable Pod Indices

Right, so you’ve got a bunch of work to do. A big, fat, parallelizable data set. Maybe you’re resizing a million images, processing a terabyte of log files, or sending out a truly staggering number of “We miss you!” emails. You reach for a Job with a high completions count, and it works… fine. But it’s a bit, well, dumb. The pods get names like mypod-6xz8w, mypod-pq4d9—completely random. If one of your workers fails and you need to know which specific chunk of work (say, file number 42) died, good luck figuring it out from that pod name. You’re left grepping through logs like a medieval peasant.

This is where Indexed Jobs come in, and they are a glorious upgrade. They bring order to the chaos by giving each pod a stable, predictable index. It’s Kubernetes saying, “Fine, you want to know exactly which pod is handling which piece of the work? Here you go. Don’t mess it up.”

The core idea is simple: instead of completions: 5 creating five anonymous pods, an Indexed Job (you enable it by setting completionMode: Indexed in the Job spec) creates pods with a known identity. Each pod gets an index number, from 0 up to (completions - 1). This index is injected into the pod in two crucial ways, making it the cornerstone of your entire batch processing logic.

How the Index Gets Injected

Kubernetes doesn’t just hope you’ll figure it out. It slaps the index right in front of you, in two places.

First, and most reliably, it’s injected as an environment variable called JOB_COMPLETION_INDEX. This is your primary tool. Your container’s entrypoint script or your application code should look for this variable to know which specific work item it’s responsible for.

Second, because some people like their metadata in their hostnames, it’s also appended to the pod’s hostname. If your Job is named resize-images, the pods will be named resize-images-0, resize-images-1, and so on. This is great for debugging (kubectl logs resize-images-4) but you shouldn’t rely on it for your application logic—always use the environment variable.

Here’s what a simple but functional Indexed Job looks like. Notice the completionMode and how the pod spec uses the magic environment variable.

apiVersion: batch/v1
kind: Job
metadata:
  name: indexed-demo
spec:
  completions: 5
  parallelism: 2 # Let's not get too greedy; two pods at a time
  completionMode: Indexed # This is the key that unlocks the whole feature
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: worker
        image: bash:5.2
        command: ["bash", "-c"]
        args:
        - |
          # Grab our assigned index from the env var
          index=$JOB_COMPLETION_INDEX
          echo "Pod index $index is processing work item number $index"
          # In real life, you'd do something like:
          # process_file /data/file_${index}.txt
          # or curl http://api.example.com/tasks/$index
          sleep 2

Apply that, and watch the magic. Run kubectl get pods -l job-name=indexed-demo and you’ll see your neatly numbered pods: indexed-demo-0, indexed-demo-1, etc., all the way to indexed-demo-4.

The “How Do I Actually Use This?” Part

The index is just a number. The real genius—and your responsibility—is mapping that number to a unit of work. This pattern is called a “work queue,” and you typically implement it in one of two ways:

The Static Manifestation: You have a fixed set of work items. Maybe you have exactly 100 files in a bucket named file_0.txt to file_99.txt. Your pod’s command becomes brilliantly simple: process_file /data/file_${JOB_COMPLETION_INDEX}.txt. The index is the work identifier.
The Dynamic Lookup: The work queue lives elsewhere, like in Redis, Amazon SQS, or a database. The index is now a worker ID, not the task itself. Each pod (index N) might pull a task from a queue named queue_n or use the index to partition a larger dataset. For example, pod index 2 handles all users with user_id % 5 == 2.

Pitfalls and Sharp Edges

This isn’t all rainbows and unicorns. The designers made a few choices you need to be aware of.

First, and this is the big one, the index is not automatically reused on pod failure. If indexed-demo-3 fails and gets restarted, the new pod will still be indexed-demo-3 and will get the same JOB_COMPLETION_INDEX=3. This is usually what you want—it ensures that chunk of work eventually gets done. But it means your application logic must be idempotent. Processing work item 3 twice should have the same effect as processing it once. If it doesn’t, you’re in for a world of pain. This isn’t a Kubernetes flaw; it’s a fundamental requirement of distributed batch processing.

Second, you are 100% responsible for handling the index correctly. If your application crashes and doesn’t handle the JOB_COMPLETION_INDEX variable, or if you mess up the mapping logic so that index 5 tries to process a non-existent file_105.txt, the pod will fail. The Job controller will restart it, it will fail again, and it will eventually mark the entire Job as failed after backoff limits. It gives you a powerful tool, not a free pass.

The best practice? Start simple. Make your entrypoint script robust. Log the index as soon as the pod starts. Use it to clearly and correctly select the work item. And for the love of all that is holy, make your processing idempotent. Test that a pod restart doesn’t corrupt your entire dataset. Once you get this pattern right, you’ll wonder how you ever lived without it. It turns a messy, opaque batch job into a predictable, debuggable assembly line.