29.8 Batch API: Asynchronous Large-Scale Processing

Right, so you’ve built your little prototype and it’s charming. It takes a user’s query, sends it off to the API, and gets back a response. It’s a nice, polite, synchronous conversation. Now imagine you need to do that for 50,000 documents. Doing it one-by-one, waiting for each to finish before starting the next, isn’t just slow—it’s a form of masochism. This is where the Batch API comes in, and it’s the closest thing you’ll get to a superpower for large-scale language processing without setting up your own distributed system.

Think of the Batch API as handing a stack of work to a highly competent but slightly pedantic intern. You give them a file of inputs, they take it away to their desk, and they work on it asynchronously, using their own internal queue. They don’t bother you with constant updates. Hours later, they email you a neat file with all the results. You don’t pay for their idle time, only for the actual work they do, and you get a hefty discount for the privilege of not getting immediate answers.

Why Batch Exists (Beyond Saving You Money)

The obvious reason is cost. Batch requests are priced at a whopping 50% less than their on-demand equivalents. That’s not a promotional coupon; it’s a fundamental pricing difference because it allows OpenAI to optimize their compute scheduling, filling in gaps and running your jobs when there’s spare capacity. It’s the cloud-computing equivalent of flying standby.

But the real reason you’ll learn to love it is for throughput and simplicity. You can throw up to 100,000 requests into a single batch file, submit it, and then go focus on literally anything else. Your application doesn’t need to manage threads, retry logic, rate limits, or hanging connections. The API handles all the complexity of scaling, retries on failures, and batching the work internally. You get one succeeded output file at the end. It’s glorious.

The Anatomy of a Batch Operation

Creating a batch isn’t a single API call; it’s a small dance. You need to prepare your data, upload it, submit the batch, and then check on it. Here’s the full lifecycle, from zero to hero.

First, you prepare your input file. It’s a JSONL file (JSON Lines), where each line is a valid request object for the endpoint you’re targeting. For a chat completions batch, each line would look exactly like the JSON you’d send to the regular /v1/chat/completions endpoint.

Step 1: Create the input file (inputs.jsonl)

{"model": "gpt-4o", "messages": [{"role": "user", "content": "Summarize this text: The Batch API is..."}], "max_tokens": 200}
{"model": "gpt-4o", "messages": [{"role": "user", "content": "Summarize this text: JSONL format is..."}], "max_tokens": 200}
{"model": "gpt-4o", "messages": [{"role": "user", "content": "Summarize this text: Asynchronous processing..."}], "max_tokens": 200}

(You’d have thousands of these lines, likely generated programmatically from your dataset.)

Step 2: Upload the file to the API The API requires you to upload this file first, getting back a file ID. Note the purpose must be batch.

from openai import OpenAI
client = OpenAI()

file = client.files.create(
    file=open("inputs.jsonl", "rb"),
    purpose="batch"
)

print(f"File uploaded with ID: {file.id}")

Step 3: Create the batch job Now you tell the API to create a batch using that file ID. The endpoint is crucial—it tells the intern which API desk to walk your file over to.

batch = client.batches.create(
    input_file_id=file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

print(f"Batch created with ID: {batch.id}")
print(f"Status: {batch.status}") # Will likely be 'validating'

The completion_window is you telling them how long they have to finish the job. “24h” is the standard. You can’t get results sooner than that, so this is not for anything urgent.

Step 4: Check the status (and finally get results) The batch will go through states: validating -> in_progress -> completed (or failed / cancelled). Once it’s completed, the output_file_id and error_file_id will be populated.

# Check status
retrieved_batch = client.batches.retrieve(batch.id)
print(f"Status: {retrieved_batch.status}")

# If completed, download the results
if retrieved_batch.status == 'completed':
    output_content = client.files.content(retrieved_batch.output_file_id).text
    with open("outputs.jsonl", "w") as f:
        f.write(output_content)

Your outputs.jsonl will contain a JSONL file where each line corresponds to your inputs, containing either a successful response object or an error.

The Gotchas and Grey Areas (Pay Attention)

This is the “been in the trenches” part. The Batch API is fantastic, but it has quirks.

The 24-Hour SLA is Real: You submit a batch and it might finish in 20 minutes. Or it might take 20 hours. You have no control. Do not use this for anything where a user is waiting, even patiently. This is for backend data processing, full stop.
Errors are… Final: The built-in retry logic is good, but if a request fundamentally fails (e.g., your input JSON is malformed on line 4,302), it will fail and not be retried. You must check the error file. It’s your responsibility to handle those failed requests, perhaps by fixing the input and creating a new batch.
No Streaming, No Object Names: The request format in your input file must be exactly what the underlying endpoint expects. This means you cannot use the stream parameter (obviously), and you also can’t use the beta object names like gpt-4; you must use the model ID string like gpt-4-0613. This trips everyone up.
Inspection is Limited: While the batch is in_progress, you get almost no information. You don’t know how many requests are done, what the estimated completion time is, or if it’s stuck. You just have to wait. It requires faith.

The Batch API is the tool you use when the problem size has graduated from “a neat trick” to “a core part of the business.” It’s robust, cost-effective, and saves you from an architectural nightmare. Just pack your patience along with your JSONL file.