29.2 Chat Completions API: Messages, Roles, and Parameters

Right, let’s get you talking to the machines. Forget the fancy demos for a second; the Chat Completions API is the workhorse, the core of everything you’ll do with OpenAI’s language models. It’s how you have a structured conversation with GPT. And yes, it’s a conversation, not a one-off command. The API is designed to remember the context of what you’ve said before, which is both its greatest strength and the source of most beginner headaches.

The whole thing revolves around the messages array. You don’t just throw a string at the API; you pass it a list of message objects, each with a role and content. This history is what the model uses to generate its next response. It’s like giving a friend the transcript of your chat so far so they don’t ask “wait, what were we talking about?” for the tenth time.

The Holy Trinity of Roles: system, user, assistant

There are three roles, and using them correctly is 90% of the battle.

system: This is your backstage directive. It’s where you set the stage, define the personality, and give the model its secret instructions. The model is supposed to be aware this is from you, the developer, and not part of the user’s conversation. The key here is brevity. A long, rambling system prompt is like giving a actor a 10-page monologue of direction right before they go on stage—they’ll forget most of it. Keep it sharp.
user: This is, unsurprisingly, the user. Anything your end-user says, any query, any instruction, goes here. This is the input you’re responding to.
assistant: This is the model’s previous responses. Why would you include these? Because this is how you maintain state. If you want the model to remember what it just said, you have to include it in the next request. The API is stateless; every request is a blank slate unless you provide the history yourself.

Here’s the simplest, most powerful pattern you’ll use:

import openai

response = openai.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant who is obsessed with 17th-century naval history. Weave facts into your answers whenever relevant."},
        {"role": "user", "content": "How do I change a tire on my car?"}
    ]
)

print(response.choices[0].message.content)

The model will probably tell you how to change a tire, but might caution you that Admiral Cloudesley Shovell would have insisted on checking the rigging on the lug nuts. You’ve been warned.

Taming the Beast: Key Parameters Explained

The model and messages are mandatory. Everything else is a dial to control the model’s behavior. Ignore them at your peril.

max_tokens: This is the hard stop for the response length. Not setting this is like giving a hyperactive child unlimited sugar and no bedtime. The model will happily ramble until it hits its internal context limit, and you’ll pay for every token. Always set a sane limit.
temperature: This controls randomness. 0.0 is deterministic (it will always pick the most likely next token). 1.0 is creative chaos. For factual, repeatable outcomes, lean low (0.2-0.5). For brainstorming or creative writing, crank it up (0.7-0.9). Using a high temperature for code generation is a recipe for bizarre and non-functional suggestions. Trust me.
top_p (Nucleus sampling): An alternative to temperature. It controls the diversity of words considered. A value of 0.1 means only the top 10% most probable tokens are considered. It’s generally recommended to tweak either temperature or top_p, not both. I usually just stick with temperature.

Here’s a more controlled example:

response = openai.chat.completions.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": "You are a concise API documentation bot. Answer questions with a maximum of 3 sentences."},
        {"role": "user", "content": "Explain the concept of recursion in programming."}
    ],
    max_tokens=150,  # Keep it short
    temperature=0.2  # Keep it factual
)

The Stateful Illusion and Common Pitfalls

Here’s the part the documentation glosses over: the API doesn’t remember anything from your previous request. That stateful conversation you see in the playground? It’s manually stitching the conversation together for you under the hood. In your code, you are responsible for managing the message history.

The biggest rookie mistake is sending the same messages array with every request, resulting in a conversation that goes: You: “Hello!” AI: “Hi there!” You: “Hello!” AI: “Hi there! …are you feeling okay?”

You have to append the assistant’s response and your new user message to the list for the next call.

# Initialize the conversation with a system prompt
conversation_history = [
    {"role": "system", "content": "You are a sarcastic tech support agent."}
]

# First user turn
user_input = "My internet is down."
conversation_history.append({"role": "user", "content": user_input})

response = openai.chat.completions.create(
    model="gpt-4-turbo",
    messages=conversation_history,
    max_tokens=300
)

# Get the assistant's reply
assistant_reply = response.choices[0].message.content
print(f"Assistant: {assistant_reply}")

# ***** CRITICAL STEP: Append the assistant's reply to the history *****
conversation_history.append({"role": "assistant", "content": assistant_reply})

# Next user turn continues the conversation
next_user_input = "No, I already tried turning it off and on again."
conversation_history.append({"role": "user", "content": next_user_input})

# Send the ENTIRE updated history for the next request
next_response = openai.chat.completions.create(
    model="gpt-4-turbo",
    messages=conversation_history, # Now includes all turns
    max_tokens=300
)

Forgetting to append the assistant’s response is the number one cause of “why is the model ignoring what it just said?!” posts on forums. You’ve been warned. Now go build something that doesn’t have the memory of a goldfish.