23.6 Structured Output: JSON Mode and Function Calling

Right, let’s talk about getting structured data out of this brilliant, chaotic word-predictor. You’re not just asking for prose anymore; you’re asking for data. You want something you can feed directly into your code without a bunch of gnarly string parsing that’ll break the moment the model decides to use a semicolon instead of a comma.

This is where we move from polite requests to laying down the law. We’re going to cover the two primary ways to enforce structure: JSON Mode and Function Calling. They solve the same core problem—getting predictable output—but they approach it from completely different angles.

The Sledgehammer: JSON Mode

Think of JSON Mode as the simplest way to tell the model, “I don’t want a story, I want a JSON object. Make it happen.” Its beauty and its limitation lie in its brute-force simplicity.

When you enable response_format: { "type": "json_object" } in the API, you’re doing one crucial thing: you’re forcing the model’s first output token to be a curly brace {. This is a genius little hack. By dictating the very first character, it effectively locks the entire subsequent generation into the grammatical structure of a JSON object. The model has to start with a key, then a colon, then a value, and so on. It can’t wander off into a preamble like “Sure, here’s the data you requested:”.

But here’s the critical catch, and I’ve seen more people trip on this than I can count: you absolutely must instruct the model to output JSON in your prompt as well. The API itself won’t magically make the content JSON-shaped; it only forces the syntax. If your prompt says “list some good books,” you’ll just get a syntactically valid JSON object containing a garbled mess like { "text": "Okay, here are some books I enjoy: To Kill a Mockingbird, Dune..." }. You have to be explicit.

from openai import OpenAI
client = OpenAI()

# This is how you do it right.
response = client.chat.completions.create(
    model="gpt-4-turbo",
    response_format={ "type": "json_object" },
    messages=[
        {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
        {"role": "user", "content": "Generate a list of 3 famous scientists and their most known discovery. Return the answer as a JSON object with a key 'scientists' that points to an array of objects. Each object should have 'name' and 'discovery' keys."}
    ]
)

# Parse the JSON string into a real Python dict
import json
data = json.loads(response.choices[0].message.content)
print(data['scientists'][0]['name'])  # Outputs: e.g., "Albert Einstein"

The pitfall? The model is still generating text, so while the output will always be valid JSON, its adherence to your intended schema (the specific keys and data types) is only as good as your prompt. It might sometimes use a string “123” instead of a number 123, or add an extra key you didn’t ask for. For mission-critical stuff, you still need to validate the output.

The Scalpel: Function Calling

Don’t let the name fool you. “Function Calling” is less about actually calling functions and more about getting the model to output a structured arguments object that you can then use to call a function. It’s a structured data extraction tool that happens to be perfectly tailored for tool use.

Here’s the workflow: you define “tools” (formerly called functions) to the model by describing their purpose and, crucially, their expected parameters using JSON Schema. The model then analyzes the user’s query and, if it decides a tool is needed, it doesn’t run the tool. Instead, it returns a perfectly formatted JSON object containing the arguments for that tool. It’s your code’s job to take that JSON and execute the actual function.

Why is this so powerful? Because JSON Schema allows you to be incredibly precise. You can specify types (string, integer, boolean), require fields, provide enum lists of valid values, and even nest objects. The model is remarkably good at fitting user queries into this rigid schema.

# Define a function schema for getting weather data
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use."
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Send the user query along with our tool definition
response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": "What's the weather like in Paris right now? I need it in Celsius."}],
    tools=tools,
    tool_choice="auto"  # Let the model decide if a tool is needed
)

# The model responds with a tool call request, not the answer.
response_message = response.choices[0].message
tool_calls = response_message.tool_calls

if tool_calls:
    # Extract the arguments JSON from the model's response
    function_name = tool_calls[0].function.name
    arguments_json = tool_calls[0].function.arguments # This is a JSON string!
    arguments_dict = json.loads(arguments_json)

    print(function_name)   # "get_weather"
    print(arguments_dict)  # {'location': 'Paris, France', 'unit': 'celsius'}
    # Now YOUR code would call a real weather API with these arguments.

The key insight is that the model is acting as a super-intelligent parser, translating natural language into structured data based on the schema you provide. The common pitfall is writing vague description and parameter fields. Be as precise as you would with code comments. If you write a bad schema, you’ll get bad results.

Which One Should You Use?

Use JSON Mode when you just need a simple, straightforward JSON blob from a generative task. “Give me a list of things in this specific format.”
Use Function Calling when you need high-fidelity, validated data extraction, especially if you’re actually connecting to tools, APIs, or databases. It’s the more robust and precise option for building reliable applications.

Both methods finally let us treat LLMs less like oracles and more like components in a larger, more predictable system. And that’s where the real magic starts.