Right, let’s talk about getting these LLMs to actually do things. You see, an AI that can only talk is like a brilliant philosopher locked in a sensory deprivation tank. They can reason about the world, but they can’t interact with it. Their knowledge is frozen in time, limited to their training data. They can’t tell you the weather, can’t look up your latest database entry, and can’t book you a flight to Tahiti. This is where Tool Use, often called Function Calling, comes in. It’s the mechanism we use to give our boxed-in intellects a set of hands.

Think of it this way: we’re not asking the model to execute code. We’re asking it to reason about which of your provided tools (functions) would be useful right now, and to formulate the exact request for that tool in a structured way. The actual execution? That happens safely on your machine, in your environment, with your permissions. The model just provides the intent and the parameters. This separation of “thought” and “action” is crucial for both safety and reliability.

The Nuts and Bolts of a Function Call

At its core, tool use is a conversation in structured data. You provide the model with a list of tools it can request. Each tool has a name, a description (this is vitally important—the model uses this to decide), and a schema of parameters it accepts, defined in JSON Schema.

The model then processes your chat prompt. If it decides a tool is needed, it doesn’t just say “Hey, could you get the weather?” in plain text. That would be a nightmare to parse. Instead, it responds with a perfectly structured JSON object, specifying the exact tool to call and the arguments to use. You then execute that function on your end, get the result, and pass that result back to the model to let it continue its reasoning. This back-and-forth is the heart of the ReAct pattern.

Here’s a concrete example. Let’s give our model a calculator and a way to get the weather.

import json
from openai import OpenAI

client = OpenAI()

# Define the tools we're offering to the model
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "evaluate_math_expression",
            "description": "Evaluate a simple math expression. Use this for calculations.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate, e.g. '2 + 2' or 'sqrt(16)'",
                    }
                },
                "required": ["expression"],
            },
        },
    }
]

# Start the conversation
messages = [{"role": "user", "content": "What's the weather like in Tokyo right now, and what's that plus 15?"}]

# Send the prompt and the tool definitions to the model
response = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=messages,
    tools=tools,
    tool_choice="auto",  # Let the model decide if it needs to use a tool
)

response_message = response.choices[0].message
tool_calls = response_message.tool_calls
messages.append(response_message)  # Append the model's response to the history

# Check if the model wanted to call a tool
if tool_calls:
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_args = json.loads(tool_call.function.arguments)

        # Here is where you, the developer, actually call the function safely.
        if function_name == "get_current_weather":
            # You would call your real weather API here.
            function_response = f"Sunny and 22°C in {function_args['location']}"
        elif function_name == "evaluate_math_expression":
            # You would use a safe math eval library here (e.g., `numexpr` or `ast.literal_eval` for simple ones)
            # NEVER use eval() on untrusted input. This is a prime way to get pwned.
            function_response = str(15 + 22)  # Mocking the response for the example

        # Append the function's result back to the messages for the model to use
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": function_response,
            "name": function_name
        })

    # Get a new completion with the function response available
    second_response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=messages,
    )
    print(second_response.choices[0].message.content)

Why Your Tool Descriptions Matter More Than You Think

That description field in the tool schema isn’t just documentation for you. It’s the primary input the model uses to decide whether to call your function and with what data. A vague description like “Gets data” is useless. Be specific about the function’s purpose and the meaning of its parameters. The model is doing a semantic match between the user’s request and your descriptions. “What’s the forecast?” should trigger your get_weather_forecast function, not your get_stock_forecast function. This is where most of the “my model won’t call the right tool” problems come from. You have to engineer these prompts just as carefully as you engineer the chat prompt itself.

The Model Server Protocol (MCP): A Glimpse of Sanity

Now, here’s where the designers clearly learned from the initial chaos. Every cloud provider and open-source project initially cooked up its own slightly different way of doing this function calling thing. It was a mess. The Model Server Protocol (MCP) is an emerging open standard to fix that. Think of it like a universal USB port for tools. Instead of wiring your tools directly into one specific AI provider’s API, you define them once in an MCP server. Then, any MCP client (like Claude Code, your custom app, etc.) can discover and use those same tools seamlessly. It separates the tooling layer from the reasoning layer, which is how it should have been from the start. It’s not universally adopted yet, but it’s the future. Bet on it.

Common Pitfalls and How to Avoid Them

  1. The eval() Trap: I already mentioned it, but it’s worth screaming about: never, ever use Python’s built-in eval() on model-generated arguments. You are parsing untrusted input. Use a safe calculator library or a restricted parser. Your future self will thank you when you don’t get your AWS bill because the model was tricked into running os.system("rm -rf /").

  2. Schema Hallucinations: The model will sometimes invent parameters that aren’t in your schema. Your code must be robust enough to handle this gracefully—usually by ignoring the extra parameters or returning an error. Validate the arguments before you call your function.

  3. Stuck in a Tool Loop: Sometimes the model will request a tool, you’ll give it the result, and it will immediately request the same tool again with the same parameters. It gets stuck. You need to build in guardrails—a maximum number of turns for a given task, or logic to detect repetitive cycles and break out of them.

  4. Assuming Perfection: The model will choose the wrong tool sometimes. It will get the arguments slightly wrong. Your system design must be resilient to this. Add logging for all tool calls so you can debug these failures. Consider having a fallback strategy or asking the user for clarification.

The power this unlocks is immense. Once you can reliably use tools, you graduate from a chatbox to an automated reasoning engine that can interact with your entire digital world: your databases, your APIs, your calendars, everything. It’s the difference between talking about the world and actually changing it.