26.8 Agents with LlamaIndex: ReAct and OpenAI Tools

Right, so you’ve got your data indexed and you’re ready to move beyond simple Q&A. Welcome to the main event: agents. This is where your application stops being a glorified file clerk and starts acting like a proper research assistant. Instead of just retrieving a context and shoving it blindly at an LLM, an agent plans. It thinks, “Hmm, to answer this user’s question, I might first need to look up X, then based on that, query Y, and then synthesize it all.” It’s the difference between handing you a phone book and a personal concierge who actually uses it.

The pattern we’re implementing here is called ReAct (Reason + Act), and it’s brilliantly simple yet powerful. The LLM reasons about the situation, takes a discrete action using a tool (like querying your LlamaIndex index), observes the result, and then repeats the loop until it has enough information to provide a final answer. It’s basically giving the LLM a working memory and the ability to use external APIs. LlamaIndex provides a wonderfully straightforward way to bolt this capability onto your existing indices.

The Core Concept: Tools and the Agent Runner

First, you need tools. A tool is just a function you expose to the agent, complete with a clear description that tells the LLM when and why to use it. The most important tool for us is a query engine tool powered by your index. The AgentRunner is the orchestration brain that manages the ReAct loop, calling the LLM, executing tools, and passing results back.

Let’s build one. Imagine you have one index for a company’s internal policies and another for its project documentation.

from llama_index.core.tools import FunctionTool
from llama_index.core.agent import AgentRunner
from llama_index.llms.openai import OpenAI

# Assuming you already have these query engines set up
policy_query_engine = policy_index.as_query_engine()
project_query_engine = projects_index.as_query_engine()

# Define a tool for each query engine. The function spec is CRUCIAL for the LLM.
def query_policy(query: str) -> str:
    """Useful for querying internal company policy documents for guidelines, HR information, and rules."""
    response = policy_query_engine.query(query)
    return str(response)

def query_projects(query: str) -> str:
    """Useful for querying internal project documentation for technical details, timelines, and team information."""
    response = project_query_engine.query(query)
    return str(response)

# Wrap the functions into Tools
policy_tool = FunctionTool.from_defaults(fn=query_policy)
projects_tool = FunctionTool.from_defaults(fn=query_projects)

# Initialize the agent with our tools and a capable LLM
llm = OpenAI(model="gpt-4-turbo")
agent = AgentRunner.from_llm(
    llm=llm,
    tools=[policy_tool, projects_tool],
    verbose=True  # So you can watch the magic happen
)

# Now ask a complex, multi-step question
response = agent.query("What is our paid time off policy, and are there any projects currently behind schedule that might be affected by it?")
print(response)

When you run this with verbose=True, you’ll see the agent’s thought process unfold in your terminal. It’ll first reason that it needs to understand the PTO policy, call the query_policy tool, get the result, reason that it now needs to find delayed projects, call the query_projects tool, and finally combine both results into a coherent, sourced answer for you. It’s a thing of beauty.

Why This Works: The Magic of the Function Spec

The secret sauce isn’t the code itself; it’s the docstring you write for the function—the “function spec.” This string is what the LLM uses to decide which tool to call. Be painfully clear and specific. A bad spec is "Queries the database." A good spec is "Useful for looking up customer order status by providing an order ID. Do not use this for general customer information." The quality of your agent is directly proportional to the quality of your tool descriptions.

Common Pitfalls and How to Avoid Them

The Hallucinating Tool-Caller: The agent might try to call a tool that doesn’t exist or use the wrong parameters. This is almost always because your tool spec is ambiguous. If you have two similar tools, the LLM will get confused. Be hyper-specific in differentiating them.
The Lazy Agent: Sometimes, especially with weaker LLMs, the agent will try to answer a question without using tools, relying solely on its internal knowledge. This defeats the entire purpose. Mitigate this by using a powerful LLM like gpt-4-turbo and writing tool specs that clearly state the tool’s purpose, e.g., “Must be used for any question regarding…”
The Infinite Loop: The agent might get stuck in a loop, querying the same tool over and over with the same parameters. This is why we use models with good reasoning capabilities and why you should consider adding a max_iterations parameter to your agent setup to hard-stop after, say, 10 iterations.

# Safer agent creation with a loop limit
agent = AgentRunner.from_llm(
    llm=llm,
    tools=[policy_tool, projects_tool],
    max_iterations=10,
    verbose=True
)

Integrating OpenAI Function Calling

The above example uses LlamaIndex’s built-in agent machinery. But since we’re using an OpenAI model, we can also leverage OpenAI’s native function calling, which is exceptionally good. LlamaIndex makes this swap trivial.

from llama_index.agent.openai import OpenAIAgent

# Same tools as before
openai_agent = OpenAIAgent.from_tools(
    tools=[policy_tool, projects_tool],
    llm=llm,
    verbose=True
)

# The usage is identical, but the underlying mechanism is OpenAI's optimized function calling
response = openai_agent.query("How many sick days do I get, and who should I notify on the Project Alpha team if I need to use one?")

The choice between AgentRunner and OpenAIAgent often comes down to the specific LLM you’re using and whether you need vendor lock-in. For OpenAI models, their native agent is often more efficient and reliable.

The bottom line is this: Agents transform your LLM application from a passive retrieval system into an active problem-solver. It requires more thought—especially in designing your tools—but the payoff is an application that feels genuinely intelligent. Now go build a concierge, not a phone book.