26.7 Sub-Question Query Engine: Decomposing Complex Questions

Look, you and I both know that LLMs are brilliant, but they’re also like that brilliant friend who gets overwhelmed if you ask them for the meaning of life, the best pizza in town, and the square root of 144 all in the same breath. They try to answer everything at once, and the result is often a garbled mess of half-truths and confident hallucinations. This is the fundamental problem with tossing a complex, multi-faceted question directly at a single LLM call. The Sub-Question Query Engine in LlamaIndex is our elegant, almost-obvious-in-hindsight solution to this. It’s the project manager for your LLM, breaking down the big, scary deliverables into manageable tasks.

The core idea is beautifully simple: don’t ask the big question. Instead, generate a set of smaller, simpler questions whose answers will logically combine to answer the original, complex one. Then, go find the answers to each of those sub-questions independently, and finally, synthesize all those individual answers into a final, comprehensive response. It’s the difference between yelling “how do I fix the economy?!” into a crowd and systematically consulting an economist, a sociologist, and a political scientist separately before forming your own conclusion.

How It Actually Works: The Three-Act Play

The process isn’t magic; it’s a structured pipeline. First, our query engine takes your complex query and passes it to a query decomposer LLM. This LLM’s only job is to act as a logical architect. We prompt it with something like: “Given this complex query, break it down into a list of simpler sub-questions that need to be answered to fully address the original question.” For the query “What were the main causes of the 2008 financial crisis and what were the key provisions of the Dodd-Frank Act enacted in response?”, it might generate:

“What were the primary causes of the 2008 financial crisis?”
“What is the Dodd-Frank Act?”
“What were the main regulatory provisions established by the Dodd-Frank Act?”

Next, the engine takes each of these pristine, simple questions and runs them through your standard query engine (e.g., a vector-based retriever and response synthesizer) one by one. This is the critical part. Each sub-question gets its own dedicated retrieval and synthesis step, completely isolated from the others. This prevents the context windows from getting polluted and ensures the answer for “Dodd-Frank provisions” isn’t biased by the text retrieved for “2008 causes.”

Finally, all the answers from the sub-questions are collected and presented to a synthesis LLM. We give this LLM the original complex query and the set of answers, instructing it to combine them coherently into a final, unified response. It’s the master synthesizer, weaving together the individual threads into a complete tapestry.

Here’s a concrete example. Notice how we build it on top of our existing index and query tools.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.llms.openai import OpenAI

# First, build a standard index. This is our knowledge base.
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Wrap our base index in a QueryEngineTool.
# This gives the sub-question engine a "tool" it can use to answer individual questions.
base_query_engine = index.as_query_engine()
query_engine_tools = [
    QueryEngineTool(
        query_engine=base_query_engine,
        metadata=ToolMetadata(
            name="finance_data",
            description="Provides information about financial history and regulations",
        ),
    ),
]

# Create the Sub-Question Query Engine, handing it the tool it can use.
llm = OpenAI(model="gpt-4-turbo")  # A stronger LLM is better for decomposition & synthesis
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    llm=llm,
)

# Now ask the complex, multi-part question.
response = query_engine.query(
    "What were the main causes of the 2008 financial crisis and what were the key provisions of the Dodd-Frank Act enacted in response?"
)
print(response)

Why This Is a Game-Changer

The beauty of this approach is its brute-force effectiveness. By isolating each sub-question, you drastically reduce the likelihood of the LLM getting confused or hallucinating due to information overload. The accuracy of the final answer becomes a function of the accuracy of each smaller step, which is much easier to control and debug. It also makes the system inherently more transparent. You can log the generated sub-questions and their individual answers, so you can see exactly where the final conclusion came from—and, more importantly, pinpoint where it might have gone wrong.

The Rough Edges and Pitfalls

This isn’t a free lunch. First, it’s expensive and slow. You’re making multiple LLM calls (one for decomposition, one per sub-question, one for synthesis) instead of one. For a complex query, this can be 5-10x more expensive and take significantly longer. You absolutely must use a cheaper model like gpt-3.5-turbo for the sub-question answering and reserve the expensive model for the final synthesis.

Second, the quality lives and dies by the decomposition step. If the initial LLM generates a poor, illogical, or incomplete set of sub-questions, the entire process is doomed from the start. Using a powerful LLM like GPT-4 for the decomposer is non-negotiable for serious applications. You must also be wary of the synthesis step introducing its own bias or hallucinations. It has all the correct answers, but it could still poorly summarize them or insert its own assumptions.

The final, subtle pitfall is the assumption of independence. This engine treats each sub-question as a separate island. But what if the answer to question 2 deeply depends on the context of the answer to question 1? The synthesis LLM has to be smart enough to handle that, as the individual retrieval steps are unaware of each other. It usually manages, but it’s a point of failure to be aware of.

The Sub-Question Query Engine isn’t the tool for every job. Use it for your big, “explain-like-I’m-a-genius” questions. For simple factual lookups, it’s overkill. But when you need to tackle complexity head-on, it’s the closest thing we have to a silver bullet. It forces the LLM to think step-by-step, and as it turns out, that’s just as effective for AI as it is for us.