Right, let’s talk about indexes. This is where LlamaIndex stops being a simple query wrapper and starts to feel like a proper data framework. The core idea is laughably simple but profoundly powerful: you can’t just shove 10,000 PDF pages into an LLM’s context window and ask it nicely to summarize them. It will try, fail spectacularly, and charge you an arm and a leg for the privilege.

An index is our way of doing the sane thing: we pre-process your data into a structured, queryable format outside the LLM. We build a map so that when you ask a question, we can quickly find the relevant parts of your data, stuff only those parts into the prompt, and get a coherent, accurate answer. It’s the difference between asking a librarian to find a specific quote in a single book versus asking them to find it across the entire Library of Congress. You need the Dewey Decimal system. Indexes are our Dewey Decimal system for your private data.

The Workhorse: VectorStoreIndex

If you use only one index, make it this one. The VectorStoreIndex is the default for a reason: it’s the most versatile and directly leverages the “magic” of embeddings. Here’s the gist: it chunks your documents into manageable pieces (Nodes), generates a vector embedding for each chunk (a numerical representation of its meaning), and stuffs all those vectors into a vector database.

When you query it, it takes your question, generates an embedding for the question itself, and then performs a lightning-fast similarity search in vector space to find the chunks most semantically related to your query. These relevant chunks are then passed to the LLM as context.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load your data. This is the 'oh wow, that was easy' part.
documents = SimpleDirectoryReader("your_data_dir").load_data()

# Build the index. This is where the embeddings are generated and stored.
# Under the hood, it's using OpenAI's text-embedding-ada-002 by default.
index = VectorStoreIndex.from_documents(documents)

# The query engine is your interface. It handles the retrieval-augmented generation (RAG) flow.
query_engine = index.as_query_engine()
response = query_engine.query("What is the main theme of the document?")
print(response)

The Nitty-Gritty: The chunking (node parsing) matters. The default chunk size is 1024 tokens, which is often fine, but for precise answers, you might want smaller chunks. For summarization, larger chunks can be better. You must experiment with this. Also, the embedding model is crucial. text-embedding-ada-002 is great, but if you’re working in a specific domain (e.g., law, medicine), a domain-specific embedder might dramatically improve your results.

The Simpleton: SummaryIndex

Don’t let the name fool you; the SummaryIndex is deceptively powerful. It does the simplest thing imaginable: it chunks your documents into nodes and… stores them. That’s it. No vector embeddings. Its superpower is that it gives you total control over retrieval.

When you query a SummaryIndex, you’re responsible for telling it how to get the relevant text. You do this with a Retriever mode. This makes it perfect for scenarios where semantic similarity isn’t the best strategy.

from llama_index.core import SummaryIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("your_data_dir").load_data()

# Build the simple index
index = SummaryIndex.from_documents(documents)

# You HAVE to choose how you want to retrieve. `default` is a joke, don't use it.
# Option 1: LLM-powered retrieval. Slow but smart. The LLM selects the most relevant nodes.
llm_retriever = index.as_retriever(retriever_mode="llm")
relevant_nodes = llm_retriever.retrieve("What did the CEO say about Q4 goals?")

# Option 2: Get EVERY single node. Blunt, but sometimes you need it.
all_nodes_retriever = index.as_retriever(retriever_mode="all_leaf")
all_nodes = all_nodes_retriever.retrieve("Give me a summary of everything.")

When to Use It: The llm retriever mode is fantastic for multi-document agents where an LLM needs to “think” about which document to reference. The all_leaf mode is your go-to for generating comprehensive summaries of an entire dataset, as you can stuff all nodes into a big prompt for a summarization LLM call.

The Overachiever: KnowledgeGraphIndex

This is where things get spicy. The KnowledgeGraphIndex (KG Index) tries to extract a graph of entities and relationships from your text. It parses your documents and identifies entities (people, places, concepts) and the relationships between them (founded, located in, is a type of). It stores this in a graph structure.

Why would you do this? Because sometimes your questions are about the connections, not just the content. Querying a knowledge graph is fundamentally different from semantic search. It’s for questions like “Which employees worked on Project Phoenix and then later transferred to the Berlin office?”

from llama_index.core import KnowledgeGraphIndex, SimpleDirectoryReader
from llama_index.core import ServiceContext

documents = SimpleDirectoryReader("your_data_dir").load_data()

# Building a KG is more computationally expensive. You can customize the LLM used for extraction.
service_context = ServiceContext.from_defaults(llm="gpt-4") # Using GPT-4 for better extraction

# Build the graph. This will make many LLM calls to extract triplets.
index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=5, # Don't let it go wild extracting everything
    service_context=service_context,
)

# Querying it uses a special graph retriever that can traverse relationships.
query_engine = index.as_query_engine(include_text=False) # Rely only on the graph
response = query_engine.query("Who is connected to Project Alpha?")
print(response)

The Cold Shower of Reality: This index is the coolest one to talk about at a party and the most frustrating one to use in practice. The quality of the extracted graph is entirely dependent on the extraction LLM’s ability to parse your documents. It will make mistakes. It will miss obvious relationships and invent bizarre ones. It’s expensive and slow to build. Use it only when you have a clear, high-value use case that requires relational reasoning, and be prepared to spend time tuning the prompt and parameters for extraction. It’s not a default choice; it’s a strategic weapon.