28.6 Multi-Agent Systems: Collaboration, Competition, and Communication

Right, so you’ve got your single agent doing its ReAct thing, using tools, feeling pretty clever. But let’s be honest, most real-world problems aren’t solved by a single brilliant mind working in isolation. They’re solved by teams, committees, and groups of specialists who (ideally) collaborate, (sometimes) bicker, and (occasionally) produce something greater than the sum of their parts. Welcome to multi-agent systems, where we take that single-agent brain and copy-paste it a few times to see what beautiful—or horrifying—chaos ensues.

The core idea here is simple: instead of one monolithic LLM trying to juggle every possible task and perspective, you break the problem down. You create a cast of characters, each with a specific role, a defined set of tools, and a personality (or at least, a system prompt) tailored to their job. You then need a mechanism for them to talk to each other, share findings, debate, and ultimately converge on a solution. It’s like hiring a team of experts, if your experts were slightly unhinged, occasionally hallucinatory, and worked for the cost of API tokens.

Orchestration: The Manager Agent That Actually Works

The first thing you need is a boss. In multi-agent systems, we call this the orchestrator or manager agent. Its job isn’t to do the grunt work but to understand the overall goal, break it down into sub-tasks, and assign those tasks to the right specialist agent. This is a classic map-reduce problem, but with LLMs doing both the mapping and the reducing.

Think of the orchestrator as a project manager who’s been mainlining espresso. It uses its own reasoning loop to decide: “Okay, to answer this complex question about the market viability of a new pet rock, I need three things: a market analysis from the economist_agent, a materials cost from the engineer_agent, and a branding hot take from the marketer_agent. I’ll ask them all in parallel.”

Here’s a simplistic but runnable Python sketch using LangChain to show the concept. We’ll use a ChatOpenAI model as our “orchestrator” and define some fake but illustrative “agents” as functions.

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage
import asyncio

# Simulated specialist agents (in reality, these would be full LLM chains with their own tools)
def economist_agent(query):
    # This would be a call to another LLM with a specific prompt and tools
    return f"ECONOMIST: Based on my models, {query} is a terrible idea. The pet rock market peaked in 1975."

def engineer_agent(query):
    return f"ENGINEER: I've calculated the cost of smooth rocks at scale. It's approximately $0.02 per unit."

def marketer_agent(query):
    return f"MARKETER: We can totally rebrand this as a 'Digital Detox Companion' and sell it for $49.99."

# Our orchestrator agent
llm = ChatOpenAI(model_name="gpt-4-turbo", temperature=0)

def orchestrator_agent(user_query):
    # The orchestrator's system prompt is everything
    system_prompt = """You are a project manager. Break down the user's request and assign tasks to the appropriate specialist agents.
    You have three agents: economist, engineer, and marketer. Your job is to call them in parallel, collect their responses, and synthesize a final answer.
    Return only the final synthesized answer for the user."""
    
    # This is where the orchestrator would reason about *how* to break down the task.
    # For simplicity, we're hard-coding the parallel calls.
    messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=user_query)
    ]
    
    # In a real system, the orchestrator would decide which agents to call here.
    # We're simulating that decision by calling all three.
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    results = loop.run_until_complete(
        asyncio.gather(
            *[asyncio.to_thread(agent, user_query) for agent in [economist_agent, engineer_agent, marketer_agent]]
        )
    )
    
    # Now synthesize the results
    synthesis_prompt = f"""The user asked: {user_query}. Here are the responses from your team:
    {chr(10).join(results)}
    
    Synthesize this into a single, coherent final recommendation. Acknowledge any disagreements."""
    
    final_message = llm([SystemMessage(content="You are a synthesis agent."), HumanMessage(content=synthesis_prompt)])
    return final_message.content

# Run the system
user_question = "Should we launch a new premium pet rock?"
result = orchestrator_agent(user_question)
print(result)

Communication Protocols: How Agents Yell at Each Other

Agents can’t just shout into the void. They need a structured way to communicate. The two main patterns are:

Centralized Communication: All messages pass through the orchestrator. It’s the corporate hub-and-spoke model. It can become a bottleneck, but it gives the manager full visibility and control.
Decentralized Communication: Agents can message each other directly based on rules. This is more flexible and scalable but can descend into utter chaos if you’re not careful—imagine a group chat where everyone is an over-eager intern with access to the codebase.

Most frameworks you’ll use (CrewAI, AutoGen) provide a built-in messaging bus or a concept of a “group chat” to handle this for you. The best practice is to start centralized. When you hit performance issues or need more dynamic collaboration, then consider a decentralized approach.

The Pitfalls: Where This All Goes Horribly Wrong

This isn’t all sunshine and rainbows. You’ve just multiplied the problems of a single agent by N.

Cost and Latency: You’re now making 3, 5, 10+ LLM calls instead of one. This gets expensive and slow, fast. Your orchestrator alone might need two calls: one to plan and one to synthesize.
Compound Hallucinations: One agent hallucinates a fact, the next agent bases its reasoning on that hallucination, and the synthesizer beautifully combines it all into a perfectly coherent, utterly wrong answer. It’s the LLM version of a game of telephone.
The Infinite Loop of Debate: You ask two agents to decide on the best Python web framework. They’ll argue about Django vs. FastAPI until your API quota is exhausted and the sun burns out. You must build in termination conditions—a max number of debate turns, or a final authority (like the orchestrator) to make an executive decision.
Prompt Contamination: This is a big one. If you’re not extremely careful with your prompts, the personality and goal of one agent can bleed into another’s response, especially during synthesis. Your marketer agent’s “hype” language can poison the calm, analytical report you were hoping for from the orchestrator.

The key to taming this beast is to start small. Two agents. Clear, non-overlapping roles. A strong orchestrator with a strict termination condition. Then, and only then, do you add the third agent and see if your system collapses into a argument about semantics. It’s the most fun you can have with an API key.