multi-agent agent architecture LangGraph system design production AI

Building Reliable Multi-Agent Systems: Coordination Patterns That Actually Work

Agentic Runbook ·

Single-agent systems are conceptually clean: one LLM, one set of tools, one thread of reasoning. When something breaks, there’s one place to look.

Multi-agent systems are different. When you decompose a complex workflow across five specialized agents, you introduce a new class of failure: coordination failures. The agents might each be working correctly while the system as a whole produces garbage.

This post covers the coordination patterns that work reliably in production — and the anti-patterns we’ve seen cause expensive, hard-to-debug failures at mid-market companies deploying AI at scale.


Why Multi-Agent?

Before the patterns, the justification. Multi-agent architectures are not inherently better — they add complexity. Use them when:

  1. Task decomposition reduces hallucination. A single agent asked to research, analyze, write, and review in one pass will hallucinate. Splitting these into specialized agents with focused context reduces error rate measurably.
  2. Parallelism is available. Subtasks that can run concurrently should run concurrently. A single agent is sequential by nature; a supervisor can dispatch workers in parallel.
  3. Domain expertise matters. A legal-review agent with legal-domain prompting outperforms a general agent with the same tools.
  4. Context windows are a bottleneck. If a single task would require more context than any model handles reliably, decomposition is often the only solution.

If none of these apply, don’t add agents. More agents = more coordination overhead = more failure modes.


Pattern 1: Supervisor-Worker Hierarchy

The most reliable multi-agent pattern for most production use cases.

Structure:
A supervisor agent receives the top-level task, decomposes it, dispatches subtasks to specialized worker agents, collects results, and synthesizes the final output. Workers do not communicate with each other — all coordination runs through the supervisor.

from langgraph.graph import StateGraph, START, END
from langgraph.types import Command
from typing import Literal

def supervisor_node(state: SupervisorState) -> Command:
    """Routes to the appropriate worker based on task type."""
    decision = supervisor_llm.invoke(state["messages"])
    
    if decision.next == "FINISH":
        return Command(goto=END)
    
    return Command(
        goto=decision.next,  # "research_agent" | "write_agent" | "review_agent"
        update={"active_worker": decision.next, "task": decision.task}
    )

# Worker nodes report back to supervisor, not to each other
def research_agent_node(state: SupervisorState) -> Command:
    result = research_agent.invoke({"task": state["task"]})
    return Command(
        goto="supervisor",
        update={"messages": [AIMessage(content=result, name="research_agent")]}
    )

When to use it: Default choice. Works well for workflows where a human-equivalent manager would decompose and delegate work.

Failure mode to watch: Supervisor over-routing — the supervisor keeps redirecting without progress. Add a turn counter and a maximum-depth guardrail.

MAX_SUPERVISOR_TURNS = 10

def supervisor_node(state: SupervisorState) -> Command:
    if state.get("supervisor_turns", 0) >= MAX_SUPERVISOR_TURNS:
        return Command(
            goto="human_escalation",
            update={"interrupt_reason": "max_supervisor_turns_exceeded"}
        )
    # ... normal routing

Pattern 2: Parallel Fan-Out / Fan-In

Structure:
The supervisor dispatches multiple workers simultaneously, waits for all to complete (fan-in), then synthesizes results. This is the right pattern when subtasks are independent and latency matters.

from langgraph.constants import Send

def fanout_node(state: ResearchState) -> list[Send]:
    """Dispatch parallel research tasks."""
    return [
        Send("research_worker", {"topic": topic, "thread_id": f"{state['thread_id']}-{i}"})
        for i, topic in enumerate(state["research_topics"])
    ]

def fanin_node(state: ResearchState) -> ResearchState:
    """Aggregate results from all parallel workers."""
    all_results = state.get("worker_results", [])
    combined = "\n\n---\n\n".join(all_results)
    return {**state, "aggregated_research": combined}

Critical constraint: Worker agents in a fan-out must be truly stateless with respect to each other — no shared mutable state, no cross-worker tool calls. If workers need to share context, route through the supervisor instead.

Idempotency requirement: Each worker must generate a unique idempotency key (per ADR-082) to prevent duplicate side effects on retry:

def research_worker(state: WorkerState) -> WorkerState:
    key = generate_idempotency_key(
        agent_slug="research-agent",
        thread_id=state["thread_id"],
        node_name="research_worker",
        operation="web_search",
        content=state["topic"],
    )
    # Check if already completed
    if key in state.get("completed_operations", set()):
        return state  # Skip — already done
    
    result = web_search_tool(state["topic"])
    return {
        **state,
        "worker_results": state.get("worker_results", []) + [result],
        "completed_operations": state.get("completed_operations", set()) | {key},
    }

Pattern 3: Sequential Pipeline with Checkpoints

Structure:
Agents run in sequence where each agent’s output is the next agent’s input. This is appropriate when there’s strict ordering and each stage validates/transforms the previous stage’s output.

[Input] → [Ingest Agent] → [Enrich Agent] → [Review Agent] → [Output Agent] → [Done]

The key production requirement is checkpoint at each stage boundary. If the pipeline fails at stage 3, you should be able to resume from stage 3, not restart from stage 1.

from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver

# Thread ID encodes stage for deterministic resumption
def build_stage_thread_id(base_thread_id: str, stage: str) -> str:
    return f"pipeline:{base_thread_id}:{stage}"

async def run_pipeline_stage(
    agent_graph,
    state: dict,
    thread_id: str,
    checkpointer: AsyncPostgresSaver,
) -> dict:
    config = {"configurable": {"thread_id": thread_id}}
    async for chunk in agent_graph.astream(state, config=config):
        pass  # Each step auto-checkpointed
    return await agent_graph.aget_state(config)

When to use it: Document processing pipelines, multi-stage data enrichment, approval workflows where each stage has human or automated sign-off.

Anti-pattern: Passing the entire prior stage’s raw output as the next stage’s input without summarization. Raw output grows unboundedly; summarize to a structured handoff schema at each boundary.


Pattern 4: Human-in-the-Loop (HITL) Gates

Structure:
The agent graph pauses at defined points and waits for human review before proceeding. This is not optional for high-stakes decisions — contract review, outbound communications, financial transactions above a threshold.

from langgraph.types import interrupt

def contract_review_node(state: LegalState) -> LegalState:
    """Review a contract and pause for human approval if risk >= HIGH."""
    risk_assessment = risk_classifier(state["contract_text"])
    
    if risk_assessment.level in ("HIGH", "BLOCKER"):
        # Interrupt execution and surface to human
        human_decision = interrupt({
            "action_required": "contract_risk_review",
            "risk_level": risk_assessment.level,
            "findings": risk_assessment.findings,
            "contract_id": state["contract_id"],
        })
        
        if human_decision["approved"]:
            state["review_outcome"] = "approved_with_conditions"
            state["conditions"] = human_decision.get("conditions", [])
        else:
            state["review_outcome"] = "rejected"
            state["rejection_reason"] = human_decision.get("reason")
    else:
        state["review_outcome"] = "auto_approved"
    
    return state

Resumption pattern: Resume from the exact checkpoint after human decision, passing the decision into state:

# Resume after human approves
await agent_graph.ainvoke(
    Command(resume={"approved": True, "conditions": ["Add mutual indemnity clause"]}),
    config={"configurable": {"thread_id": thread_id}},
)

Timeout handling: HITL gates that wait indefinitely are a production liability. Set a timeout and route to escalation:

HITL_TIMEOUT_SECONDS = 86_400  # 24 hours

async def check_hitl_timeout(state: LegalState) -> LegalState:
    if state.get("hitl_requested_at"):
        elapsed = time.time() - state["hitl_requested_at"]
        if elapsed > HITL_TIMEOUT_SECONDS:
            state["review_outcome"] = "timeout_escalated"
            # Notify founder via Slack
            await notify_escalation(state)
    return state

Pattern 5: Shared State via Supervisor (Not Direct Agent-to-Agent)

The rule: Agents do not call other agents directly. All inter-agent communication routes through the supervisor graph or through shared state in the LangGraph checkpoint.

Why: Direct agent-to-agent calls create invisible coordination dependencies. When agent B calls agent A and agent A fails, debugging requires tracing call chains across two graphs. When everything routes through a supervisor, the coordinator is the single source of truth for what happened and in what order.

# ❌ Anti-pattern: direct agent call
def write_agent_node(state):
    # DON'T DO THIS — creates hidden dependency
    research_result = research_agent.invoke({"query": state["topic"]})
    ...

# ✅ Correct: write agent reads from supervisor-managed shared state
def write_agent_node(state: SupervisorState):
    research_result = state["research_results"]  # Populated by supervisor
    draft = writer_llm.invoke(f"Write based on: {research_result}")
    return {"draft": draft}

The Three Failure Modes Nobody Warns You About

1. The Coordination Loop

What happens: The supervisor routes to Agent A, Agent A produces output, the supervisor routes back to Agent A (because the output was incomplete), and this repeats indefinitely.

Fix: Add a per-node invocation counter to state. If any node is invoked more than 3 times in a single thread, pause for human review.

2. The Silent Partial Completion

What happens: In a fan-out pattern, 4 of 5 workers complete successfully. The 5th times out. The fan-in aggregates results from 4 workers and presents a synthesized output — without indicating that one worker’s contribution is missing.

Fix: Fan-in nodes must explicitly track worker completion and flag partial results:

def fanin_node(state: ResearchState) -> ResearchState:
    expected = len(state["research_topics"])
    received = len(state.get("worker_results", []))
    
    if received < expected:
        state["is_partial_result"] = True
        state["missing_workers"] = expected - received
        # Route to review before presenting output
    
    return state

3. The State Explosion

What happens: Each agent appends its full output to shared state. After 5 agents across 10 tool calls each, the state object is 50K+ tokens. The next agent’s context window is consumed by history before it can do any real work.

Fix: Per-agent output summarization at each supervisor handoff. The supervisor writes a compact handoff schema to state, not the raw agent output:

@dataclass
class AgentHandoff:
    agent: str
    status: Literal["completed", "partial", "failed"]
    summary: str          # ≤500 tokens
    key_outputs: dict     # Structured, schema-validated
    errors: list[str]

Production Checklist

Before deploying a multi-agent system:

  • Supervisor turn limit — hard cap prevents runaway coordination loops
  • Worker idempotency keys — required on all side-effectful tool calls (per ADR-082)
  • Fan-out completion tracking — fan-in node detects and flags partial results
  • Checkpoints at stage boundaries — all sequential pipelines are resumable mid-flight
  • HITL gates — defined for all decisions above risk threshold
  • HITL timeouts — 24h max wait before auto-escalation
  • Handoff schema validation — structured AgentHandoff at every supervisor boundary
  • Context budget enforcement — per ADR-083, context_budget_node before every LLM call
  • LangSmith tracing — every agent invocation tagged with ar_client_slug, ar_agent_slug, ar_thread_id
  • No direct agent-to-agent calls — all coordination through supervisor or shared checkpoint state

What This Looks Like End-to-End

An inbound lead arrives via a contact form. Here’s a production multi-agent pipeline we deploy for clients:

[Contact Form] → [Webhook]
    → [Inbound Triage Agent]            # Qualification rubric, ICP scoring
        → [Supervisor]
            → [Research Worker]         # Company enrichment (parallel)
            → [Intent Classifier]       # Signal extraction (parallel)
        ← [Fan-in]
    → [HITL Gate]                       # Human reviews score ≥ 7
        ← [Founder Approves]
    → [Outreach Drafter Agent]          # Personalized first-touch draft
    → [HITL Gate]                       # Human approves outreach before send
        ← [Founder Approves]
    → [CRM Logger Agent]                # Log to pipeline, notify Slack

Every stage is checkpointed. Every side-effectful tool call is idempotent. Every HITL gate has a 24-hour timeout. The whole pipeline is observable in LangSmith with cost attribution per stage.

The result: qualified leads get a researched, personalized first-touch response in under 2 minutes, with human oversight on every decision above a risk threshold.


The Bottom Line

Multi-agent systems are powerful when designed for their failure modes. The patterns above — supervisor-worker hierarchy, fan-out with idempotency, sequential pipelines with checkpoints, HITL gates, and no direct agent-to-agent calls — hold up in production across dozens of deployments.

Get the coordination right first. Optimize for speed later.

Ready to build your agentic team?

Start with a Diagnostic Sprint — a 2–4 week structured audit that produces your prioritized Agentic Roadmap.

Start with a Diagnostic →