LangGraph + MCP: Multi-Agent Workflows [2026 Guide]

Two of the most important primitives in agentic AI — LangGraph's stateful graph runtime and Anthropic's Model Context Protocol (MCP) — now compose cleanly. LangGraph gives you a structured execution engine with checkpointing and human-in-the-loop support; web-based MCP servers give every agent in that graph a live, versioned, network-accessible toolbox. The result is a production-ready pattern for multi-agent orchestration that is both auditable and extensible without touching agent code.

This guide walks you through building a supervisor multi-agent workflow from scratch: one orchestrator routes tasks between a research specialist and a code specialist, both of which call tools served over an HTTP/SSE MCP server. You can paste every snippet directly into your project — clean it up with the TechBytes Code Formatter if you want consistent style before committing.

Key Takeaway

The supervisor pattern keeps each specialist agent simple and single-purpose. The orchestrator LLM does routing only — it never executes tools itself. Web-based MCP servers mean you can update, version, or swap tools without redeploying any agent code.

Prerequisites

Before You Begin

Python 3.11+ — async/await patterns are used throughout
langgraph >= 0.3 and langchain-mcp-adapters >= 0.1
mcp Python SDK for running the example MCP server locally
langchain-anthropic or langchain-openai plus a valid API key
Comfort with asyncio — every MCP client call is async

What You'll Build

The finished system has three nodes inside a single StateGraph:

supervisor — an LLM that reads the conversation and outputs the name of the next worker, or FINISH
research_agent — a ReAct agent with web_search and fetch_page tools, served by MCP Server A
code_agent — a ReAct agent with run_python and lint_code tools, served by MCP Server B

Both specialist agents report back to the supervisor after each turn. The supervisor decides whether to loop, hand off, or terminate. All state — including the full message history — is persisted in a MemorySaver checkpointer keyed by thread ID.

Step 1 — Install Dependencies

pip install \
  "langgraph>=0.3" \
  "langchain-mcp-adapters>=0.1" \
  "langchain-anthropic>=0.3" \
  "mcp>=1.6" \
  uvicorn

Pin these in your requirements.txt or pyproject.toml. The langchain-mcp-adapters package is the official bridge — it converts MCP tool schemas into langchain_core-compatible BaseTool objects that any LangGraph node can call directly.

Step 2 — Launch a Web-Based MCP Server

A web-based MCP server uses SSE (Server-Sent Events) or the newer streamable-HTTP transport instead of stdio. Here is a minimal research-tools server using FastMCP:

# research_server.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("research-tools")

@mcp.tool()
def web_search(query: str) -> str:
    """Search the web and return a result summary."""
    # Replace with your real search integration
    return f"[mock] Top results for: {query}"

@mcp.tool()
def fetch_page(url: str) -> str:
    """Fetch the text content of a web page."""
    import urllib.request
    with urllib.request.urlopen(url) as r:
        return r.read(4096).decode("utf-8", errors="ignore")

if __name__ == "__main__":
    mcp.run(transport="sse", host="127.0.0.1", port=8001)

Start it in a separate terminal:

python research_server.py
# Listening on http://127.0.0.1:8001/sse

Repeat the same pattern for a code_server.py on port 8002 with run_python and lint_code tools. Use sandboxed execution (e.g., a restricted subprocess or a container) for any tool that runs arbitrary code in production.

Step 3 — Connect via SSE Transport

MultiServerMCPClient is an async context manager that opens connections to one or more MCP servers and converts their tool manifests into LangChain tools:

import asyncio
from langchain_mcp_adapters.client import MultiServerMCPClient

async def get_tools():
    async with MultiServerMCPClient(
        {
            "research": {
                "url": "http://127.0.0.1:8001/sse",
                "transport": "sse",
            },
            "code": {
                "url": "http://127.0.0.1:8002/sse",
                "transport": "sse",
            },
        }
    ) as client:
        all_tools = await client.get_tools()
        return all_tools

tools = asyncio.run(get_tools())
print([t.name for t in tools])
# ['web_search', 'fetch_page', 'run_python', 'lint_code']

In production you will hold the client open for the lifetime of the process — wrap the entire graph invocation inside the async with block rather than opening and closing per-request.

Step 4 — Define Shared State

LangGraph nodes communicate through a single shared TypedDict state object. The add_messages reducer appends messages rather than replacing them, which is exactly what a multi-turn conversation needs:

from typing import Annotated, Literal
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    # Reducer appends new messages to the list
    messages: Annotated[list[BaseMessage], add_messages]
    # Supervisor writes the name of the next worker here
    next: str

Step 5 — Build Worker Agents

Each specialist is a create_react_agent wrapped in a plain function node. Pass only the tools that belong to that agent — scoping tools per-agent prevents accidental cross-capability bleed and makes debugging far easier:

from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage
from langgraph.prebuilt import create_react_agent

llm = ChatAnthropic(model="claude-sonnet-4-6")

# Partition tools by server prefix
research_tools = [t for t in tools if t.name in ("web_search", "fetch_page")]
code_tools    = [t for t in tools if t.name in ("run_python", "lint_code")]

research_runnable = create_react_agent(llm, research_tools)
code_runnable     = create_react_agent(llm, code_tools)

def research_node(state: AgentState) -> dict:
    result = research_runnable.invoke({"messages": state["messages"]})
    return {"messages": result["messages"]}

def code_node(state: AgentState) -> dict:
    result = code_runnable.invoke({"messages": state["messages"]})
    return {"messages": result["messages"]}

Step 6 — Build the Supervisor Router

The supervisor receives the full message history and outputs a structured routing decision. Using with_structured_output forces the LLM to emit a validated Pydantic object — no fragile string parsing:

from pydantic import BaseModel
from langchain_core.messages import SystemMessage

WORKERS = ["research_agent", "code_agent"]

SYSTEM = (
    "You are a supervisor routing tasks between workers: {workers}. "
    "Given the conversation, decide who acts next or output FINISH. "
    "Respond only with the worker name or FINISH."
).format(workers=", ".join(WORKERS))

class Route(BaseModel):
    next: Literal["research_agent", "code_agent", "FINISH"]

router_llm = llm.with_structured_output(Route)

def supervisor_node(state: AgentState) -> dict:
    messages = [SystemMessage(content=SYSTEM)] + state["messages"]
    route = router_llm.invoke(messages)
    return {"next": route.next}

def route_edge(state: AgentState) -> str:
    if state["next"] == "FINISH":
        return "__end__"
    return state["next"]

Step 7 — Compile the Graph and Run

Wire up the nodes and edges, compile with a MemorySaver checkpointer, then invoke inside your async MCP context:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver

def build_graph():
    builder = StateGraph(AgentState)

    builder.add_node("supervisor",      supervisor_node)
    builder.add_node("research_agent",  research_node)
    builder.add_node("code_agent",      code_node)

    builder.set_entry_point("supervisor")
    builder.add_conditional_edges("supervisor", route_edge)
    builder.add_edge("research_agent", "supervisor")
    builder.add_edge("code_agent",     "supervisor")

    return builder.compile(checkpointer=MemorySaver())

async def main():
    async with MultiServerMCPClient({...}) as client:
        global tools
        tools = await client.get_tools()
        graph = build_graph()

        config = {"configurable": {"thread_id": "session-1"}}
        result = await graph.ainvoke(
            {"messages": [HumanMessage(content="Research quantum key distribution, then write a Python demo.")]},
            config=config,
        )
        print(result["messages"][-1].content)

asyncio.run(main())

Expected Output

The console should show the supervisor routing twice — once to research_agent, once to code_agent — before emitting FINISH:

supervisor -> research_agent
[research_agent] Called web_search("quantum key distribution")
[research_agent] Called fetch_page("https://...")
supervisor -> code_agent
[code_agent] Called run_python("...")
supervisor -> FINISH
---
Here is a QKD explainer and a working BB84 simulation in Python...

Troubleshooting: Top 3 Issues

ConnectionRefusedError on MCP server URL
The MultiServerMCPClient attempts to connect at import time inside the async with block. Confirm both servers are running before entering the context, and that firewall rules allow the loopback ports. Use curl http://127.0.0.1:8001/sse to verify the SSE endpoint responds with text/event-stream.
Tool schema validation errors (ValidationError from Pydantic)
FastMCP infers JSON schemas from Python type hints. If a tool parameter uses a complex type not expressible in JSON Schema (e.g., a raw dict with nested generics), MCP may emit an ambiguous schema that LangChain rejects. Resolve by using simple primitives (str, int, list[str]) or an explicit Field(...) annotation.
Supervisor entering an infinite loop
This happens when the router LLM keeps emitting a worker name instead of FINISH. Add an explicit turn counter to AgentState (turn: int) and add a hard-stop edge: if state["turn"] > MAX_TURNS, route_edge returns "__end__". Also audit your SYSTEM prompt — it must include an explicit stopping criterion the model can recognize.

What's Next

Persistent checkpointing — swap MemorySaver for AsyncPostgresSaver (langgraph-checkpoint-postgres) to survive process restarts and scale horizontally
Human-in-the-loop — add interrupt_before=["code_agent"] to builder.compile() so a human can review tool calls before execution
Streamable-HTTP transport — MCP 1.5+ recommends streamable-HTTP over SSE for lower overhead; swap "transport": "sse" for "transport": "streamable_http" and update your server's mcp.run(transport="streamable-http") call
LangGraph Platform — deploy the compiled graph as a managed API endpoint with built-in auth, rate limiting, and a visual debugger at studio.langchain.com
More specialists — add a data_agent backed by a database-query MCP server, or a notification_agent that pushes results to Slack via an MCP tool — the supervisor pattern scales to any number of workers without changing the routing logic