LangGraph + MCP: Multi-Agent Workflows [2026 Guide]
Two of the most important primitives in agentic AI — LangGraph's stateful graph runtime and Anthropic's Model Context Protocol (MCP) — now compose cleanly. LangGraph gives you a structured execution engine with checkpointing and human-in-the-loop support; web-based MCP servers give every agent in that graph a live, versioned, network-accessible toolbox. The result is a production-ready pattern for multi-agent orchestration that is both auditable and extensible without touching agent code.
This guide walks you through building a supervisor multi-agent workflow from scratch: one orchestrator routes tasks between a research specialist and a code specialist, both of which call tools served over an HTTP/SSE MCP server. You can paste every snippet directly into your project — clean it up with the TechBytes Code Formatter if you want consistent style before committing.
Key Takeaway
The supervisor pattern keeps each specialist agent simple and single-purpose. The orchestrator LLM does routing only — it never executes tools itself. Web-based MCP servers mean you can update, version, or swap tools without redeploying any agent code.
Prerequisites
Before You Begin
- Python 3.11+ — async/await patterns are used throughout
- langgraph >= 0.3 and langchain-mcp-adapters >= 0.1
- mcp Python SDK for running the example MCP server locally
- langchain-anthropic or langchain-openai plus a valid API key
- Comfort with
asyncio— every MCP client call is async
What You'll Build
The finished system has three nodes inside a single StateGraph:
- supervisor — an LLM that reads the conversation and outputs the name of the next worker, or
FINISH - research_agent — a ReAct agent with
web_searchandfetch_pagetools, served by MCP Server A - code_agent — a ReAct agent with
run_pythonandlint_codetools, served by MCP Server B
Both specialist agents report back to the supervisor after each turn. The supervisor decides whether to loop, hand off, or terminate. All state — including the full message history — is persisted in a MemorySaver checkpointer keyed by thread ID.
Step 1 — Install Dependencies
pip install \
"langgraph>=0.3" \
"langchain-mcp-adapters>=0.1" \
"langchain-anthropic>=0.3" \
"mcp>=1.6" \
uvicorn
Pin these in your requirements.txt or pyproject.toml. The langchain-mcp-adapters package is the official bridge — it converts MCP tool schemas into langchain_core-compatible BaseTool objects that any LangGraph node can call directly.
Step 2 — Launch a Web-Based MCP Server
A web-based MCP server uses SSE (Server-Sent Events) or the newer streamable-HTTP transport instead of stdio. Here is a minimal research-tools server using FastMCP:
# research_server.py
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("research-tools")
@mcp.tool()
def web_search(query: str) -> str:
"""Search the web and return a result summary."""
# Replace with your real search integration
return f"[mock] Top results for: {query}"
@mcp.tool()
def fetch_page(url: str) -> str:
"""Fetch the text content of a web page."""
import urllib.request
with urllib.request.urlopen(url) as r:
return r.read(4096).decode("utf-8", errors="ignore")
if __name__ == "__main__":
mcp.run(transport="sse", host="127.0.0.1", port=8001)
Start it in a separate terminal:
python research_server.py
# Listening on http://127.0.0.1:8001/sse
Repeat the same pattern for a code_server.py on port 8002 with run_python and lint_code tools. Use sandboxed execution (e.g., a restricted subprocess or a container) for any tool that runs arbitrary code in production.
Step 3 — Connect via SSE Transport
MultiServerMCPClient is an async context manager that opens connections to one or more MCP servers and converts their tool manifests into LangChain tools:
import asyncio
from langchain_mcp_adapters.client import MultiServerMCPClient
async def get_tools():
async with MultiServerMCPClient(
{
"research": {
"url": "http://127.0.0.1:8001/sse",
"transport": "sse",
},
"code": {
"url": "http://127.0.0.1:8002/sse",
"transport": "sse",
},
}
) as client:
all_tools = await client.get_tools()
return all_tools
tools = asyncio.run(get_tools())
print([t.name for t in tools])
# ['web_search', 'fetch_page', 'run_python', 'lint_code']
In production you will hold the client open for the lifetime of the process — wrap the entire graph invocation inside the async with block rather than opening and closing per-request.
Step 4 — Define Shared State
LangGraph nodes communicate through a single shared TypedDict state object. The add_messages reducer appends messages rather than replacing them, which is exactly what a multi-turn conversation needs:
from typing import Annotated, Literal
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
# Reducer appends new messages to the list
messages: Annotated[list[BaseMessage], add_messages]
# Supervisor writes the name of the next worker here
next: str
Step 5 — Build Worker Agents
Each specialist is a create_react_agent wrapped in a plain function node. Pass only the tools that belong to that agent — scoping tools per-agent prevents accidental cross-capability bleed and makes debugging far easier:
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage
from langgraph.prebuilt import create_react_agent
llm = ChatAnthropic(model="claude-sonnet-4-6")
# Partition tools by server prefix
research_tools = [t for t in tools if t.name in ("web_search", "fetch_page")]
code_tools = [t for t in tools if t.name in ("run_python", "lint_code")]
research_runnable = create_react_agent(llm, research_tools)
code_runnable = create_react_agent(llm, code_tools)
def research_node(state: AgentState) -> dict:
result = research_runnable.invoke({"messages": state["messages"]})
return {"messages": result["messages"]}
def code_node(state: AgentState) -> dict:
result = code_runnable.invoke({"messages": state["messages"]})
return {"messages": result["messages"]}
Step 6 — Build the Supervisor Router
The supervisor receives the full message history and outputs a structured routing decision. Using with_structured_output forces the LLM to emit a validated Pydantic object — no fragile string parsing:
from pydantic import BaseModel
from langchain_core.messages import SystemMessage
WORKERS = ["research_agent", "code_agent"]
SYSTEM = (
"You are a supervisor routing tasks between workers: {workers}. "
"Given the conversation, decide who acts next or output FINISH. "
"Respond only with the worker name or FINISH."
).format(workers=", ".join(WORKERS))
class Route(BaseModel):
next: Literal["research_agent", "code_agent", "FINISH"]
router_llm = llm.with_structured_output(Route)
def supervisor_node(state: AgentState) -> dict:
messages = [SystemMessage(content=SYSTEM)] + state["messages"]
route = router_llm.invoke(messages)
return {"next": route.next}
def route_edge(state: AgentState) -> str:
if state["next"] == "FINISH":
return "__end__"
return state["next"]
Step 7 — Compile the Graph and Run
Wire up the nodes and edges, compile with a MemorySaver checkpointer, then invoke inside your async MCP context:
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
def build_graph():
builder = StateGraph(AgentState)
builder.add_node("supervisor", supervisor_node)
builder.add_node("research_agent", research_node)
builder.add_node("code_agent", code_node)
builder.set_entry_point("supervisor")
builder.add_conditional_edges("supervisor", route_edge)
builder.add_edge("research_agent", "supervisor")
builder.add_edge("code_agent", "supervisor")
return builder.compile(checkpointer=MemorySaver())
async def main():
async with MultiServerMCPClient({...}) as client:
global tools
tools = await client.get_tools()
graph = build_graph()
config = {"configurable": {"thread_id": "session-1"}}
result = await graph.ainvoke(
{"messages": [HumanMessage(content="Research quantum key distribution, then write a Python demo.")]},
config=config,
)
print(result["messages"][-1].content)
asyncio.run(main())
Expected Output
The console should show the supervisor routing twice — once to research_agent, once to code_agent — before emitting FINISH:
supervisor -> research_agent
[research_agent] Called web_search("quantum key distribution")
[research_agent] Called fetch_page("https://...")
supervisor -> code_agent
[code_agent] Called run_python("...")
supervisor -> FINISH
---
Here is a QKD explainer and a working BB84 simulation in Python...
Troubleshooting: Top 3 Issues
-
ConnectionRefusedError on MCP server URL
TheMultiServerMCPClientattempts to connect at import time inside theasync withblock. Confirm both servers are running before entering the context, and that firewall rules allow the loopback ports. Usecurl http://127.0.0.1:8001/sseto verify the SSE endpoint responds withtext/event-stream. -
Tool schema validation errors (
ValidationErrorfrom Pydantic)
FastMCP infers JSON schemas from Python type hints. If a tool parameter uses a complex type not expressible in JSON Schema (e.g., a rawdictwith nested generics), MCP may emit an ambiguous schema that LangChain rejects. Resolve by using simple primitives (str,int,list[str]) or an explicitField(...)annotation. -
Supervisor entering an infinite loop
This happens when the router LLM keeps emitting a worker name instead ofFINISH. Add an explicit turn counter toAgentState(turn: int) and add a hard-stop edge: ifstate["turn"] > MAX_TURNS,route_edgereturns"__end__". Also audit your SYSTEM prompt — it must include an explicit stopping criterion the model can recognize.
What's Next
- Persistent checkpointing — swap
MemorySaverforAsyncPostgresSaver(langgraph-checkpoint-postgres) to survive process restarts and scale horizontally - Human-in-the-loop — add
interrupt_before=["code_agent"]tobuilder.compile()so a human can review tool calls before execution - Streamable-HTTP transport — MCP 1.5+ recommends streamable-HTTP over SSE for lower overhead; swap
"transport": "sse"for"transport": "streamable_http"and update your server'smcp.run(transport="streamable-http")call - LangGraph Platform — deploy the compiled graph as a managed API endpoint with built-in auth, rate limiting, and a visual debugger at
studio.langchain.com - More specialists — add a
data_agentbacked by a database-query MCP server, or anotification_agentthat pushes results to Slack via an MCP tool — the supervisor pattern scales to any number of workers without changing the routing logic
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.
Related Deep-Dives
Agent-First API Design Pattern for Autonomous LLMs
How to design APIs that autonomous LLM agents can discover and invoke reliably without human-written client code.
AI EngineeringByteDance Deer-Flow 2.0: Multi-Agent Orchestration
An analysis of ByteDance's open-source multi-agent framework and how its task-delegation model compares to LangGraph supervisor patterns.