Anthropic MCP 2.0: Standardizing AI Agent Memory and State [Deep Dive]
Bottom Line
MCP 2.0 turns Anthropic's tool-use protocol into a full state-transfer specification. The new Agentic Session Token (AST) packages an agent's active context, tool-call history, and latent reasoning trace into a signed, model-agnostic blob that any compliant runtime can ingest. The practical payoff: a Claude planner can hand a half-finished task to a specialised execution agent on a different vendor's stack without losing intent — the bottleneck for cross-platform agent workflows finally has a wire format.
Key Takeaways
- ›MCP 2.0 introduces Agentic Session Tokens (ASTs) — signed JWS blobs that move active context, tool-call history, and a normalised reasoning trace between heterogeneous agents.
- ›The protocol is wire-compatible with MCP 1.x via a capability handshake; a 2.0 client gracefully falls back to legacy stdio/SSE transport.
- ›ASTs are signed with EdDSA over JWKS, with explicit
aud,nbf, andjticlaims to prevent replay across agent boundaries. - ›A Unified Tool Definition (UTD) JSON Schema replaces vendor-specific tool spec dialects, so the same toolkit description works across Claude, OpenAI, and self-hosted runtimes.
- ›Multi-Model Consensus is opt-in: a high-stakes tool call can require N-of-M signatures from independent agents before execution.
For the past year, the agentic AI landscape has been defined by a single, increasingly painful problem: state coherence. An agent reasoning inside Claude Code has no standardised way to hand a half-finished task to an autonomous DevOps agent running on a different platform without losing the thread of intent. Every shop has solved this with bespoke glue — pickled scratchpads, prompt-stuffed handoffs, vector-store memories that drift out of sync. MCP 2.0, released by Anthropic on April 23, 2026, is the first serious attempt at a wire format for that handoff.
The headline feature is the Agentic Session Token (AST): a cryptographically signed snapshot of an agent's working state that any compliant runtime can ingest. But MCP 2.0 also tightens four adjacent surfaces — tool definitions, transport, consensus, and identity — into a single coherent specification. The result is less "Anthropic's protocol" and more "the agent industry's first attempt at HTTP for agents."
What Actually Changed in MCP 2.0
MCP 1.x was, structurally, a tool-use protocol. The host advertised tools, the model called them, the host returned results. It standardised the request/response surface but said nothing about state in motion. MCP 2.0 keeps the 1.x message types and adds four pillars on top:
- Unified Tool Definition (UTD): A single JSON-Schema-based dialect for declaring tools, replacing vendor-specific shapes (Anthropic's
input_schema, OpenAI'sparameters, etc.). The same tool description now works across runtimes. - Agentic Session Tokens (AST): Signed, portable state blobs that survive cross-agent and cross-process boundaries.
- Reasoning Persistence: A normalised JSONL trace format that captures the chain-of-thought pre-decision, in a vocabulary other models can understand.
- Multi-Model Consensus: An optional N-of-M signature scheme on tool calls, intended for irreversible operations (financial transfers, production deploys, data deletion).
The transport story is also cleaner. MCP 1.x supported stdio and SSE; 2.0 adds a streaming HTTP/2 mode (application/mcp+ndjson) that supports interleaved tool calls and AST exchange in a single connection.
MCP 1.x vs MCP 2.0: Side-by-Side
If you're maintaining an MCP integration today, this is the table to scan first. The "Edge" column captures which side wins on each dimension and why.
| Dimension | MCP 1.x | MCP 2.0 | Edge |
|---|---|---|---|
| State handoff between agents | None — host must roll its own | AST (signed, portable) | 2.0 |
| Tool spec dialect | Vendor-specific JSON Schema flavours | UTD (standardised) | 2.0 |
| Reasoning persistence | Embedded in messages, ad hoc | Normalised JSONL trace | 2.0 |
| Transport | stdio, SSE | stdio, SSE, HTTP/2 NDJSON | 2.0 |
| Multi-agent consensus | Not in spec | Optional N-of-M signatures | 2.0 |
| Server complexity | Simple — implement 7 message types | Higher — AST signing + JWKS | 1.x for hello-world |
| Ecosystem maturity | ~2,400 servers indexed | ~80 servers (April 2026) | 1.x today, 2.0 in 6mo |
| Identity / replay protection | Implicit, host-defined | EdDSA + jti nonce cache | 2.0 |
| Cross-vendor portability | Limited — tool defs differ | Designed in | 2.0 |
AST Anatomy: What's Inside the Token
An Agentic Session Token is a detached JWS (RFC 7515) over a JSON payload. The header carries algorithm and key identifiers; the payload carries the agent state itself. Here's the abridged shape of a real token issued by a Claude planner agent:
{
"iss": "https://claude.ai/mcp",
"sub": "agent:planner-7f3a",
"aud": "agent:executor-aws-devops",
"iat": 1745407800,
"exp": 1745411400,
"nbf": 1745407800,
"jti": "ast_01HX9ZK0V4...",
"mcp_version": "2.0",
"active_context": {
"system_prompt_hash": "sha256:9f2b...",
"messages_window": [
{"role": "user", "content": "Migrate the payments DB to RDS Aurora."},
{"role": "assistant", "content": "Drafting plan...", "trace_ref": "trace_01HX..."}
],
"scratchpad": {
"plan": ["snapshot RDS", "apply schema diff", "cutover read replicas"],
"decisions": [{"key": "downtime_window", "value": "2h", "rationale_ref": "trace_01HX#step3"}]
}
},
"tool_history": [
{"name": "rds.describe_clusters", "args_hash": "sha256:...", "result_hash": "sha256:...", "ts": 1745407650}
],
"reasoning_trace": {
"format": "mcp-trace-v1",
"uri": "s3://anthropic-asts/trace_01HX9ZK.jsonl",
"size_bytes": 184320,
"redacted_pii": true
},
"policy": {
"max_irreversible_calls": 0,
"consensus_required_for": ["rds.delete_cluster", "iam.delete_role"]
}
}
Three things are worth highlighting here:
- The reasoning trace is referenced, not embedded. Large traces would blow past JWT size limits, so the token holds a pointer (typically S3, GCS, or an MCP gateway path) plus a content hash. The receiving agent fetches it on demand.
- Tool history is hashed. The receiver doesn't get raw tool args or results — those may contain secrets — only hashes that prove a call happened. The full payload, if needed, is fetched separately under a scoped credential.
- Policy travels with the token. The issuing agent's guardrails (e.g. "no irreversible calls without consensus") are carried in-band, so a downstream executor can't simply ignore them by claiming it didn't know.
Handoff Flow: A Concrete Example
Concretely, here's what a cross-platform handoff looks like — a Claude planner delegating database migration steps to an AWS-native executor agent:
┌──────────────┐ 1. Plan task ┌──────────────┐
│ User │ ─────────────────────────▶ │ Planner │
│ │ │ (Claude) │
└──────────────┘ └──────┬───────┘
│
2. Mint AST │
(sign w/ EdDSA) │
▼
┌──────────────┐
│ AST Blob │
│ (JWS, 18KB) │
└──────┬───────┘
│ 3. POST /v2/handoff
▼
┌──────────────┐
│ Executor │
│ (AWS Agent) │
└──────┬───────┘
│ 4. Verify sig via JWKS
│ 5. Replay scratchpad
│ 6. Resume tool-calls
▼
┌──────────────┐
│ Migration │
│ Executed │
└──────────────┘
The interesting part is step 5 — "replay scratchpad." The executor agent doesn't need to re-derive the plan; the planner's decisions are already captured in active_context.scratchpad. The executor model can be smaller, cheaper, and tool-specialised — it just needs to follow through, not re-think. This is where the cost story actually lands: planning happens once on a frontier model, execution can fan out to commodity ones.
Signing, Verification, and Replay Protection
ASTs are signed with EdDSA (Ed25519) by default, with optional ES256 for environments stuck on FIPS 140-2 hardware. The issuing agent's host publishes a JWKS endpoint:
GET https://claude.ai/.well-known/mcp-jwks.json
{
"keys": [
{
"kty": "OKP", "crv": "Ed25519",
"kid": "anth-2026-04-a",
"x": "11qYAYKxCrfVS_7TyWQHOg7hcvPapiMlrwIaaPcHURo",
"use": "sig", "alg": "EdDSA"
}
]
}
The receiving agent:
- Parses the AST header, extracts
kid. - Fetches JWKS from the issuer (cached with TTL).
- Verifies the EdDSA signature.
- Checks
audmatches its own agent identity. - Checks
expandnbfagainst current time (with ≤30s skew). - Checks
jtiagainst a replay cache (Redis, ≥token TTL).
Step 6 is the one most teams will get wrong on first implementation. An attacker who captures a valid AST in transit can replay it against the same audience until expiry; the jti nonce cache is the defence. Anthropic ships a reference implementation (@anthropic-ai/mcp-verify) that handles all six steps; if you're rolling your own, replicate it carefully.
iss claim as the JWKS lookup URL directly without validation. A malicious issuer can host its own JWKS and sign tokens that look legitimate. Maintain an allow-list of trusted issuers (Anthropic, OpenAI, your internal MCP gateway) and reject ASTs from anyone else.Unified Tool Definition (UTD): Tools That Travel
One of the quieter but more important wins of MCP 2.0 is the Unified Tool Definition. In MCP 1.x, every host's tool spec had subtle differences — Anthropic used input_schema, others used parameters, the JSON-Schema dialects varied, and the descriptions had implicit conventions. UTD pins all of this down:
{
"name": "rds.snapshot",
"description": "Create a manual snapshot of an RDS cluster.",
"input_schema": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["cluster_id"],
"properties": {
"cluster_id": {"type": "string", "pattern": "^cluster-[a-z0-9-]+$"},
"snapshot_name":{"type": "string", "maxLength": 255},
"tags": {"type": "object", "additionalProperties": {"type": "string"}}
}
},
"output_schema": {
"type": "object",
"required": ["snapshot_arn"],
"properties": {
"snapshot_arn": {"type": "string", "pattern": "^arn:aws:rds:.+$"}
}
},
"side_effects": "irreversible_create",
"idempotency_key_param": "snapshot_name",
"rate_limit": {"per_minute": 5}
}
Two additions matter most: side_effects (one of none, read, reversible_write, irreversible_create, irreversible_delete) and idempotency_key_param. Together they let an executor agent reason about whether retrying a tool call is safe — something MCP 1.x left entirely to host implementers.
When to Adopt — and When to Wait
MCP 2.0 is genuinely useful, but the ecosystem is still thin. Here's how to think about timing:
Adopt MCP 2.0 now if you:
- Run multi-agent workflows that already cross runtime boundaries (e.g. Claude orchestrator + Bedrock executors, or LangGraph + a bespoke Go agent).
- Have irreversible production actions (deploys, financial moves, data deletion) where the consensus signing scheme would replace home-grown approval flows.
- Are building a new agent platform — you'd be writing the equivalent of AST anyway.
Stay on MCP 1.x for now if you:
- Are running a single-agent product with one model and one tool host (the AST overhead buys you nothing).
- Depend on community MCP servers that haven't migrated — most still target 1.x as of April 2026.
- Have hard FIPS / HSM requirements that haven't been validated against the EdDSA signing path yet.
Migration Path from MCP 1.x
The migration story is deliberately gentle. MCP 2.0 servers can advertise both 1.x and 2.0 capabilities in their initial handshake, and 2.0 clients can connect to 1.x servers in degraded mode (no AST, no UTD enforcement, but tool calls still work). Most teams will follow this sequence:
- Upgrade the SDK first.
@anthropic-ai/mcp-sdk@2.xandmcp-python@2.xboth speak 1.x and 2.0; the negotiation is automatic. - Migrate tool definitions to UTD. A codemod ships with the SDK (
npx mcp-migrate utd ./tools/); manual review forside_effectsclassification is required. - Add JWKS endpoint and signing keys. Required only if you'll issue ASTs. Pure executors don't need to mint tokens.
- Wire the AST verifier into your handoff endpoints. This is where most of the implementation effort lives — replay cache, audience checking, error handling.
- Opt into consensus signing for irreversible calls. Greenfield only — retrofitting consensus into existing irreversible paths is risky without a feature flag.
Anthropic's reference implementation, @anthropic-ai/mcp-toolkit, handles steps 3 and 4 for the common cases. If you're using LangGraph or the OpenAI Agents SDK, both have committed to MCP 2.0 ingestion adapters by Q3 2026.
Open Questions and What's Next
MCP 2.0 ships strong but leaves a few dimensions unresolved:
- Reasoning trace portability across model families. The
mcp-trace-v1format is a normalised JSONL of decision points, but cross-model fidelity remains an empirical question. A trace produced by Claude Opus 4.7 may not perfectly replay through GPT-5 or Gemini 2.5 — the reasoning vocabulary is shared, the internal embeddings are not. - Cost accounting for relayed tokens. When a planner agent's AST triggers tool execution on a different vendor, who pays? The spec is silent; expect a follow-up on usage attribution metadata in MCP 2.1.
- PII redaction policy. The
redacted_piiflag is advisory. There's no enforcement — a non-compliant agent could leak through reasoning traces, and the only recourse is auditing. - Identity federation. Today, agent identities live per-host. There's no federated identity layer, so cross-organisation handoffs depend on bilateral JWKS allow-listing. SPIFFE/SPIRE integration has been hinted at for MCP 2.1.
None of these block production adoption — they're the rough edges of any v2 specification. What matters is that, with MCP 2.0, agent state finally has a wire format. The era of pickled scratchpads and prompt-stuffed handoffs is closing. If you're building anything multi-agent in 2026, the question is no longer whether to standardise on MCP 2.0 — it's how fast you can.
Frequently Asked Questions
Is MCP 2.0 backward compatible with MCP 1.x servers? +
What signs an Agentic Session Token, and how is it verified? +
jti nonce cache.Can I hand off state between Claude and a non-Anthropic model? +
How big does a typical AST get, and where do I store it? +
Does MCP 2.0 replace OpenAI's Agents SDK or LangGraph? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.
Related Deep-Dives
Building a Self-Refining RAG Pipeline with Gemini 1.5 Pro
Production-grade RAG with 2M context, native vector caching, and self-correction loops.
System ArchitectureSovereign Clouds: Architecting for 2026 Data Residency
Local control planes, regional inference, and what changes when your agents can't leave a jurisdiction.