Prompt Leakage [Deep Dive]: EchoLeak Lessons [2026]

On June 11, 2025, Microsoft published CVE-2025-32711, an information disclosure issue in Microsoft 365 Copilot. The bug became widely known through EchoLeak, research disclosed by Aim Labs. The finding mattered because it pushed AI security out of the lab and into mainstream enterprise threat modeling: a production copilot could be manipulated into exposing data from its own context window, with no traditional malware, no memory corruption, and in the reported chain, effectively no user security decision in the loop.

Strictly speaking, EchoLeak was broader than the classic “print your hidden instructions” trick. But it belongs in any serious discussion of prompt leakage because it exposed the same broken assumption many teams still ship: if the prompt is hidden, the system is safe. OWASP’s LLM07:2025 System Prompt Leakage says the quiet part out loud: system prompts are not secrets, and they are not security controls. EchoLeak showed why. Once an application lets untrusted content influence what the model sees, how it reasons, and what it is allowed to retrieve, “hidden” instructions become just another soft boundary.

Takeaway

Prompt leakage is rarely the final incident. It is usually the reconnaissance and routing layer for a bigger failure: data exfiltration, guardrail bypass, tool misuse, or cross-tenant disclosure. Treat leaked prompts as evidence that your architecture trusted the model too much.

CVE Summary Card

CVE: CVE-2025-32711
Product: Microsoft 365 Copilot
Published: June 11, 2025
Type: AI command injection leading to information disclosure
What researchers called it: EchoLeak
Core issue: Untrusted prompt content could influence Copilot’s reasoning and cause disclosure of sensitive contextual data over the network
Why it matters for prompt leakage: The exploit chain turned hidden model context, retrieved enterprise content, and instruction hierarchy into an exfiltration surface
Official references: Microsoft MSRC, NVD, Aim Labs research

The most important lesson is not the vendor name. It is the pattern. A modern AI app combines user prompts, system prompts, retrieved documents, tool outputs, and orchestration metadata into one token stream. Humans think those are separate layers. The model does not. If your production design depends on the model respecting those boundaries perfectly, then prompt leakage is not an edge case. It is your steady-state risk.

Vulnerable Code Anatomy

Most production leaks do not come from a single “dump system prompt” bug. They come from composition mistakes. The typical anti-pattern looks like this:

system_rules = load_hidden_prompt()
retrieved_docs = search_enterprise_content(user_id, user_query)
incoming_content = load_email_thread_or_web_content()

full_context = [
  system_rules,
  retrieved_docs,
  incoming_content,
  user_query
]

answer = llm.generate(full_context)
return answer

This design fails for three reasons.

It collapses trust zones. Internal instructions, enterprise documents, and attacker-controlled content all become adjacent tokens.
It delegates policy enforcement to the model. The application expects the model to decide when not to reveal hidden rules or sensitive context.
It treats output as safe because it is natural language. In practice, natural language is also a control channel.

The more dangerous variant is the agentic one:

plan = llm.generate(system_prompt + user_input + retrieved_content)
if plan.requests_tool:
  tool_result = run_tool(plan.tool_name, plan.tool_args)
  final = llm.generate(system_prompt + plan + tool_result)
return final

Now prompt leakage becomes an attack accelerator. A leaked system message may expose tool names, hidden workflow steps, approval logic, classifier thresholds, or red-team phrases the system treats specially. That is enough for a patient attacker to move from extraction to steering.

OWASP is clear here: keep secrets, credentials, permission rules, and hard security decisions out of prompts. If you must inject policy text, assume it will be discovered eventually. For logs, eval datasets, and support transcripts, sanitize aggressively before storage; TechBytes’ Data Masking Tool is the kind of utility worth inserting into that pipeline so prompt histories and model traces are redacted before they become tomorrow’s leak source.

Attack Timeline

January 2025: according to Aim Labs, researchers identified the EchoLeak issue in Microsoft 365 Copilot and reported it to Microsoft through coordinated disclosure.

May 2025: Microsoft addressed the issue server-side before public disclosure, a notable detail because many AI application fixes are not traditional client patches. The vulnerable behavior lived in orchestration and service-side controls.

May 31, 2025: Aim Labs published an initial public write-up describing EchoLeak as the first zero-click AI vulnerability enabling data exfiltration from Microsoft 365 Copilot.

June 11, 2025: Microsoft published CVE-2025-32711 via MSRC. NVD later recorded the issue as an AI command injection vulnerability in M365 Copilot that could allow unauthorized information disclosure over a network.

February 20, 2026: NVD reflected later metadata changes, including updated weakness mapping. By then, EchoLeak had already become one of the canonical examples in AI threat modeling because it demonstrated that production copilots can fail in ways that look nothing like legacy web exploits while producing equally serious confidentiality impact.

Microsoft said it found no evidence of customer impact. That matters, but it should not be misread as low severity. In AI systems, responsible disclosure is masking a deeper reality: defenders got lucky that researchers found the chain first.

Exploitation Walkthrough

This section is conceptual only. No working proof of concept is provided.

A simplified exploitation path for prompt leakage in production AI apps looks like this:

Seed attacker-controlled content. The attacker places instructions in a source the model is likely to ingest indirectly: an email, a shared document, a knowledge-base page, a ticket comment, or a web page referenced by retrieval.
Make the content retrieval-friendly. The malicious text is phrased so retrieval, summarization, or classification layers are more likely to include it in the model’s active context.
Target the hidden layer. The embedded instructions do not need to say “reveal your prompt” verbatim. They can ask for policy summaries, reasoning scaffolds, tool descriptions, formatting rules, or previous hidden instructions. Leakage is often partial and incremental.
Turn leakage into routing. Once the model reveals even fragments of its hidden logic, the attacker learns what is likely to bypass refusal patterns, what tools exist, what content filters are in play, and which phrasing gets promoted by the planner.
Escalate to disclosure. The final objective is usually not the prompt itself. It is adjacent data: retrieved files, chat history, secrets copied into prompts, internal URLs, or sensitive enterprise content summarized back to an unauthorized channel.

That sequence is why “prompt leakage” is a misleadingly small term. The leaked prompt is often only the breadcrumb trail. The real exploit is a trust-boundary violation between instructions, data, and actions.

In EchoLeak, researchers described an LLM Scope Violation: the model’s accessible context could be coerced into leaking outward despite interface-level assumptions about isolation. That is the production-grade version of prompt leakage. The attacker is no longer trying to impress social media by printing a system message. They are using prompt behavior to move data across boundaries your architecture claimed were closed.

Hardening Guide

1. Stop treating prompts as secrets

If a system prompt contains credentials, API keys, role mappings, sensitive URLs, suppression phrases, or escalation hints, the architecture is already wrong. Prompts can encode behavior, but they cannot be your vault.

2. Separate policy enforcement from generation

Authorization, tenancy checks, DLP policy, workflow approvals, and data classification must execute in deterministic code before the model sees content and after the model proposes output. The model can assist; it cannot be the final authority.

3. Minimize retrievable context

Most damaging leaks are not verbatim system-prompt dumps. They are retrieved documents the model had no business seeing in the first place. Apply least privilege to retrieval indexes, connector scopes, and session memory. Reduce the blast radius first.

4. Add output inspection for hidden-state indicators

Look for responses that contain phrases such as internal policy summaries, system-role markers, tool manifests, chain-of-thought-like scaffolds, or unexplained formatting templates. These are strong indicators that hidden instructions are bleeding into user-visible output.

5. Instrument the full prompt assembly pipeline

Logging only the final user prompt is not enough. Capture structured telemetry for retrieval sources, tool calls, policy decisions, classifier verdicts, and prompt-template versions. Without that, prompt leakage detection becomes guesswork.

6. Create leakage canaries

Insert benign, unique markers into non-production hidden prompts and monitor whether they ever appear in model output, traces, analytics, or support tickets. Canary exposure is one of the fastest ways to prove a prompt isolation failure.

7. Red-team indirect injection, not just direct jailbreaks

Most teams test with adversarial chat inputs. Fewer test poisoned documents, malicious emails, hostile Markdown, injected OCR text, or retrieval-ranked snippets. In production, those indirect paths are often the real entry point.

8. Build kill switches for retrieval and tools

When suspicious leakage patterns spike, you should be able to disable high-risk connectors, session memory recall, or outbound action tools without taking the entire assistant offline.

9. Sanitize stored prompts and traces

Support platforms, observability tools, and fine-tuning datasets routinely become secondary leak surfaces. Redact sensitive payloads before persistence, not after an incident review.

Architectural Lessons

The largest lesson from EchoLeak is that AI security failures are usually composition failures. The base model may behave exactly as designed: it predicts the next useful token from the context it receives. The application creates the breach when it assembles that context carelessly and lets model output drive sensitive actions or disclosures.

Three architectural rules follow.

Keep trust boundaries explicit. User input, retrieved content, hidden instructions, memory, and tool output should be tagged, isolated, and policy-checked as separate classes of data.
Assume partial leakage. Even if your exact system prompt never appears verbatim, attackers can infer policy shape from behavior. Design as if hidden instructions are observable.
Optimize for blast-radius reduction, not perfect refusal. A model that occasionally leaks a prompt but has no access to sensitive documents is far safer than a perfectly aligned model sitting on an overbroad enterprise index.

That is why prompt leakage prevention is not mainly a prompting problem. It is an application security problem with LLM-specific failure modes. The right control stack looks familiar to experienced security engineers: least privilege, isolation, deterministic authorization, output validation, telemetry, and incident response. The novelty is where those controls need to be inserted: around the model, not inside it.

For teams building copilots, agents, or RAG products in 2026, the practical standard is simple. If leaking your system prompt would materially weaken your app’s security, the app is under-architected. If leaking retrieved context would expose regulated or business-critical data, the app is over-permissioned. EchoLeak did not invent those truths. It forced the industry to stop pretending hidden prompts were a defense layer.

Sources: Microsoft MSRC, NVD, Aim Labs, OWASP LLM07:2025, OWASP LLM01:2025.