Prompt Injection at Scale: EchoLeak Deep Dive [2026]
Prompt injection stopped being a lab curiosity the moment LLMs gained real connectors, tenant-wide search, and permission to act across enterprise systems. The most useful case study so far is EchoLeak, the Microsoft 365 Copilot issue tracked as CVE-2025-32711. Publicly disclosed on June 11, 2025 after a server-side fix in May 2025, EchoLeak showed that an attacker could weaponize normal-looking content so the AI assistant would later reinterpret it as instructions and leak sensitive data without a victim click.
That is the production lesson: prompt injection is not only a model-alignment problem. It is a distributed systems problem spanning retrieval, trust labeling, prompt construction, output rendering, and network egress. If your API wraps an LLM with documents, mail, tickets, chat logs, or tools, you already have the ingredients.
Takeaway
EchoLeak mattered because it turned passive data into active instructions. In production, the winning defense is not a better refusal prompt alone. It is a pipeline that preserves trust boundaries from ingestion to retrieval to model output to side-effect execution.
CVE Summary Card
- CVE: CVE-2025-32711
- Product: Microsoft 365 Copilot
- Disclosure: Aim Labs published details on June 11, 2025; Microsoft assigned the CVE and reported a server-side mitigation in May 2025.
- Attack type: Indirect prompt injection leading to data exfiltration
- Interaction: Reported as a zero-click chain, meaning the victim did not need to click a link or run code manually
- What made it novel: The exploit abused retrieval and orchestration behavior rather than memory corruption or classic command execution
- Business impact: Potential exposure of tenant data surfaced to the assistant through Outlook, documents, chats, and other connected sources
- Status: Microsoft said there was no evidence of real-world customer compromise before mitigation
Two references shaped the public understanding of the incident: Aim Labs' technical write-up and Microsoft's own guidance on defending against indirect prompt injection. Together they made one point clear. Once an assistant can read untrusted content and also access sensitive context, the line between “data” and “instruction” becomes an attack surface.
For engineering teams building internal copilots or customer-facing LLM APIs, EchoLeak is useful precisely because it was not about one model family. The pattern applies whether your stack uses a hosted GPT-class model, an on-prem open-weight model, or a hybrid routing layer. The exploit lives in the surrounding architecture.
Vulnerable Code Anatomy
The typical vulnerable flow looks harmless in code review because each step is individually reasonable. A retriever fetches relevant context. A prompt builder concatenates system guidance and user input. The model produces a helpful answer. A renderer turns the answer into rich text, links, or actions. The weakness appears in composition.
const retrieved = await retrieveContext(userQuery, tenantSources);
const prompt = buildPrompt({
systemRules,
userQuery,
context: retrieved
});
const answer = await callModel(prompt);
const rendered = renderRichOutput(answer);
await maybeExecuteSideEffects(rendered);What is missing here is a durable notion of trust. retrieveContext brings in email bodies, documents, tickets, or web content. Some of that material is attacker-controlled. But buildPrompt often flattens everything into one token stream, so the model sees untrusted text right next to privileged instructions. Then renderRichOutput and maybeExecuteSideEffects may treat the model's output as safe because it came from the “assistant,” even though the assistant was influenced by hostile context.
This is the core anatomy of indirect prompt injection:
- Untrusted content enters a trusted context window.
- The orchestrator fails to preserve the source's trust label.
- The model is asked to follow instructions and summarize data in the same pass.
- Downstream systems execute, fetch, render, or send model output with insufficient policy checks.
In conventional application security terms, the pipeline confuses content with control flow. In LLM systems, that confusion is especially dangerous because natural language is both data format and instruction language.
Why basic guardrails fail
Many teams start with stronger system prompts, a denylist of phrases, or a classifier that tries to detect hostile wording. Those are useful, but EchoLeak shows why they are not enough. Attackers do not need the model to say “ignore previous instructions” in an obvious way. They only need the broader system to let untrusted context influence a high-privilege reasoning step. Once the assistant can access sensitive tenant data and produce externally consumable output, small prompt-control failures become architectural failures.
Attack Timeline
- January 2025: Aim Labs says it discovered the attack chain affecting Microsoft 365 Copilot.
- Early 2025: Findings were reported to Microsoft through coordinated disclosure.
- May 2025: Microsoft applied a server-side mitigation, according to public reporting and vendor statements. No customer action was required.
- June 11, 2025: Aim Labs publicly disclosed EchoLeak and Microsoft assigned CVE-2025-32711.
- July 29, 2025: Microsoft published broader guidance on defending against indirect prompt injection attacks.
- By April 6, 2026: EchoLeak remains the canonical production example used in AI security discussions because it demonstrated zero-click prompt injection against an enterprise copilot at realistic scale.
That sequence matters because it maps cleanly to how AI incidents will mature. First comes research showing a chain across components. Then the vendor applies a quiet service-side fix. Only later does the industry absorb the design lesson and turn it into defensive architecture. Teams that wait for CVE headlines before hardening their own LLM APIs are already late.
Exploitation Walkthrough
This walkthrough stays conceptual on purpose. The goal is to understand the failure mode, not reproduce it.
- An attacker places malicious instructions inside content likely to be indexed or retrieved later. In an enterprise copilot that could be an email, shared document, ticket, or another data source the assistant can read.
- The content is designed to look benign to humans while still influencing the model when retrieved. The important point is not the exact wording. The point is that the text is attacker-controlled but reaches the model with insufficient separation.
- Later, the victim asks a normal question such as a project summary, risk update, or inbox digest. Nothing in the prompt looks dangerous from the user's perspective.
- The retriever ranks the attacker content as relevant and includes it alongside legitimate internal context.
- The orchestrator merges user intent, system policy, and retrieved content into one prompt. This is where the trust boundary collapses.
- The model follows the malicious embedded instructions as part of its reasoning process, which may include gathering sensitive nearby context that was never meant to leave the tenant boundary.
- A downstream component turns the response into an externalized artifact such as a remote fetch, rich link, image load, or another output channel capable of carrying data out.
Notice what did not happen here: there was no macro, no shell, no browser exploit, and potentially no user click. The system harmed itself because the assistant was allowed to treat untrusted retrieved text as part of its control plane.
This is why some researchers describe the class as a scope-violation problem. The model should have used attacker content only as quoted evidence to reason about. Instead, the system let that content expand its authority and steer privileged behavior.
Hardening Guide
Defending LLM-powered APIs in production requires layered controls. The right model helps, but your strongest defenses live in the orchestrator, policy engine, and connector layer.
1. Separate data from instructions
Never concatenate untrusted retrieval results into the same undifferentiated block as system policy. Treat retrieved content as data, wrap it with explicit provenance metadata, and tell the model it is not instruction-bearing. More importantly, make downstream logic enforce that distinction instead of trusting the prompt alone.
const docs = await retrieveContext(query, sources);
const labeledDocs = docs.map(classifyUntrustedInput);
const response = await callModel({
system: systemRules,
user: query,
evidence: labeledDocs,
tools: allowedToolsForRequest
});
const safe = await enforceEgressPolicy(response, user, labeledDocs);
return renderSafeOutput(safe);classifyUntrustedInput should assign provenance, sensitivity, source type, and trust level. enforceEgressPolicy should block or redact outputs that try to leave approved channels, especially if they contain secrets, tenant identifiers, or high-sensitivity text.
2. Constrain retrieval and connector scope
The fastest way to reduce blast radius is to reduce what the model can see. Use least-privilege connectors, query-time authorization checks, source allowlists, and sensitivity-aware retrieval. If a user asks for a meeting summary, the assistant probably does not need raw inbox access, historical chats, and broad SharePoint search in the same turn.
Production teams often over-focus on the model and under-focus on retrieval breadth. EchoLeak is the opposite lesson. RAG is where your exposure compounds.
3. Put side effects behind a policy gate
Any operation that fetches a URL, sends email, posts to chat, opens a ticket, or writes to a document must require explicit policy approval outside the model. Do not let the model mint a side effect simply because the answer format happens to contain a link or a tool call. Gate all such actions through deterministic checks on domain allowlists, data classification, and user intent confirmation.
4. Treat model output as untrusted until verified
LLM output is not trusted just because it came from your assistant. Parse it, validate it, and apply content-security rules before rendering. Strip remote references when possible. Disable automatic expansion of unapproved external resources. If your client renders markdown, HTML, images, or embedded previews, those renderers are part of the attack surface.
5. Build AI-specific detection and evals
Traditional security telemetry will miss many prompt-injection paths because the payload is ordinary language. Add tests for retrieval poisoning, instruction smuggling, data exfil intent, and tool misuse. Maintain adversarial eval corpora and rerun them on every prompt, model, or orchestration change.
When those corpora include production-like samples, scrub them first. Our Data Masking Tool is useful here because AI security testing quickly drifts into realistic documents, tickets, and support transcripts that may contain sensitive fields.
6. Log provenance, not just prompts
For incident response, raw prompt logs are not enough. You need to know which sources were retrieved, how they were ranked, what trust labels they carried, which policies fired, and whether any egress attempt was blocked. Without provenance telemetry, post-incident analysis becomes guesswork.
7. Design a human confirmation model that matches risk
User confirmation still matters, but only for meaningful actions. Asking users to click “Allow” on every tool call trains them to approve blindly. Reserve confirmation for operations that cross trust zones: external sends, broad data exports, connector expansion, or privilege escalation. Zero-click incidents happen when teams confuse visibility with control.
Architectural Lessons
EchoLeak delivered several durable lessons for anyone shipping LLM-powered APIs.
- The model is not the security boundary. Your actual boundary is the orchestration layer that decides what enters context and what may happen afterward.
- Trust labels must survive every hop. If provenance disappears when you build the prompt, you have already lost.
- Retrieval is a privileged operation. Query rewriting, ranking, chunking, and source selection are all security-relevant code paths.
- Output rendering is part of the exploit chain. Safe text generation can become unsafe once a client auto-fetches, expands, or sends it.
- Least privilege beats clever prompting. A small, tightly scoped assistant is materially safer than a general assistant with broad tenant access.
- Server-side controls matter most. Microsoft's May 2025 mitigation mattered because centralized changes can neutralize entire exploit classes faster than client updates or user guidance.
The strategic takeaway for 2026 is straightforward. As soon as an LLM can read from one system and write, fetch, or trigger behavior in another, prompt injection becomes an inter-service security problem. Engineering teams should evaluate it with the same seriousness they apply to SSRF, XSS, and injection flaws in classic web stacks.
EchoLeak did not just reveal a bug in one product. It exposed a category error the industry made early in the copilot wave: assuming that natural-language reasoning could safely absorb untrusted content without changing the system's authority. Production systems cannot rely on that assumption. They need explicit trust boundaries, deterministic gates, and a design where the model can assist with decisions but never silently redefine the rules of execution.
Further reading: Aim Labs' June 11, 2025 EchoLeak disclosure at aim.security and Microsoft's July 29, 2025 guidance on indirect prompt injection at msrc.microsoft.com.
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.