AppSec
Microsoft MDASH Moves Vulnerability Discovery Into Agent Pipelines
Published June 03, 2026 by Dillip Chowdary
Microsoft MDASH is the clearest Build 2026 signal that vulnerability scanners are becoming coordinated agent systems. Microsoft describes the preview as a multi-model scanning harness that uses more than 100 specialized AI agents to discover, validate, and prove exploitability in codebases.
The system sits inside a broader security stack. Microsoft says MDASH uses more than 100 trillion signals per day, integrates with Microsoft Defender, and connects into GitHub Code Security so findings can be prioritized with production exposure and remediated with Copilot-assisted workflows.
Why This Is Different From SAST
Classic static analysis produces candidate findings. MDASH-style systems try to reason across code, runtime context, and exploitability evidence before presenting a result. The value is not just recall; it is reducing theoretical noise so developers spend time on issues that matter.
Microsoft says the harness recently reached a 96.55% CyberGym benchmark score after a roughly 10% improvement in less than three weeks. The number is useful, but the enterprise question is operational: can the system preserve evidence, respect role-based access, and fit disclosure processes when it finds sensitive flaws?
What AppSec Teams Should Do
Teams evaluating MDASH or similar systems should define reviewer ownership before enabling broad scans. Agent-discovered vulnerabilities need access controls, finding provenance, model and prompt versioning, exploit proof containment, and a clear policy for external disclosure.
The practical path is staged adoption: begin with non-production repositories, compare findings against existing SAST and DAST queues, measure confirmed exploitability, then only connect production runtime signals after the evidence workflow is trustworthy.
How the Pipeline Should Be Measured
The right benchmark is not how many findings an agent can produce in a demo. AppSec teams need to measure confirmed exploitable issues per reviewer hour, time from report to validated patch, duplicate rate, and how often the agent identifies the correct owning service. Those metrics decide whether the system expands security capacity or simply moves alert fatigue into a new interface.
The Defender and GitHub Code Security integration is important because it can enrich a code finding with runtime exposure and data sensitivity. A vulnerable function in an internal test utility is not the same priority as the same pattern on an internet-facing service that touches regulated records. Agentic scanners become more useful when they can reason over that context before assigning severity.
Copilot-assisted remediation also needs discipline. A generated fix should include the vulnerable path, the proposed code change, tests that prove the issue is closed, and a reviewer note that explains residual risk. Without that package, teams may accept patches that silence the scanner while leaving the exploit path reachable through a slightly different input.
Governance for Exploit Proofs
MDASH-style systems blur the line between discovery and exploitation because proof generation is part of the value. That means exploit artifacts need their own retention and access policy. Payloads, reproduction scripts, and generated traces should be stored like sensitive security evidence, not like ordinary logs.
Role-based access is the minimum. Security engineers, service owners, and compliance teams may need different views of the same finding. Developers need enough detail to fix the issue, while broader audiences may only need severity, affected asset, and remediation status.
Teams should also define a kill switch. If an agent starts producing destructive payloads, scanning out of scope, or touching regulated repositories without approval, administrators need a way to stop execution and preserve logs for review.
The Adoption Playbook
A practical pilot can start with one language ecosystem and a known vulnerable benchmark set. Run the agent against repositories where the team already understands the bug classes, then compare its findings with CodeQL, dependency scanning, and manual review. Only after that should teams use production exposure signals for prioritization.
The long-term shift is structural. Vulnerability management is moving from static queues to agent pipelines that discover, validate, prioritize, and propose patches. Security leaders should design the workflow around evidence and accountability before the tooling spreads across engineering teams.