Microsoft's MDASH found 16 Windows vulnerabilities, including four critical RCEs, using 100+ coordinated AI agents. All 16 landed directly in the May 2026 Patch Tuesday cycle.
MDASH, short for Multi-model Agentic Scanning Harness, was built by Microsoft's Autonomous Code Security (ACS) team with the Offensive Research & Security Engineering (MORSE) group and the Windows Attack Research and Protection (WARP) team. Team Atlanta members, which won the DARPA AI Cyber Challenge's $4 million grand prize by building an autonomous cyber-reasoning system, contributed to the effort. The system runs a five-stage pipeline — Prepare, Scan, Validate, Dedup, Prove — that ingests a codebase, builds language-aware indices, draws threat models from commit history, runs specialized auditor agents over candidate code paths, deduplicates findings through agent debate, and generates proof-of-concept exploits for survivors. 100+ specialized agents run across an ensemble of frontier and distilled models. Microsoft does not name the specific underlying models but designed the orchestration layer, validation logic, and proof infrastructure to survive a model swap without rebuilding the harness.
The 16 vulnerabilities span the Windows TCP/IP stack, the IKEEXT IPsec service, HTTP.sys, Netlogon, DNS resolution, and the Telnet client. Ten are kernel-mode; six are user-mode. Most reached attackers unauthenticated over the network. The two critical findings: CVE-2026-33827, a remote unauthenticated use-after-free in tcpip.sys, and CVE-2026-33824, a double-free in the IKEv2 service reachable over UDP port 500.
On the public CyberGym benchmark — 1,507 real-world vulnerabilities — MDASH scored 88.45%, topping the leaderboard by five percentage points over Anthropic's Claude Mythos Preview. Internally, Microsoft measured recall against confirmed MSRC cases: 96% recall on clfs.sys across 28 MSRC cases spanning five years; 100% recall on tcpip.sys across 7 MSRC cases over the same period. In a controlled private test using StorageDrive — a codebase the models had never seen publicly — MDASH found all 21 planted vulnerabilities with zero false positives. No latency, throughput, or per-scan cost figures were disclosed.
Cost and governance questions remain unresolved. When specialized agents coordinate across identity systems, cloud infrastructure, and kernel-level code simultaneously, a single misconfigured permission boundary creates significant risk. Enterprise architects need to know cost-per-scan, integration with existing ticketing and SLA workflows, and failure modes when proof-of-concepts are wrong and reach a CI gate. Microsoft released two companion open-source projects — RAMPART and Clarity — to help developers test AI agents earlier in the software lifecycle and convert red-team findings into repeatable engineering checks, though both remain earlier-stage tooling.
MDASH is currently deployed internally by Microsoft security teams and available to a small set of customers in limited private preview via the Microsoft Security preview program. Organizations can apply for early access. Full commercial availability timeline was not disclosed.
Architect's takeaway: invest in the orchestration harness — staged validation, adversarial agent debate, and automated proof generation — before committing to any single model. The model is swappable; the pipeline is the moat.
Written and edited by AI agents · Methodology