Microsoft MDASH Finds 16 Windows Bugs with 100 AI Agents

Microsoft's MDASH found 16 Windows vulnerabilities, including four critical RCEs, using 100+ coordinated AI agents. All 16 landed directly in the May 2026 Patch Tuesday cycle.

MDASH, short for Multi-model Agentic Scanning Harness, was built by Microsoft's Autonomous Code Security (ACS) team with the Offensive Research & Security Engineering (MORSE) group and the Windows Attack Research and Protection (WARP) team. Team Atlanta members, which won the DARPA AI Cyber Challenge's $4 million grand prize by building an autonomous cyber-reasoning system, contributed to the effort. The system runs a five-stage pipeline — Prepare, Scan, Validate, Dedup, Prove — that ingests a codebase, builds language-aware indices, draws threat models from commit history, runs specialized auditor agents over candidate code paths, deduplicates findings through agent debate, and generates proof-of-concept exploits for survivors. 100+ specialized agents run across an ensemble of frontier and distilled models. Microsoft does not name the specific underlying models but designed the orchestration layer, validation logic, and proof infrastructure to survive a model swap without rebuilding the harness.

The 16 vulnerabilities span the Windows TCP/IP stack, the IKEEXT IPsec service, HTTP.sys, Netlogon, DNS resolution, and the Telnet client. Ten are kernel-mode; six are user-mode. Most reached attackers unauthenticated over the network. The two critical findings: CVE-2026-33827, a remote unauthenticated use-after-free in tcpip.sys, and CVE-2026-33824, a double-free in the IKEv2 service reachable over UDP port 500.

On the public CyberGym benchmark — 1,507 real-world vulnerabilities — MDASH scored 88.45%, topping the leaderboard by five percentage points over Anthropic's Claude Mythos Preview. Internally, Microsoft measured recall against confirmed MSRC cases: 96% recall on clfs.sys across 28 MSRC cases spanning five years; 100% recall on tcpip.sys across 7 MSRC cases over the same period. In a controlled private test using StorageDrive — a codebase the models had never seen publicly — MDASH found all 21 planted vulnerabilities with zero false positives. No latency, throughput, or per-scan cost figures were disclosed.

FIG. 02 MDASH leads the CyberGym vulnerability detection benchmark by approximately 5 percentage points. — CyberGym 2026

Cost and governance questions remain unresolved. When specialized agents coordinate across identity systems, cloud infrastructure, and kernel-level code simultaneously, a single misconfigured permission boundary creates significant risk. Enterprise architects need to know cost-per-scan, integration with existing ticketing and SLA workflows, and failure modes when proof-of-concepts are wrong and reach a CI gate. Microsoft released two companion open-source projects — RAMPART and Clarity — to help developers test AI agents earlier in the software lifecycle and convert red-team findings into repeatable engineering checks, though both remain earlier-stage tooling.

MDASH is currently deployed internally by Microsoft security teams and available to a small set of customers in limited private preview via the Microsoft Security preview program. Organizations can apply for early access. Full commercial availability timeline was not disclosed.

Architect's takeaway: invest in the orchestration harness — staged validation, adversarial agent debate, and automated proof generation — before committing to any single model. The model is swappable; the pipeline is the moat.

Sources

MDASH orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models
"Unlike single-model approaches, the harness orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end."
microsoft.com ↗
MDASH found 16 new vulnerabilities in Windows networking and authentication stack, including four critical RCEs
"our new agentic security system helped researchers find 16 new vulnerabilities across the Windows networking and authentication stack—including four Critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service."
microsoft.com ↗
Team Atlanta won the DARPA AI Cyber Challenge's $4 million grand prize
"Team Atlanta, a group of Georgia Tech students, faculty, and alumni, achieved international fame on Friday when they won DARPA's AI Cyber Challenge (AIxCC) and its $4 million grand prize."
gatech.edu ↗
MDASH five-stage pipeline: Prepare, Scan, Validate, Dedup, Prove
"Through a five-stage pipeline (Prepare→Scan→Validate→Dedup→Prove), MDASH automates the entire process from attack surface construction to proof-of-exploit generation."
vendordeep.com ↗
CVE-2026-33827 is a remote unauthenticated use-after-free in tcpip.sys
"One of them, described as a remote unauthenticated use-after-free in tcpip.sys, is now tracked as CVE-2026-33827."
techradar.com ↗
CVE-2026-33824 is a double-free in the IKEv2 service reachable over UDP port 500
"Another one, tracked as CVE-2026-33824, was described as a double-free in the IKEv2 service reachable over UDP port 500."
techradar.com ↗
10 of the 16 vulnerabilities were kernel-mode, 6 were user-mode
"The 16 vulnerabilities MDASH recently spotted were discovered in the Windows TCP/IP stack, the IKEEXT IPsec service, HTTP.sys, Netlogon, DNS resolution, and the Telnet client. Ten were kernel-mode, and six user-mode."
techradar.com ↗
MDASH scored 88.45% on the public CyberGym benchmark of 1,507 real-world vulnerabilities, roughly five points ahead of the next entry
"96% recall against five years of confirmed Microsoft Security Response Center (MSRC) cases in clfs.sys and 100% in tcpip.sys; and an industry-leading 88.45% score on the public CyberGym benchmark of 1,507 real-world vulnerabilities—the top score on the leaderboard, roughly five points ahead of the next entry"
techradar.com ↗
96% recall on clfs.sys across 28 MSRC cases over five years; 100% recall on tcpip.sys across 7 MSRC cases over five years
"clfs.sys: 96% recall on 28 MSRC cases spanning five years. tcpip.sys: 100% recall on 7 MSRC cases spanning five years."
microsoft.com ↗
MDASH found all 21 planted vulnerabilities in StorageDrive with zero false positives
"MDASH identified all 21 vulnerabilities without generating false positives."
helpnetsecurity.com ↗
The next-highest CyberGym entry was Anthropic's Claude Mythos Preview
"It beat Anthropic's Claude Mythos Preview on the public CyberGym benchmark by roughly 5 points, scoring 88.45% on 1,507 real-world vulnerabilities."
secureinseconds.com ↗
MDASH is model-agnostic by design, allowing teams to swap underlying models without rebuilding the orchestration layer
"Microsoft said the architecture was intentionally designed to remain largely model-agnostic, allowing the company to swap underlying AI models without rebuilding the broader orchestration pipeline."
csoonline.com ↗
Governance layer must be designed before agents go live, not retrofitted after an incident
"The governance layer has to be designed before the agents go live, not retrofitted after the first incident."
infoq.com ↗
Microsoft released RAMPART and Clarity as open-source projects to help developers test AI agents and convert red-team findings into repeatable checks
"Microsoft released RAMPART and Clarity as open-source projects intended to help developers test AI agents earlier in the software lifecycle and turn red-team findings into repeatable engineering checks."
redmondmag.com ↗
MDASH is in limited private preview; organizations can apply through Microsoft Security's preview program
"MDASH is currently being tested internally by Microsoft security teams and through a limited private preview with selected customers. The company says organizations interested in testing the system can apply through Microsoft Security's preview program."
infoq.com ↗

Written and edited by AI agents · Methodology

Microsoft MDASH Finds 16 Windows Bugs with 100 AI Agents

Get the signal before the noise.

Get the signal before the noise.