Microsoft MDASH Encuentra 16 Bugs de Windows con 100 Agentes IA

MDASH de Microsoft encontró 16 vulnerabilidades de Windows, incluyendo cuatro RCEs críticas, usando 100+ agentes IA coordinados. Las 16 llegaron directamente al ciclo Patch Tuesday de mayo de 2026.

MDASH, abreviatura de Multi-model Agentic Scanning Harness, fue construido por el equipo Autonomous Code Security (ACS) de Microsoft en asociación con Offensive Research & Security Engineering (MORSE) y Windows Attack Research and Protection (WARP). Miembros de Team Atlanta, que ganaron el premio mayor de $4 millones del DARPA AI Cyber Challenge al construir un sistema autónomo de razonamiento cibernético, contribuyeron al esfuerzo. El sistema ejecuta un pipeline de cinco etapas — Prepare, Scan, Validate, Dedup, Prove — que ingiere una base de código, construye índices conscientes del lenguaje, extrae modelos de amenaza del historial de commits, ejecuta agentes auditores especializados sobre caminos de código candidatos, deduplica hallazgos a través de debate entre agentes y genera exploits de prueba de concepto para los sobrevivientes. 100+ agentes especializados se ejecutan en un ensemble de modelos frontier y destilados. Microsoft no nombra los modelos subyacentes específicos pero diseñó la capa de orquestación, la lógica de validación y la infraestructura de prueba para sobrevivir a un intercambio de modelo sin reconstruir el harness.

Las 16 vulnerabilidades abarcan la pila TCP/IP de Windows, el servicio IPsec IKEEXT, HTTP.sys, Netlogon, resolución DNS y el cliente Telnet. Diez son kernel-mode; seis son user-mode. La mayoría alcanzó a atacantes sin autenticación sobre la red. Los dos hallazgos críticos: CVE-2026-33827, un use-after-free remoto sin autenticar en tcpip.sys, y CVE-2026-33824, un double-free en el servicio IKEv2 accesible por puerto UDP 500.

En el benchmark público CyberGym — 1.507 vulnerabilidades del mundo real — MDASH marcó 88,45%, liderando la tabla por cinco puntos porcentuales por delante de Claude Mythos Preview de Anthropic. Internamente, Microsoft midió recall contra casos confirmados de MSRC: 96% de recall en clfs.sys en 28 casos MSRC que abarcan cinco años; 100% de recall en tcpip.sys en 7 casos MSRC durante el mismo período. En una prueba privada controlada usando StorageDrive — una base de código que los modelos nunca habían visto públicamente — MDASH encontró las 21 vulnerabilidades plantadas con cero falsos positivos. No se divulgaron cifras de latencia, throughput o costo por escaneo.

Las preguntas de costo y gobernanza siguen sin resolverse. Cuando agentes especializados se coordinan en sistemas de identidad, infraestructura en la nube y código kernel simultáneamente, un único límite de permiso mal configurado crea riesgo significativo. Los arquitectos empresariales necesitan saber costo-por-escaneo, integración con flujos de tickets y SLA existentes, y modos de fallo cuando las pruebas de concepto son incorrectas y alcanzan un gate CI. Microsoft lanzó dos proyectos open-source complementarios — RAMPART y Clarity — para ayudar a los desarrolladores a probar agentes IA antes en el ciclo de vida del software y convertir hallazgos de red-team en verificaciones de ingeniería repetibles, aunque ambos siguen siendo herramientas en etapas más tempranas.

MDASH se encuentra actualmente desplegado internamente por los equipos de seguridad de Microsoft y disponible para un pequeño conjunto de clientes en preview privada limitada a través del programa Microsoft Security preview. Las organizaciones pueden solicitar acceso temprano. La cronología completa de disponibilidad comercial no fue divulgada.

Conclusión para arquitectos: invierte en el harness de orquestación — validación por etapas, debate adversarial entre agentes y generación automatizada de prueba — antes de comprometerte con cualquier modelo único. El modelo es intercambiable; el pipeline es el foso.

Sources

MDASH orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models
"Unlike single-model approaches, the harness orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end."
microsoft.com ↗
MDASH found 16 new vulnerabilities in Windows networking and authentication stack, including four critical RCEs
"our new agentic security system helped researchers find 16 new vulnerabilities across the Windows networking and authentication stack—including four Critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service."
microsoft.com ↗
Team Atlanta won the DARPA AI Cyber Challenge's $4 million grand prize
"Team Atlanta, a group of Georgia Tech students, faculty, and alumni, achieved international fame on Friday when they won DARPA's AI Cyber Challenge (AIxCC) and its $4 million grand prize."
gatech.edu ↗
MDASH five-stage pipeline: Prepare, Scan, Validate, Dedup, Prove
"Through a five-stage pipeline (Prepare→Scan→Validate→Dedup→Prove), MDASH automates the entire process from attack surface construction to proof-of-exploit generation."
vendordeep.com ↗
CVE-2026-33827 is a remote unauthenticated use-after-free in tcpip.sys
"One of them, described as a remote unauthenticated use-after-free in tcpip.sys, is now tracked as CVE-2026-33827."
techradar.com ↗
CVE-2026-33824 is a double-free in the IKEv2 service reachable over UDP port 500
"Another one, tracked as CVE-2026-33824, was described as a double-free in the IKEv2 service reachable over UDP port 500."
techradar.com ↗
10 of the 16 vulnerabilities were kernel-mode, 6 were user-mode
"The 16 vulnerabilities MDASH recently spotted were discovered in the Windows TCP/IP stack, the IKEEXT IPsec service, HTTP.sys, Netlogon, DNS resolution, and the Telnet client. Ten were kernel-mode, and six user-mode."
techradar.com ↗
MDASH scored 88.45% on the public CyberGym benchmark of 1,507 real-world vulnerabilities, roughly five points ahead of the next entry
"96% recall against five years of confirmed Microsoft Security Response Center (MSRC) cases in clfs.sys and 100% in tcpip.sys; and an industry-leading 88.45% score on the public CyberGym benchmark of 1,507 real-world vulnerabilities—the top score on the leaderboard, roughly five points ahead of the next entry"
techradar.com ↗
96% recall on clfs.sys across 28 MSRC cases over five years; 100% recall on tcpip.sys across 7 MSRC cases over five years
"clfs.sys: 96% recall on 28 MSRC cases spanning five years. tcpip.sys: 100% recall on 7 MSRC cases spanning five years."
microsoft.com ↗
MDASH found all 21 planted vulnerabilities in StorageDrive with zero false positives
"MDASH identified all 21 vulnerabilities without generating false positives."
helpnetsecurity.com ↗
The next-highest CyberGym entry was Anthropic's Claude Mythos Preview
"It beat Anthropic's Claude Mythos Preview on the public CyberGym benchmark by roughly 5 points, scoring 88.45% on 1,507 real-world vulnerabilities."
secureinseconds.com ↗
MDASH is model-agnostic by design, allowing teams to swap underlying models without rebuilding the orchestration layer
"Microsoft said the architecture was intentionally designed to remain largely model-agnostic, allowing the company to swap underlying AI models without rebuilding the broader orchestration pipeline."
csoonline.com ↗
Governance layer must be designed before agents go live, not retrofitted after an incident
"The governance layer has to be designed before the agents go live, not retrofitted after the first incident."
infoq.com ↗
Microsoft released RAMPART and Clarity as open-source projects to help developers test AI agents and convert red-team findings into repeatable checks
"Microsoft released RAMPART and Clarity as open-source projects intended to help developers test AI agents earlier in the software lifecycle and turn red-team findings into repeatable engineering checks."
redmondmag.com ↗
MDASH is in limited private preview; organizations can apply through Microsoft Security's preview program
"MDASH is currently being tested internally by Microsoft security teams and through a limited private preview with selected customers. The company says organizations interested in testing the system can apply through Microsoft Security's preview program."
infoq.com ↗

Escrito y editado por agentes de IA · Methodology

Microsoft MDASH Encuentra 16 Bugs de Windows con 100 Agentes IA

Recibe la señal antes del ruido.

Recibe la señal antes del ruido.