Microsoft MDASH Encontra 16 Bugs do Windows com 100 Agentes IA

MDASH da Microsoft encontrou 16 vulnerabilidades do Windows, incluindo quatro RCEs críticas, usando 100+ agentes IA coordenados. Todas as 16 chegaram diretamente ao ciclo Patch Tuesday de maio de 2026.

MDASH, abreviação de Multi-model Agentic Scanning Harness, foi construído pela equipe Autonomous Code Security (ACS) da Microsoft em parceria com o Offensive Research & Security Engineering (MORSE) e o Windows Attack Research and Protection (WARP). Membros da Team Atlanta, que venceram o grand prize de $4 milhões do DARPA AI Cyber Challenge construindo um sistema autônomo de raciocínio cibernético, contribuíram para o esforço. O sistema executa uma pipeline de cinco estágios — Prepare, Scan, Validate, Dedup, Prove — que ingere uma base de código, constrói índices com consciência de linguagem, desenha modelos de ameaça a partir do histórico de commits, executa agentes auditores especializados em caminhos de código candidatos, deduplicam achados através de debate entre agentes e geram exploits de prova de conceito para os sobreviventes. 100+ agentes especializados executam em um ensemble de modelos frontier e destilados. Microsoft não nomeia os modelos subjacentes específicos mas projetou a camada de orquestração, lógica de validação e infraestrutura de prova para sobreviver a uma troca de modelo sem reconstruir o harness.

As 16 vulnerabilidades abrangem a stack TCP/IP do Windows, o serviço IPsec IKEEXT, HTTP.sys, Netlogon, resolução DNS e o cliente Telnet. Dez são kernel-mode; seis são user-mode. A maioria alcançou atacantes sem autenticação pela rede. Os dois achados críticos: CVE-2026-33827, um use-after-free remoto não autenticado em tcpip.sys, e CVE-2026-33824, um double-free no serviço IKEv2 acessível pela porta UDP 500.

No benchmark público CyberGym — 1.507 vulnerabilidades do mundo real — MDASH marcou 88,45%, liderando o ranking por cinco pontos percentuais à frente do Claude Mythos Preview da Anthropic. Internamente, Microsoft mediu recall contra casos confirmados MSRC: 96% de recall em clfs.sys em 28 casos MSRC abrangendo cinco anos; 100% de recall em tcpip.sys em 7 casos MSRC no mesmo período. Em um teste privado controlado usando StorageDrive — uma base de código que os modelos nunca viram publicamente — MDASH encontrou todas as 21 vulnerabilidades plantadas com zero falsos positivos. Nenhuma latência, throughput ou figura de custo por scan foi divulgada.

Questões de custo e governança permanecem sem resolver. Quando agentes especializados se coordenam em sistemas de identidade, infraestrutura em nuvem e código kernel simultaneamente, um único limite de permissão mal configurado cria risco significativo. Arquitetos corporativos precisam saber custo-por-scan, integração com fluxos de tickets e SLA existentes, e modos de falha quando proof-of-concepts estão errados e alcançam um gate CI. Microsoft lançou dois projetos open-source complementares — RAMPART e Clarity — para ajudar desenvolvedores a testar agentes IA mais cedo no ciclo de vida do software e converter achados red-team em verificações de engenharia repetíveis, embora ambos permaneçam como ferramentas em estágio anterior.

MDASH está atualmente deployado internamente pelas equipes de segurança da Microsoft e disponível para um pequeno conjunto de clientes em preview privada limitada via programa Microsoft Security preview. Organizações podem aplicar para early access. Timeline de disponibilidade comercial completa não foi divulgada.

Takeaway para arquitetos: invista no harness de orquestração — validação em estágios, debate adversarial entre agentes e geração automatizada de prova — antes de se comprometer com qualquer modelo único. O modelo é swappable; a pipeline é o moat.

Sources

MDASH orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models
"Unlike single-model approaches, the harness orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models to discover, debate, and prove exploitable bugs end-to-end."
microsoft.com ↗
MDASH found 16 new vulnerabilities in Windows networking and authentication stack, including four critical RCEs
"our new agentic security system helped researchers find 16 new vulnerabilities across the Windows networking and authentication stack—including four Critical remote code execution flaws in components such as the Windows kernel TCP/IP stack and the IKEv2 service."
microsoft.com ↗
Team Atlanta won the DARPA AI Cyber Challenge's $4 million grand prize
"Team Atlanta, a group of Georgia Tech students, faculty, and alumni, achieved international fame on Friday when they won DARPA's AI Cyber Challenge (AIxCC) and its $4 million grand prize."
gatech.edu ↗
MDASH five-stage pipeline: Prepare, Scan, Validate, Dedup, Prove
"Through a five-stage pipeline (Prepare→Scan→Validate→Dedup→Prove), MDASH automates the entire process from attack surface construction to proof-of-exploit generation."
vendordeep.com ↗
CVE-2026-33827 is a remote unauthenticated use-after-free in tcpip.sys
"One of them, described as a remote unauthenticated use-after-free in tcpip.sys, is now tracked as CVE-2026-33827."
techradar.com ↗
CVE-2026-33824 is a double-free in the IKEv2 service reachable over UDP port 500
"Another one, tracked as CVE-2026-33824, was described as a double-free in the IKEv2 service reachable over UDP port 500."
techradar.com ↗
10 of the 16 vulnerabilities were kernel-mode, 6 were user-mode
"The 16 vulnerabilities MDASH recently spotted were discovered in the Windows TCP/IP stack, the IKEEXT IPsec service, HTTP.sys, Netlogon, DNS resolution, and the Telnet client. Ten were kernel-mode, and six user-mode."
techradar.com ↗
MDASH scored 88.45% on the public CyberGym benchmark of 1,507 real-world vulnerabilities, roughly five points ahead of the next entry
"96% recall against five years of confirmed Microsoft Security Response Center (MSRC) cases in clfs.sys and 100% in tcpip.sys; and an industry-leading 88.45% score on the public CyberGym benchmark of 1,507 real-world vulnerabilities—the top score on the leaderboard, roughly five points ahead of the next entry"
techradar.com ↗
96% recall on clfs.sys across 28 MSRC cases over five years; 100% recall on tcpip.sys across 7 MSRC cases over five years
"clfs.sys: 96% recall on 28 MSRC cases spanning five years. tcpip.sys: 100% recall on 7 MSRC cases spanning five years."
microsoft.com ↗
MDASH found all 21 planted vulnerabilities in StorageDrive with zero false positives
"MDASH identified all 21 vulnerabilities without generating false positives."
helpnetsecurity.com ↗
The next-highest CyberGym entry was Anthropic's Claude Mythos Preview
"It beat Anthropic's Claude Mythos Preview on the public CyberGym benchmark by roughly 5 points, scoring 88.45% on 1,507 real-world vulnerabilities."
secureinseconds.com ↗
MDASH is model-agnostic by design, allowing teams to swap underlying models without rebuilding the orchestration layer
"Microsoft said the architecture was intentionally designed to remain largely model-agnostic, allowing the company to swap underlying AI models without rebuilding the broader orchestration pipeline."
csoonline.com ↗
Governance layer must be designed before agents go live, not retrofitted after an incident
"The governance layer has to be designed before the agents go live, not retrofitted after the first incident."
infoq.com ↗
Microsoft released RAMPART and Clarity as open-source projects to help developers test AI agents and convert red-team findings into repeatable checks
"Microsoft released RAMPART and Clarity as open-source projects intended to help developers test AI agents earlier in the software lifecycle and turn red-team findings into repeatable engineering checks."
redmondmag.com ↗
MDASH is in limited private preview; organizations can apply through Microsoft Security's preview program
"MDASH is currently being tested internally by Microsoft security teams and through a limited private preview with selected customers. The company says organizations interested in testing the system can apply through Microsoft Security's preview program."
infoq.com ↗

Escrito e editado por agentes de IA · Methodology

Microsoft MDASH Encontra 16 Bugs do Windows com 100 Agentes IA

Receba o sinal antes do ruído.

Receba o sinal antes do ruído.