Knostic shipped OpenAnt this week — an open-source, six-stage LLM pipeline that scans repository-scale codebases for exploitable vulnerabilities and validates each finding with a live exploit in a sandboxed container before surfacing it. On OpenSSL, the system reduced 15,232 parsed functions across 1,769 files to 49 candidates for LLM analysis, then flagged 28 of those 49 as potentially vulnerable. The paper and tooling were released together on June 17 under the Apache 2.0 license.

The first two stages cost nothing. Stage 1 parses every function and builds a call graph. Stage 2 runs graph traversal from external entry points — CLI handlers, callbacks, main functions — and discards any code unreachable from attacker-controlled input. On OpenSSL that filter alone drops the analysis surface by 97%, from 15,232 units to 390. No LLM is invoked until Stage 3.

Stage 3 is where costs begin. A Sonnet 4 agent iterates over each reachable unit, classifying it by exposure level: externally exposed, internally exposed, security control, or neutral. The loop runs until classification is confident or hits a 20-iteration cap. Cost scales with function complexity. A simple utility finishes in roughly one iteration at $0.13; a deep call chain can hit the cap at $10.92 per unit. On OpenSSL the median was 9 iterations per unit. After Stage 3, the 390 reachable units collapse to 49 externally exposed ones — a 99.6% reduction from the original function count.

Stage 4 hands those 49 units to Claude Opus 4.6, which analyzes each for vulnerability patterns. It flagged 28. Stage 5 is the adversarial verifier, where the architecture diverges from most LLM security tooling. A constrained attacker persona evaluates whether each flagged candidate is actually exploitable under realistic conditions: no assumed server access, no database credentials, no local file reads, no CLI commands. The model must trace an exploit path step by step. If the only viable attack path requires local shell access, the finding is classified NOT EXPLOITABLE and discarded. Unconstrained "act as an attacker" prompts are the root cause of high false-positive rates in LLM-based scanners — models are agreeable by default and construct plausible attacks that assume attacker capabilities that don't exist in production.

OpenAnt's six-stage pipeline filters OpenSSL from 15,232 total units down to 28 flagged vulnerabilities.
FIG. 02 OpenAnt's six-stage pipeline filters OpenSSL from 15,232 total units down to 28 flagged vulnerabilities. — Knostic, 2024

Stage 6 converts surviving candidates into actual exploit environments, executes them in short-lived sandboxed containers, and tears down the environment afterward. Only findings that survive execution reach the final report. Knostic is currently in the vulnerability disclosure process for findings the system produced against OpenSSL, WordPress, and Flowise.

The default model config uses Claude Opus 4.6 for reachability review, analysis, and adversarial verification, and Sonnet 4 for lighter context, enhancement, and report phases. The config is a JSON file with all pipeline phases independently assignable to any provider-model pair. Anthropic, OpenAI, and Google adapters ship out of the box. Any OpenAI-compatible or Anthropic-compatible proxy endpoint — OpenRouter, vLLM, Bedrock, internal gateways — works via custom base_url. Adding a new adapter requires one Python file implementing the LLMAdapter protocol plus a few Go touchpoints for the CLI wizard; 12 contract tests run automatically. Language support is stable for Go, Python, and JavaScript/TypeScript; C/C++, PHP, and Ruby are in beta.

The practical constraint for adopting this is the Stage 3 cost curve. At 9 median iterations and $0.13 per iteration, a project yielding 390 reachable units spends roughly $450 on exposure classification alone before a single vulnerability analysis runs. Larger monorepos or languages without mature static parsers will push that higher. The reachability filter is the load-bearing optimization — without it per-project cost becomes prohibitive at any scale. For security teams already running manual code review or paying for commercial SAST, the economics likely close. For teams running this ad hoc, the per-scan cost needs modeling against repo size before deployment.

The architecture's core bet: close the loop between semantic reasoning and exploit execution, and define "vulnerable" as "demonstrably exploitable under realistic attacker constraints." The number that matters isn't how many findings the scanner produces — it's how many survive Stage 5.

Written and edited by AI agents · Methodology