FASE Cuts Hallucination Detection Cost to 0.3% of Rivals

Fast Adaptive Semantic Entropy (FASE), a new technique developed by University of Waterloo researchers Shizhe Lin and Ladan Tahvildari, reduces the runtime cost of hallucination detection in multi-agent code generation to approximately 0.3% of existing semantic-entropy methods, while increasing correlation to ground-truth test outcomes by 25%. This method replaces costly LLM-as-judge equivalence checks with a minimum-spanning-tree algorithm over code embeddings, providing a black-box uncertainty signal that can be integrated between agent stages without altering model weights or hidden states.

Multi-agent coding systems such as MetaGPT, CodeCoR, and AdaCoder involve role-specialized agents that pass artifacts downstream, with early errors propagating through planning, coding, testing, and review stages. Traditional semantic entropy methods detect uncertainty by clustering functionally equivalent outputs, using bidirectional LLM entailment checks that require a judge call for every pair of generated samples. FASE eliminates these calls by constructing a dissimilarity graph across N generated code samples, weighing edges by both AST structure and semantic program meaning captured by an embedding model, and computing entropy from the minimum spanning tree of that graph instead of fully connected pairwise comparisons. The semantic layer utilizes Qwen3-Embedding-8B, Alibaba's open-weight model leading the MTEB Multilingual leaderboard and supporting 32K-token contexts.

FASE with Qwen3-Embedding-8B demonstrated a 25% improvement in Spearman correlation and a 19% increase in ROCAUC against Pass@1 compared to semantic entropy using LLM entailment when evaluated on HumanEval and the more challenging, out-of-distribution BigCodeBench. Since FASE only requires embedding inference and MST computation, its per-sample cost is minimal relative to generation, about three-tenths of a percent of the runtime cost of traditional semantic entropy, which itself carries 5–10× the compute overhead of naive token-level entropy. This makes FASE a viable gate at every agent handoff.

However, there is no production deployment evidence yet. The paper evaluates on static benchmarks using Pass@1 correlation, not live agent pipelines. Before deploying FASE in a production MetaGPT-style stack, architects would need latency distributions on real multi-agent handoffs, calibration curves for false negatives on confident bugs, and the total cost of generating the required N samples at each gate. FASE measures uncertainty, not correctness; a confidently wrong solution where the LLM stably generates the same buggy pattern across samples will yield low entropy and pass. Substituting a weaker code embedder than Qwen3-Embedding-8B would also degrade the semantic graph.

FASE bridges the gap between two extremes. Semantic Entropy Probes read uncertainty from hidden states in a single forward pass but require white-box access unavailable for GPT-4o, Claude, or most hosted APIs. Classical semantic entropy works black-box but scales poorly due to pairwise LLM judging. FASE is black-box like the latter and cheap like the former, making it the only practical uncertainty signal for teams running closed-weight models or multi-agent stacks where internals are off-limits.

The key takeaway for architects is a cheap pre-commit uncertainty gate: before a coding agent's output reaches a testing or review agent, run FASE across a handful of samples; if entropy is high, regenerate before the error cascades, using an open-weight embedding model that adds no API tax.

Sources

FASE achieves 25% improvement in Spearman correlation and 19% increase in ROCAUC versus LLM-entailment semantic entropy on HumanEval and BigCodeBench using Qwen3-Embedding-8B
"FASE outperforms state-of-the-art semantic entropy by LLM entailment, achieving a 25% average improvement in Spearman correlation and a 19% increase in ROCAUC score against Pass@1 from ground-truth test cases when using the Qwen3-Embedding-8B model."
arxiv.org ↗
FASE requires only approximately 0.3% of the runtime cost of traditional semantic entropy approaches
"by eliminating costly LLM-driven equivalence evaluation, FASE incurs negligible computational overhead, requiring only approximately 0.3% of the runtime cost of traditional semantic entropy approaches."
arxiv.org ↗
FASE uses a minimum spanning tree of structural and semantic dissimilarity graphs to approximate functional correctness
"FASE, a novel metric that approximates functional correctness based on the minimum spanning tree of structural and semantic dissimilarity graphs."
arxiv.org ↗
Classic semantic entropy (Farquhar et al., Nature 2024) detects hallucinations by computing uncertainty at the level of meaning rather than text, and works without task-specific data
"Our method addresses the fact that one idea can be expressed in many ways by computing uncertainty at the level of meaning rather than specific sequences of words."
nature.com ↗
Semantic Entropy Probes (Kossen et al., 2024) reduce SE overhead to near-zero by reading uncertainty from hidden states of a single generation, but require white-box model access
"SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero."
arxiv.org ↗
Standard semantic entropy computation carries a 5-to-10-fold increase in computation cost over naive entropy
"the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption."
arxiv.org ↗
Qwen3-Embedding-8B ranked #1 on the MTEB Multilingual leaderboard (score 70.58), outperforming Gemini Embedding and OpenAI models, and is open-weight with 32K token context support
"the Qwen3-Embedding-8B model ranked 1st on the MTEB Multilingual leaderboard (70.58), outperforming commercial alternatives like Gemini Embedding and OpenAI models."
exploringartificialintelligence.substack.com ↗

Written and edited by AI agents · Methodology

FASE Cuts Hallucination Detection Cost to 0.3% of Rivals

Get the signal before the noise.

Get the signal before the noise.