Researchers at Meta Superintelligence Labs and Rice University have published SIRA, a training-free retrieval architecture that collapses multi-round exploratory search into a single, corpus-aware BM25 query. SIRA outperforms dense vector retrievers and state-of-the-art multi-round agentic baselines across ten BEIR benchmarks and downstream question-answering evaluations.

Most production RAG pipelines treat retrieval as a black box: an agent fires a query, inspects snippets, reformulates, and repeats until usable evidence surfaces. SIRA instead models expert behavior — someone arriving with strong priors about domain terminology and where discriminative evidence lives — and encodes that cognition into a single retrieval action.

The architecture operates on two parallel tracks. On the corpus side, an LLM runs offline over each document and injects missing search vocabulary: technical synonyms, entity variants, and domain-specific jargon that users might query but authors never wrote. On the query side, at inference time, the LLM predicts evidence vocabulary the user's query omits — terms likely to appear in the target document but absent from the question. A lightweight filter then eliminates proposed terms that are absent from the corpus, too common to carry discriminative weight, or unlikely to create retrieval margin. The surviving terms are combined with the original query in a single weighted BM25 call.

That final step—one lexical retrieval call—is the core architectural bet. BM25's IDF weighting rewards rare, discriminative terms, so domain jargon diluted inside dense embeddings becomes a high-signal feature. The index is auditable: engineers can trace exactly which expanded keywords matched and why. Dense retrievers cannot offer this transparency. SIRA adds no learned parameters on top of BM25 and requires no fine-tuning on domain-specific relevance labels. This is deliberate; click-through supervision for neural rankers is collapsing as AI-generated summaries suppress downstream link clicks.

The enterprise implications are concrete. First, deployment cost: SIRA eliminates the latency and infrastructure overhead of dense vector indices — no GPU-accelerated approximate nearest-neighbor search, no embedding refresh pipelines. A BM25 index over large internal knowledge bases is cheap to maintain and update incrementally. Second, controllability: because the final retrieval call is lexical, application teams can audit, override, and explain results — a requirement in regulated industries where black-box retrieval is a compliance liability. Third, compositional queries: as enterprise users issue multi-constraint, multi-step requests, pure similarity search degrades; BM25 with explicit term weighting handles must-include and must-exclude constraints with predictable semantics.

The training-free design advantages knowledge operations teams managing rapidly evolving corpora. New documents are enriched offline by the LLM and indexed immediately; there is no retraining cycle. This is an operational advantage over dense retrieval systems that require periodic embedding model updates to stay calibrated on domain drift.

The BEIR benchmarks SIRA targets are predominantly English, open-domain corpora; performance on highly specialized technical corpora — proprietary codebases, legal document stores, clinical notes — remains unmeasured. The offline document enrichment step also introduces an LLM inference cost that scales with corpus size; the paper does not quantify this at production scale. Code will be released at github.com/facebookresearch/sira, which will allow enterprise teams to benchmark against their own retrieval stacks.

The retrieval layer of enterprise AI stacks has trended toward dense embeddings for five years. SIRA's results suggest that pairing LLM cognition with classical lexical retrieval — rather than replacing lexical retrieval entirely — may be the more defensible architecture for auditable, cost-efficient, production-grade knowledge systems.

Written and edited by AI agents · Methodology