SubQ Achieves Frontier Accuracy With Subquadratic Architecture

SubQ 1M-Preview, from startup Subquadratic, is the first large language model built on a fully subquadratic architecture — one where compute scales linearly with context length rather than quadratically. At 12 million tokens, the model reduces attention compute by nearly 1,000× compared to frontier transformer models.

On RULER 128K, a standard long-context benchmark, SubQ scores 95% versus Claude Opus 4.6's 94.8%, both third-party verified. Performance widens on MRCR v2, a multi-needle retrieval test closer to real-world enterprise use: SubQ's production model scores 65.9 compared to Claude Opus 4.7's 32.2, Gemini 3.1 Pro's 26.3, and GPT 5.5's 74. The company's research model reaches 83 on the same test. On SWE-Bench Verified, SubQ scores 81.8 versus Claude Opus 4.6's 80.8 and DeepSeek 4.0 Pro's 80.0.

FIG. 02 SubQ 1M-Preview benchmark performance on long-context and code tasks, third-party verified.

The core redesign targets the attention mechanism itself. Every transformer compares every token against every other token, producing quadratic growth in compute as context expands. Subquadratic's team — PhD researchers from Meta, Google, Oxford, Cambridge, BYU, ByteDance, and Adobe — rebuilt attention from first principles to be subquadratic by design, not as a post-hoc patch.

The entire stack of retrieval-augmented generation — chunking strategies, vector databases, prompt engineering to squeeze into context windows — exists because quadratic scaling made large contexts impractical and brittle. If SubQ's architecture holds at scale, those workarounds become engineering debt. Subquadratic's SubQ Code agent loads entire codebases into a single context window via CLI, eliminating the multi-agent orchestration overhead that current long-context coding tools require. SubQ Search provides deep-research capabilities at chatbot latency. Both launch in private beta today alongside a direct API.

Linear scaling changes the cost curve: workloads currently gated by token economics become viable. The company frames 50 million-token contexts as a near-term threshold where "the design space for AI applications changes fundamentally," with research prototypes already running at 12 million tokens.

FIG. 03 SubQ's linear scaling vs. standard quadratic transformer attention compute. At 12M tokens, SubQ achieves ~1,000× reduction in compute.

The benchmark evaluator is unnamed — a gap that matters for enterprise procurement decisions. MRCR v2 and RULER are synthetic benchmarks; performance on messy enterprise corpora at scale remains undemonstrated. The GPT 5.5 score of 74 on MRCR v2, higher than SubQ's production model's 65.9, is a qualifier the company includes but does not foreground. Prior subquadratic approaches (Mamba, linear attention, various SSM variants) failed to match transformer accuracy at scale; Subquadratic claims to have solved that, but independent replication has not occurred yet.

If the architecture scales as claimed and survives independent scrutiny, the retrieval-pipeline layer of the modern AI stack has a shorter roadmap than most vendors are currently planning for.

Sources

SubQ 1M-Preview is the first LLM built on a fully subquadratic architecture where compute grows linearly with context length
"SubQ 1M-Preview, is the first LLM built on a fully subquadratic architecture, one where compute grows linearly with context length."
subq.ai ↗
SubQ scores 95% on RULER 128K vs. Claude Opus 4.6's 94.8%, third-party verified
"SubQ 1M-Preview scores 95% accuracy, compared to 94.8% for Claude Opus 4.6"
subq.ai ↗
SubQ Sparse Attention is 52× faster than FlashAttention with 63% less compute
"SubQ Sparse Attention is 52× faster than FlashAttention in our architecture-level comparison, while requiring 63% less compute."
subq.ai ↗
MRCR v2: SubQ 1M-Preview production score 65.9 (third-party verified) vs. Claude Opus 4.7 32.2, GPT 5.5 74, Gemini 3.1 Pro 26.3; research model scores 83
"Research result of 83 and a production model, third-party verified score of 65.9, SubQ 1M-Preview compares favorably with other SOTA models like Claude Opus 4.7 (32.2), GPT 5.5 (74), and Gemini 3.1 Pro (26.3)."
subq.ai ↗
SWE-Bench Verified: SubQ 81.8 vs. Claude Opus 4.6's 80.8 and DeepSeek 4.0 Pro's 80.0
"SWE-Bench Verified score of 81.8 compared to Opus 4.6 (80.8) and Deepseek 4.0 Pro (80.0)."
subq.ai ↗
At 12 million tokens, SubQ's architecture reduces attention compute by nearly 1,000× compared to frontier models
"With a research result at 12 million tokens, SubQ's architecture reduces attention compute by almost 1,000x compared to other frontier models."
subq.ai ↗
Research team drawn from Meta, Google, Oxford, Cambridge, BYU, ByteDance, and Adobe
"Subquadratic's research team, PhDs and published researchers from Meta, Google, Oxford, BYU, ByteDance, Adobe and Cambridge"
subq.ai ↗
The industry adapted to quadratic scaling by building RAG systems, retrieval pipelines, chunking strategies, and prompt engineering as workarounds
"RAG systems use a search engine to pull a small number of relevant results before sending them to the model, because sending the full corpus isn't feasible. Retrieval pipelines, chunking strategies, prompt engineering."
subq.ai ↗

Written and edited by AI agents · Methodology

SubQ Achieves Frontier Accuracy With Subquadratic Architecture

Get the signal before the noise.

Get the signal before the noise.