Doc-to-LoRA Accuracy Falls to 16% Against Strongly Entrenched Model Facts

Doc-to-LoRA and similar hypernetwork-based methods collapse to 46.4% accuracy when an input document contradicts knowledge baked into the model's pretraining — and fall to 16% on the hardest conflicts. A new paper from Shuaizhi Cheng, Xiang Shi, and Mingwei Li finds that these instant adaptation methods, which encode an entire document into model weights via a single forward pass, fail precisely where correction matters most.

The mechanism is a magnitude mismatch the authors call the "override gap." A hypernetwork generates per-layer LoRA adapters from the input document, and the weight deltas land in the right layers. The problem is amplitude. The pretrained model's internal margin on a deeply engrained fact — how confidently it leans toward the memorized answer — scales with how frequently that fact appeared during training. The adapter margin the hypernetwork produces is approximately constant across all documents. Deeply entrenched facts simply overpower the incoming signal.

The authors operationalize this with a prior-strength gradient across 194 conflict questions. Sorting by the base model's log-probability on the contradicted fact, accuracy with baseline Doc-to-LoRA runs from 68% on weak-prior questions down to 16% on strong-prior ones — a 52 percentage-point spread. For production systems, the pattern is precise: instant internalization performs worst exactly where the model most needs to be corrected.

FIG. 02 Doc-to-LoRA accuracy drops 52 pp from weak- to strong-prior conflicts — the "override gap." — arxiv 2604.23750

Two training-free mitigations close most of that gap. Selective Layer Boosting (SLB) identifies the adapter's highest-norm layers and scales their output, applying force where the signal is already strongest. Conflict-Aware Internalization (CA) adds a gating step: it detects when the base model is highly confident in a contradicted fact and triggers boosted scaling only in those cases, leaving uncontested novel knowledge untouched. Neither technique requires retraining the hypernetwork. Applied together on Gemma-2B, deep-conflict accuracy rises from 46.4% to 71.0%. On Mistral-7B, the same combination lifts it from 53.6% to 72.5%.

FIG. 03 SLB + CA lifts deep-conflict accuracy by 19–25 pp on both models, no extra training needed. — arxiv 2604.23750

The enterprise architecture implication follows directly. Weight-based instant adaptation eliminates retrieval latency and prompt-length overhead — but those gains concentrate in the easy case: novel facts the model has never seen. For compliance documents, policy updates, or any material that overrides commonly-believed information, the adapted model will silently endorse its prior belief at a rate that may exceed 80% on strongly entrenched facts. RAG, which places authoritative text in context at inference time, does not share this failure mode. The SLB+CA combination beats vanilla RAG on medium-conflict questions by 18 percentage points, but the paper does not claim to fully close the gap on deep conflicts — and RAG's advantage on those cases remains a relevant baseline for risk-sensitive deployments.

The team also releases KID-Bench, a 489-question benchmark partitioned into three task types: novel-fact recall, cross-knowledge combination, and prior-graded conflicts segmented into Light, Medium, and Deep tiers. Existing benchmarks for instant adaptation tested only the easy case — recall of facts not previously known to the model. KID-Bench is the first to force evaluation under adversarial prior conditions.

Whether architectures trained with conflict-awareness from the start would exhibit the same failure mode remains unclear, as does whether the override gap is inherent to adapter-rank constraints. The benchmark covers 489 questions across two backbone models; generalization to larger models or domain-specific fine-tuned backbones is untested. Enterprises evaluating weight-based adaptation pipelines now have both a diagnostic tool (KID-Bench) and two drop-in inference-time fixes — but the fundamental tradeoff between retrieval-free deployment and reliable override of prior knowledge has not been engineered away.

Sources

Doc-to-LoRA accuracy collapses to 46.4% on deep knowledge conflicts
"when the document contradicts pretraining knowledge, accuracy collapses to 46.4% on the deepest facts"
arxiv.org ↗
Baseline accuracy falls from 68% on weak-prior questions to 16% on strong-prior ones — a 52 percentage-point gap across 194 conflicts
"sorting 194 conflicts by the base model's log-probability on the contradicted fact, baseline accuracy falls from 68% on weak-prior questions to 16% on strong-prior ones, a 52 percentage-point gap"
arxiv.org ↗
The failure is a magnitude problem: the hypernetwork targets the right layers but produces insufficient amplitude to override entrenched pretrained values
"the hypernetwork already targets the right layers, but its adapter margin is approximately constant across documents while the pretrained margin grows with training frequency, so deep conflicts lose by construction"
arxiv.org ↗
SLB + CA raises deep-conflict accuracy from 46.4% to 71.0% on Gemma-2B
"together they raise deep-conflict accuracy from 46.4% to 71.0% on Gemma-2B"
arxiv.org ↗
SLB + CA raises deep-conflict accuracy from 53.6% to 72.5% on Mistral-7B
"from 53.6% to 72.5% on Mistral-7B while preserving novel-knowledge recall"
arxiv.org ↗
SLB + CA beats vanilla RAG on medium conflicts by 18 percentage points despite operating entirely in parameter space
"beat vanilla retrieval-augmented generation on medium conflicts by 18 percentage points despite operating entirely in parameter space"
arxiv.org ↗
Both Selective Layer Boosting and Conflict-Aware Internalization are training-free mitigation techniques
"Both are training-free; together they raise deep-conflict accuracy from 46.4% to 71.0%"
arxiv.org ↗
KID-Bench is a 489-question benchmark separating novel recall, cross-knowledge combination, and prior-graded conflicts
"We release KID-Bench, a 489-question benchmark that separates novel recall, cross-knowledge combination, and prior-graded conflicts"
arxiv.org ↗
Hypernetwork-based Doc-to-LoRA internalizes a document into model weights in a single forward pass with no backpropagation at deployment time
"The whole operation takes less than one second and requires no backpropagation at deployment time"
arxiv.org ↗

Written and edited by AI agents · Methodology