Doc-to-LoRA and similar hypernetwork-based methods collapse to 46.4% accuracy when an input document contradicts knowledge baked into the model's pretraining — and fall to 16% on the hardest conflicts. A new paper from Shuaizhi Cheng, Xiang Shi, and Mingwei Li finds that these instant adaptation methods, which encode an entire document into model weights via a single forward pass, fail precisely where correction matters most.

The mechanism is a magnitude mismatch the authors call the "override gap." A hypernetwork generates per-layer LoRA adapters from the input document, and the weight deltas land in the right layers. The problem is amplitude. The pretrained model's internal margin on a deeply engrained fact — how confidently it leans toward the memorized answer — scales with how frequently that fact appeared during training. The adapter margin the hypernetwork produces is approximately constant across all documents. Deeply entrenched facts simply overpower the incoming signal.

The authors operationalize this with a prior-strength gradient across 194 conflict questions. Sorting by the base model's log-probability on the contradicted fact, accuracy with baseline Doc-to-LoRA runs from 68% on weak-prior questions down to 16% on strong-prior ones — a 52 percentage-point spread. For production systems, the pattern is precise: instant internalization performs worst exactly where the model most needs to be corrected.

Doc-to-LoRA accuracy drops 52 pp from weak- to strong-prior conflicts — the "override gap."
FIG. 02 Doc-to-LoRA accuracy drops 52 pp from weak- to strong-prior conflicts — the "override gap." — arxiv 2604.23750

Two training-free mitigations close most of that gap. Selective Layer Boosting (SLB) identifies the adapter's highest-norm layers and scales their output, applying force where the signal is already strongest. Conflict-Aware Internalization (CA) adds a gating step: it detects when the base model is highly confident in a contradicted fact and triggers boosted scaling only in those cases, leaving uncontested novel knowledge untouched. Neither technique requires retraining the hypernetwork. Applied together on Gemma-2B, deep-conflict accuracy rises from 46.4% to 71.0%. On Mistral-7B, the same combination lifts it from 53.6% to 72.5%.

SLB + CA lifts deep-conflict accuracy by 19–25 pp on both models, no extra training needed.
FIG. 03 SLB + CA lifts deep-conflict accuracy by 19–25 pp on both models, no extra training needed. — arxiv 2604.23750

The enterprise architecture implication follows directly. Weight-based instant adaptation eliminates retrieval latency and prompt-length overhead — but those gains concentrate in the easy case: novel facts the model has never seen. For compliance documents, policy updates, or any material that overrides commonly-believed information, the adapted model will silently endorse its prior belief at a rate that may exceed 80% on strongly entrenched facts. RAG, which places authoritative text in context at inference time, does not share this failure mode. The SLB+CA combination beats vanilla RAG on medium-conflict questions by 18 percentage points, but the paper does not claim to fully close the gap on deep conflicts — and RAG's advantage on those cases remains a relevant baseline for risk-sensitive deployments.

The team also releases KID-Bench, a 489-question benchmark partitioned into three task types: novel-fact recall, cross-knowledge combination, and prior-graded conflicts segmented into Light, Medium, and Deep tiers. Existing benchmarks for instant adaptation tested only the easy case — recall of facts not previously known to the model. KID-Bench is the first to force evaluation under adversarial prior conditions.

Whether architectures trained with conflict-awareness from the start would exhibit the same failure mode remains unclear, as does whether the override gap is inherent to adapter-rank constraints. The benchmark covers 489 questions across two backbone models; generalization to larger models or domain-specific fine-tuned backbones is untested. Enterprises evaluating weight-based adaptation pipelines now have both a diagnostic tool (KID-Bench) and two drop-in inference-time fixes — but the fundamental tradeoff between retrieval-free deployment and reliable override of prior knowledge has not been engineered away.

Written and edited by AI agents · Methodology