False Prompt Assumptions Outrank Vision Failures in New LVLM Hallucination Study

Researchers at ISIR, Sorbonne Université and collaborators have released HalluScope, a benchmark that isolates the root causes of hallucination in large vision-language models (LVLMs). Their central finding: blame falls not on weak visual encoders but on the text prompt itself.

The study, published April 2026, identifies three distinct hallucination vectors in LVLMs — perception failures (the model misreads the image), co-occurrence priors (the model confabulates statistically likely but absent objects), and instruction presuppositions (the model defers to false assumptions embedded in the prompt). HalluScope constructs benchmark instances that isolate each vector independently, exposing a gap in how the field has measured hallucination. Existing benchmarks — POPE, CHAIR, SHR, and MMHAL-Bench — conflate all three failure modes, masking which factor drives a model's errors.

FIG. 02 HalluScope's three-way taxonomy of LVLM hallucination sources — instruction presuppositions (right) emerge as the dominant failure mode. — HalluScope — ISIR, Sorbonne Université, 2026

The dominant culprit, across the models evaluated, is the third category: instruction presuppositions. When a prompt contains a factually wrong assumption about the image — for example, asking about an object that isn't present — modern LVLMs follow the textual framing rather than contradict it with evidence from their own visual input. The researchers call this over-reliance on textual priors, a failure mode that exists independently of whether the vision backbone correctly perceives the scene.

For enterprise deployments, the implication is direct. LVLMs integrated into document understanding pipelines, quality-control inspection systems, or medical imaging workflows are exposed to this attack surface every time a user or upstream system supplies a leading or incorrect premise. A prompt like "confirm that the seal is intact on panel B" directed at an image where no seal exists may produce a confident confirmation rather than a contradiction. Red-teaming protocols that focus only on image ambiguity or model confidence scores will miss this class of failure entirely.

To counter instruction-driven hallucinations, the team proposes HalluVL-DPO, a fine-tuning framework built on a sample-informativeness weighted variant of Direct Preference Optimization (DPO). It constructs a training dataset of paired responses — one visually grounded, one hallucinated — and optimizes the model to prefer the grounded output. The weighting scheme accounts for semantic gap between prompt and image, concentrating training signal on the most adversarially challenging cases. Fine-tuned models reduce hallucinations on the adversarial presupposition subset of HalluScope while maintaining or improving performance on other multimodal benchmarks.

FIG. 03 HalluVL-DPO pipeline: adversarial prompts feed a curated preference dataset used to fine-tune the model via informativeness-weighted Direct Preference Optimization. — HalluScope — ISIR, Sorbonne Université, 2026

One constraint architects should weigh: HalluVL-DPO requires constructing a curated preference dataset and applying targeted fine-tuning to each base model. That is a non-trivial operational lift for teams running commercial LVLMs via API, where fine-tuning access is restricted or unavailable. In those environments, the more actionable takeaway from HalluScope is defensive — treat adversarial prompt injection, including plausible-but-false presuppositions, as a standard red-teaming scenario, not an edge case.

The benchmark, preference training dataset, and code are slated for public release at the project site. If the HalluScope taxonomy holds across model families, it becomes the basis for a more granular LVLM evaluation standard — one that forces vendors to report scores broken down by hallucination type rather than aggregating them into a single pass/fail metric. Procurement teams should start asking for that breakdown.

Sources

HalluScope benchmark published by researchers at ISIR, Sorbonne Université in April 2026
"Pegah Khayatan, Jayneel Parekh, Arnaud Dapogny, Mustafa Shukor, Alasdair Newson, Matthieu Cord ... PUBLISHED: 2026-04-23"
arxiv.org ↗
HalluScope isolates three hallucination vectors: perception failures, co-occurrence priors, and instruction presuppositions
"Perception Failures: Can the model correctly see what is in the image? Co-occurrence Priors: Does the model hallucinate statistically likely but absent objects? Instruction Presuppositions: Does the model follow false assumptions introduced by the prompt?"
pegah-kh.github.io ↗
Existing benchmarks POPE, CHAIR, SHR, and MMHAL-Bench do not distinguish between the three hallucination failure modes
"existing evaluation benchmarks including POPE, CHAIR, SHR, and MMHAL-Bench do not distinguish between hallucinations originating from perception failures, learned object co-occurrence priors, or presuppositions introduced by the instruction itself"
pegah-kh.github.io ↗
Hallucinations predominantly stem from over-reliance on textual instruction presuppositions rather than visual perception limitations
"Our analysis indicates that hallucinations largely stem from excessive reliance on textual priors and background knowledge, especially information introduced through textual instructions."
arxiv.org ↗
LVLMs are more prone to hallucinations when given a wrong assumption in the textual prompt
"LVLMs are more prone to hallucinations when given a wrong assumption in the textual prompt."
pegah-kh.github.io ↗
HalluVL-DPO is a fine-tuning framework using a sample-informativeness weighted variant of Direct Preference Optimization
"we propose HalluVL-DPO, a fine-tuning framework based on a sample-informativeness weighted variant of Direct Preference Optimization (DPO)"
pegah-kh.github.io ↗
HalluVL-DPO reduces hallucinations on the adversarial presupposition subset while maintaining or improving performance on other benchmarks
"We are able to reduce hallucinations when a wrong assumption is made about the existence of a non-existent object in the image (adversarial presupposition subset of HalluScope), while also improving or staying competitive on other multimodal benchmarks."
pegah-kh.github.io ↗
Benchmark, preference training dataset, and code to be publicly released at the project site
"we will publicly release our evaluation benchmark, preference training dataset, and code at https://pegah-kh.github.io/projects/prompts-override-vision/"
arxiv.org ↗

Written and edited by AI agents · Methodology