Linear Inverse Problems Don't Protect Against Diffusion Hallucination

Production diffusion pipelines for accelerated MRI, CT reconstruction, and compressed sensing are generating plausible but incorrect images, a problem previously without a principled prediction method. Burns and Fridovich-Keil's new analysis, using a finite-sample lens, identifies four downstream consequences of inaccurate posterior spread in diffusion posterior samplers. The analysis shows breakdowns can occur even with linear forward measurement models and unimodal posteriors, provided the learned prior is multimodal. This challenges the belief that simple linear inverse problems are inherently safe.

The core issue lies in the likelihood approximation that enables tractable inference. State-of-the-art posterior samplers, including DPS, πGDM, and DDRM, approximate the intractable intermediate likelihood p(y|x_t) through a point estimate of the posterior mean x̂_0(x_t) at each timestep. Hamidi and Yang note that such approximations are often based on the mean of conditional densities of the reverse process, which can be obtained using Tweedie's formula. This replaces the true conditional density with a point estimate, integrated into the reverse diffusion process via a guidance term. The arXiv paper notes this heuristic is computationally necessary for realistic timestep budgets, but its downstream effect on the sampled posterior has been unclear. Every production pipeline using these methods incurs unquantified approximation error.

The finite-sample lens views the training set as a finite sample rather than an infinite population distribution, approximating the posterior to arbitrary precision as training data grows. This diagnostic is agnostic to both the forward model—linear or nonlinear—and the specific likelihood approximation within the sampler, allowing it to be integrated into existing pipelines without replacing the core model or serving layer. The authors use it to trace how errors in intermediate timesteps spread to the final reconstruction, distinguishing approximation artifacts from measurement noise or model bias.

This propagation results in four downstream consequences that platform teams can now identify during incident response: sensitivity to early stopping time, inaccurate relative weighting of posterior modes, hallucination of unsupported prior modes, and hallucination of unsupported likelihood modes. The paper focuses on theoretical characterization rather than wall-clock benchmarks, so teams must benchmark runtime overhead internally against their existing inference stack. However, it offers a taxonomy to differentiate approximation error from other pipeline faults during validation.

FIG. 02 Error propagation through diffusion posterior samplers: likelihood approximation errors cascade into four failure modes, detected by the finite-sample lens diagnostic. — Burns & Fridovich-Keil, arxiv.org/abs/2605.30330

The most significant finding for practitioners is that these failures do not require a nonlinear measurement model or a multimodal posterior; a multimodal prior alone is sufficient. This undermines the assumption that linear inverse problems—common in accelerated MRI, CT, and deblurring—are safe for such posterior samplers. If the training distribution contains multiple modes, the approximation can over- or under-estimate posterior spread at intermediate timesteps, regardless of the forward operator's simplicity. In practice, this means a compressed-sensing or Fourier phase retrieval pipeline can produce confident artifacts that resemble valid posterior samples but are not.

The approximation is here to stay, as it makes DPS and its variants feasible for clinical or real-time serving at required timestep budgets. Approximation-free alternatives, such as ensemble-based sequential Monte Carlo methods, can bound error using particles, and ensemble methods carry a runtime penalty that conflicts with clinical latency requirements. The finite-sample diagnostic can flag risk before deployment but does not replace the need for domain-specific eval harnesses and human-in-the-loop validation in medical imaging workflows.

Consider the finite-sample lens a mandatory drop-in stress-test for any diffusion reconstruction pipeline: a linear forward model and a unimodal posterior offer no protection against hallucination when the prior is multimodal.

Sources

Burns and Fridovich-Keil identify downstream consequences in diffusion posterior samplers; breakdown can occur even when the forward model is linear and the posterior is unimodal, provided the prior is multimodal
"the cause of these posterior errors requires neither a nonlinear measurement model nor a multimodal posterior, but can arise solely due to a multimodal prior and inaccurate posterior spread at intermediate sampling times"
arxiv.org ↗
Existing diffusion posterior samplers must use an inexact likelihood approximation at intermediate timesteps for computational tractability, with poorly understood downstream effects on the sampled posterior
"Existing methods can incorporate any measurement model at inference time but must use an inexact approximation for the likelihood at intermediate timesteps for computational tractability"
arxiv.org ↗
The finite-sample lens diagnostic is agnostic to the type of likelihood approximation and whether the forward model is linear or nonlinear, making it a drop-in diagnostic for existing and future samplers
"Our finite-sample posterior sampling approach is agnostic to the type of likelihood approximation and the type of (linear or nonlinear) forward model, and can thus serve as a drop-in diagnostic to evaluate the accuracy and failure modes of existing and future posterior samplers"
arxiv.org ↗
Downstream consequences of likelihood approximation errors include sensitivity to early stopping time, inaccurate relative weighting of posterior modes, hallucination of prior modes not in the posterior, and hallucination of likelihood modes not supported by the prior
"downstream consequences including sensitivity to early stopping time, inaccurate relative weighting of posterior modes, and hallucination, both of prior modes that are not in the posterior and likelihood modes that are not supported by the prior"
arxiv.org ↗
Likelihood approximations in popular diffusion posterior samplers are often derived via Tweedie's formula, approximating the posterior mean from the conditional mean of the reverse process
"approximations of the likelihood are often based on the mean of conditional densities of the reverse process, which can be obtained using Tweedie's formula"
arxiv.org ↗
DPS extends diffusion solvers to handle general noisy nonlinear inverse problems via approximation of the posterior sampling
"we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via approximation of the posterior sampling"
arxiv.org ↗
Approximation-free ensemble-based SMC methods can bound error in terms of score function training error and particle count, but carry runtime penalties
"we prove that the error between the true posterior distribution can be bounded in terms of the training error of the pre-trained score function and the number of particles in the ensemble"
arxiv.org ↗

Written and edited by AI agents · Methodology

Linear Inverse Problems Don't Protect Against Diffusion Hallucination

Get the signal before the noise.

Get the signal before the noise.