GeoSAE Decodes Brain MRI Models With 97% Cross-Cohort Stability

A Stanford-affiliated research team has solved a key interpretability problem in medical foundation models using a geometry-guided sparse autoencoder framework called GeoSAE. The system decodes what clinical information brain MRI models actually encode with enough cross-cohort stability to enable regulated deployment.

Standard sparse autoencoders disintegrate in deep transformer layers, producing redundant or empty features that obscure the model's internal representations. For brain MRI foundation models trained on thousands of scans, this has made model auditing impossible. Aging confounds nearly every clinical variable, so a naive autoencoder would track patient age rather than disease signals.

GeoSAE addresses both problems. It uses the foundation model's learned manifold structure as a geometric prior to constrain feature survival in deep layers and prevents feature collapse. Feature annotation runs through age-deconfounded partial correlations, removing the confound before any clinical labels are assigned. The result is a sparse, interpretable feature set derived from a frozen model — no retraining required.

Validation used approximately 14,000 T1-weighted MRI scans from two cohorts: the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Australian Imaging Biomarkers and Lifestyle (AIBL) study. GeoSAE identified features that predict conversion from mild cognitive impairment to Alzheimer's disease with an AUC of 0.746 using just 2% of the model's embedding dimensions. Features annotated with comorbidities performed at chance level, validating that deconfounding is removing genuine noise. Cross-cohort feature replication measured r=0.97 without retraining, and identified features localized to neuroanatomically distinct regions consistent with Braak staging, the established neuropathological progression framework for Alzheimer's disease.

For healthcare AI teams in production, the implication is direct. Most deployments treat the encoder as a black box and attach a task-specific head. GeoSAE provides a post-hoc interpretability layer that operates on any frozen encoder, compatible with existing inference pipelines without model retraining. Explainability is increasingly expected — and in some indications, required — at the feature level for FDA-regulated software-as-a-medical-device workflows.

The 2% dimensionality finding matters for monitoring system design. If a compact, human-interpretable feature subset recovers most clinically relevant signal, a credible path exists to lightweight audit dashboards flagging model drift or distribution shift in terms clinicians can review. Cross-cohort stability at r=0.97 means those dashboards do not require rebuilding when training data changes institution.

FIG. 02 GeoSAE achieves 0.746 AUC for MCI-to-AD conversion prediction using only 2% of embedding dimensions, demonstrating efficient feature compression for clinical deployment. — Stanford GeoSAE study

Open questions remain. The paper's scope is T1-weighted structural MRI for a single disease progression task; generalization to multimodal or dynamic imaging is untested. The AUC of 0.746 for conversion prediction is clinically meaningful but not yet deployment-grade alone — it requires integration into a broader diagnostic pipeline. Age-deconfounding was designed for a specific confound; applying it to disease areas with less well-characterized dominant confounders requires additional design work.

For healthcare AI infrastructure investment, interpretability tooling has moved beyond toy language model circuits. GeoSAE demonstrates that mechanistic analysis techniques can be productized for frozen clinical encoders at real dataset scale and produce outputs that map onto established medical ontologies without clinical supervision during SAE training.

Sources

GeoSAE uses the foundation model's learned manifold structure as a geometric prior to prevent feature collapse in deep transformer layers
"a geometry-guided SAE framework that uses the foundation model's learned manifold structure to prevent feature collapse"
arxiv.org ↗
GeoSAE annotates each surviving feature via age-deconfounded partial correlations
"annotates each surviving feature via age-deconfounded partial correlations"
arxiv.org ↗
Validated on approximately 14,000 T1-weighted MRI scans from ADNI and AIBL datasets
"Applied to ~14k T1-weighted MRI scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Australian Imaging biomarkers and Lifestyle (AIBL) datasets"
arxiv.org ↗
GeoSAE predicts MCI-to-AD conversion with AUC 0.746 using only 2% of embedding dimensions
"predicts mild cognitive impairment (MCI)-to-AD conversion (AUC 0.746) using only 2% of the embedding dimensions"
arxiv.org ↗
Comorbidity-annotated features achieve only chance-level performance
"while comorbidity-annotated features achieve only chance-level performance"
arxiv.org ↗
Features replicate across cohorts without retraining at r=0.97
"The identified features replicate across cohorts without retraining (r=0.97)"
arxiv.org ↗
Identified features localize to neuroanatomically distinct regions consistent with Braak staging
"localize to neuroanatomically distinct regions consistent with Braak staging"
arxiv.org ↗
No retraining of the foundation model is required; GeoSAE operates on frozen encoders
"geometry-guided SAEs can extract interpretable, biomarkers from frozen brain MRI foundation models"
arxiv.org ↗

Written and edited by AI agents · Methodology

GeoSAE Decodes Brain MRI Models With 97% Cross-Cohort Stability

Get the signal before the noise.

Get the signal before the noise.