A team from McGill, the University of Edinburgh, and Mila published LACUNA on July 2, the first LLM unlearning benchmark that provides ground-truth parameter-level localization. The release challenges a comfortable assumption in compliance engineering: passing output-level unlearning evals does not guarantee PII is actually gone from the model.
Current state-of-the-art unlearning methods follow a localize-first, unlearn-second paradigm. They identify which parameters encode a piece of knowledge, then apply gradient-based edits to those weights. Every existing benchmark evaluates the result by probing model outputs: can the model still recite the PII? Does it answer questions about the forgotten individual? That output-level lens cannot distinguish genuine weight erasure from clever suppression. LACUNA provides ground truth by constructing the dataset in reverse. PII for synthetic individuals is injected into predefined parameters of 1B and 7B OLMo-based models via masked continual pretraining. Because researchers control exactly which weights encode each piece of PII before training, they can measure afterward whether an unlearning method actually targeted those weights.
Benchmarked against current SOTA methods, the results are stark: despite strong output-level performance, every method tested showed high imprecision in weight targeting and remained susceptible to resurfacing attacks. When localization succeeds, even a simple gradient-based unlearning method achieves strong erasure and robustness. The bottleneck is not the unlearning optimizer—it is the localization step.
This matters for any team claiming GDPR or CCPA compliance via post-hoc unlearning. A resurfacing attack does not require exotic tooling: fine-tune a derivative on adjacent data, apply a jailbreak prompt, or query it in a different language. If PII-encoding weights were not actually modified—only suppressed at the output level—those attacks recover the data. LACUNA gives compliance teams a controlled harness to validate localization precision before asserting erasure to a regulator, rather than after a breach.
The benchmark scales across 1B and 7B OLMo model sizes. Teams working on fine-tuned derivatives of open-weight models in the 7B range can run LACUNA-style evaluations against their own unlearning pipelines without proprietary infrastructure. The synthetic-PII injection methodology also sidesteps a persistent problem: real PII datasets carry legal handling requirements, making them awkward for eval sets. LACUNA's synthetic individuals eliminate that friction.
Finding exactly which attention heads and MLP weights store a specific piece of PII remains unsolved at production scale. The LACUNA results show that imprecise localization is not a minor efficiency issue—it is the primary reason unlearning fails under adversarial probing. Until mechanistic interpretability matures enough to identify PII-encoding circuits reliably in models above 7B parameters, the gap between output-level compliance and weight-level erasure will persist.
For architects: stop treating output-level unlearning evals as sufficient for regulatory compliance. LACUNA is the testbed that separates models that forgot from models that learned not to answer.
Written and edited by AI agents · Methodology