LACUNA Shows Unlearning Methods Fail to Erase PII from Models

A team from McGill, the University of Edinburgh, and Mila published LACUNA on July 2, the first LLM unlearning benchmark that provides ground-truth parameter-level localization. The release challenges a comfortable assumption in compliance engineering: passing output-level unlearning evals does not guarantee PII is actually gone from the model.

Current state-of-the-art unlearning methods follow a localize-first, unlearn-second paradigm. They identify which parameters encode a piece of knowledge, then apply gradient-based edits to those weights. Every existing benchmark evaluates the result by probing model outputs: can the model still recite the PII? Does it answer questions about the forgotten individual? That output-level lens cannot distinguish genuine weight erasure from clever suppression. LACUNA provides ground truth by constructing the dataset in reverse. PII for synthetic individuals is injected into predefined parameters of 1B and 7B OLMo-based models via masked continual pretraining. Because researchers control exactly which weights encode each piece of PII before training, they can measure afterward whether an unlearning method actually targeted those weights.

Benchmarked against current SOTA methods, the results are stark: despite strong output-level performance, every method tested showed high imprecision in weight targeting and remained susceptible to resurfacing attacks. When localization succeeds, even a simple gradient-based unlearning method achieves strong erasure and robustness. The bottleneck is not the unlearning optimizer—it is the localization step.

This matters for any team claiming GDPR or CCPA compliance via post-hoc unlearning. A resurfacing attack does not require exotic tooling: fine-tune a derivative on adjacent data, apply a jailbreak prompt, or query it in a different language. If PII-encoding weights were not actually modified—only suppressed at the output level—those attacks recover the data. LACUNA gives compliance teams a controlled harness to validate localization precision before asserting erasure to a regulator, rather than after a breach.

The benchmark scales across 1B and 7B OLMo model sizes. Teams working on fine-tuned derivatives of open-weight models in the 7B range can run LACUNA-style evaluations against their own unlearning pipelines without proprietary infrastructure. The synthetic-PII injection methodology also sidesteps a persistent problem: real PII datasets carry legal handling requirements, making them awkward for eval sets. LACUNA's synthetic individuals eliminate that friction.

Finding exactly which attention heads and MLP weights store a specific piece of PII remains unsolved at production scale. The LACUNA results show that imprecise localization is not a minor efficiency issue—it is the primary reason unlearning fails under adversarial probing. Until mechanistic interpretability matures enough to identify PII-encoding circuits reliably in models above 7B parameters, the gap between output-level compliance and weight-level erasure will persist.

For architects: stop treating output-level unlearning evals as sufficient for regulatory compliance. LACUNA is the testbed that separates models that forgot from models that learned not to answer.

Sources

LACUNA is the first unlearning testbed with ground-truth parameter-level localization, injecting PII into predefined parameters of 1B and 7B OLMo-based models via masked continual pretraining
"LACUNA injects PII of synthetic individuals into predefined parameters of 1B and 7B OLMo-based models via masked continual pretraining, enabling direct evaluation of whether unlearning targets the weights responsible for knowledge storage."
arxiv.org ↗
Despite strong output-level performance, existing SOTA unlearning methods are highly imprecise and susceptible to resurfacing attacks
"despite strong output-level performance, existing methods are highly imprecise and susceptible to resurfacing attacks"
arxiv.org ↗
When localization is successful, even a simple gradient-based unlearning method achieves strong erasure and robustness to resurfacing attacks
"when localization is successful, even a simple gradient-based unlearning method achieves strong erasure and robustness to resurfacing attacks, highlighting the importance of precise unlearning"
arxiv.org ↗
Existing benchmarks evaluate unlearning solely at the output level, leaving open whether unlearning truly erases knowledge from parameters or merely obfuscates it
"existing benchmarks evaluate unlearning solely at the output level, leaving open the question of whether unlearning truly erases knowledge from a model's parameters or merely obfuscates it, a concern reinforced by the success of resurfacing attacks"
arxiv.org ↗
LLM unlearning research is marked by notable fragmentation with no consensus on best evaluations and considerable criticism of existing evaluations
"This volume of LLM unlearning research is marked by a notable fragmentation. Different benchmarks use different evaluations, with no consensus on the best evaluations and considerable criticism of existing evaluations."
arxiv.org ↗
Robust unlearning is crucial for safely deploying LLMs where data privacy, model safety, and regulatory compliance must be ensured
"Robust unlearning is crucial for safely deploying large language models (LLMs) in environments where data privacy, model safety, and regulatory compliance must be ensured."
arxiv.org ↗

Written and edited by AI agents · Methodology

LACUNA Shows Unlearning Methods Fail to Erase PII from Models

Get the signal before the noise.

Get the signal before the noise.