MemAudit Cuts Memory-Poisoning Attacks to 0%

Researchers present post-hoc auditing for LLM agent memory systems, using causal attribution and structural anomaly detection to catch adversarial record injection. Security angle for architects: agent memory is now a known attack surface (users can inject malicious context that gets retrieved later), and MemAudit provides the first automated detection method. Teams deploying multi-user agent systems need this or equivalent memory validation.

A new auditing framework called MemAudit reduces memory-poisoning attack success rates from 70–83% to 0% in controlled tests. It gives teams running shared LLM agents a post-hoc tool to identify which stored memories caused harmful behavior.

FIG. 02 MemAudit reduces memory-poisoning attack success rates to zero against both QA and RAP injection variants. — MemAudit arXiv, 2026

Memory-augmented agents store user-written records—past conversations, few-shot demonstrations, task history. Adversarial users can inject malicious records through normal interaction. When retrieved later, those records steer the agent's reasoning in another user's session. Standard defenses like prompt filtering and output blockers don't identify compromised records after an incident occurs.

MemAudit (published 22 May 2026 on arXiv) frames this as post-hoc forensics. It combines two signals. The first is a counterfactual memory influence score: the framework masks each memory record and measures output change—direct causal attribution. A second signal flags records that deviate structurally from the broader memory topology. Together they identify records that caused the observed harm or represent anomalies.

The paper tested against MINJA, a query-only injection attack that seeds malicious records through normal agent interaction. Results: QA attack success fell from 70% to 0%. RAP attack success (reasoning-agent poisoning) fell from 83.3% to 0%. Both represent complete elimination.

No operational metrics were disclosed. The paper omits latency, token cost, and throughput impact at scale. Base models and hardware are unspecified. Teams evaluating MemAudit will need to benchmark counterfactual-scoring overhead against their memory store size before deployment.

The open questions are significant. Counterfactual scoring at scale is expensive: each audit requires re-inference across every memory record. The paper benchmarks against one attack class (MINJA). Adaptive attackers who spread injected records across many semantically plausible entries may evade the consistency-graph component. No red-team or ablation result on adversarially camouflaged injections is disclosed. Retrieval-timing attacks—injecting records with delayed retrieval triggers—are also not discussed.

If you run a multi-user agent with shared memory and lack post-hoc forensics, MemAudit's causal-attribution approach is the design pattern to prototype. Budget for counterfactual re-inference overhead before deployment.

Sources

MemAudit reduces QA attack success rate from 70% to 0% and RAP attack success from 83.3% to 0%
"MemAudit substantially reduces attack success rates under realistic post-hoc auditing scenarios. The results show that QA attack success is reduced from 70% to 0%, while RAP attack success drops from 83.3% to 0%."
arxiv.org ↗
MemAudit uses two signals: a counterfactual memory influence score and a memory consistency graph
"The framework combines two complementary signals: (1) a counterfactual memory influence score that measures each memory's causal contribution to harmful outputs, and (2) a memory consistency graph that identifies structurally anomalous memories within the broader memory store."
arxiv.org ↗
MemAudit is evaluated against MINJA, a query-only memory injection attack where malicious records are generated through normal agent interactions
"We evaluate MemAudit against MINJA, a query-only memory injection attack in which malicious records are generated and stored through normal agent interactions rather than direct memory-bank modification."
arxiv.org ↗
Adversarial users can inject malicious records into agent memory through ordinary interaction, which are later retrieved to steer agent reasoning and actions
"an adversarial user may inject malicious records into the agent's memory through ordinary interaction, and these records can later be retrieved to steer the agent's reasoning and actions."
arxiv.org ↗
Existing defenses focus on online intervention and do not address which stored memories are responsible after harmful behavior has been observed
"Existing defenses primarily focus on online intervention, such as prompt filtering or output blocking, but they do not address the post-hoc question of which stored memories are responsible after harmful behavior has already been observed."
arxiv.org ↗

Written and edited by AI agents · Methodology

MemAudit Cuts Memory-Poisoning Attacks to 0%

Get the signal before the noise.

Get the signal before the noise.