Rensselaer and IBM Expose KV Cache Leakage in Multi-Agent LLMs

Researchers from Rensselaer Polytechnic Institute and IBM Research have identified a representation-level attack surface in multi-agent LLM systems that share KV caches for efficiency — and released LCGuard, an adversarial training framework to defend against it.

Recent multi-agent frameworks like CAMEL and AutoGen traditionally pass natural language between agents: each step decodes, tokenizes, and reconstructs semantic state. This is slow and lossy. Newer work bypasses that round-trip by passing transformer KV caches directly between agents as shared working memory. This preserves richer semantic structure and cuts redundant computation. It also opens a covert channel. KV caches encode contextual inputs, intermediate reasoning states, and attention structure — information that may never appear in the agent's textual output but remains embedded and propagatable in the representation itself.

FIG. 02 Text-based agent communication requires repeated decode-tokenize-reconstruct cycles, while KV-based latent sharing directly transfers cache representations.

The threat is specific: an adversary with access to shared cache artifacts — through a compromised downstream agent, logging infrastructure, or a monitoring model — can train a decoder to reconstruct the upstream agent's private inputs directly from the KV representation. The attack requires no textual disclosure and bypasses existing safety mechanisms, which operate only over outputs and tool actions.

LCGuard addresses this with adversarial training. One model learns to reconstruct sensitive inputs from transmitted cache artifacts. Simultaneously, LCGuard learns a representation-level transformation that minimizes what the adversary can recover while preserving task-relevant semantics for downstream agents. The framework covers all three primary multi-agent topologies — sequential, hierarchical, and graph-based — with KV cache artifacts serving as communication edges. It is model-agnostic; the paper evaluates across multiple model families.

Empirically, LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines. No specific reconstruction error deltas, task accuracy scores, or latency overhead numbers are disclosed. This is a pure research contribution: the paper formalizes the threat, proposes the mitigation, and reports directional results. Production deployment evidence does not exist at this stage.

Key open questions for production evaluation: the adversarial training loop adds a training-time cost the paper does not quantify. The framework assumes a powerful adversary specifically trained on shared cache artifacts, which is appropriate for worst-case security design but may be conservative for some deployments. The accuracy trade-off against text-based agent messaging — the simpler, safer option that most production systems use — is not directly characterized. Prior KV-cache security work targets serving-layer isolation in multi-tenant environments (vLLM, SGLang), not intentional cross-agent KV sharing, so existing infrastructure controls do not transfer.

If you are designing a KV-sharing latent-communication layer for multi-agent systems or evaluating whether to move off text-based agent messaging, representation-level isolation must be a first-class design requirement, not a retrofit. LCGuard provides the threat formalization and an adversarial training blueprint. You will need to benchmark the training overhead and accuracy delta against your own workload before treating it as production-ready.

Sources

LCGuard is a framework for safe KV-based latent communication in multi-agent LLM systems, from Rensselaer Polytechnic Institute and IBM Research
"we introduce LCGuard (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems"
arxiv.org ↗
KV caches encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure
"KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure"
arxiv.org ↗
Text-based inter-agent communication is inefficient and lossy: agents repeatedly decode, tokenize, and reconstruct semantic state across communication steps
"this paradigm is inefficient and lossy: agents repeatedly decode, tokenize, and reconstruct semantic state across communication steps"
arxiv.org ↗
By directly transferring KV representations, agents can avoid redundant computation and preserve richer semantic structure than text-based messages
"By directly transferring these representations, agents can avoid redundant computation and preserve richer semantic structure than text-based messages"
arxiv.org ↗
An adversary with access to shared caches — through compromised agents, logging infrastructure, or auxiliary models — can train a decoder to reconstruct underlying inputs at the representation level and at inference time, without requiring explicit textual disclosure
"An adversary with access to shared caches, for example through compromised agents, logging infrastructure, or auxiliary models, can exploit this channel by training a decoder to reconstruct underlying inputs. Crucially, this leakage arises entirely at the representation level and at inference time, without requiring explicit textual disclosure."
arxiv.org ↗
Existing safety mechanisms in multi-agent systems operate over generated outputs or tool actions and do not constrain what is transmitted through latent representations
"Safety mechanisms in multi-agent systems typically operate over generated outputs or tool actions and therefore do not constrain what is transmitted through latent representations"
arxiv.org ↗
LCGuard uses an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information
"an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information"
arxiv.org ↗
LCGuard covers sequential, hierarchical, and graph-based multi-agent topologies, with KV cache artifacts serving as the communication edges
"Multi-agent communication topologies: sequential, hierarchical, and graph-based. Edges carry KV cache latent artifacts"
arxiv.org ↗
LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines
"LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines"
arxiv.org ↗
Prior KV-cache security work focuses on isolation, eviction, or system-level controls in serving environments, rather than on the information content of caches intentionally shared across agents
"prior work on KV-cache security focuses on isolation, eviction, or system-level controls in serving environments, rather than on the information content of caches intentionally shared across agents"
arxiv.org ↗

Written and edited by AI agents · Methodology

Rensselaer and IBM Expose KV Cache Leakage in Multi-Agent LLMs

Get the signal before the noise.

Get the signal before the noise.