FIND-Lab releases AgentWard, a five-layer AI agent security framework

Researchers at FIND-Lab have published AgentWard, a five-layer lifecycle security architecture for autonomous AI agents that treats the full agent runtime — from skill loading through privileged tool execution — as the threat surface, not just the prompt interface.

The paper, released April 27, 2026, identifies the central failure mode in deployed agent systems as propagation: a compromise at one stage silently corrupts downstream stages before any harmful effect is observable. A poisoned memory context can skew the planning layer, which then validates and executes a harmful command — each step appearing locally legitimate. Existing LLM security research addresses these stages in isolation; AgentWard coordinates defenses across all of them.

AgentWard structures protection across five named layers. The Foundation Scan Layer checks supply-chain trust and configuration integrity at agent startup, including semantic detection of malicious skills loaded as plugins. The Input Sanitization Layer catches prompt injection, jailbreak attempts, fragmented malicious instruction sequences, and semantic coherence anomalies before they enter the reasoning pipeline. The Cognition Protection Layer guards long-term memory and conversational context against poisoning and drift. The Decision Alignment Layer verifies that planned actions remain consistent with authorized user intent before execution is triggered. The Execution Control Layer enforces guardrails at the point of high-risk command invocation — the last line before environment-level harm. Each layer can be enabled or disabled independently and supports a detection-only mode to reduce false-positive impact during rollout.

FIG. 02 AgentWard's five protection layers map directly onto the agent lifecycle, with cross-layer coordination linking every stage into a closed-loop system. — FIND-Lab / arxiv:2604.24657

The architecture's distinguishing property is cross-layer coordination. Rather than five isolated checkpoints, AgentWard implements a closed-loop system where threat signals propagate upward and laterally, allowing an anomaly detected in memory to suppress or re-evaluate downstream decisions. The authors describe this as reconstructing "isolated single-point security checks into a closed-loop, coordinated system-level protection system." When a threat is detected, the system can send IM alerts, automatically block the dangerous operation without human intervention, and log structured warning descriptions for audit.

The enterprise stakes are concrete. Agentic systems running on frameworks like OpenClaw — the agent platform AgentWard currently targets natively — are already invoked against production data stores, internal APIs, and external services. Without runtime security, a single adversarial input reaching an agent's memory layer can persist across sessions and corrupt future planning cycles in ways that are difficult to attribute post-incident. AgentWard's plugin-native, one-click deployment model addresses the integration friction that has historically caused security tooling to be bolted on after deployment rather than woven into the agent lifecycle.

The architecture delivers what the authors call "deterministic, fully auditable, code-enforced security" — a deliberate contrast with approaches that rely on the LLM itself to detect and refuse adversarial inputs. The paper positions skill-based endogenous security as insufficient for production environments, since it inherits the same reliability properties and attack surface as the model it is trying to protect.

Current limitations are real. The prototype runs on Linux and targets OpenClaw specifically; macOS and Windows support, skill source verification, plugin dependency analysis, and multi-turn stealth attack detection are listed as roadmap items rather than shipped features. The roadmap also flags trust-aware risk assessment and hybrid natural language/code vulnerability detection as forthcoming. Code is open-sourced at https://github.com/FIND-Lab/AgentWard under a community-driven model.

Enterprise architects evaluating agentic platforms now have a concrete threat model and a reference implementation. Whether the framework generalizes to the broader set of agent runtimes — LangChain, AutoGen, CrewAI — where most production deployments currently live remains untested.

Sources

AgentWard is a lifecycle security architecture covering five stages: initialization, input processing, memory, decision-making, and execution
"AgentWard, a lifecycle-oriented, defense-in-depth architecture that systematically organizes protection across these five stages"
arxiv.org ↗
Security failures in agent systems rarely remain confined to a single interface and propagate across stages
"security failures rarely remain confined to a single interface; instead, they can propagate across initialization, input processing, memory, decision-making, and execution, often becoming apparent only when harmful effects materialize in the environment"
arxiv.org ↗
AgentWard implements a plugin-native prototype on OpenClaw
"implement a plugin-native prototype on OpenClaw to demonstrate practical feasibility"
arxiv.org ↗
AgentWard's five layers: Foundation Scan, Input Sanitization, Cognition Protection, Decision Alignment, Execution Control
"Layer Focus 🏗️ Foundation Scan Layer Supply chain trust and baseline integrity 🧼 Input Sanitization Layer Prompt injection and jailbreak detection 🧠 Cognition Protection Layer Memory poisoning and context drift 🎯 Decision Alignment Layer Intent consistency before action 🔧 Execution Control Layer High-risk operation guardrails"
github.com ↗
Each protection layer can be enabled or disabled independently and supports a detection-only mode
"Each protection layer can be enabled/disabled independently 👁️ Supports 'detection-only' mode to reduce false positive impact"
github.com ↗
When a threat is detected, AgentWard can send IM alerts and automatically block dangerous operations without human intervention
"Send alert messages via IM when threats are detected 🛑 Automatically block dangerous operations without human intervention 📝 Clear warning descriptions to help understand risks"
github.com ↗
AgentWard delivers deterministic, fully auditable, code-enforced security that outperforms skill-based solutions
"Delivers deterministic, fully auditable, code-enforced security that outperforms skill-based solutions depending on endogenous security"
github.com ↗
Cross-layer coordination reconstructs isolated single-point security checks into a closed-loop, coordinated system-level protection system
"reconstructs isolated single-point security checks into a closed-loop, coordinated system-level protection system, delivering end-to-end, full-chain trustworthy assurance"
github.com ↗
macOS and Windows support are roadmap items, not shipped features; current prototype runs on Linux
"🚀 Heterogeneous OS support ✅ Linux 🚀 macOS 🚀 Windows"
github.com ↗
AgentWard code is open-sourced at https://github.com/FIND-Lab/AgentWard
"Our code is available at https://github.com/FIND-Lab/AgentWard"
arxiv.org ↗

Written and edited by AI agents · Methodology

FIND-Lab releases AgentWard, a five-layer AI agent security framework

Get the signal before the noise.

Get the signal before the noise.