A research team from HKUST and USTC has published MOSS, a system that lets an autonomous agent rewrite its own production source code — routing logic, hook ordering, dispatch, state-machine invariants — and demonstrated a mean grader-score lift from 0.25 to 0.61 on a four-task OpenClaw benchmark suite in a single self-modification cycle, with no human intervention.
Every existing self-evolving agent system — Hermes Agent, SkillClaw, GenericAgent, EvoAgentX — confines evolution to text-mutable artifacts: skill files, prompt configurations, memory schemas, workflow graphs. MOSS is the first to also target the agent harness itself. The core argument is that text-layer edits cannot fix structural failures: mis-routed messages, hooks firing out of order, corrupted session state, atomicity bugs across concurrent skills. These failures originate in the harness, not the prompt. As system complexity scales, the gap between what text-mutable evolution can fix and what's actually breaking in production widens.
The MOSS pipeline runs in four stages. First, production failure evidence is automatically curated into a replay batch. Second, a deterministic multi-stage pipeline generates candidate harness modifications by delegating code writing to a pluggable coding-agent CLI. Third, candidates are verified by replaying the failure batch against the modified image inside ephemeral trial workers. Finally, a passing candidate is promoted via user-consent-gated, in-place container swap, with health-probe-gated rollback as the escape valve if the live system regresses post-swap.
The architecture's key design choices carry deliberate tradeoffs. Source-level edits take effect deterministically: routing logic runs as code, not as a prompt the base model must reread and comply with. This removes the compliance dependency that undermines text-mutable fixes. It also means edits don't erode under long-context drift, a real failure mode for agents accumulating weeks of prompt-layer patches.
On the quantitative side, MOSS lifts the four-task mean grader score on OpenClaw from 0.25 to 0.61 in a single evolution cycle. That is the only operational metric disclosed. No latency, token throughput, cost-per-evolution, or GPU-hours consumed were reported. The ephemeral trial workers imply per-cycle infrastructure cost, but no figures are given. The coding-agent CLI is pluggable but no benchmarked CLI is named, so the cost of the code-generation step is also uncharacterized. This is a research paper with no production deployment evidence; teams evaluating MOSS for adoption need to instrument both the trial-worker replay cost and the end-to-end evolution wall-clock time before any production sizing.
The open questions are the mutation surface and safety constraints. The user-consent gate and health-probe rollback are the only safety mechanisms described. The paper does not specify what constraints govern which files or modules the agent is permitted to modify, whether the coding-agent CLI operates in a sandboxed context, or how the system handles a candidate that passes replay but introduces a latent correctness regression outside the curated batch. Prompt-injection via the failure evidence corpus is also an unaddressed attack surface: a crafted failure trace could steer harness code toward an attacker-controlled modification. The security literature on OpenClaw-style agents documents that existing agent runtimes fail under realistic attack assumptions even without self-modification; MOSS expands that surface.
Architect's takeaway: if your self-healing agents only retune prompts and skills, you are ignoring the entire class of harness-layer structural bugs that grows with system complexity. MOSS gives you the threat model and a concrete pipeline pattern, but before adopting it you need a defined mutation surface and a broader replay corpus than the paper demonstrates.
Written and edited by AI agents · Methodology