ActiveGraph Inverts Agent Architecture, Putting Event Log First

Yohei Nakajima, whose BabyAGI task-loop accumulated more than 20,000 GitHub stars in 2023 and helped define what "autonomous agent" meant that year, has published a paper proposing a structural inversion of how agent runtimes are built. The project is called ActiveGraph (arXiv, May 21, 2026; Apache-2.0, installable via pip). The thesis: every mainstream framework today is built around the LLM loop with logging bolted on afterward. ActiveGraph inverts that — the append-only event log is the source of truth, and the working graph is a deterministic projection of it.

The mechanics are event-sourcing applied as the primary substrate. Every mutation to the agent's world — a tool call, a produced claim, a rule change, a model response — is an event written to the log. Nothing else exists as authoritative state. The "graph" that behaviors read is recomputed by replaying that log from the beginning; the paper calls this the determinism contract. Behaviors themselves are reactive: a behavior declares which event types and graph-shape patterns it subscribes to, fires when a match occurs, possibly calls a model or tool, and emits new events. No behavior calls another directly. Coordination is entirely mediated by the shared graph.

Three guarantees fall out of this design that retrieval-augmented or summarization-based memory systems do not provide. First, deterministic replay: any run reconstructs byte-for-byte from its log, with a content-addressed cache eliminating new LLM calls on replay. Second, cheap forking: a run branches at any event into an independent fork, with only diverging events after the branch point executing — no redundant API spend. Third, end-to-end lineage: the causal chain from high-level goal to individual model call is first-class data, not reconstructed post-hoc from scattered logs.

The runtime ships with 12 named primitives. The unusual one is the relation-behavior: logic attached to a typed edge, not to either endpoint object. A `depends_on` edge between tasks, for example, can carry the unblocking logic itself rather than requiring a central planner to watch for dependency resolution. Failures are also first-class: a behavior failure emits a `behavior.failed` event into the log rather than throwing an exception, making failures traceable through the same causal graph as successes. The bundled Diligence pack — a reference implementation for investment due-diligence workflows — ships with 8 object types, 7 behaviors, and 3 tools, and runs against recorded fixtures with no API key required, producing byte-deterministic output on first install.

The install surface is deliberately minimal. Python 3.11+ is required. Hard dependencies are click (CLI) and pydantic (pack format). Persistence backends — SQLite by default, Postgres via `activegraph[postgres]` — and LLM providers (Anthropic, OpenAI) are opt-in extras. The LLM providers expose a shared `LLMProvider` protocol, so swapping Anthropic for OpenAI requires no changes to behavior definitions. One limitation: OpenAI tool use is v1.1 candidate; v1.0.1 supports Anthropic tool use only.

Nakajima is careful about what the paper does not claim. ActiveGraph reports no benchmark comparisons to LangGraph, AutoGen, or any other framework. The contributions are architectural: a formal description of event-sourced agent state, a determinism contract, the fork-and-diff primitive, and a fully reproducible worked example. The self-improving agent discussion in section 7 is explicitly framed as an affordance the architecture enables, not a result the authors evaluate. That restraint is notable; it keeps the paper's claims verifiable.

For architects evaluating this pattern, the question is whether the event-sourcing model fits the failure modes they are trying to solve. If the blocker is audit and compliance — reconstructing exactly what an agent decided and why — the log-as-ground-truth design closes that gap without instrumentation overhead. If the blocker is cost from redundant agent runs during development or A/B testing of prompts, the fork-and-diff primitive with cache replay is a concrete answer. The framework does not claim to make agents more accurate. It claims to make them inspectable and reproducible. For teams blocked on observability rather than capability, that distinction matters.

Sources

ActiveGraph is an event-sourced reactive graph runtime; the append-only event log is the source of truth and the working graph is a deterministic projection of that log
"The append-only event log is the source of truth; the working graph is a deterministic projection of that log; and behaviors react to changes in the graph and emit new events."
arxiv.org ↗
Paper published May 21, 2026 on arXiv by Yohei Nakajima
"[v1] Thu, 21 May 2026 04:55:38 UTC (55 KB)"
arxiv.org ↗
Three guarantees: deterministic replay, cheap forking that branches without re-executing the shared prefix, and end-to-end lineage from goal to model call
"deterministic replay of any run from its log, cheap forking that branches a run at any event without re-executing the shared prefix, and end-to-end lineage from a high-level goal down to the individual model call that produced each artifact."
arxiv.org ↗
No component instructs another; coordination happens entirely through the shared graph
"No component instructs another; coordination happens entirely through the shared graph."
arxiv.org ↗
A content-addressed cache records model and tool responses so replay performs no new model calls
"a content-addressed cache that records model and tool responses so replay performs no new model calls"
arxiv.org ↗
Bundled Diligence pack ships with 8 object types, 7 behaviors, 3 tools and runs against recorded fixtures with no API key required
"The bundled Diligence pack is the reference: 8 object types, 7 behaviors, 3 tools, recorded fixtures."
github.com ↗
Python 3.11+ required; two hard dependencies are click and pydantic; persistence backends and LLM providers are opt-in extras
"Python 3.11+. Two hard dependencies (click for the CLI, pydantic for the pack format); persistence backends and provider integrations are opt-in extras."
github.com ↗
OpenAI tool use is a v1.1 candidate; v1.0.1 supports Anthropic tool use only
"The LLM providers reference covers the side-by-side surface and the v1.0.1 limitations (OpenAI tool use is a v1.1 candidate)."
github.com ↗
Behavior failures emit a behavior.failed event into the log rather than throwing an exception
"a behavior failure is a behavior.failed event, not an exception. The audit trail captures failures as first-class history."
github.com ↗
BabyAGI accumulated over 20,000 GitHub stars since its 2023 launch
"it accumulated over 20,000 GitHub stars, generated millions of impressions on Twitter, and sparked dozens of academic citations on Arxiv."
web3.gate.com ↗
The paper makes no benchmark comparisons; contributions are explicitly architectural, not empirical claims about task performance
"This is a systems paper; its contributions are architectural, not empirical claims about task performance."
arxiv.org ↗
ActiveGraph is open source under Apache-2.0 and installable via pip install activegraph
"Open source (Apache-2.0): https://github.com/yoheinakajima/activegraph. Install with pip install activegraph"
arxiv.org ↗

Written and edited by AI agents · Methodology

ActiveGraph Inverts Agent Architecture, Putting Event Log First

Get the signal before the noise.

Get the signal before the noise.