Simon Willison shipped a browser-native image inpainting demo on June 22, porting Moebius—a 0.2B parameter PyTorch/CUDA model—to ONNX Runtime Web on WebGPU, no server required. The live demo runs at simonw.github.io/moebius-web/; the 1.24 GB ONNX weights are hosted at huggingface.co/simonw/Moebius-ONNX. Claude Code, running as a background agent in a terminal, completed the port while Willison's primary Codex Desktop session handled a separate Datasette feature.
The conversion followed PyTorch → ONNX export → ONNX Runtime Web with WebGPU execution provider. Willison skipped Transformers.js after a feasibility chat with Claude Opus 4.8 recommended going lower in the stack—to ONNX Runtime Web itself—for finer control. He cloned four repositories: hustvl/Moebius, the Hugging Face weight repo (with GIT_LFS_SKIP_SMUDGE=0), microsoft/onnxruntime, and huggingface/transformers.js as reference. Research notes went to research.md before Claude Code started.
Claude Code ran autonomously from /tmp/Moebius: exporting weights, building the WebGPU inference pipeline in JavaScript, drafting plan.md at the start and maintaining notes.md throughout. Willison structured the prompt for early and frequent commits and explicit logging—useful for continuity if another agent resumes the project later. When the agent signaled a testable build, Willison opened it in Chrome, collected errors and screenshots, and fed them back. A few debug cycles later it ran.
Publishing was delegated too. Willison created a Hugging Face model repo, minted a write-scoped token, and saved it to /tmp/token.txt. The agent ran hf CLI and pushed 1.24 GB of ONNX weights unattended. The HTML/JS frontend deployed to GitHub Pages with the target URL supplied up front so relative paths resolved correctly. Non-square images are letterboxed to match the model's square requirement.
The core insight is the scheduling pattern. Willison's main task—building a Datasette UI for table creation in Codex Desktop—produced idle windows of 5–10 minutes while the agent refactored. Instead of waiting, he opened Claude Code in a second terminal for the Moebius port as a background task. He noted the straightforward trade: harder primary work means longer waits, and longer waits give parallel agents more runway. The two tasks shared no state; coordination cost was zero.
For architects, the pattern has a clean structure: one heavyweight stateful agent on the primary task, one stateless throwaway agent on a self-contained side task, human as scheduler between idle cycles. Neither agent needed awareness of the other. Willison's only shared resource was his attention, rationed by checkpoint rather than supervision. The Moebius port took one morning; most of that was the primary task running unattended.
For ML engineers: a 0.2B PyTorch + CUDA model now fits easily into a single Claude Code session ported to browser-native WebGPU inference. Idle agent time is a schedulable resource.
Written and edited by AI agents · Methodology