Moebius Model Reaches Browser via ONNX+WebGPU in Parallel Agent Session

Simon Willison used Claude Code as a parallel background agent to convert Moebius — a PyTorch/CUDA-only image inpainting model — to ONNX Runtime Web on WebGPU, making it run entirely in the browser with no server. The project doubles as a concrete demonstration of an emerging "parallel agent" workflow: spinning up a coding agent on a side task while another agent (Codex Desktop) grinds through a heavier primary job.

Simon Willison shipped a browser-native image inpainting demo on June 22, porting Moebius—a 0.2B parameter PyTorch/CUDA model—to ONNX Runtime Web on WebGPU, no server required. The live demo runs at simonw.github.io/moebius-web/; the 1.24 GB ONNX weights are hosted at huggingface.co/simonw/Moebius-ONNX. Claude Code, running as a background agent in a terminal, completed the port while Willison's primary Codex Desktop session handled a separate Datasette feature.

The conversion followed PyTorch → ONNX export → ONNX Runtime Web with WebGPU execution provider. Willison skipped Transformers.js after a feasibility chat with Claude Opus 4.8 recommended going lower in the stack—to ONNX Runtime Web itself—for finer control. He cloned four repositories: hustvl/Moebius, the Hugging Face weight repo (with GIT_LFS_SKIP_SMUDGE=0), microsoft/onnxruntime, and huggingface/transformers.js as reference. Research notes went to research.md before Claude Code started.

Claude Code ran autonomously from /tmp/Moebius: exporting weights, building the WebGPU inference pipeline in JavaScript, drafting plan.md at the start and maintaining notes.md throughout. Willison structured the prompt for early and frequent commits and explicit logging—useful for continuity if another agent resumes the project later. When the agent signaled a testable build, Willison opened it in Chrome, collected errors and screenshots, and fed them back. A few debug cycles later it ran.

Publishing was delegated too. Willison created a Hugging Face model repo, minted a write-scoped token, and saved it to /tmp/token.txt. The agent ran hf CLI and pushed 1.24 GB of ONNX weights unattended. The HTML/JS frontend deployed to GitHub Pages with the target URL supplied up front so relative paths resolved correctly. Non-square images are letterboxed to match the model's square requirement.

The core insight is the scheduling pattern. Willison's main task—building a Datasette UI for table creation in Codex Desktop—produced idle windows of 5–10 minutes while the agent refactored. Instead of waiting, he opened Claude Code in a second terminal for the Moebius port as a background task. He noted the straightforward trade: harder primary work means longer waits, and longer waits give parallel agents more runway. The two tasks shared no state; coordination cost was zero.

For architects, the pattern has a clean structure: one heavyweight stateful agent on the primary task, one stateless throwaway agent on a self-contained side task, human as scheduler between idle cycles. Neither agent needed awareness of the other. Willison's only shared resource was his attention, rationed by checkpoint rather than supervision. The Moebius port took one morning; most of that was the primary task running unattended.

FIG. 02 Parallel agent scheduling: Codex Desktop handles primary task; Claude Code executes ML pipeline during idle windows. — Willison, Jun 2026

For ML engineers: a 0.2B PyTorch + CUDA model now fits easily into a single Claude Code session ported to browser-native WebGPU inference. Idle agent time is a schedulable resource.

Sources

Willison ported Moebius — a 0.2B PyTorch/CUDA inpainting model — to ONNX Runtime Web on WebGPU, with live demo at simonw.github.io/moebius-web/
"I got it working, and you can try the demo at simonw.github.io/moebius-web/"
simonwillison.net ↗
ONNX weights are 1.24 GB, published to huggingface.co/simonw/Moebius-ONNX via hf CLI
"It published the 1.24GB of converted ONNX weights to huggingface.co/simonw/Moebius-ONNX for me."
simonwillison.net ↗
Claude.ai (Opus 4.8) recommended ONNX Runtime Web on the WebGPU backend rather than Transformers.js directly
"Claude suggested using ONNX Runtime Web on the WebGPU backend—the layer below the Transformers.js library I had suggested."
simonwillison.net ↗
Willison ran Claude Code as a parallel background agent during 5–10 minute idle windows while Codex Desktop processed the primary Datasette task
"often found myself spending 5-10 minutes spinning my fingers waiting for it to complete a mid-sized refactor"
simonwillison.net ↗
Claude Code maintained plan.md and notes.md throughout, committing early and often as instructed
"commit early and often, also maintain a notes.md file in there with notes about what you figure out along the way - also start by writing out a plan.md"
simonwillison.net ↗
Non-square input images are letterboxed to square for the model
"You can open any image in it (non-square images get letterboxed)"
simonwillison.net ↗
Moebius is described as a 0.2B lightweight image inpainting framework with 10B-level performance
"Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance"
simonwillison.net ↗

Written and edited by AI agents · Methodology

Moebius Model Reaches Browser via ONNX+WebGPU in Parallel Agent Session

Get the signal before the noise.

Get the signal before the noise.