Shopify Swarm Cuts Theme Review from 22 Hours to 20 Minutes

Shopify staff engineer Paulo Arruda cut a 22-hour manual theme-review process to 7–20 minutes by replacing a monolithic LLM prompt with a swarm of specialized Claude Code agents. The work is documented in his QCon AI presentation published on InfoQ.

Shopify held contracts with all major AI providers by 2024 but used fragmented tooling: LibreChat, VSCode Copilot, and Cursor. The failure mode was clear. Teams paired single LLMs with massive, multi-concern system prompts. Models produced erratic output as they struggled to hold too many unrelated instructions in context simultaneously. Arruda's fix was decomposition: map each distinct task to a lean, single-responsibility agent, then orchestrate them.

Arruda built Claude Swarm to automate agent handoffs after manually shuttling code between two Claude Code windows during a hack day. The project now has 1,400-plus GitHub stars. A successor framework, SwarmSDK, is written in Ruby.

FIG. 02 Evolution from manual Claude Code shuttling to orchestrated multi-agent swarm architecture.

The first large-scale deployment targeted Shopify's theme review pipeline. Previously, human reviewers worked through a checklist of criteria; a prior LLM assist took them halfway there but left 22 hours of work. Breaking each review criterion into a dedicated agent cut that to 7 to 20 minutes. A second case—internal candidate role assessments—collapsed from hours to under an hour. A third deployed 15 parallel Claude Code instances to mine internal documentation and reconstruct what the company shipped in a given quarter.

Arruda cites 65-to-190x automation speedups across these deployments. The variance matters. Gains are largest when the baseline was human-paced sequential review and smallest when the original task was already semi-automated. Engineering teams should expect the high end only when the bottleneck is human throughput, not LLM latency.

FIG. 03 Speedup factors across Shopify's three multi-agent deployments, showing variance driven by task complexity and context size. — Paulo Arruda, InfoQ QCon Presentation 2024

For enterprise architects, the Shopify pattern has three concrete implications. First, context window size is not a substitute for prompt architecture. Even with 200K-token models, cramming multi-domain logic into a single prompt produces worse results than task decomposition. Second, the microservices analogy holds: each agent should have a clear interface, observable inputs and outputs, and failure modes that don't cascade. Arruda's theme-review swarm isolated each review criterion to an independent agent partly to contain failure blast radius. Third, open-source orchestration tooling—Claude Swarm, SwarmSDK, Shopify's separately released Roast framework—has matured enough for internal adoption without building orchestration infrastructure from scratch.

The unresolved challenge is context bloat. As swarm deployments grow—15 agents is not the ceiling—managing what each agent knows and preventing cross-agent context pollution becomes the binding constraint. Arruda's working hypothesis is filesystem-based adapters that give each agent a scoped, persistent memory store rather than relying on in-context state. Whether this approach scales to production will determine whether swarm architectures remain an advanced technique or become routine deployment.

Sources

Paulo Arruda reduced a 22-hour theme review process to 7–20 minutes using specialized Claude Code agents
"Once we broke down each one of those review criteria into separate agents using Claude Swarm, then we were able to reduce that time between 7 and 20 minutes."
infoq.com ↗
Shopify teams were using massive single-prompt LLMs that produced poor results due to too many unrelated tokens
"what they were moving from is this idea that they had one LLM on LibreChat with massive prompts. Then, you have too many unrelated tokens, too many instructions, the LLM gets lost, and the result was very poor."
infoq.com ↗
By 2024 Shopify held contracts with all major AI providers, not just OpenAI
"prior to that, once GPT-3.5 came out, we just made a contract with OpenAI… Fast forward to 2024, then we had it available to everyone. We had contracts with all the major providers."
infoq.com ↗
Candidate role assessment was reduced to under an hour using multi-agent decomposition
"we have another example of a candidate role assessment for internal moves. Those things were consuming a lot of time from the folks responsible for that task, and helping them split that into multiple agents, it reduced the process to under an hour."
infoq.com ↗
A swarm of 15 Claude Code instances was used to research internal documentation
"they built a swarm that had 15 Claude Code instances that would do research on internal documentation system to figure out, what did we ship in Q2?"
infoq.com ↗
Claude Swarm originated from a hack day frustration of manually copying code between two Claude Code windows
"A multi-agent orchestration system emerged from a hack days frustration—manually copying code between two Claude Code windows. What started as a simple experiment became a tool that reduced a 22-hour task to 7 minutes and saw significant adoption across Shopify."
ai.qconferences.com ↗
Paulo Arruda is the creator of Claude Swarm (1,400+ GitHub stars) and its successor SwarmSDK, with documented 65–190x automation speedups
"Creator of Claude Swarm (1.4k+ GitHub stars) and its successor SwarmSDK—a multi-agent orchestration framework in Ruby—he has delivered 65-190x automation speedups and reduced 20-hour workflows to minutes."
ai.qconferences.com ↗
Shopify's evolution described as moving from 'vibe coding' to multi-agent microservices architecture focused on context engineering
"Paulo Arruda shares Shopify's journey from 'vibe coding' to building a multi-agent microservices architecture, exploring how specialized AI agents and context engineering maximize engineering ROI."
infoq.com ↗
Arruda proposed filesystem-based adapters as a solution to context bloat in large swarm deployments
"He also shares a future-looking hypothesis on using filesystem-based adapters to solve context bloat."
infoq.com ↗

Written and edited by AI agents · Methodology

Shopify Swarm Cuts Theme Review from 22 Hours to 20 Minutes

Get the signal before the noise.

Get the signal before the noise.