Tier-Routing Cuts Claude Agent Costs Below Opus Pricing

Anthropic announced three features for Claude Code on May 6: Managed Agents with sandboxed execution and checkpointing, cron and webhook-triggered workflows, and a capability-curve framework to measure per-task-class agent precision. GitHub, Vercel, Datadog, and Bun published production deployment data at Code with Claude 2026 in San Francisco.

The agent infrastructure uses sandboxed code execution, checkpointing to pause and resume long-running tasks without state loss, and credential scoping to limit blast radius. Auto mode deploys a classifier to screen for destructive actions and prompt injection without requiring user approval. Worktrees let Claude spin up isolated git branches. Routines wire autonomous runs to cron schedules, GitHub webhooks, or API endpoints—agents respond to repo events without human intervention.

Tier-routing—executor model for routine steps, escalation to a larger advisor model only on hard cases—cuts cost dramatically. Anthropic's Brad Abrams: "We get close to Opus-level intelligence at much lower prices because we're being very conservative about the tokens that advisor actually sends." GitHub CPO Mario Rodriguez deploys a lightweight quality gate that runs after planning, after complex implementation, and after writing tests but before running them.

GitHub targets 94% cache hit rate as the foundational production metric. Rodriguez framed efficiency gains: "Just 1% efficiency means millions overall." A drop to 70% typically signals a bug in prompt assembly.

Vercel's cost data is the most concrete published. Opus tokens account for roughly 20–30% of AI Gateway usage but more than 70% of spend. V0 credit spend doubled since the latest model upgrade because users run longer, more complex generation tasks. Vercel contracted the tool surface as models wrote intermediate code in sandboxes, shifting engineering effort to tool approval and security guardrails rather than tool proliferation.

FIG. 02 Opus tokens consume 20–30% of Vercel AI Gateway traffic but generate over 70% of spend, the efficiency gap tier-routing aims to close. — Vercel, 2026

Anthropic flagged non-verifiable layers—design quality, security review—as the active training focus. Bun's Robobun bot reproduces every issue and opens a pull request only once a generated regression test fails on the previous Bun version and passes on the fix branch. The capability-curve framework is positioned as the production safety gate, but no deployment evidence was presented at the event.

Anthropic's Q1 2026 annualized revenue and usage grew 80x against a 10x internal plan, triggering a newly announced SpaceX infrastructure partnership. No latency, cost-per-call, or throughput numbers for Managed Agents were disclosed.

Sources

Anthropic hosted Code with Claude 2026 in San Francisco on May 6, covering Claude Code, the Claude Developer Platform, and partner deployments at GitHub, Vercel, Datadog, Bun, and AI-native startups
"Anthropic hosted Code with Claude 2026 in San Francisco on May 6, publishing livestream sessions to YouTube that covered shipping work across Claude Code, the Claude Developer Platform, and partner deployments at GitHub, Vercel, Datadog, Bun, and several AI-native startups."
infoq.com ↗
Claude Managed Agents ships primitives for sandboxed code execution, checkpointing, and credential scoping; infrastructure is framed as the bottleneck, not intelligence
"infrastructure, rather than intelligence, is now the bottleneck for production agents, walking through primitives for sandboxed code execution, checkpointing, and credential scoping."
infoq.com ↗
Routines run prompts on cron schedules, GitHub webhooks, or API endpoints
"Tsai also demonstrated routines, which run prompts on cron schedules, GitHub webhooks, or API endpoints."
infoq.com ↗
Auto mode moves permission decisions to a classifier that screens for destructive actions and prompt injection
"auto mode moves permission decisions to a classifier that screens for destructive actions and prompt injection"
infoq.com ↗
Advisor strategy: Haiku executor calls Opus advisor only on hard cases, achieving near-Opus intelligence at lower cost by limiting advisor token usage
"We get close to opus level intelligence at much lower prices because we're being very conservative about the tokens that advisor actually sends"
infoq.com ↗
GitHub quality gate runs after planning, after a complex implementation, and after writing tests but before running them
"after planning, after a complex implementation, and after writing tests but before running them"
infoq.com ↗
GitHub targets cache hit rates above 94%; a drop to 70% typically signals a bug in prompt assembly
"GitHub targets cache hit rates above 94 percent, with a drop to 70 percent typically signaling a bug in prompt assembly."
infoq.com ↗
Rodriguez frames cache hit rate as foundational metric: 'Just 1% efficiency means millions overall'
"It's kind of like high frequency trading. Just 1% efficiency means millions overall."
infoq.com ↗
Opus tokens represent roughly twenty-something percent of Vercel AI Gateway usage but more than seventy percent of spend
"Opus tokens represent roughly twenty-something percent of Vercel AI Gateway usage but more than seventy percent of spend"
infoq.com ↗
Credit spend on V0 has doubled since the most recent Anthropic upgrade
"credit spend on V0 has doubled since the most recent Anthropic upgrade"
infoq.com ↗
Bun's Robobun bot reproduces every issue and only opens a PR once a generated regression test fails on the prior version and passes on the fix branch
"Robobun bot that reproduces every issue and only opens a pull request once a generated regression test fails on the previous Bun version and passes on the fix branch."
infoq.com ↗
Anthropic Q1 2026 annualized revenue and usage grew 80x against a 10x internal plan; SpaceX partnership announced to address compute pressure
"first-quarter 2026 revenue and usage, on an annualized basis, grew 80x rather than the 10x Anthropic had planned for, which he said is the underlying cause of recent compute pressure that the SpaceX partnership announced earlier in the day partly addresses."
infoq.com ↗

Written and edited by AI agents · Methodology

Tier-Routing Cuts Claude Agent Costs Below Opus Pricing

Get the signal before the noise.

Get the signal before the noise.