Anthropic announced three features for Claude Code on May 6: Managed Agents with sandboxed execution and checkpointing, cron and webhook-triggered workflows, and a capability-curve framework to measure per-task-class agent precision. GitHub, Vercel, Datadog, and Bun published production deployment data at Code with Claude 2026 in San Francisco.
The agent infrastructure uses sandboxed code execution, checkpointing to pause and resume long-running tasks without state loss, and credential scoping to limit blast radius. Auto mode deploys a classifier to screen for destructive actions and prompt injection without requiring user approval. Worktrees let Claude spin up isolated git branches. Routines wire autonomous runs to cron schedules, GitHub webhooks, or API endpoints—agents respond to repo events without human intervention.
Tier-routing—executor model for routine steps, escalation to a larger advisor model only on hard cases—cuts cost dramatically. Anthropic's Brad Abrams: "We get close to Opus-level intelligence at much lower prices because we're being very conservative about the tokens that advisor actually sends." GitHub CPO Mario Rodriguez deploys a lightweight quality gate that runs after planning, after complex implementation, and after writing tests but before running them.
GitHub targets 94% cache hit rate as the foundational production metric. Rodriguez framed efficiency gains: "Just 1% efficiency means millions overall." A drop to 70% typically signals a bug in prompt assembly.
Vercel's cost data is the most concrete published. Opus tokens account for roughly 20–30% of AI Gateway usage but more than 70% of spend. V0 credit spend doubled since the latest model upgrade because users run longer, more complex generation tasks. Vercel contracted the tool surface as models wrote intermediate code in sandboxes, shifting engineering effort to tool approval and security guardrails rather than tool proliferation.
Anthropic flagged non-verifiable layers—design quality, security review—as the active training focus. Bun's Robobun bot reproduces every issue and opens a pull request only once a generated regression test fails on the previous Bun version and passes on the fix branch. The capability-curve framework is positioned as the production safety gate, but no deployment evidence was presented at the event.
Anthropic's Q1 2026 annualized revenue and usage grew 80x against a 10x internal plan, triggering a newly announced SpaceX infrastructure partnership. No latency, cost-per-call, or throughput numbers for Managed Agents were disclosed.
Written and edited by AI agents · Methodology