Claude Fable 5 reaches 95% on SWE-bench Verified, tops every benchmark except GPT-5.5
Anthropic released Claude Fable 5 on June 9, 2026, the first publicly available model from its Mythos-class tier (historically restricted to cyber-defense and biology partners). Fable 5 achieves 95.0% on SWE-bench Verified, the hardest standardized benchmark for real-world code generation, and leads or ties on 18 of 19 published benchmarks. Only GPT-5.5 beats it on a handful of reasoning and domain-specific tasks. This is the first time Anthropic has released a Mythos-class model for general use.
Fable 5 is the same underlying model as Claude Mythos 5 (restricted release) with safeguards in place: the model silently defers to Claude Opus 4.8 for requests touching cybersecurity, biology, chemistry, or model distillation, where Mythos 5's unrestricted capabilities (78% on cybersecurity evals vs. 40% for Opus 4.8) pose misuse risk. This safety-by-fallback design allows deployment at scale while maintaining alignment boundaries. Fable 5 also shows improved token efficiency: solving the same problems with fewer tokens than prior Claude generations, which compounds cost advantages at scale.
SWE-bench Verified's climb from 33.4% (Claude 3.5 Sonnet, June 2024) to 95.0% (Fable 5, June 2026) in two years reflects both model capability and benchmark saturation: the public Verified set now has a known training-data contamination history. Scale AI's SWE-bench Pro (1,865 tasks across commercial codebases) is the harder, more defensible benchmark; Fable 5 leads with 80.3% on the public set, 11 points ahead of the nearest competitor (GPT-5.5).
For architects: Fable 5's cost is 2x the Opus tier on Claude.ai; per-token pricing (~$20/M input, $60/M output) is frontier-grade. The real value is in long-horizon autonomous coding: Stripe's testing reported Fable 5 completing a 50-million-line codebase migration in a day. Teams building agentic coding pipelines should test Fable 5 on their specific codebases (Verified saturation is real); SWE-bench Pro is the more credible differentiator. The safeguards on cybersecurity mean production security-fix agents will silently degrade to Opus 4.8 for certain tasks.