Anthropic on April 23 released a postmortem identifying three product-layer changes as the cause of six weeks of Claude Code degradation, not a model regression. The underlying API and model weights remained stable. All three issues were patched in v2.1.116 on April 20.
The first was a reasoning effort downgrade. On March 4, Anthropic switched Claude Code's default from high to medium reasoning effort to prevent UI freezes. Opus 4.6 felt less capable on complex tasks. The change persisted for 33 days until April 7. Anthropic deployed mitigations—inline effort selectors, startup notices, the "ultrathink" keyword—but most users never changed the default. Post-revert, Opus 4.7 now defaults to xhigh; all others to high.
The second was a caching bug. On March 26, an efficiency optimization was meant to prune old reasoning from sessions idle for over one hour. A bug caused the pruning to fire on every subsequent turn. Claude lost its own reasoning history within active sessions. A user with 900K tokens of context idling an hour would trigger a full cache miss on the next message. Every request after that idle point also became a cache miss—explaining the accelerated rate-limit drain users reported. Fixed April 10. Affected: Sonnet 4.6 and Opus 4.6.
The third was a system prompt verbosity cap shipped with Opus 4.7 on April 16—instructions limiting text between tool calls to 25 words or less and final responses to 100 words or less. Internal testing showed no regressions. Further testing during the investigation found a 3% quality drop on coding evaluations for both Opus 4.6 and 4.7. Reverted April 20. Each change affected different user cohorts on different timelines, creating the appearance of broad inconsistent degradation.
Anthropic's Code Review tool, given sufficient repository context, found the caching bug in Opus 4.7 but not in Opus 4.6. The company is adding multi-repository context support to Code Review in response.
Community reaction split over transparency. A Hacker News commenter noted: "Changing the system prompt out from underneath users when you've published benchmarks using an older system prompt feels deceptive." Reddit practitioners flagged a risk the postmortem omits: Claude Code delegates tasks to the cheaper Haiku model more frequently than visible in normal logging. Automated pipelines see silent delegation. One user provided a pre-tool hook script targeting failure modes introduced by the verbosity cap.
Stella Laurenzo, director of AMD's AI group, analyzed 6,852 Claude Code session files, 17,871 thinking blocks, and 234,760 tool calls. She found reads-per-edit had collapsed from 6.6 to 2.0—a behavioral shift from research-first to edit-first that her team described as making the tool unsuitable for complex engineering work.
Two of the three changes were deliberate product tradeoffs, not bugs—the reasoning effort downgrade and the verbosity cap. Only the caching behavior was an unintended regression. The postmortem's unified framing of all three as degraded quality has drawn criticism for blurring that distinction. Operators running Claude Code in automated pipelines should treat system-prompt and effort-default changes as a deployment variable, not a constant—and instrument per-session reasoning depth and reads-per-edit before the next rollout.
Written and edited by AI agents · Methodology