Reasoning Downgrade and Caching Bug Tanked Claude Code for Six Weeks

Anthropic on April 23 released a postmortem identifying three product-layer changes as the cause of six weeks of Claude Code degradation, not a model regression. The underlying API and model weights remained stable. All three issues were patched in v2.1.116 on April 20.

FIG. 02 Three product changes between March and April 2026, staggered across six weeks, each hitting different user cohorts at different times. — Anthropic postmortem, April 2026

The first was a reasoning effort downgrade. On March 4, Anthropic switched Claude Code's default from high to medium reasoning effort to prevent UI freezes. Opus 4.6 felt less capable on complex tasks. The change persisted for 33 days until April 7. Anthropic deployed mitigations—inline effort selectors, startup notices, the "ultrathink" keyword—but most users never changed the default. Post-revert, Opus 4.7 now defaults to xhigh; all others to high.

The second was a caching bug. On March 26, an efficiency optimization was meant to prune old reasoning from sessions idle for over one hour. A bug caused the pruning to fire on every subsequent turn. Claude lost its own reasoning history within active sessions. A user with 900K tokens of context idling an hour would trigger a full cache miss on the next message. Every request after that idle point also became a cache miss—explaining the accelerated rate-limit drain users reported. Fixed April 10. Affected: Sonnet 4.6 and Opus 4.6.

The third was a system prompt verbosity cap shipped with Opus 4.7 on April 16—instructions limiting text between tool calls to 25 words or less and final responses to 100 words or less. Internal testing showed no regressions. Further testing during the investigation found a 3% quality drop on coding evaluations for both Opus 4.6 and 4.7. Reverted April 20. Each change affected different user cohorts on different timelines, creating the appearance of broad inconsistent degradation.

Anthropic's Code Review tool, given sufficient repository context, found the caching bug in Opus 4.7 but not in Opus 4.6. The company is adding multi-repository context support to Code Review in response.

Community reaction split over transparency. A Hacker News commenter noted: "Changing the system prompt out from underneath users when you've published benchmarks using an older system prompt feels deceptive." Reddit practitioners flagged a risk the postmortem omits: Claude Code delegates tasks to the cheaper Haiku model more frequently than visible in normal logging. Automated pipelines see silent delegation. One user provided a pre-tool hook script targeting failure modes introduced by the verbosity cap.

Stella Laurenzo, director of AMD's AI group, analyzed 6,852 Claude Code session files, 17,871 thinking blocks, and 234,760 tool calls. She found reads-per-edit had collapsed from 6.6 to 2.0—a behavioral shift from research-first to edit-first that her team described as making the tool unsuitable for complex engineering work.

FIG. 03 AMD analysis of 6,852 Claude Code sessions showed reads-per-edit collapsing from 6.6 to 2.0 during the six-week degradation window. — Stella Laurenzo, AMD AI; The Register, April 2026

Two of the three changes were deliberate product tradeoffs, not bugs—the reasoning effort downgrade and the verbosity cap. Only the caching behavior was an unintended regression. The postmortem's unified framing of all three as degraded quality has drawn criticism for blurring that distinction. Operators running Claude Code in automated pipelines should treat system-prompt and effort-default changes as a deployment variable, not a constant—and instrument per-session reasoning depth and reads-per-edit before the next rollout.

Sources

Three product-layer changes shipped between March and April 2026 caused six weeks of Claude Code quality complaints; all resolved in v2.1.116 on April 20
"We traced these reports to three separate changes that affected Claude Code, the Claude Agent SDK, and Claude Cowork. The API was not impacted. All three issues have now been resolved as of April 20 (v2.1.116)."
anthropic.com ↗
The API and underlying model weights were not affected by any of the three changes
"We were able to immediately confirm that our API and inference layer were unaffected."
anthropic.com ↗
Default reasoning effort was switched from high to medium on March 4 to prevent UI from appearing frozen; reverted April 7 after 33 days in production
"The March 4 change was in place for 33 days before reversal."
stackfutures.com ↗
Post-revert defaults: Opus 4.7 to xhigh, all other models to high
"All users now default to xhigh effort for Opus 4.7, and high effort for all other models."
anthropic.com ↗
Caching bug on March 26 pruned reasoning history every subsequent turn instead of once after an idle gap
"A bug caused this to keep happening every turn for the rest of the session instead of just once, which made Claude seem forgetful and repetitive."
anthropic.com ↗
A user with 900K tokens of context who idled for an hour would face a full cache miss on the next message, consuming a significant share of rate limits
"Boris Cherny from the Claude Code team explained on Hacker News that in an extreme case, a user with 900K tokens in context who idled for an hour would face a full cache miss on the next message, consuming a significant percentage of rate limits, especially for Pro users."
infoq.com ↗
Caching bug fixed April 10; affected Sonnet 4.6 and Opus 4.6
"On March 26, we shipped a change to clear Claude's older thinking from sessions that had been idle for over an hour... A bug caused this to keep happening every turn for the rest of the session instead of just once... We fixed it on April 10. This affected Sonnet 4.6 and Opus 4.6."
anthropic.com ↗
System prompt verbosity cap shipped April 16 limited text between tool calls to 25 words and final responses to 100 words; broader ablations found a 3% coding quality drop
"On April 16, we added a system prompt instruction to reduce verbosity. In combination with other prompt changes, it hurt coding quality and was reverted on April 20."
anthropic.com ↗
Because each change affected a different slice of traffic on a different schedule, the aggregate effect looked like broad, inconsistent degradation
"Because each change affected a different slice of traffic on a different schedule, the aggregate effect looked like broad, inconsistent degradation."
anthropic.com ↗
Opus 4.7 found the caching bug when given sufficient repository context; Opus 4.6 did not
"Anthropic back-tested its Code Review tool against the offending pull requests. Opus 4.7 found the caching bug when provided with sufficient repository context. Opus 4.6 did not."
infoq.com ↗
Community comment flagging benchmark validity concern about undisclosed system prompt changes
"Changing the system prompt out from underneath users when you've published benchmarks using an older system prompt feels deceptive. At least tell users when the system prompt has changed."
infoq.com ↗
Claude Code delegates tasks to Haiku model silently; quality drops in automated pipelines are invisible until downstream
"In interactive use, quality drops are obvious. You can course-correct. In automated pipelines they're silent until 3 tasks downstream. Much harder to catch."
infoq.com ↗
Stella Laurenzo, director of AMD's AI group, analyzed 6,852 Claude Code session files, 17,871 thinking blocks, and 234,760 tool calls and found reads-per-edit had collapsed from 6.6 to 2.0
"Laurenzo and her team analyzed 6,852 Claude Code sessions incorporating 234,760 tool calls and 17,871 thinking blocks."
theregister.com ↗
Usage limits reset for all subscribers on April 23
"Anthropic is resetting usage limits for all Claude Code subscribers as of April 23."
stackfutures.com ↗

Written and edited by AI agents · Methodology

Reasoning Downgrade and Caching Bug Tanked Claude Code for Six Weeks

Get the signal before the noise.

Get the signal before the noise.