Claude Code Reasoning Effort Fell, and Quality Followed

Anthropic says the Claude Code reasoning effort default was lowered from high to medium in early March to reduce latency, cut token usage, and avoid sessions that appeared frozen. In its April 23 postmortem, Anthropic also said a cache-related regression and a brevity instruction contributed to the drop in perceived quality, and the Claude Code changelog shows the default effort was reversed back to high on April 7 for several user groups.

Those are the confirmed facts. The more interesting conclusion comes after them: Anthropic says Claude Code felt worse without a model-weight change. The company’s explanation is that the regression came from wrapper-level configuration around the model, default effort, prompt caching behavior, and output-shaping instructions, not from a secret downgrade to the underlying API model. We covered the earlier user reports in our pieces on the Claude Code regression and the Claude Code cache bug.

What Anthropic changed in Claude Code

Anthropic’s postmortem identifies three changes behind the complaints.

First, Anthropic said that when Opus 4.6 launched in Claude Code in February, the default reasoning effort was high. It later changed that default to medium. Anthropic wrote that internal evaluations showed medium had “slightly lower intelligence with significantly less latency.”

Second, Anthropic confirmed a cache-related regression. The company said this contributed to the quality reports, but did not publish the exact TTL timeline in the postmortem.

Third, Anthropic said it added a brevity instruction intended to make responses more concise. According to the company, that instruction degraded longer coding responses.

The Claude Code changelog adds the timeline for the reversal. It shows that on April 7, Anthropic changed the default effort from medium back to high for API-key, Bedrock/Vertex/Foundry, Team, and Enterprise users. Later, Anthropic’s Opus 4.7 launch post said Claude Code would default to xhigh.

That distinction matters. Anthropic did not say Claude’s model weights were changed. The company’s published explanation is narrower: product-layer settings changed the experience enough to look like a model regression.

Change	What Anthropic said it was for	User-visible effect
Default effort `high` → `medium`	Lower latency, fewer long waits, lower token use	Faster replies, but lower perceived intelligence on harder tasks
Cache-related regression	Not described as intentional; confirmed as part of the issue	Worse context retention during longer sessions
Brevity instruction	Make responses more concise	Shorter coding answers that Anthropic says hurt longer responses

Why Claude Code reasoning effort was lowered to reduce latency

Anthropic’s stated motivation was direct: high effort could “occasionally think for too long,” which made Claude Code seem frozen. In the postmortem, the company tied the lower default to better latency and lower token usage, and said internal testing showed medium brought “significantly less latency” with only “slightly lower intelligence.”

That is the documented tradeoff, and it is easy to see why users experienced it as a quality drop. Developers do not observe an internal effort budget. They observe whether the tool catches edge cases, stays with a hard problem, and returns a complete answer. If the assistant spends less effort before responding, Claude Code latency improves, but the result can still feel less capable.

The changelog shows Anthropic eventually decided that tradeoff had gone too far. On April 7 it restored high as the default for several groups, and later moved to xhigh with Opus 4.7. That sequence says more than any abstract debate about tuning knobs: Anthropic tried a lower default, saw the reaction, and reversed it.

What users noticed when Claude Code felt worse

The broad complaint, reflected in Anthropic’s postmortem, was that Claude Code felt less intelligent. Some of the more specific symptoms require more careful labeling.

The strongest confirmed claim is the cache problem. Anthropic confirmed a cache-related regression was part of the issue. Separately, a GitHub issue author analyzed 119,866 API calls across two machines from January 11 to April 11 and argued that prompt cache TTL appeared to fall from 1 hour to 5 minutes around March 6-8. That author also estimated a 20% to 32% increase in cache creation costs and higher quota consumption. Those TTL and cost figures come from practitioner analysis, not Anthropic’s postmortem.

Prompt caching stores reusable context so the system does not need to rebuild the full prompt every turn. If cache life gets shorter, longer coding sessions lose continuity more easily. That gives a practical explanation for why developers reported weaker follow-up turns even when the model itself had not changed.

The brevity instruction likely amplified that impression. Here the sourcing is Anthropic’s own: the company said the instruction was meant to make responses shorter, and that it degraded longer coding responses. That does not prove every short answer was worse, but it does confirm Anthropic saw the change as part of the regression.

There were also operational details surfaced outside Anthropic’s postmortem. A Hacker News thread discussing the cache issue highlighted that main agents and sub-agents appeared to behave differently around caching. That is useful context, but it should be treated as anecdotal rather than canonical documentation.

So the clean sourcing line is:

Anthropic said users saw a quality drop tied to lower default effort, a cache-related regression, and a brevity instruction.
The Claude Code changelog shows Anthropic reversed the default effort on April 7.
A GitHub issue author argued the cache TTL appeared to shrink from 1 hour to 5 minutes and raised costs.
Hacker News users suggested cache behavior differed across modes, which may explain why reports varied.

Why this mattered: configuration can move product quality without changing model weights

This episode is most useful as a product lesson, not a rumor story. Anthropic’s own explanation says it changed response time, context retention, and verbosity at roughly the same time, while leaving the underlying model weights alone. That was enough to make Claude Code feel broadly worse.

For developers, that is the day-to-day consequence. A coding assistant is judged less by benchmark abstractions than by whether it stays coherent across a long session, answers with enough detail, and does not quietly lose capability when defaults change. The Anthropic Mythos speculation filled in the gap with a more dramatic theory, but Anthropic’s postmortem points to something more mundane and more important: wrapper configuration alone can create the experience of a model downgrade.

That is also why the reversal matters. Anthropic did not just explain the problem; it restored higher default effort on April 7 and later pushed Claude Code to xhigh by default with Opus 4.7. The practical message is clear: for agentic coding, the default budget for deliberation is part of the product, not a minor implementation detail.

Key Takeaways

Anthropic says the Claude Code reasoning effort default was lowered from high to medium to reduce latency, token use, and sessions that looked frozen.
Anthropic’s postmortem identified three causes of the quality drop: lower default effort, a cache-related regression, and a brevity instruction that hurt longer coding responses.
The Claude Code changelog shows Anthropic reversed the default from medium back to high on April 7 for several user groups.
Anthropic says the underlying model weights and API were not changed; the quality regression came from wrapper-level configuration.
A GitHub issue author argued the cache TTL appeared to drop from 1 hour to 5 minutes around March 6-8, increasing cache creation costs and quota usage, but those exact TTL claims were not published by Anthropic.

Claude Code Reasoning Effort Fell, and Quality Followed

What Anthropic changed in Claude Code

Why Claude Code reasoning effort was lowered to reduce latency

What users noticed when Claude Code felt worse

Why this mattered: configuration can move product quality without changing model weights

Key Takeaways

Further Reading

LLM Failure Modes Start in the Stack, Not the Chat

Product Tutorial Videos Now Update Themselves

Google’s 75% AI-Generated Code Claim Hides the Real Shift

Anthropic Bans Can Hit Teams, APIs, and Billing Separately

AI Deathbots Sell Memory, Not Holograms

Categories