---
title: "Claude Code \"Flops\" After Update, Thinking Depth Plummets 67%, \"Can No Longer Be Trusted for Complex Engineering Tasks\"!"
type: "News"
locale: "zh-CN"
url: "https://longbridge.com/zh-CN/news/281849400.md"
description: "Based on a quantitative analysis of 6,852 session logs, AMD’s AI Director Stella Laurenzo publicly accused Claude Code on GitHub of systemic degradation since February: thinking depth has plummeted by 67%, the file reading rate before code modification has dropped by 70%, the number of misbehavior triggers has surged to 173, and API costs have skyrocketed 122-fold. The official response attributed this to a lowered default thinking level, but user feedback indicates the problem persists even after manual adjustments, sparking a serious trust crisis and significant user churn"
datetime: "2026-04-07T08:17:36.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/281849400.md)
  - [en](https://longbridge.com/en/news/281849400.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/281849400.md)
---

> 支持的语言: [English](https://longbridge.com/en/news/281849400.md) | [繁體中文](https://longbridge.com/zh-HK/news/281849400.md)


# Claude Code "Flops" After Update, Thinking Depth Plummets 67%, "Can No Longer Be Trusted for Complex Engineering Tasks"!

Anthropic's AI coding tool, Claude Code, is facing a severe reputation crisis. An AI director from AMD publicly submitted a problem report to the official GitHub repository. Based on a quantitative analysis of tens of thousands of session logs, the report alleges a systemic capability degradation in Claude Code since February, with **thinking depth plummeting by 67%** and model behavior becoming completely distorted. This report quickly ignited discussions in the developer community, thrusting Anthropic into the spotlight.

Stella Laurenzo, the head of AMD’s AI team, submitted this analysis report. She directly opened an Issue in the official GitHub repository with stern wording: "**Claude can no longer be trusted to perform complex engineering tasks**." She stated that the team has switched to other service providers and warned Anthropic: "Six months ago, Claude was unique in its reasoning quality and execution capabilities. Now, other competitors need to be very seriously considered and evaluated."

![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/bd5dff86-4786-40ed-863c-55424bc52f04.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

This Issue quickly gained traction on Hacker News, receiving 975 upvotes and 548 comments, making it one of the most popular posts related to Claude Code discussions recently. Netizens' comments directly addressed the core issue — "**Claude Code used to be like a smart pair programming partner; now it feels like an overenthusiastic intern who constantly messes things up and then suggests the simplest temporary fix**"; "Recently it keeps telling me things like 'You should go to bed. It's too late, let's call it a day.' At first, I thought I had accidentally let Claude know my deadline."

Anthropic responded to this. Claude Code team member Boris clarified that the redact-thinking feature is only an interface-level change, "it does not affect the actual reasoning logic within the model, nor does it impact the thinking budget or the underlying reasoning execution mechanism."

He also admitted that the team made two substantive adjustments in February: **first, introducing the "adaptive thinking" mechanism with the release of Opus 4.6 on February 9th; and second, adjusting the default effort level from High to Medium on March 3rd**. Boris suggested that users manually restore high-intensity thinking via the `/effort high` command or by modifying configuration files.

However, **this explanation did not quell community doubts.** Multiple developers stated that even with effort set to the highest level, "slacking" behavior (an eagerness to just finish the task) still persists. User richardjennings commented:

> "I had no idea the default effort had been changed to Medium until the output quality took a nosedive. I spent roughly a full day of work trying to correct these issues."

## Data Proof: Thinking Depth Plummets, Behavior Distorted

Laurenzo's analysis is based on 6,852 Claude Code session JSONL files accumulated by her team in the `~/.claude/projects/` directory, covering 17,871 thinking blocks, 234,760 tool calls, and over 18,000 user prompts. The timeframe spans from late January 2026 to early April, using the Anthropic official API to connect directly to the Opus model throughout.

![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/32d9ffb5-b27a-42ef-83fe-6bd1d29358e3.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

**The data reveals a clear timeline of degradation.** Between January 30th and February 8th (the "high-quality period"), the median thinking depth of Claude Code was approximately 2,200 characters; by late February, this figure plummeted to about 720 characters, a 67% decrease; in early March, it further shrank to about 560 characters, a 75% decrease.

![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/d3a98a1c-580e-4df9-b150-63fcc714fb8b.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

**The collapse in thinking depth directly triggered a fundamental shift in tool usage patterns.** During the high-quality period, Claude Code's "read-to-edit ratio" (number of file reads before each edit) was as high as 6.6, adhering to a rigorous workflow of "research first, then modify." However, during the "degradation period" after March 8th, this ratio dropped sharply to 2.0, representing a 70% reduction in research effort. Even more alarming, during the degradation period, one out of every three code modifications was made directly without reading the target file — this led directly to frequent low-level errors, such as inserting code in the wrong place or breaking the semantic association of comments.

![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/f6acf29f-1284-4446-afef-e3e2631d56ce.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

**Quantitative indicators at the behavioral level are equally alarming.** The `stop-phrase-guard.sh` script, used to capture misbehaviors such as "evading responsibility, premature termination, and requesting permission," had never been triggered before March 8th; since then, **the number of triggers has surged to 173 in 17 days, averaging 10 per day**. The proportion of negative sentiment in user prompts rose from 5.8% to 9.8%, an increase of 68%; the user interruption rate (the frequency with which users detect model errors and forcibly terminate) soared 12-fold from the high-quality period to the later period.

![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/9eab226f-d6bf-45fa-a3b1-e073755f8607.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/482c6122-7797-4ffe-9b4c-8afd50b7f17b.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

## Hidden "Redact Thinking" Feature: Was Degradation Deliberately Obscured?

Laurenzo's analysis points out that the aforementioned degradation aligns highly with the deployment timeline of a feature named `redact-thinking-2026-02-12`. Data shows that this feature began a gradual rollout (1.5%) starting March 5th, covered over 99% of requests by March 10th-11th, and became fully effective from March 12th.

This feature strips thinking content from API responses, preventing users from externally observing the model's actual reasoning process. Laurenzo believes this design objectively rendered the degradation of thinking depth invisible to users — "**The hidden feature launched in early March simply made this degradation invisible to users.**"

She further noted that the decline in thinking depth actually began in mid-February, before the feature's launch. This coincides with Anthropic's release of Opus 4.6 on February 9th, which introduced the "adaptive thinking" mode, and the adjustment of the default thinking level to "Medium effort" (effort=85) on March 3rd.

The report also found that **thinking depth exhibited distinct temporal fluctuation characteristics after the hidden feature went live** — 17:00 Pacific Time (end of US West Coast working hours) was the worst period of the day, with a median estimated thinking depth of only 423 characters; 19:00 was the second-worst period, at only 373 characters.

![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/a853f057-5638-47f3-92c7-f5b605020b49.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

This pattern does not align with a fixed budget allocation; rather, it more closely resembles the characteristics of a load-sensitive dynamic allocation system, suggesting that thinking resources might fluctuate in real-time with platform load.

## Anthropic's Official Response: Configuration Issue, Not Model Degradation

Facing the rapid escalation of the GitHub issue, Claude Code team member Boris responded on both GitHub and Hacker News within hours, acknowledging some problems and providing technical explanations.

Boris's core clarifications include:

> -   First, the redact-thinking feature is a UI-level change and does not affect the actual reasoning process. Users can restore its display via the `showThinkingSummaries: true` option in `settings.json`;
> -   Second, the drop in thinking depth in late February is mainly related to the introduction of the adaptive thinking mechanism in Opus 4.6 on February 9th and the adjustment of the default effort level to Medium on March 3rd. The former can be disabled by `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1`, and the latter can be manually increased via `/effort high` or `/effort max`.
> 
> ![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/c5cf20a3-1bcd-4987-98ee-9e256b36e63c.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

Boris also stated that the team plans to test increasing the default effort level for Teams and Enterprise users to High and is investigating issues reported by some users regarding insufficient reasoning allocation in specific rounds of the adaptive thinking mechanism.

However, this explanation has drawn widespread skepticism from the community. User koverstreet responded:

> "The problem is much more than just the default thinking level being changed to Medium. Even with the effort set to maximum, the model's 'eager to finish' slacking behavior has noticeably increased."

Another user directly pointed out that the submitter of the original report was already using all known public settings, implying the issue was not due to improper configuration. One user sarcastically asked:

> "What kind of spirit is this — telling users 'you've configured it wrong'."

## Cost Avalanche and User Exodus

The consequences of degradation extend beyond quality loss, triggering a catastrophic inflation in costs.

Laurenzo's data shows that from February to March, her team's user prompt volume remained nearly constant (5,608 vs 5,701), but API requests surged 80-fold, total input tokens increased 170-fold, and output tokens grew 64-fold. Based on Bedrock Opus pricing, the estimated monthly cost skyrocketed from $345 to $42,121, a 122-fold increase.

![Image](https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/fb132bde-5420-419a-9f80-8aefb65f1be8.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg)

Laurenzo explained that the cost surge was partly due to the team actively scaling up the number of concurrent Agents, but the ineffective loops, frequent interruptions, and retries caused by the degradation itself amplified the API requests per unit of effective work by an additional 8 to 16 times. The team was ultimately forced to shut down the entire Agent cluster and revert to a single-session manual supervision mode. Laurenzo wrote:

> "Human input workload remained almost the same, but the model consumed 80 times more API requests and 64 times more output tokens, yet produced significantly worse results."

In the Hacker News discussion, numerous users expressed similar experiences, with some announcing their switch to OpenAI Codex or other alternatives. "I have canceled my subscription and switched to Codex"; "Now using Qwen3.5-27b. Although it's not as sharp as Opus was two months ago, we can finally move forward with our work again."

## User Self-Help: Temporary Workarounds

Faced with degradation, some developers have devised several temporary coping strategies.

Explicitly authorizing actions in `CLAUDE.md` is the most common approach — by writing instructions like "You have the right to edit any file in this project" and "Do not ask for confirmation during refactoring" in the configuration file at the project's root directory, the frequency of safety interruptions can be reduced by approximately 70% in practice.

Breaking down complex tasks into subtasks with clear boundaries has also been widely verified as effective. Compared to "refactor the entire authentication system," a directive with clear boundaries like "refactor only auth.js and output a summary of changes upon completion" can significantly reduce the model's premature termination behavior.

At the settings level, setting effort to `high` or `max` and disabling adaptive thinking via `CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1` are currently the most direct intervention methods officially recognized.

In her report, Laurenzo proposed more systemic demands: Anthropic should publicly disclose thinking token allocation, introduce a dedicated "full thinking" subscription tier for complex engineering workflows, and expose the `thinking_tokens` field in API responses, allowing users to independently monitor whether reasoning depth meets standards.

### 相关股票

- [GraniteShares 2x Long AMD Daily ETF (AMDL.US)](https://longbridge.com/zh-CN/quote/AMDL.US.md)
- [Direxion Daily AMD Bull 2X Shares (AMUU.US)](https://longbridge.com/zh-CN/quote/AMUU.US.md)
- [VG Info Tech (VGT.US)](https://longbridge.com/zh-CN/quote/VGT.US.md)
- [Invesco S&P 500 Equal Weight Tech ETF (RSPT.US)](https://longbridge.com/zh-CN/quote/RSPT.US.md)
- [ISHRS S&P Glb It (IXN.US)](https://longbridge.com/zh-CN/quote/IXN.US.md)
- [Invesco AI and Next Gen Software ETF (IGPT.US)](https://longbridge.com/zh-CN/quote/IGPT.US.md)
- [AMD (AMD.US)](https://longbridge.com/zh-CN/quote/AMD.US.md)

## 相关资讯与研究

- [Advanced Micro Devices, Inc. $AMD Shares Purchased by Leo Wealth LLC](https://longbridge.com/zh-CN/news/281864771.md)
- [AI Search Engineers Recognized as a Leading AI Certified Agency for Ranking Businesses in AI Search Results](https://longbridge.com/zh-CN/news/281223038.md)
- [Ericsson's Vonage to Provide Broot.ai With Voice APIs](https://longbridge.com/zh-CN/news/281508301.md)
- [The Next Big AI Winner Might Not Be a Tech Company](https://longbridge.com/zh-CN/news/281689441.md)
- [Napster is Evolving in the AI Era](https://longbridge.com/zh-CN/news/281749361.md)