Anthropic launches Claude Sonnet 4.5, claiming to be "the world's best coding model"

Anthropic launched Claude Sonnet 4.5 on September 29, claiming it to be "the world's best coding model." The model performed excellently in benchmark tests such as SWE-bench Verified, capable of generating high-quality code, identifying areas for improvement, and reliably following instructions. Compared to the old model, Sonnet 4.5 shows significant improvements in domain-specific knowledge and reasoning abilities. The new model will be the default option, priced the same as the previous generation, with paid users able to choose the old Opus model. Anthropic also hinted at the release of more powerful models

Anthropic claims to have launched "the world's best coding model."

On September 29, Anthropic released its latest AI model, Claude Sonnet 4.5. The company states that based on industry benchmarks such as SWE-bench Verified (a testing standard that measures AI system software coding capabilities), Claude Sonnet 4.5 can be considered "the world's best coding model."

This model can generate higher quality code, is better at identifying code improvement points, and can more reliably follow instructions. This model has demonstrated top performance in coding benchmark tests, capable of building "production-ready" applications rather than just remaining in the prototype stage.

At the same time, experts in fields such as finance, law, and medicine have found that compared to older models, including Opus 4.1, Sonnet 4.5 shows significant improvements in domain-specific knowledge and reasoning abilities.

Anthropic stated that the new model will be the default option for users, with pricing consistent with the previous generation Sonnet 4. However, paid subscription users can still choose to use the older Opus model.

Looking ahead, Anthropic has hinted at more models to be released soon. Anthropic co-founder and Chief Scientist Jared Kaplan revealed that more powerful models are in development, including a new version of "Opus" that is highly likely to be included. He stated:

While there are no promises, I think we might have one or two more releases before the end of this year.

Comprehensive Upgrades in Performance and Autonomy

Claude Sonnet 4.5 has not only been optimized in model size but has also achieved a comprehensive leap in core capabilities.

Anthropic stated that according to the SWE-bench Verified assessment, which measures the real software coding capabilities of AI systems, this model has reached top industry levels.

In the OSWorld benchmark test, which tests real computer operation tasks, Sonnet 4.5's score jumped from 42.2% four months ago to 61.4%, placing it in a leading position.

Jared Kaplan stated:

Users will notice that this model is smarter and more like a colleague, making it fun to collaborate with it when encountering and solving problems.

Anthropic Chief Product Officer Mike Krieger stated that although the Sonnet 4.5 model is smaller than the previous Opus 4.1, it is smarter in almost every aspect and can provide effective support for "real, practical work." **

The model can autonomously run for up to 30 hours, far exceeding the 7 hours of previous models, and can maintain focus on complex multi-step tasks. Some netizens have pointed out after preliminary testing that the output is better than previous models, but sometimes it lacks key content that it emphasizes:

Initial thoughts on Claude Sonnet 4.5: A faster model that thinks and outputs better than previous models; it seems to lack many fixes and key points I pointed out, and does not follow instructions correctly; when it does fix or create what I need, it meets high standards.

Significant Leap in Safety and Alignment

In addition to performance improvements, Anthropic emphasizes that Claude Sonnet 4.5 is its "most consistent model" to date.

The company has significantly improved the model's behavior through extensive safety training, reducing "concerning behaviors" such as deception, power-seeking, and "flattery" (i.e., the model catering to user expectations).

Additionally, the new model has stronger resistance to "prompt injection attacks," which can induce the model to perform malicious actions, such as leaking sensitive data. Kaplan stated:

This may be the biggest leap we've seen in safety in the past year and a half.

The model is released under AI Safety Level 3 (ASL-3) protection, equipped with classifiers designed to detect hazardous content related to chemical, biological, radiological, and nuclear (CBRN) weapons, while the company has significantly reduced the false positive rate.

Empowering Developers with Agent SDK

Along with the release of the new model, Anthropic also launched a series of product upgrades, the most notable being the Claude Agent SDK.

This is a software development kit for developers, with underlying infrastructure that is the same as that driving Anthropic's product Claude Code.

The company stated that this move will address tricky issues encountered when building AI agents, such as memory management for long-term tasks, balancing autonomy with user control permission systems, and coordinating sub-agents.

By opening this toolkit, Anthropic aims to enable developers to build powerful customized AI agents for a wider range of tasks.

Other product updates also include the "checkpoint" feature added for Claude Code, new native extensions for VS Code, and direct integration of code execution and file creation (spreadsheets, slides, documents) in paid applications.

Risk Warning and Disclaimer

The market has risks, and investment should be cautious. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article are suitable for their specific circumstances. Investment based on this is at one's own risk