阿里巴巴通义团队用四连发的重磅出击，横扫 Github 开源榜单。
本周从 7 月 22 日到 25 日，阿里接连推出 Qwen3-235B 非思考版本、Qwen3-Coder 编程模型、Qwen3-235B-A22B-Thinking-2507 推理模型，以及 WebSailor AI Agent 框架，四款产品横扫基础模型、编程模型、推理模型和智能体领域的开源榜单。
权威机构 Artificial Analysis 更是直接评价：
<blockquote>
通义千问 3 是全球最智能的非思考基础模型。
</blockquote>
<h2>非思考模型也能性能 “爆表”</h2>
据硬 AI，周二凌晨阿里巴巴通义千问团队推出非思考模式（Non-thinking）最新模型，命名为 Qwen3-235B-A22B-Instruct-2507-FP8。
这款非思考模型在多项关键基准测试中表现出色。不仅全面超越了 Kimi-K2 等顶级开源模型，甚至领先 Claude-Opus4-Non-thinking 等顶级闭源模型。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/79b7eb42-d8d7-4350-890a-076f6c914d67.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="865" height="522" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/79b7eb42-d8d7-4350-890a-076f6c914d67.png"/>
值得一提的是，本次更新的 Qwen3 模型在 Agent 能力尤其亮眼：在 BFCL（Agent 能力）测评中表现卓越。这意味着模型在理解复杂指令、自主规划、调用工具以完成任务的能力上，达到了一个新的高度。“主打 Agent”，将是未来 AI 应用的核心竞争力。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/c5e06ee8-2808-4385-879a-52bf2f6c0d71.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="681" height="1454" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/c5e06ee8-2808-4385-879a-52bf2f6c0d71.png"/>
<div> </div>
<blockquote>
 
</blockquote>
<div>
<h2 id="354364a0">编程模型引发社区沸腾</h2>
</div>
7 月 23 日发布的 Qwen3-Coder 更是在全球开发者社区引发轰动。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/3a5b3bd9-bd87-4e0a-a77d-9f77eb02b9ad.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="1024" height="595" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/3a5b3bd9-bd87-4e0a-a77d-9f77eb02b9ad.png"/>
华尔街见闻此前提及，这款基于 MoE 架构的编程模型拥有 480B 总参数，35B 激活参数，原生支持 256K 上下文，可扩展至 1M。
在开发者最关注的 SWE-bench Verified 基准测试中，Qwen3-Coder 取得了开源模型最佳表现。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/8bb1ce09-ebde-4a85-a9d3-09cd67a9a1aa.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="1024" height="530" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/8bb1ce09-ebde-4a85-a9d3-09cd67a9a1aa.png"/>
模型在 7.5 万亿 token 规模上训练，其中包含 70% 代码数据，并通过长时程强化学习和 2 万个虚拟环境的大规模实战训练，在真实世界的多轮交互任务中展现出色能力。
阿里还推出了配套的命令行工具 Qwen Code，为开发者提供了完整的编程解决方案。
科技界领袖纷纷为 Qwen3-Coder 点赞，例如 Perplexity CEO Aravind Srinivas 盛赞 Qwen3-coder 的实力：
<blockquote>
成果令人惊叹，开源正在获胜。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/77040370-a13f-4740-bc18-270cd53475d8.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="906" height="638" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/77040370-a13f-4740-bc18-270cd53475d8.png"/>
</blockquote>
推特创始人 Jack Dorsey 更是强调 Qwen3 和 Goose——其 Block 公司开发的 AI Agent 框架，搭配使用的话非常棒：
<blockquote>
goose 配合 Qwen3-Coder 等于哇哦
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/e9e0a12c-c293-4bbc-b77c-4971c48fa476.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="749" height="248" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/e9e0a12c-c293-4bbc-b77c-4971c48fa476.png"/>
</blockquote>
<div>
<h2 id="ai-agent">AI Agent 框架挑战闭源垄断</h2>
</div>
阿里通义实验室同期开源的 WebSailor AI Agent 框架，直接对标 OpenAI 的 Deep Research 产品。
这款框架在 BrowseComp-en/zh 测试中性能显著优于所有开源智能体，可媲美专属闭源模型。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/dd4514c9-8e68-41b9-b87a-254da313f283.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="554" height="199" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/dd4514c9-8e68-41b9-b87a-254da313f283.png"/>
WebSailor 采用复杂任务生成和强化学习模块的双重技术架构。通过构建复杂知识图谱和动态采样策略，系统能够在海量信息中进行高效检索和推理。
除了在复杂任务上的卓越表现，WebSailor 在简单任务上也表现出色。例如，在 SimpleQA 基准测试中，WebSailor 的性能超过了所有其他模型产品。
该项目在 GitHub 已获得超过 5000 颗星，并曾拿下每日增长趋势第一名。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/3228428b-f08c-4fee-b4b8-89c146c187b3.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="1024" height="645" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/3228428b-f08c-4fee-b4b8-89c146c187b3.png"/>
WebSailor 的核心技术主要围绕复杂任务生成和强化学习模块展开，这两个模块相互配合，共同推动了开源智能体在复杂信息检索任务中的性能提升。
该框架的开源意义重大，打破了闭源系统在信息检索领域的垄断地位，为全球开发者提供了媲美 Deep Research 的开源解决方案。
<div>
<h2 id="9f391a53">推理模型登顶全球开源冠军</h2>
</div>
7 月 25 日发布的 Qwen3-235B-A22B-Thinking-2507 成为本周最重磅的产品。
<blockquote>
<ul>
<li>AIME25（数学）达到 92.3 分。</li>
<li>LiveCodeBench v6（编程）获得 74.1 分。</li>
<li>WritingBench（写作）达到 88.3 分。</li>
<li>PolyMATH（多语言数学）获得 60.1 分。</li>
</ul>
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/352d1411-64c4-4622-86ac-7ba5634177d8.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="1024" height="659" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/352d1411-64c4-4622-86ac-7ba5634177d8.png"/>
</blockquote>
更详细的榜单表现来看，Qwen3 推理模型相较于其他模型来看也毫不逊色（除了 R1，其他都是顶尖闭源模型）。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/c16b8d61-9a9a-46e1-ab6d-abce75e90b02.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="681" height="1454" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/c16b8d61-9a9a-46e1-ab6d-abce75e90b02.png"/>
该模型采用 MoE 架构，总参数 235B，激活参数 22B，拥有 94 层结构和 128 个专家系统，原生支持 262,144 tokens 上下文长度。模型专为思考模式构建，默认聊天模板自动包含思考标签，为深度推理提供了强大支撑。
OpenRouter 数据显示，阿里千问的 API 调用量在过去几天暴涨，已超过 1000 亿 tokens，包揽最热门调用模型前三名。这一数据直接反映了市场对阿里开源模型的认可度。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/9bbe46f7-9ba7-43a3-8e70-bb396ffca927.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="553" height="291" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/9bbe46f7-9ba7-43a3-8e70-bb396ffca927.png"/>
全球网友也是被通义的最强推理模型给惊呆了。有网友直接表示：
<blockquote>
中国的开源 o4-mini。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/abd22ff1-0f19-4b26-9db7-e2dad00300c5.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="736" height="731" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/abd22ff1-0f19-4b26-9db7-e2dad00300c5.png"/>
</blockquote>
AI Thinkers 更是评论道：
<blockquote>
中国刚刚发布了一款怪物级的 AI 模型。
<img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/303f829f-b58a-4e46-8ce8-0dfce5998aa2.png?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" width="736" height="673" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/303f829f-b58a-4e46-8ce8-0dfce5998aa2.png"/>
</blockquote>

阿里巴巴

阿里巴巴-W

2 倍做多阿里巴巴 ETF - KraneShares

BABO

2 倍做多 BABA ETF - GraniteShares

本周阿里通义团队接连推出 Qwen3-235B 非思考版本、Qwen3-Coder 编程模型、Qwen3-235B-A22B-Thinking-2507 推理模型，以及 WebSailor AI Agent 框架，四款产品横扫基础模型、编程模型、推理模型和智能体领域的开源榜单。

- 阿里巴巴通义团队推出四款新模型，横扫开源榜单。  
- Qwen3-235B-A22B-Thinking-2507 推理模型在多项基准测试中表现优异。  
- WebSailor AI Agent 框架挑战闭源垄断，获得广泛关注。

阿里 AI 四连发，横扫全球开源榜单第一名