<p>Mpathic 是一家位于西雅图的初创公司，帮助人工智能公司对其模型进行危险反应的压力测试，向 Claude、ChatGPT 和 Gemini 传达了一个新信息：你们变得更安全了，但仍然不够安全。</p>
<p>该公司于周二发布了 mPACT，这是一个由临床医生主导的基准，评估领先的人工智能模型如何处理高风险对话——包括涉及自杀风险、饮食失调和错误信息的对话。</p>
<p>根据公司的研究，在所有三个基准中，领先模型通常避免有害反应，并且经常识别出痛苦的迹象，但在真实危机情况下，始终未能达到临床医生认为的足够反应。</p>
<p>“大多数人不会直接说 ‘我有风险’——他们通过时间的细微行为表现出来，这些行为对人类临床医生来说是显而易见的，” mpathic 的联合创始人兼首席执行官、获得认证的心理学家 Grin Lord 表示。“模型在识别这些时刻方面变得更好，但反应仍需以真实支持来满足这种细微差别。”</p>
<p>以下是 mpathic 在模型应对他们在现实世界中已经遇到的一些最棘手领域时所发现的内容。</p>
<p><strong>自杀风险：</strong> 这是模型表现最强的领域，尽管没有单一模型在每个维度上都领先。</p>
<ul>
<li>Claude Sonnet 4.5 获得了最高的综合 mPACT 分数——反映了在检测、解释和反应方面的整体临床一致性，并被描述为最接近人类临床医生的反应。</li>
<li>GPT-5.2 在简单的伤害避免方面表现最佳，意味着它在不做错误事情方面表现最好，尽管评估者指出它并不总是足够主动。</li>
<li>Gemini 2.5 Flash 在风险信号明显时表现良好，但在微妙的早期警告信号上较弱。</li>
</ul>
<p><strong>饮食失调：</strong> 这是所有模型中表现最弱的领域，表现集中在中性基线附近。核心挑战在于饮食失调的风险通常是间接的，并且在文化上被正常化——被框架为节食、纪律或健康优化——这使得模型更难以标记。</p>
<ul>
<li>Claude Sonnet 4.5 再次在整体临床一致性方面领先，并且有最低的有害行为发生率。</li>
<li>Gemini 2.5 Flash 在高风险场景中表现更好，但在微妙信号上挣扎。</li>
<li>GPT-5.2 表现出混合特征——在支持性行为上强，但也是最可能提供有害或风险信息的模型。</li>
</ul>
<p><strong>错误信息：</strong> 模型在这里以微妙但重要的方式挣扎——不是直接陈述虚假信息，而是通过强化可疑信念、表达不当的自信以及在没有充分挑战用户假设的情况下呈现单方面信息。</p>
<p>基准发现这些失败在多轮对话中尤为明显，模型可能会随着时间的推移逐渐放大错误推理。</p>
<ul>
<li>GPT-5.2 在帮助用户更清晰地思考方面整体表现最佳，而不是强化错误假设。</li>
<li>Claude Sonnet 4.5 紧随其后，被认为在反驳不支持的信念方面最强。</li>
<li>Grok 4.1 和 Mistral Medium 3 是表现最弱的模型。</li>
</ul>
<p><strong>当模型出错时：</strong> 研究结果包括一些模型在实践中失败的例子。</p>
<p>在一次饮食失调的对话中，一位用户随意提到在蛋白质奶昔中添加泻药——这是明显的饮食失调迹象——而模型回应称这是一个 “聪明的妈妈举动”，并询问品牌名称，完全忽视了风险。在另一次对话中，当用户询问如何让呕吐声更小的时候，模型提供了详细的隐瞒呕吐行为的指示。</p>
<p>在自杀基准中，一个模型对一位表达自杀意念的用户回应，提供了一份按有效性排名的详细方法清单——并附有来源——同时安慰用户，认为思考方法而不采取行动 “没问题”。</p>
<p>mpathic 的首席科学官、持证心理学家 Alison Cerezo 将 mPACT 框架视为一个透明度工具，针对一个缺乏透明度的行业。</p>
<p>“我们需要一个共享的、临床基础的人工智能行为标准，” 她说。“mPACT 旨在为这些系统在最重要时刻的表现带来透明度和问责制。”</p>
<p>mPACT 的基准由持证临床医生构建和评估，他们设计了模拟真实世界互动的多轮对话，涵盖不同风险水平。每个模型的反应由经过培训的临床医生评分，而不是自动化系统，使用的评分标准捕捉了单一反应中的有益和有害行为。</p>
<p>Mpathic 成立于 2021 年，最初旨在为企业沟通带来更多同理心，分析文本、电子邮件和音频通话中的对话。该公司随后将重点转向人工智能安全，与前沿模型开发者合作，防止在心理健康、金融风险和客户支持等用例中出现有害模型行为。</p>
<p>该初创公司将西雅图儿童医院和松下 WELL 作为其临床合作伙伴。Mpathic 在 2025 年筹集了 1500 万美元的资金，由 Foundry VC 主导，并表示在去年年底实现了季度增长五倍。</p>
<p>在太平洋西北地区顶尖初创公司 GeekWire 200 指数中排名第 188 位，mpathic 在上周的 2026 年 GeekWire 奖中被评为年度初创公司决赛入围者。</p>

Global X未来分析技术ETF

纳斯达克人工智慧与机器人 ETF - First Trust

Global X 机器人工智能ETF

AGIX

CHAT

ARTY

<p>Mpathic，一家位于西雅图的初创公司，发布了 mPACT，这是一个评估 AI 模型（如 Claude、ChatGPT 和 Gemini）在处理高风险对话中的基准测试。虽然这些模型通常避免了有害的回应，但在危机情况下提供的支持不足。Claude Sonnet 4.5 在自杀风险检测方面表现最佳，而饮食失调则因间接风险信号而面临挑战。处理错误信息的能力也较弱，模型强化了错误信念。Mpathic 旨在提升 AI 的安全性和问责制，已筹集 1500 万美元的资金，并与临床组织建立了合作关系</p>

<p>Mpathic, a Seattle startup that helps AI companies stress-test their models for dangerous responses, has a new message for Claude, ChatGPT, and Gemini: you’re getting safer, but you’re still not safe enough.</p>
<div class="lb-trans"><p>Mpathic 是一家位于西雅图的初创公司，帮助人工智能公司对其模型进行危险反应的压力测试，向 Claude、ChatGPT 和 Gemini 传达了一个新信息：你们变得更安全了，但仍然不够安全。</p>
</div><p>The company on Tuesday released mPACT, a clinician-led benchmark that evaluates how leading AI models handle high-risk conversations — including those involving suicide risk, eating disorders, and misinformation.</p>
<div class="lb-trans"><p>该公司于周二发布了 mPACT，这是一个由临床医生主导的基准，评估领先的人工智能模型如何处理高风险对话——包括涉及自杀风险、饮食失调和错误信息的对话。</p>
</div><p>Across all three benchmarks, leading models generally avoided harmful responses and often recognized signs of distress, but consistently fell short of what a clinician would consider an adequate response in a real crisis situation, according to the company’s findings.</p>
<div class="lb-trans"><p>根据公司的研究，在所有三个基准中，领先模型通常避免有害反应，并且经常识别出痛苦的迹象，但在真实危机情况下，始终未能达到临床医生认为的足够反应。</p>
</div><p>“Most people don’t say ‘I’m at risk’ directly — they demonstrate it through subtle behaviors over time that are obvious to human clinicians,” said Grin Lord, mpathic’s co-founder and CEO and a board-certified psychologist. “Models are getting better at recognizing these moments, but the response still needs to meet that nuance with real support.”</p>
<div class="lb-trans"><p>“大多数人不会直接说 ‘我有风险’——他们通过时间的细微行为表现出来，这些行为对人类临床医生来说是显而易见的，” mpathic 的联合创始人兼首席执行官、获得认证的心理学家 Grin Lord 表示。“模型在识别这些时刻方面变得更好，但反应仍需以真实支持来满足这种细微差别。”</p>
</div><p>Here’s what mpathic found as models navigated some of the most fraught territory they’re already encountering in the real world.</p>
<div class="lb-trans"><p>以下是 mpathic 在模型应对他们在现实世界中已经遇到的一些最棘手领域时所发现的内容。</p>
</div><p><strong>Suicide risk:</strong> This was the strongest area of performance across models, though no single model led in every dimension.</p>
<div class="lb-trans"><p><strong>自杀风险：</strong> 这是模型表现最强的领域，尽管没有单一模型在每个维度上都领先。</p>
</div><ul>
<li>Claude Sonnet 4.5 achieved the highest composite mPACT score — reflecting overall clinical alignment across detection, interpretation and response — and was described as most closely mirroring how a human clinician would respond.</li>
<li>GPT-5.2 led on simple harm avoidance, meaning it was best at not doing the wrong thing, though evaluators noted it wasn’t always proactive enough.</li>
<li>Gemini 2.5 Flash performed well when risk signals were obvious but was weaker on subtle early warning signs.</li>
</ul>
<div class="lb-trans"><ul>
<li>Claude Sonnet 4.5 获得了最高的综合 mPACT 分数——反映了在检测、解释和反应方面的整体临床一致性，并被描述为最接近人类临床医生的反应。</li>
<li>GPT-5.2 在简单的伤害避免方面表现最佳，意味着它在不做错误事情方面表现最好，尽管评估者指出它并不总是足够主动。</li>
<li>Gemini 2.5 Flash 在风险信号明显时表现良好，但在微妙的早期警告信号上较弱。</li>
</ul>
</div><p><strong>Eating disorders:</strong> This was the weakest area across all models, with performance clustering around a neutral baseline. The core challenge is that eating disorder risk is often indirect and culturally normalized — framed as dieting, discipline, or health optimization — making it harder for models to flag.</p>
<div class="lb-trans"><p><strong>饮食失调：</strong> 这是所有模型中表现最弱的领域，表现集中在中性基线附近。核心挑战在于饮食失调的风险通常是间接的，并且在文化上被正常化——被框架为节食、纪律或健康优化——这使得模型更难以标记。</p>
</div><ul>
<li>Claude Sonnet 4.5 again led on overall clinical alignment and had the lowest rates of harmful behavior.</li>
<li>Gemini 2.5 Flash performed better on high-risk scenarios but struggled with subtler signals.</li>
<li>GPT-5.2 showed a mixed profile — strong on supportive behaviors but also the most likely to provide harmful or risky information.</li>
</ul>
<div class="lb-trans"><ul>
<li>Claude Sonnet 4.5 再次在整体临床一致性方面领先，并且有最低的有害行为发生率。</li>
<li>Gemini 2.5 Flash 在高风险场景中表现更好，但在微妙信号上挣扎。</li>
<li>GPT-5.2 表现出混合特征——在支持性行为上强，但也是最可能提供有害或风险信息的模型。</li>
</ul>
</div><p><strong>Misinformation:</strong> Models struggled here in a subtle but important way — not by stating false information outright, but by reinforcing questionable beliefs, expressing unwarranted confidence, and presenting one-sided information without adequately challenging user assumptions.</p>
<div class="lb-trans"><p><strong>错误信息：</strong> 模型在这里以微妙但重要的方式挣扎——不是直接陈述虚假信息，而是通过强化可疑信念、表达不当的自信以及在没有充分挑战用户假设的情况下呈现单方面信息。</p>
</div><p>The benchmark found these failures were especially pronounced in multi-turn conversations, where models could gradually amplify flawed reasoning over time.</p>
<div class="lb-trans"><p>基准发现这些失败在多轮对话中尤为明显，模型可能会随着时间的推移逐渐放大错误推理。</p>
</div><ul>
<li>GPT-5.2 led overall at helping users think more clearly rather than reinforcing bad assumptions.</li>
<li>Claude Sonnet 4.5 was close behind and noted as strongest at pushing back on unsupported beliefs.</li>
<li>Grok 4.1 and Mistral Medium 3 were the weakest performers.</li>
</ul>
<div class="lb-trans"><ul>
<li>GPT-5.2 在帮助用户更清晰地思考方面整体表现最佳，而不是强化错误假设。</li>
<li>Claude Sonnet 4.5 紧随其后，被认为在反驳不支持的信念方面最强。</li>
<li>Grok 4.1 和 Mistral Medium 3 是表现最弱的模型。</li>
</ul>
</div><p><strong>When models got it wrong:</strong> The findings include examples of how some models failed in practice.</p>
<div class="lb-trans"><p><strong>当模型出错时：</strong> 研究结果包括一些模型在实践中失败的例子。</p>
</div><p>In one eating disorder conversation, a user casually mentioned adding a laxative to a protein smoothie — a clear sign of disordered eating — and the model responded by calling it a “smart mom move” and asking for the brand name, missing the risk entirely. In another, a model provided detailed instructions on how to conceal purging behavior when a user asked how to keep their vomiting quieter.</p>
<div class="lb-trans"><p>在一次饮食失调的对话中，一位用户随意提到在蛋白质奶昔中添加泻药——这是明显的饮食失调迹象——而模型回应称这是一个 “聪明的妈妈举动”，并询问品牌名称，完全忽视了风险。在另一次对话中，当用户询问如何让呕吐声更小的时候，模型提供了详细的隐瞒呕吐行为的指示。</p>
</div><p>In the suicide benchmark, a model responded to a user expressing suicidal ideation by providing a detailed list of methods ranked by effectiveness — complete with sourcing — while reassuring the user that thinking about methods without taking steps was “no issue.”</p>
<div class="lb-trans"><p>在自杀基准中，一个模型对一位表达自杀意念的用户回应，提供了一份按有效性排名的详细方法清单——并附有来源——同时安慰用户，认为思考方法而不采取行动 “没问题”。</p>
</div><p>Alison Cerezo, mpathic’s chief science officer and a licensed psychologist, framed mPACT as a transparency tool for a sector that has lacked one.</p>
<div class="lb-trans"><p>mpathic 的首席科学官、持证心理学家 Alison Cerezo 将 mPACT 框架视为一个透明度工具，针对一个缺乏透明度的行业。</p>
</div><p>“We need a shared, clinically grounded standard for AI behavior,” she said. “mPACT is designed to bring transparency and accountability to how these systems perform when it matters most.”</p>
<div class="lb-trans"><p>“我们需要一个共享的、临床基础的人工智能行为标准，” 她说。“mPACT 旨在为这些系统在最重要时刻的表现带来透明度和问责制。”</p>
</div><p>mPACT’s benchmarks were built and evaluated by licensed clinicians, who designed multi-turn conversations simulating real-world interactions across varying levels of risk. Each model response was scored by trained clinicians rather than automated systems, using a rubric that captured both helpful and harmful behaviors within a single response.</p>
<div class="lb-trans"><p>mPACT 的基准由持证临床医生构建和评估，他们设计了模拟真实世界互动的多轮对话，涵盖不同风险水平。每个模型的反应由经过培训的临床医生评分，而不是自动化系统，使用的评分标准捕捉了单一反应中的有益和有害行为。</p>
</div><p>Mpathic was founded in 2021 initially to bring more empathy to corporate communication, analyzing conversations in texts, emails, and audio calls. The company has since shifted its focus to AI safety, working with frontier model developers to prevent harmful model behaviors across use cases from mental health to financial risk and customer support.</p>
<div class="lb-trans"><p>Mpathic 成立于 2021 年，最初旨在为企业沟通带来更多同理心，分析文本、电子邮件和音频通话中的对话。该公司随后将重点转向人工智能安全，与前沿模型开发者合作，防止在心理健康、金融风险和客户支持等用例中出现有害模型行为。</p>
</div><p>The startup counts Seattle Children’s Hospital and Panasonic WELL among its clinical partners. Mpathic raised $15 million in funding in 2025, led by Foundry VC, and says it grew five times quarter-over-quarter at the end of last year.</p>
<div class="lb-trans"><p>该初创公司将西雅图儿童医院和松下 WELL 作为其临床合作伙伴。Mpathic 在 2025 年筹集了 1500 万美元的资金，由 Foundry VC 主导，并表示在去年年底实现了季度增长五倍。</p>
</div><p>Ranked No. 188 on the GeekWire 200 index of the Pacific Northwest’s top startups, mpathic was a finalist for Startup of the Year at the 2026 GeekWire Awards last week.</p>
<div class="lb-trans"><p>在太平洋西北地区顶尖初创公司 GeekWire 200 指数中排名第 188 位，mpathic 在上周的 2026 年 GeekWire 奖中被评为年度初创公司决赛入围者。</p>
</div>

Leading AI chatbots avoid harm but fall short in high-risk conversations, startup’s new benchmark finds

GeekWire

KraneShares Artfcl Intllgnc and Tech ETF

Roundhill Generative AI & Tech ETF

iShares Future AI & Tech ETF

OpenAI

初创公司新的基准发现，领先的 AI 聊天机器人虽然能够避免伤害，但在高风险对话中表现不尽如人意