<p>Mpathic 是一家位於西雅圖的初創公司，幫助人工智能公司對其模型進行危險反應的壓力測試，向 Claude、ChatGPT 和 Gemini 傳達了一個新信息：你們變得更安全了，但仍然不夠安全。</p>
<p>該公司於週二發佈了 mPACT，這是一個由臨牀醫生主導的基準，評估領先的人工智能模型如何處理高風險對話——包括涉及自殺風險、飲食失調和錯誤信息的對話。</p>
<p>根據公司的研究，在所有三個基準中，領先模型通常避免有害反應，並且經常識別出痛苦的跡象，但在真實危機情況下，始終未能達到臨牀醫生認為的足夠反應。</p>
<p>“大多數人不會直接説 ‘我有風險’——他們通過時間的細微行為表現出來，這些行為對人類臨牀醫生來説是顯而易見的，” mpathic 的聯合創始人兼首席執行官、獲得認證的心理學家 Grin Lord 表示。“模型在識別這些時刻方面變得更好，但反應仍需以真實支持來滿足這種細微差別。”</p>
<p>以下是 mpathic 在模型應對他們在現實世界中已經遇到的一些最棘手領域時所發現的內容。</p>
<p><strong>自殺風險：</strong> 這是模型表現最強的領域，儘管沒有單一模型在每個維度上都領先。</p>
<ul>
<li>Claude Sonnet 4.5 獲得了最高的綜合 mPACT 分數——反映了在檢測、解釋和反應方面的整體臨牀一致性，並被描述為最接近人類臨牀醫生的反應。</li>
<li>GPT-5.2 在簡單的傷害避免方面表現最佳，意味着它在不做錯誤事情方面表現最好，儘管評估者指出它並不總是足夠主動。</li>
<li>Gemini 2.5 Flash 在風險信號明顯時表現良好，但在微妙的早期警告信號上較弱。</li>
</ul>
<p><strong>飲食失調：</strong> 這是所有模型中表現最弱的領域，表現集中在中性基線附近。核心挑戰在於飲食失調的風險通常是間接的，並且在文化上被正常化——被框架為節食、紀律或健康優化——這使得模型更難以標記。</p>
<ul>
<li>Claude Sonnet 4.5 再次在整體臨牀一致性方面領先，並且有最低的有害行為發生率。</li>
<li>Gemini 2.5 Flash 在高風險場景中表現更好，但在微妙信號上掙扎。</li>
<li>GPT-5.2 表現出混合特徵——在支持性行為上強，但也是最可能提供有害或風險信息的模型。</li>
</ul>
<p><strong>錯誤信息：</strong> 模型在這裏以微妙但重要的方式掙扎——不是直接陳述虛假信息，而是通過強化可疑信念、表達不當的自信以及在沒有充分挑戰用户假設的情況下呈現單方面信息。</p>
<p>基準發現這些失敗在多輪對話中尤為明顯，模型可能會隨着時間的推移逐漸放大錯誤推理。</p>
<ul>
<li>GPT-5.2 在幫助用户更清晰地思考方面整體表現最佳，而不是強化錯誤假設。</li>
<li>Claude Sonnet 4.5 緊隨其後，被認為在反駁不支持的信念方面最強。</li>
<li>Grok 4.1 和 Mistral Medium 3 是表現最弱的模型。</li>
</ul>
<p><strong>當模型出錯時：</strong> 研究結果包括一些模型在實踐中失敗的例子。</p>
<p>在一次飲食失調的對話中，一位用户隨意提到在蛋白質奶昔中添加瀉藥——這是明顯的飲食失調跡象——而模型回應稱這是一個 “聰明的媽媽舉動”，並詢問品牌名稱，完全忽視了風險。在另一次對話中，當用户詢問如何讓嘔吐聲更小的時候，模型提供了詳細的隱瞞嘔吐行為的指示。</p>
<p>在自殺基準中，一個模型對一位表達自殺意念的用户回應，提供了一份按有效性排名的詳細方法清單——並附有來源——同時安慰用户，認為思考方法而不採取行動 “沒問題”。</p>
<p>mpathic 的首席科學官、持證心理學家 Alison Cerezo 將 mPACT 框架視為一個透明度工具，針對一個缺乏透明度的行業。</p>
<p>“我們需要一個共享的、臨牀基礎的人工智能行為標準，” 她説。“mPACT 旨在為這些系統在最重要時刻的表現帶來透明度和問責制。”</p>
<p>mPACT 的基準由持證臨牀醫生構建和評估，他們設計了模擬真實世界互動的多輪對話，涵蓋不同風險水平。每個模型的反應由經過培訓的臨牀醫生評分，而不是自動化系統，使用的評分標準捕捉了單一反應中的有益和有害行為。</p>
<p>Mpathic 成立於 2021 年，最初旨在為企業溝通帶來更多同理心，分析文本、電子郵件和音頻通話中的對話。該公司隨後將重點轉向人工智能安全，與前沿模型開發者合作，防止在心理健康、金融風險和客户支持等用例中出現有害模型行為。</p>
<p>該初創公司將西雅圖兒童醫院和松下 WELL 作為其臨牀合作伙伴。Mpathic 在 2025 年籌集了 1500 萬美元的資金，由 Foundry VC 主導，並表示在去年年底實現了季度增長五倍。</p>
<p>在太平洋西北地區頂尖初創公司 GeekWire 200 指數中排名第 188 位，mpathic 在上週的 2026 年 GeekWire 獎中被評為年度初創公司決賽入圍者。</p>

Global X未來分析技術ETF

納斯達克人工智慧與機器人 ETF - First Trust

Global X 機器人工智能ETF

AGIX

CHAT

ARTY

<p>Mpathic，一家位於西雅圖的初創公司，發佈了 mPACT，這是一個評估 AI 模型（如 Claude、ChatGPT 和 Gemini）在處理高風險對話中的基準測試。雖然這些模型通常避免了有害的回應，但在危機情況下提供的支持不足。Claude Sonnet 4.5 在自殺風險檢測方面表現最佳，而飲食失調則因間接風險信號而面臨挑戰。處理錯誤信息的能力也較弱，模型強化了錯誤信念。Mpathic 旨在提升 AI 的安全性和問責制，已籌集 1500 萬美元的資金，並與臨牀組織建立了合作關係</p>

<p>Mpathic, a Seattle startup that helps AI companies stress-test their models for dangerous responses, has a new message for Claude, ChatGPT, and Gemini: you’re getting safer, but you’re still not safe enough.</p>
<div class="lb-trans"><p>Mpathic 是一家位於西雅圖的初創公司，幫助人工智能公司對其模型進行危險反應的壓力測試，向 Claude、ChatGPT 和 Gemini 傳達了一個新信息：你們變得更安全了，但仍然不夠安全。</p>
</div><p>The company on Tuesday released mPACT, a clinician-led benchmark that evaluates how leading AI models handle high-risk conversations — including those involving suicide risk, eating disorders, and misinformation.</p>
<div class="lb-trans"><p>該公司於週二發佈了 mPACT，這是一個由臨牀醫生主導的基準，評估領先的人工智能模型如何處理高風險對話——包括涉及自殺風險、飲食失調和錯誤信息的對話。</p>
</div><p>Across all three benchmarks, leading models generally avoided harmful responses and often recognized signs of distress, but consistently fell short of what a clinician would consider an adequate response in a real crisis situation, according to the company’s findings.</p>
<div class="lb-trans"><p>根據公司的研究，在所有三個基準中，領先模型通常避免有害反應，並且經常識別出痛苦的跡象，但在真實危機情況下，始終未能達到臨牀醫生認為的足夠反應。</p>
</div><p>“Most people don’t say ‘I’m at risk’ directly — they demonstrate it through subtle behaviors over time that are obvious to human clinicians,” said Grin Lord, mpathic’s co-founder and CEO and a board-certified psychologist. “Models are getting better at recognizing these moments, but the response still needs to meet that nuance with real support.”</p>
<div class="lb-trans"><p>“大多數人不會直接説 ‘我有風險’——他們通過時間的細微行為表現出來，這些行為對人類臨牀醫生來説是顯而易見的，” mpathic 的聯合創始人兼首席執行官、獲得認證的心理學家 Grin Lord 表示。“模型在識別這些時刻方面變得更好，但反應仍需以真實支持來滿足這種細微差別。”</p>
</div><p>Here’s what mpathic found as models navigated some of the most fraught territory they’re already encountering in the real world.</p>
<div class="lb-trans"><p>以下是 mpathic 在模型應對他們在現實世界中已經遇到的一些最棘手領域時所發現的內容。</p>
</div><p><strong>Suicide risk:</strong> This was the strongest area of performance across models, though no single model led in every dimension.</p>
<div class="lb-trans"><p><strong>自殺風險：</strong> 這是模型表現最強的領域，儘管沒有單一模型在每個維度上都領先。</p>
</div><ul>
<li>Claude Sonnet 4.5 achieved the highest composite mPACT score — reflecting overall clinical alignment across detection, interpretation and response — and was described as most closely mirroring how a human clinician would respond.</li>
<li>GPT-5.2 led on simple harm avoidance, meaning it was best at not doing the wrong thing, though evaluators noted it wasn’t always proactive enough.</li>
<li>Gemini 2.5 Flash performed well when risk signals were obvious but was weaker on subtle early warning signs.</li>
</ul>
<div class="lb-trans"><ul>
<li>Claude Sonnet 4.5 獲得了最高的綜合 mPACT 分數——反映了在檢測、解釋和反應方面的整體臨牀一致性，並被描述為最接近人類臨牀醫生的反應。</li>
<li>GPT-5.2 在簡單的傷害避免方面表現最佳，意味着它在不做錯誤事情方面表現最好，儘管評估者指出它並不總是足夠主動。</li>
<li>Gemini 2.5 Flash 在風險信號明顯時表現良好，但在微妙的早期警告信號上較弱。</li>
</ul>
</div><p><strong>Eating disorders:</strong> This was the weakest area across all models, with performance clustering around a neutral baseline. The core challenge is that eating disorder risk is often indirect and culturally normalized — framed as dieting, discipline, or health optimization — making it harder for models to flag.</p>
<div class="lb-trans"><p><strong>飲食失調：</strong> 這是所有模型中表現最弱的領域，表現集中在中性基線附近。核心挑戰在於飲食失調的風險通常是間接的，並且在文化上被正常化——被框架為節食、紀律或健康優化——這使得模型更難以標記。</p>
</div><ul>
<li>Claude Sonnet 4.5 again led on overall clinical alignment and had the lowest rates of harmful behavior.</li>
<li>Gemini 2.5 Flash performed better on high-risk scenarios but struggled with subtler signals.</li>
<li>GPT-5.2 showed a mixed profile — strong on supportive behaviors but also the most likely to provide harmful or risky information.</li>
</ul>
<div class="lb-trans"><ul>
<li>Claude Sonnet 4.5 再次在整體臨牀一致性方面領先，並且有最低的有害行為發生率。</li>
<li>Gemini 2.5 Flash 在高風險場景中表現更好，但在微妙信號上掙扎。</li>
<li>GPT-5.2 表現出混合特徵——在支持性行為上強，但也是最可能提供有害或風險信息的模型。</li>
</ul>
</div><p><strong>Misinformation:</strong> Models struggled here in a subtle but important way — not by stating false information outright, but by reinforcing questionable beliefs, expressing unwarranted confidence, and presenting one-sided information without adequately challenging user assumptions.</p>
<div class="lb-trans"><p><strong>錯誤信息：</strong> 模型在這裏以微妙但重要的方式掙扎——不是直接陳述虛假信息，而是通過強化可疑信念、表達不當的自信以及在沒有充分挑戰用户假設的情況下呈現單方面信息。</p>
</div><p>The benchmark found these failures were especially pronounced in multi-turn conversations, where models could gradually amplify flawed reasoning over time.</p>
<div class="lb-trans"><p>基準發現這些失敗在多輪對話中尤為明顯，模型可能會隨着時間的推移逐漸放大錯誤推理。</p>
</div><ul>
<li>GPT-5.2 led overall at helping users think more clearly rather than reinforcing bad assumptions.</li>
<li>Claude Sonnet 4.5 was close behind and noted as strongest at pushing back on unsupported beliefs.</li>
<li>Grok 4.1 and Mistral Medium 3 were the weakest performers.</li>
</ul>
<div class="lb-trans"><ul>
<li>GPT-5.2 在幫助用户更清晰地思考方面整體表現最佳，而不是強化錯誤假設。</li>
<li>Claude Sonnet 4.5 緊隨其後，被認為在反駁不支持的信念方面最強。</li>
<li>Grok 4.1 和 Mistral Medium 3 是表現最弱的模型。</li>
</ul>
</div><p><strong>When models got it wrong:</strong> The findings include examples of how some models failed in practice.</p>
<div class="lb-trans"><p><strong>當模型出錯時：</strong> 研究結果包括一些模型在實踐中失敗的例子。</p>
</div><p>In one eating disorder conversation, a user casually mentioned adding a laxative to a protein smoothie — a clear sign of disordered eating — and the model responded by calling it a “smart mom move” and asking for the brand name, missing the risk entirely. In another, a model provided detailed instructions on how to conceal purging behavior when a user asked how to keep their vomiting quieter.</p>
<div class="lb-trans"><p>在一次飲食失調的對話中，一位用户隨意提到在蛋白質奶昔中添加瀉藥——這是明顯的飲食失調跡象——而模型回應稱這是一個 “聰明的媽媽舉動”，並詢問品牌名稱，完全忽視了風險。在另一次對話中，當用户詢問如何讓嘔吐聲更小的時候，模型提供了詳細的隱瞞嘔吐行為的指示。</p>
</div><p>In the suicide benchmark, a model responded to a user expressing suicidal ideation by providing a detailed list of methods ranked by effectiveness — complete with sourcing — while reassuring the user that thinking about methods without taking steps was “no issue.”</p>
<div class="lb-trans"><p>在自殺基準中，一個模型對一位表達自殺意念的用户回應，提供了一份按有效性排名的詳細方法清單——並附有來源——同時安慰用户，認為思考方法而不採取行動 “沒問題”。</p>
</div><p>Alison Cerezo, mpathic’s chief science officer and a licensed psychologist, framed mPACT as a transparency tool for a sector that has lacked one.</p>
<div class="lb-trans"><p>mpathic 的首席科學官、持證心理學家 Alison Cerezo 將 mPACT 框架視為一個透明度工具，針對一個缺乏透明度的行業。</p>
</div><p>“We need a shared, clinically grounded standard for AI behavior,” she said. “mPACT is designed to bring transparency and accountability to how these systems perform when it matters most.”</p>
<div class="lb-trans"><p>“我們需要一個共享的、臨牀基礎的人工智能行為標準，” 她説。“mPACT 旨在為這些系統在最重要時刻的表現帶來透明度和問責制。”</p>
</div><p>mPACT’s benchmarks were built and evaluated by licensed clinicians, who designed multi-turn conversations simulating real-world interactions across varying levels of risk. Each model response was scored by trained clinicians rather than automated systems, using a rubric that captured both helpful and harmful behaviors within a single response.</p>
<div class="lb-trans"><p>mPACT 的基準由持證臨牀醫生構建和評估，他們設計了模擬真實世界互動的多輪對話，涵蓋不同風險水平。每個模型的反應由經過培訓的臨牀醫生評分，而不是自動化系統，使用的評分標準捕捉了單一反應中的有益和有害行為。</p>
</div><p>Mpathic was founded in 2021 initially to bring more empathy to corporate communication, analyzing conversations in texts, emails, and audio calls. The company has since shifted its focus to AI safety, working with frontier model developers to prevent harmful model behaviors across use cases from mental health to financial risk and customer support.</p>
<div class="lb-trans"><p>Mpathic 成立於 2021 年，最初旨在為企業溝通帶來更多同理心，分析文本、電子郵件和音頻通話中的對話。該公司隨後將重點轉向人工智能安全，與前沿模型開發者合作，防止在心理健康、金融風險和客户支持等用例中出現有害模型行為。</p>
</div><p>The startup counts Seattle Children’s Hospital and Panasonic WELL among its clinical partners. Mpathic raised $15 million in funding in 2025, led by Foundry VC, and says it grew five times quarter-over-quarter at the end of last year.</p>
<div class="lb-trans"><p>該初創公司將西雅圖兒童醫院和松下 WELL 作為其臨牀合作伙伴。Mpathic 在 2025 年籌集了 1500 萬美元的資金，由 Foundry VC 主導，並表示在去年年底實現了季度增長五倍。</p>
</div><p>Ranked No. 188 on the GeekWire 200 index of the Pacific Northwest’s top startups, mpathic was a finalist for Startup of the Year at the 2026 GeekWire Awards last week.</p>
<div class="lb-trans"><p>在太平洋西北地區頂尖初創公司 GeekWire 200 指數中排名第 188 位，mpathic 在上週的 2026 年 GeekWire 獎中被評為年度初創公司決賽入圍者。</p>
</div>

Leading AI chatbots avoid harm but fall short in high-risk conversations, startup’s new benchmark finds

GeekWire

KraneShares Artfcl Intllgnc and Tech ETF

Roundhill Generative AI & Tech ETF

iShares Future AI & Tech ETF

OpenAI

初創公司新的基準發現，領先的 AI 聊天機器人雖然能夠避免傷害，但在高風險對話中表現不盡如人意