---
title: "Alibaba AI voice model beats OpenAI, xAI to bridge Chinese dialect gap"
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/288128873.md"
description: "Alibaba's Tongyi Lab developed the Fun-Realtime-TTS-Preview AI voice model, which ranked fifth globally on the Artificial Analysis Speech Arena leaderboard, outperforming Western rivals like OpenAI and xAI. It is the only Chinese-engineered system in the top five, excelling in capturing complex Chinese dialects and accents. Additionally, Alibaba's ASR model ranked first with a 1.8% word error rate. This breakthrough addresses accuracy bottlenecks for regional Chinese speech, supporting over 30 languages and multiple dialects, while offering enterprise customization for sectors like finance and healthcare."
datetime: "2026-05-30T02:04:33.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/288128873.md)
  - [en](https://longbridge.com/en/news/288128873.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/288128873.md)
---

# Alibaba AI voice model beats OpenAI, xAI to bridge Chinese dialect gap

A new artificial intelligence voice model from Alibaba Group Holding has beaten out Western rivals OpenAI and xAI on a major global benchmark, underscoring its technical edge in capturing complex Chinese dialects and accents. Fun-Realtime-TTS-Preview, developed by Alibaba’s Tongyi Lab, has secured the fifth spot on the Artificial Analysis Speech Arena leaderboard with a score of 1,190. It was the only Chinese-engineered voice system in the global top five. Alibaba owns the South China Morning Post. The Speech Arena benchmark is operated by Artificial Analysis, a San Francisco-based AI evaluation organisation backed by investors including former GitHub chief executive Nat Friedman and Google Brain founder Andrew Ng. The platform ranks models through blind user evaluations of generated speech clips using an Elo-based system. Speech Arena users test how well models can perform across three core capabilities – converting speech into text, enabling end-to-end voice understanding and conversational interaction, and transforming text into natural-sounding speech. In a separate Artificial-Analysis Word Error Rate index, Alibaba’s Fun-Realtime-ASR model ranked first with a word error rate of 1.8 per cent, meaning fewer than two words out of every 100 were transcribed incorrectly. The breakthrough addresses a long-standing bottleneck for voice tech in Asia. According to a May report by the Baidu Developer Centre, traditional speech systems trained on standard Mandarin see their accuracy fall below 60 per cent for accented speakers, and drop to under 30 per cent for regional Chinese dialects. Alibaba has been trying to bridge this gap. According to the firm’s cloud unit, the new model supports more than 30 languages, seven major Chinese dialects and over 20 regional accents. Chinese AI developers are increasingly pivoting from general-purpose chatbots towards embedding voice AI assistants into daily applications in search of broader commercial uses for generative AI technologies. The growing industry focus on speech models reflects expectations that voice interfaces could become a key gateway for deploying AI across industries. As one of the most intuitive forms of human-computer interaction, voice requires little user training. Voice-based AI systems are generally seen as easier for mainstream users to adopt than text-based interfaces because they require less user training and can operate more naturally across devices such as smartphones, smart speakers and in-car assistants. Fun-Realtime-TTS-Preview also provides enterprise-level customisation interfaces tailored to finance and healthcare industry use cases. In medical settings, for example, the system can convert doctors’ spoken notes into structured clinical records in real time. The expansion into speech AI comes as Chinese tech companies focus more on AI systems designed for more specialised real-world applications. However, US companies including Google and ElevenLabs continue to dominate many global commercial voice applications and developer ecosystems.

### Related Stocks

- [BABA.US](https://longbridge.com/en/quote/BABA.US.md)
- [OpenAI.NA](https://longbridge.com/en/quote/OpenAI.NA.md)
- [159998.CN](https://longbridge.com/en/quote/159998.CN.md)
- [516190.CN](https://longbridge.com/en/quote/516190.CN.md)
- [159855.CN](https://longbridge.com/en/quote/159855.CN.md)
- [517770.CN](https://longbridge.com/en/quote/517770.CN.md)
- [513770.CN](https://longbridge.com/en/quote/513770.CN.md)
- [KWEB.US](https://longbridge.com/en/quote/KWEB.US.md)
- [513040.CN](https://longbridge.com/en/quote/513040.CN.md)
- [BABX.US](https://longbridge.com/en/quote/BABX.US.md)
- [516620.CN](https://longbridge.com/en/quote/516620.CN.md)
- [KBAB.US](https://longbridge.com/en/quote/KBAB.US.md)
- [159805.CN](https://longbridge.com/en/quote/159805.CN.md)
- [09988.HK](https://longbridge.com/en/quote/09988.HK.md)
- [GOOGL.US](https://longbridge.com/en/quote/GOOGL.US.md)
- [GOOG.US](https://longbridge.com/en/quote/GOOG.US.md)
- [BIDU.US](https://longbridge.com/en/quote/BIDU.US.md)
- [09888.HK](https://longbridge.com/en/quote/09888.HK.md)
- [89988.HK](https://longbridge.com/en/quote/89988.HK.md)
- [HBBD.SG](https://longbridge.com/en/quote/HBBD.SG.md)
- [89888.HK](https://longbridge.com/en/quote/89888.HK.md)

## Related News & Research

- [ChatGPT is no longer OpenAI's most important product. Here's why.](https://longbridge.com/en/news/288956704.md)
- [OpenAI plans ChatGPT 'superapp' overhaul ahead of listing, FT reports](https://longbridge.com/en/news/288947616.md)
- [Trump confirms talks with OpenAI on public AI profit sharing](https://longbridge.com/en/news/288935610.md)
- [OpenAI To Give Government Early Access To New AI Models Under Trump Order](https://longbridge.com/en/news/288894421.md)
- [OpenAI Caves to Trump's Order as Govt Gets Sneak Peek of AI Models Before Launch](https://longbridge.com/en/news/288888754.md)