
Alibaba Tongyi Bailing upgraded again, allowing seamless switching between languages, dialects, and emotions in 3 seconds of recording
Alibaba (09988.HK) subsidiary Tongyi announced the upgrade of its Tongyi Bailing model, which allows seamless switching of voice languages, dialects, and emotions with just 3 seconds of recording—Mandarin, Cantonese, Japanese, English, happy, angry, and includes 9 universal languages and 18 dialects. Even in a noisy meeting environment, AI can output text in milliseconds, overcoming tongue twisters, RAP, and background music interference.
Among them, the Fun-CosyVoice3 model has been upgraded, reducing the first package latency by 50%, doubling the accuracy of mixed Chinese and English characters, and supporting 9 languages and 18 dialect accents; Fun-CosyVoice3 (0.5B) is officially open-sourced, providing zero-shot voice cloning capabilities, supporting local deployment and secondary development; Fun-ASR model capabilities have been enhanced, achieving 93% accuracy in noisy scenarios, supporting lyrics and rap recognition, free mixing of 31 languages, dialect accent coverage, and reducing the first word latency of the streaming recognition model to 160ms; Fun-ASR-Nano (0.8B) is open-sourced, a lightweight version of Fun-ASR, with lower inference costs, model open-sourced, supporting local deployment and customized fine-tuning

