News

Zhitong

2025.09.12 02:22

On September 12th, according to Xiaomi Technology news, the new generation Kaldi team of Xiaomi Group's AI Lab has released the ZipVoice series of text-to-speech (TTS) models based on the Flow Matching architecture—ZipVoice (zero-shot single-speaker voice synthesis model) and ZipVoice-Dialog (zero-shot dialogue voice synthesis model). ZipVoice addresses the pain points of existing zero-shot voice synthesis models, such as large parameter sizes and slow synthesis speeds, while ZipVoice-Dialog resolves the bottlenecks in stability and inference speed of existing dialogue voice synthesis models