--- title: "According to reports, Alibaba's AI video large model HappyHorse is expected to be released in a week" type: "News" locale: "en" url: "https://longbridge.com/en/news/282285619.md" description: "Alibaba-W's AI video large model HappyHorse is expected to be officially released in a week. This model ranks first in the ArtificialAnalysis video arena leaderboard with an Elo score of 1333, achieving win rates of 80% and 60.9%. HappyHorse is developed by a team led by former Kuaishou-W Vice President Zhang Di, has been announced as open-source, supports audio and video generation in multiple languages, with a parameter count reaching 15 billion, and can generate a 5-second 1080p video in about 38 seconds" datetime: "2026-04-10T03:16:19.000Z" locales: - [zh-CN](https://longbridge.com/zh-CN/news/282285619.md) - [en](https://longbridge.com/en/news/282285619.md) - [zh-HK](https://longbridge.com/zh-HK/news/282285619.md) --- # According to reports, Alibaba's AI video large model HappyHorse is expected to be released in a week Alibaba-W (09988.HK) AI video large model HappyHorse has been internally launched on Alibaba Baichain, and it is expected to be officially released to the public in a week, according to domestic media reports. The model ranks first in the ArtificialAnalysis video competition arena leaderboard with a score of 1333 Elo, achieving an 80% win rate against OVI1.1 and a 60.9% win rate against LTX2.3, making it the highest-ranked open-source video generation model globally. It is reported that the model was developed by a team led by Zhang Di, former Vice President of Kuaishou-W (01024.HK), and has officially announced its open-source status. In the fields of text-to-video (without audio) and image-to-video (without audio), it surpasses Seedance2.0 and Kegling 3.0, slightly leading in text-to-video (with audio), and is on par with Seedance2.0 in image-to-video (with audio). According to the information, HappyHorse1.0 is currently the world's first open-source video large model that natively supports the joint generation of audio and video, with 15 billion parameters, utilizing a 40-layer unified self-attention Transformer architecture. Generating a 5-second 1080p video on a single H100 takes approximately 38 seconds, and it natively supports lip-syncing in seven languages: English, Mandarin, Cantonese, Japanese, Korean, German, and French, with the lowest word error rate among similar open-source models ### Related Stocks - [BABA.US](https://longbridge.com/en/quote/BABA.US.md) - [09988.HK](https://longbridge.com/en/quote/09988.HK.md) - [BABX.US](https://longbridge.com/en/quote/BABX.US.md) - [KBAB.US](https://longbridge.com/en/quote/KBAB.US.md) - [BABO.US](https://longbridge.com/en/quote/BABO.US.md) ## Related News & Research - [Zhipu Raises AI Model Prices 8%--17% As GLM-5.1 Launches](https://longbridge.com/en/news/282048810.md) - [SGTech launches AI-Ready initiative for SMEs](https://longbridge.com/en/news/281485464.md) - [Napster is Evolving in the AI Era](https://longbridge.com/en/news/281749361.md) - [Fireworks AI CEO explains why AI's infrastructure can't keep up with rampant demand](https://longbridge.com/en/news/281798253.md) - [Redwood AI Launches Secure Cloud Version of Reactosphere for High-Security Markets](https://longbridge.com/en/news/282258719.md)