---
title: "Step Audio 2.5 TTS released by Jieyue Xingchen redefines the boundaries of voice generation expression"
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/283077310.md"
description: "StepFun has released its next-generation voice generation model StepAudio 2.5 TTS, aiming to break through the limitations of traditional speech synthesis technology and achieve a leap from \"reproducing sound\" to \"creating expression.\" This model possesses three core capabilities: global context control, in-text context control, and zero-shot replication with full timbre control, providing high-quality voice solutions for scenarios such as audiobook production, film dubbing, and intelligent interaction"
datetime: "2026-04-17T02:01:12.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/283077310.md)
  - [en](https://longbridge.com/en/news/283077310.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/283077310.md)
---

# Step Audio 2.5 TTS released by Jieyue Xingchen redefines the boundaries of voice generation expression

PingWest reported on April 17th that StepFun officially launched its next-generation voice generation model StepAudio 2.5 TTS. This model is built for the Agent era, breaking through the limitations of traditional speech synthesis technology, and aims to achieve a leap from merely "reproducing sound" to a deeper "creating expression," endowing the speech synthesis model with the ability to truly understand human intentions.

StepAudio 2.5 TTS has three core capabilities: first, global context control, which supports defining the emotional tone, character state, and scene atmosphere of an entire speech segment through natural language, ensuring coherence and unity in expression; second, in-context control, which can precisely adjust tone, rhythm, pauses, and breathing, delicately portraying the character's psychological activities and subtext; third, zero-shot replication and full timbre control, which retains target timbre characteristics without the need for retraining and allows flexible adjustment of emotion and style.

The model is now fully launched on the StepFun open platform and Step Plan, supporting various interface methods for both non-streaming and streaming speech synthesis, aiming to provide voice solutions with human-level expressiveness for scenarios such as audiobook production, film dubbing, and intelligent interaction.

### Related Stocks

- [002439.CN](https://longbridge.com/en/quote/002439.CN.md)

## Related News & Research

- [Cizzle Brands Corporation Announces the Launch of CWENCH Hydration™ at Save-On-Foods Across Western Canada | CZZLF Stock News](https://longbridge.com/en/news/286559823.md)
- [Discord Voice and Video Calls Now End-to-End Encrypted by Default](https://longbridge.com/en/news/287058186.md)
- [Beyond the Crescendo: Behind-the-Scenes Making of the Legendary 2026 THX Deep Note Spark](https://longbridge.com/en/news/286946656.md)
- [Stability AI releases a new audio model that can create six-minute songs](https://longbridge.com/en/news/287088125.md)
- [22:00 ETProlight + Sound Guangzhou 2026 regresa el 28 de mayo](https://longbridge.com/en/news/286699310.md)