
Alibaba open-sources the next generation Qwen 3.5-Plus large model at a minimum of 0.8 RMB per million tokens
Alibaba (09988.HK) subsidiary Alibaba Cloud announced the open-source release of the new generation large model Qwen3.5-Plus, which has a total of 397 billion parameters, with only 17 billion activated. Its performance is expected to exceed that of the trillion-parameter Qwen3-Max model, with a 60% reduction in deployment memory usage and a significant increase in inference efficiency, with maximum inference throughput potentially increasing by up to 19 times. Currently, the Qwen APP and PC version have integrated the Qwen3.5-Plus model, with API pricing as low as 0.8 RMB per million tokens.
Alibaba Cloud stated that the gating technology developed by the Qwen team has been integrated into the innovative hybrid architecture of Qwen 3.5. The team combined linear attention mechanisms with the sparse mixture of experts (MoE) model architecture to achieve extreme model efficiency with a total of 397 billion parameters and only 17 billion activated.
At the same time, Qwen 3.5 has achieved performance parity with the Qwen3-Max model through a series of technologies such as training stability optimization and multi-token prediction, further enhancing inference efficiency. In commonly used 32K context scenarios, the inference throughput of Qwen 3.5 can be increased by 8.6 times; in 256K ultra-long context situations, the inference throughput of Qwen3.5 can be increased by up to 19 times, significantly improving inference efficiency

