06.09 MiMo × TileRT, if you want to get rich, build roads first.

portai
I'm LongbridgeAI, I can summarize articles.

Today, MiMo × TileRT jointly released the UltraSpeed mode for Xiaomi MiMo-V2.5-Pro. Through extreme co-design of the model and system, the generation speed of the trillion-parameter model on general-purpose GPUs has for the first time broken through 1000 tokens/s. Seeing this news while the stock price remains low, I was jolted up from my sickbed, but still quite heartened. It feels like I have the motivation to write again.

Benefiting the Edge: Phones, AIoT First to Feel the Change

The most likely to benefit quickly are small devices like phones and AIoT, which are constrained by hardware and cannot run large models locally.

Take smart security cameras as an example. Currently, edge-side lightweight models take about 200ms to complete one face comparison, with limited accuracy. While calling cloud-based large models for auxiliary analysis is more accurate, the response usually takes 2-3 seconds, causing noticeable lag for users. Under UltraSpeed, a typical 200-token request to the cloud only needs about 0.2 seconds, accuracy goes up a notch, while latency drops by an order of magnitude.

There are many similar scenarios: face verification for smart door locks, multi-turn conversations for smart speakers, long sentence processing for translators. When the response speed of cloud-based large models approaches the magnitude of edge-side inference, a key economic tipping point is broken — the layered architecture of "edge-side small models handle high-frequency simple tasks + cloud-based large models handle complex tasks in real-time" changes from an optional solution to the optimal one.

This is especially significant for Xiaomi's "People-Car-Home" ecosystem: these scenarios currently largely remain at the stage of edge-side rule orchestration, with a low level of intelligence. When every terminal device can call cloud intelligence at a speed close to local, the overall intelligence level of the entire ecosystem will be elevated a notch.

Benefiting Autonomous Driving: Mainly Affects Production Side for Now

For real-time driving on the vehicle side: It cannot be used directly yet, as autonomous driving requires latency at the 80-100ms level and must work stably without network connectivity. UltraSpeed's 0.2-second response is still not enough; on-vehicle inference remains irreplaceable. Tesla, with its self-developed AI5 chip running large models directly on the vehicle, has a hardware advantage that currently has no shortcut to bypass.

However, the significance for the production side is relatively large. The difference between Xiaomi and Tesla lies in using the Thor chip, relying more on software optimization and cloud-based large model distillation. UltraSpeed has two characteristics: one is that it increases distillation efficiency by about 10 times, and the other is that it uses general-purpose GPUs, which also brings highly cost-effective cloud iteration speed. In the current stage of hardware procurement and chip competition, advancing autonomous driving levels quickly through software efficiency first is also possible.

Future Trend: To Get Rich, Build Roads First

MiMo's development speed this year is indeed astonishing: API launch at the beginning of the year, version 2.5 in April, a significant price cut in May, and today announcing the trial of the UltraSpeed 1000 tokens/s mode, precisely meeting the hardware ecosystem's rigid demand for AI.

From a trend perspective, possessing the capability for self-developed large models and self-developed chips is the entry ticket for future AI products. Currently, MiMo's optimization is already quite outstanding. If chips can have better collaborative customization, supplemented by OS support in bandwidth and scheduling, the future room for improvement is huge.

AI × Chip × OS is the underlying infrastructure that carries all upper-layer applications and experiences. Many things seem impossible according to current common sense, but when inference costs are low enough and speed is fast enough, after breaking through the tipping point, common sense will be rewritten, and product power and competitiveness will change accordingly.

Xiaomi continues to invest heavily in three directions: self-developed large models, self-developed chips, and OS optimization. Combined with the recent increase in holdings of Kingsoft Cloud to further strengthen cloud infrastructure, it reminds me of the old saying: To get rich, build roads first.

$XIAOMI-W(01810.HK)

The copyright of this article belongs to the original author/organization.

The views expressed herein are solely those of the author and do not reflect the stance of the platform. The content is intended for investment reference purposes only and shall not be considered as investment advice. Please contact us if you have any questions or suggestions regarding the content services provided by the platform.