
How did SenseTime achieve the 'escape velocity' in the implementation of large models?


When SenseTime's Chairman and CEO Xu Li unveiled the newly upgraded "SenseNova 5.0" large model system against a backdrop with distinct Chinese aesthetics, it signaled that SenseTime has become the first company to achieve a full-stack layout spanning cloud, edge, and device. Against this very backdrop, the words "AI Large Model Era II" were prominently displayed.
This naturally raises the question: How did SenseTime achieve the "escape velocity" for large model deployment, surpassing the performance of GPT-4 Turbo?
If we take a comprehensive look at SenseNova 5.0 and the powerful computing infrastructure behind SenseTime, this achievement should come as no surprise.
As Xu Li stated, "Guided by scaling laws, SenseTime will continue to explore the KRE three-layer architecture (Knowledge-Reasoning-Execution) of large model capabilities, constantly pushing the boundaries of what large models can achieve." This principle may hold the key to understanding the internal logic behind SenseTime's "escape velocity" in large model deployment.
SenseNova 5.0 Outperforms GPT-4 Turbo Across the Board
Since its official launch in April last year, the SenseNova large model system has undergone five major version upgrades. The latest iteration is trained on over 10TB of tokens, covering extensive synthetic data, and employs a mixture-of-experts architecture. During inference, its context window can effectively reach around 200K, with significant enhancements in knowledge, mathematics, reasoning, and coding capabilities—fully benchmarking against GPT-4 Turbo and matching or surpassing it in mainstream objective evaluations.
Thanks to these updates, SenseNova 5.0 has achieved qualitative improvements in its "liberal arts capabilities," "STEM capabilities," and multimodal abilities.
For example, when answering a fun reasoning question—"Mom made Yuan Yuan a cup of coffee. Yuan Yuan drank half, then filled it with water. She drank half again, refilled it with water, and finally finished it all. Did Yuan Yuan drink more coffee or water?"—SenseNova 5.0 answered correctly, while GPT-4 got it wrong.
These enhanced capabilities allow SenseNova 5.0 to better summarize and answer questions in Chinese contexts, supporting applications in education, content creation, and other industries.
Meanwhile, the significant improvements in SenseNova 5.0's mathematical, coding, and reasoning abilities provide robust support for applications in finance, data analysis, and similar fields.
Beyond "liberal arts" and "STEM" capabilities, SenseNova 5.0 also excels in multimodal performance. It supports high-definition long-image parsing and understanding, interactive text-to-image generation, complex cross-document knowledge extraction, summarization, Q&A presentation, and rich multimodal interaction capabilities.
SenseTime's multimodal large model leads globally in image-text perception, ranking first in the comprehensive benchmark MMBench and achieving top scores in renowned multimodal evaluations like MathVista, AI2D, ChartQA, TextVQA, DocVQA, and MMMU.
Clearly, SenseNova 5.0's outstanding performance in "liberal arts," "STEM," and multimodal capabilities lays a solid foundation for advancing large model applications. Not only does it match or surpass GPT-4 Turbo in subjective evaluations, but it also empowers more local enterprises in China to embrace the opportunities of the large model era.
Thus, if we seek the internal logic behind SenseNova 5.0's "escape velocity," its well-rounded development and exceptional multimodal performance are undoubtedly key factors worth noting.
Full-Stack Cloud-Edge-Device Layout: SenseTime Builds a Large Model Product Matrix
As the AI era unfolds, with centralized computing demands expanding to edge devices and growing enterprise-level edge AI needs, only efficient cloud-edge-device collaboration can truly facilitate large model deployment.
Recognizing this, SenseTime has pioneered a full-stack large model product matrix covering "cloud, edge, and device," including the "SenseTime On-Device Large Model" for terminal devices and the "SenseTime Enterprise Large Model Appliance" for sectors like finance, coding, healthcare, and government.
Reportedly, SenseNova's on-device large language model achieves the industry's fastest inference speed, averaging 18.3 tokens per second on mid-range platforms and reaching 78.3 tokens per second on flagship platforms.
Its diffusion model also delivers the fastest on-device inference speed. The on-device LDM-AI image expansion technology completes inference in under 1.5 seconds on mainstream platforms—10x faster than competitors' cloud apps—while supporting 12-megapixel+ high-definition output and rapid image editing features like proportional expansion, free expansion, and rotation expansion.
Notably, to meet the growing edge AI demands in finance, coding, healthcare, and government, SenseTime has launched an enterprise large model appliance supporting accelerated inference for enterprise-scale trillion-parameter models and hardware-accelerated knowledge retrieval. It enables plug-and-play localized deployment, lowering the barrier to enterprise adoption. Compared to industry peers, it reduces inference costs by 80%, significantly speeds up retrieval, and cuts CPU workload by 50%.
Thanks to SenseTime's full-stack cloud-edge-device layout, AI large models can now reach more enterprises, maximizing their ability to meet diverse needs.
For instance:
In office productivity, SenseTime's "SenseNova" powers WPS 365 with superior code generation and tool invocation, creating a next-gen productivity platform that unlocks scenario-specific capabilities and builds customized "enterprise brains."
In finance, Haitong Securities and SenseTime jointly released a multimodal, full-stack financial industry large model, driving applications in smart customer service, compliance, risk control, coding assistance, and business office assistants, while co-developing cutting-edge scenarios like robo-advisors and sentiment monitoring—fully enabling securities industry large model deployment.
In mobility, Xiaomi's Xiao Ai leverages SenseTime's cloud-device large model solution to deliver intelligent interaction for car owners.
As SenseNova 5.0's full-stack cloud-edge-device deployment deepens, we can expect more enterprises to rapidly adopt AI applications with SenseTime's support, embracing the dividends of the AI era.
Computing Power Backing: SenseTime Finds Its Path Under "Scaling Laws"
Whether it's SenseNova 5.0's comprehensive upgrade or SenseTime's full-stack cloud-edge-device layout, none would be possible without the support of SenseTime's computing infrastructure.
As Xu Li noted, SenseTime continuously seeks optimal data ratios and establishes data quality evaluation systems, advancing its own large model R&D while providing partners with training, fine-tuning, deployment, and generative AI services.
At the end of the tech exchange event, Xu Li showcased three fully AI-generated videos, emphasizing the controllability of characters, actions, and scenes in text-to-video platforms.
SenseTime has made breakthroughs in text-to-video technology. Soon, inputting text or a detailed description will generate videos where clothing, hairstyles, and scenes remain consistent per preset parameters, ensuring coherence.
Clearly, SenseTime's text-to-video is already on the horizon.
It's fair to say SenseTime has found its path under "scaling laws."
This new trajectory enables continuous upgrades to SenseNova 5.0, a full-stack cloud-edge-device ecosystem, and the ability to meet enterprises' evolving AI demands.
Thus, if we seek the internal drivers behind SenseTime's "escape velocity," the formidable backing of its AI computing centers is another critical factor.
Conclusion
From SenseNova 5.0's knowledge, math, reasoning, and coding capabilities—matching or surpassing GPT-4 Turbo in mainstream benchmarks—to pioneering full-stack cloud-edge-device deployment, deep partner empowerment, and embracing AGI, SenseTime has undeniably achieved "escape velocity" in large model deployment.
With SenseNova 5.0 outperforming GPT-4 Turbo and understanding Chinese consumers and enterprises better than GPT-4 Turbo, SenseTime is poised to overtake competitors as "scaling laws" become clearer, driving AI into more scenarios and achieving full synergy across algorithms, computing, data, applications, and use cases. $SENSETIME-W(00020.HK)
—End—
Author: Meng Yonghui, Senior Writer, Columnist, Industry Observer, and Influencer.
The copyright of this article belongs to the original author/organization.
The views expressed herein are solely those of the author and do not reflect the stance of the platform. The content is intended for investment reference purposes only and shall not be considered as investment advice. Please contact us if you have any questions or suggestions regarding the content services provided by the platform.

