SenseTime releases NEO architecture, expected to be the industry's first to achieve deep integration of native multimodal architecture

AASTOCKS
2025.12.03 03:42

SenseTime Technology (00020.HK) announced the official release and open-source of the new multimodal model architecture "NEO," developed in collaboration with the S-Lab of Nanyang Technological University. It is expected to be the industry's first available native multimodal architecture (Native VLM) that achieves deep integration. Starting from fundamental principles, NEO is designed with innovation specifically for multimodality, achieving deep integration at the core architecture level, resulting in a breakthrough in performance, efficiency, and versatility. This lays a new architectural foundation for the SenseNova multimodal model and marks the entry of AI multimodal technology into a new era of "native architecture."

SenseTime stated that the NEO architecture is centered around extreme efficiency and deep integration. Through fundamental innovations in three key dimensions: attention mechanisms, positional encoding, and semantic mapping, the model inherently possesses the ability to unify the processing of visual and language data. Additionally, with the innovative Pre-Buffer & Post-LLM dual-stage integration training strategy, NEO can absorb the complete language reasoning capabilities of the original LLM while building strong visual perception capabilities from scratch, addressing the issue of impaired language abilities in traditional cross-modal training.

SenseTime aims to drive the development of NEO into a scalable and reusable next-generation AI infrastructure through open-source collaboration and scenario implementation, thereby promoting the industrial application of native multimodal technology from the laboratory to widespread use