---
title: "Soochow Securities Co., Ltd.: The embodied intelligence industry faces challenges, and it is recommended to focus on companies that are building embodied intelligence datasets"
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/243224560.md"
description: "Soochow Securities Co., Ltd. released a research report pointing out that the embodied intelligence industry faces challenges such as high costs of real data collection and insufficient standardization of simulated data. It is recommended to pay attention to companies that are laying out embodied intelligence datasets, such as NJEC, Haitianruisheng, Suochen, and Huaru. High-quality datasets are crucial for the environmental perception and task execution capabilities of intelligent agents, and in the future, there will be a large mix of real data and high-quality synthetic data"
datetime: "2025-06-05T08:01:04.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/243224560.md)
  - [en](https://longbridge.com/en/news/243224560.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/243224560.md)
---

# Soochow Securities Co., Ltd.: The embodied intelligence industry faces challenges, and it is recommended to focus on companies that are building embodied intelligence datasets

According to the Zhitong Finance APP, Soochow Securities has released a research report stating that data is the key to driving rapid breakthroughs and practical applications of embodied intelligence technology. It is recommended to focus on companies that are laying out embodied intelligence datasets, including NJEC (600699.SH), Haitianruisheng (688787.SH), Suochen (688507.SH), and Huaru (300302.SZ). The report points out that high-quality datasets can accelerate the training of environmental perception and task execution capabilities of intelligent agents, but the industry currently faces challenges such as high costs of real data collection and insufficient standardization of simulated data.

## The main points of Soochow Securities are as follows:

**Data is the key to driving rapid breakthroughs and practical applications of embodied intelligence technology**

Drawing on the development path of autonomous vehicles, data is equally crucial for embodied intelligence. High-quality datasets can drive intelligent agents to perceive and understand their environment, accelerate the training and deployment of embodied intelligence models, and help robots effectively complete complex tasks. Unlike large language models that can use vast amounts of information from the internet as training data, embodied intelligence models used by robots do not have readily available data and require significant time and resources for practical robot operations or simulation to collect multi-source heterogeneous data such as vision, touch, force, motion trajectories, and the state of the robot itself. Datasets that meet general standards and are validated have become a rigid demand in the embodied intelligence industry. Currently, there are various forms of embodied intelligence, and the application scenarios are diverse, leading to a more varied demand for training data. Some datasets in the industry still focus mainly on specific robots, specific scenarios, and specific skills, which need to be improved in overall generality. Therefore, constructing high-quality and diverse perception datasets is an indispensable foundational task. These datasets not only provide rich materials for algorithm training but also serve as benchmark reference standards for evaluating embodied performance.

**Embodied intelligence data is mainly divided into two categories based on collection methods: real data and simulated data**

(1) Real data: Real data is collected in real-time by intelligent agents through various sensors (such as cameras, microphones, tactile sensors, etc.) on their physical bodies while interacting with the real physical environment. The main sources of real data include: remote operation of robots (obtaining operational data in real scenarios through manual remote control) and motion capture (recording human behavior patterns in specific environments). (2) Simulated data: Data generated in virtual environments using computer simulation technology for training embodied intelligence. This involves constructing virtual scenes, objects, and intelligent agents to simulate the interaction process between intelligent agents and virtual environments to produce data. In other words, training data is generated using simulation environments. Real data and simulated data are complementary, and in the future, training will heavily mix real data with high-quality synthetic data.

**Currently, most embodied intelligence data is self-collected by manufacturers, and there are abundant open-source datasets**

Currently, high-quality data collected for humanoid robots is usually obtained in the real world, with collection methods mainly including direct contact data (real machine data) and indirect contact data (manually controlled data). The ideal data collection method is to allow humanoid robots to directly interact with the physical world, enabling them to accurately understand the real environment The cost of collecting large-scale real machine data is high, requiring significant investment in human, material, and time resources. There are thresholds for data labeling and collection equipment. Currently, there are abundant high-quality open-source datasets for embodied intelligence on the market, such as those released by Zhiyuan, Google, and the National and Local Collaborative Center, which provide rich datasets for embodied intelligence with a variety of demonstration quantities, scene tasks, and action skills.

**Robot simulation data mainly relies on virtual scenes, and the scene synthesis scheme can be broken down into two key parts: Scene Generation (Gen) and Simulation (Sim)**

The scene generation engine (Gen) mainly has two technical paths: synthetic video + 3D reconstruction: based on pixel flow driving, first generating video or images, then reconstructing them into unstructured 3D data such as point clouds or meshes, and finally converting them into structured semantic models. Examples include Hillbot, Qunkex Technology, and World Labs (Li Feifei). AIGC directly synthesizes 3D data: using methods such as Graph Neural Networks (GNN), Diffusion models, and Attention mechanisms to directly synthesize structured spatial data. Representative models include ATISS, LEGO-Net, DiffuScene, and RoomFormer, with some schemes combining procedural generation techniques, such as Infinigen (CVPR 2024).

**Risk Warning:** Relevant policies may not meet expectations, IT budgets for various types of enterprises may fall short of expectations, and market competition may intensify

### Related Stocks

- [600699.CN](https://longbridge.com/en/quote/600699.CN.md)
- [301302.CN](https://longbridge.com/en/quote/301302.CN.md)
- [688787.CN](https://longbridge.com/en/quote/688787.CN.md)
- [688507.CN](https://longbridge.com/en/quote/688507.CN.md)
- [601555.CN](https://longbridge.com/en/quote/601555.CN.md)
- [GOOGL.US](https://longbridge.com/en/quote/GOOGL.US.md)
- [GOOG.US](https://longbridge.com/en/quote/GOOG.US.md)

## Related News & Research

- [Ningbo Joyson files HKEX next-day return disclosing share repurchase at HKD 16.97](https://longbridge.com/en/news/287050407.md)
- [Europe-China spacecraft launches to study Earth's 'invisible armour'](https://longbridge.com/en/news/286881963.md)
- [US Senator Durbin urges RFK. Jr to resist easing vape rules](https://longbridge.com/en/news/287079463.md)
- [ZAWYA: KIB Group, represented by KIB Invest, acts as Joint Lead Manager in landmark $700mln Sukuk issuance by First Abu Dhabi Bank](https://longbridge.com/en/news/287054139.md)
- [Lytica Launches Supplier Intelligence to Modernize Buyer-Supplier Dynamics in Electronics Procurement](https://longbridge.com/en/news/286921721.md)