---
title: "Dialogue with Lingji Tang Wenbin: The pure \"world model\" approach is not feasible"
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/279881037.md"
description: "A data cold war regarding embodied intelligence is underway. The Hubei Humanoid Robot Innovation Center and Zhiyuan Robotics have completed the first customized humanoid robot data transaction in China. JD.com plans to establish the world's largest embodied intelligence data collection center, mobilizing over 100,000 employees. South Korea's Robotis has set up a subsidiary in Uzbekistan to build a data factory. Tang Wenbin, founder of Yuanli Lingji, emphasized the need for diversity in data collection, believing that a single world model approach is unlikely to succeed, and advocates for a combination of visual-language-action models"
datetime: "2026-03-20T03:31:21.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/279881037.md)
  - [en](https://longbridge.com/en/news/279881037.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/279881037.md)
---

# Dialogue with Lingji Tang Wenbin: The pure "world model" approach is not feasible

A "data cold war" regarding embodied intelligence is quietly unfolding.

In January of this year, the Hubei Humanoid Robot Innovation Center delivered thousands of hours of training data to Zhiyuan Robotics, completing the first customized humanoid robot data transaction in China.

On the side of industry giants, JD.com recently proclaimed its goal to build the world's largest and most comprehensive data collection center for embodied intelligence, planning to mobilize over 100,000 internal employees and up to 500,000 external personnel to launch an unprecedented "human wave tactic."

Turning our attention overseas, South Korean robotics company Robotis established a subsidiary in Uzbekistan this January, planning to build a massive "data factory" on an 110,000 square meter plot of land to collect robot behavior data.

**Hourly billing for customized transactions, mobilization of hundreds of thousands of people, and setting up factories in Central Asia—this series of initiatives reflects the heavy "data anxiety" within the entire embodied intelligence industry.**

Unlike large language models that grew up in internet corpora, embodied intelligence needs to understand the world and interact with the real world, which raises higher requirements for the authenticity and modality of data.

This is also one of the challenges currently being tackled by Tang Wenbin, founder and CEO of Yuanli Lingji.

**Looking back at his resume, Tang Wenbin is better known as the co-founder and CTO of Megvii Technology, a star unicorn from the last wave of AI.**

In just one year since its establishment, Yuanli Lingji has quietly raised over 1 billion yuan, securing investments from top institutions such as Alibaba, Nio, Junlian, and Qiming.

Currently, Yuanli Lingji has released its first embodied native large model DM0 and has reached a strategic cooperation with Huaqin Technology to achieve mass production and delivery of the data collection robot DOS-W1.

**After experiencing the baptism of the last wave of AI implementation, Tang Wenbin has developed a greater sense of reverence for the industry.**

In a recent dialogue with Wall Street Journal and All Weather Technology, Tang Wenbin shared Yuanli Lingji's data collection approach: not relying on a single source, but implementing distributed collection through a combination of "quality ✖ quantity ✖ diversity" to fill the capability space of robots.

Regarding generating data through world models to enable robots to learn by imitation, Tang Wenbin believes this path is difficult to pursue. He pointed out that a more feasible paradigm is to unify the world model with the VLA (Vision-Language-Action) model, which can not only predict the future world but also deduce the precise actions needed based on that.

As industry players are frantically "stockpiling" data resources in their own ways, which route will ultimately "laugh last" remains to be seen by the market.

The following is a transcript of the dialogue.

## Detailed Explanation of Data Collection

**All Weather Technology: Can you share your data collection approach?**

**Tang Wenbin:** Currently, we are still using an imitation reinforcement learning approach.

Imitation involves simulating data distribution. Our goal is to fill the robot's capability space with as much data as possible, having seen enough things. The core lies in the ability to handle unseen scenarios, and the value of data lies in this, so our data collection is focused on open environments and real scenarios But we hope that while maintaining high-quality data, we can also fill this space as much as possible, so I think data is a combination problem of "quality ✖ quantity ✖ diversity."

**All-weather Technology: How is data collected?**

**Tang Wenbin:** In fact, we do not rely on a single data source, and there is no need to do so; it is basically a combination model. For real machine data, we mainly collect it through various calibrated sensors, including things like exoskeletons, but the collection cost is indeed relatively high.

At the same time, we also collect data through non-embodied and first-person perspectives to form a larger-scale dataset, which is actually a middle ground between real machine and synthetic data.

In addition, there is also internet data that has a lower collection cost.

**All-weather Technology: Can you explain non-embodied collection specifically?**

**Tang Wenbin:** Non-embodied means that it may be a glove or a handheld claw, without a mechanical arm or robotic body, so it is equivalent to just using an end effector. I record the approximate position and state of this end effector, and this data collection method is currently also known as UMI.

Today, we also discuss a lot of first-person perspective data, such as capturing the operation process through glasses, which is also a form of non-embodied collection.

**All-weather Technology: The AI glasses data of each person has privacy, and no one would want to share their glasses data for collection. How do you solve this problem?**

**Tang Wenbin:** Indeed, if I were a user of the glasses, I wouldn't want to share my data with everyone either. However, for training, we can ask some third-party data collectors to record workflows through daily wear of the glasses, and then the data will also be recorded.

Of course, we also hope that the glasses themselves can have more powerful functions, such as having stereoscopic vision and multi-purpose capabilities. In the future, we may also add devices like wristbands and gloves for data collection.

So overall, the objects we collect data from are diverse. **The first category is the robot itself, which can be remotely operated; the second category is non-embodied devices like claws, which are "human body + robotic end" devices; the third category is completely focused on human data collection; and the fourth category is descriptions of the physical world.**

**All-weather Technology: For example, in the end sensors, is the main data collected force data?**

**Tang Wenbin:** Not just force; we also hope the data is multimodal, such as including increased perspectives.

In practice, because the arm may block some data, we can equip a camera at the eye position, and there may also be two cameras on each wrist, forming multi-perspective data.

**All-weather Technology: Will this collection cost be very high?**

**Tang Wenbin:** This is actually a complex issue of data quality, quantity, and diversity. If we need to collect data from all modules, the cost will become very high. Therefore, we adopt a distributed collection strategy; for some data, we will try to ensure its completeness, while for other data, in order to reduce costs, increase quantity, and improve speed, we may not focus so much on completeness This is a matter of trade-offs. We have our own collection tools and collaborate extensively with other industries.

**All Weather Technology: In February this year, you collaborated with Huaqin Technology to launch a data collection robot. Can you share some details about this robot?**

**Tang Wenbin:** This robot is mainly used in research scenarios and is somewhat similar to the ALOHA robot, which is also being developed by peers. (Note: ALOHA stands for "A Low-cost Open-source Hardware system for bimanual teleOperation," a low-cost open-source hardware system for bimanual remote operation.)

However, there are two major pain points with current market data collection robots.

On one hand, there is reliability; the product performance is indeed unsatisfactory. For example, frequent failures can negatively impact research work and reduce work efficiency.

Currently, we cannot ensure long-term stability of the product, so our improvement point is to simplify the repair process and design a modular and detachable product structure. Once a component is damaged, users can quickly replace it. For instance, many connection points are not screws but knobs, so it might take just 30 seconds to fix;

On the other hand, costs are still relatively high, so we designed a product similar to ALOHA through our collaboration with Huaqin, supporting master-slave and drag-and-drop operations. The core aspect is that it can be repaired quickly and is cost-effective. (Note: Master-slave refers to a person controlling the master arm to achieve real-time remote control of the slave arm, with zero latency replication of actions, thus enabling low-cost, high-precision data collection for dual-arm fine operations.)

**All Weather Technology: Have peers purchased this robot to collect data?**

**Tang Wenbin:** Yes, actually the pain points in the industry are quite consistent, so everyone tends to buy products from peers to use in combination.

## The World Model Route is Not Feasible

**All Weather Technology: Can you talk about your views on world models and VLA?**

**Tang Wenbin:** Here, we need to distinguish two points: understanding the world and generating the world are different.

The large model capabilities we are discussing today are generally focused on their ability to understand the world. The world model is actually trying to predict the future, that is, predicting what the next frame might look like, while VLA essentially interacts with the world.

These models have commonalities but can solve problems from different angles.

We believe the best strategy is to combine them. Only in this way can we truly understand and generate content, as well as understand and interact with the world.

Theoretically, if we can predict the future world, we can infer how we should operate. If we know how to operate, it means we can predict future developments.

Thus, in our current technical framework, the world model and VLA are unified; we hope for a model that can both understand this world and predict what comes next.

In this way, the model can not only execute actions but also predict how the world will change after executing those actions **All Weather Technology: Is the technical framework of the industry different from yours?**

**Tang Wenbin:** Indeed, currently some companies advocate using only world models. There is a viewpoint that generating data through world models allows robots to learn by imitation, thus creating an infinite data source.

**However, I personally believe this path is not feasible, because if the world model has already been realized, then the problem of generation has already been solved, and there is no need for everyone to train robots with generated data.**

**Another path is what we and many peers are doing, which is to predict future world models and then deduce the required actions based on this model. This method involves first predicting future scenarios or world states, and then calculating the corresponding action sequences. This paradigm is actually what I just mentioned about the combined unified model framework.**

**All Weather Technology: From a scenario perspective, given the current high level of automation in factory production lines, will robots have no place in factories?**

**Tang Wenbin:** Indeed, the current automation solutions in factories are quite mature. But what we want to solve are problems that could not be solved before, or problems that were very costly to solve.

However, many automated production lines that people see do not have such high requirements for generalization, meaning they do not require generalization of objects, environments, and tasks. For example, there may only be a few SKUs, and external environmental conditions like lighting have already been adjusted.

The problems that cannot be solved currently are actually the diversification of objects, the ever-changing environments, and possibly many different tasks.

Taking the logistics scenario as an example, the main work of robots now is handling tasks, but they have not done well in manual operations, as this requires high generalization.

For instance, if you purchase a bottle of cola and a bag of chips, the operator will package the cola and chips separately. Due to the wide variety of products and constantly changing environmental conditions, this is actually very difficult to solve with automated equipment.

There is also the packaging scenario; for example, with bottled shower gel, when we receive the product, we find that the bottle neck is wrapped with a plastic film to prevent leakage.

In actual operations, it is usually the case that operators, based on experience, wrap the cling film and then place it in a foam bag, and label the seal, which cannot be completed by automated equipment.

We are currently mainly making some attempts in logistics and industry.

**All Weather Technology: Are you inclined to focus on concentrated development in specific scenarios, or do you want to expand simultaneously across multiple scenarios?**

**Tang Wenbin:** This needs to be discussed from two ends. Observing the development of large models, especially the latest progress, we can see a common trend. If we only build a model in a vertical field now, it will not achieve a truly generalizable model, which is not feasible.

Therefore, **from the perspective of the model, we must firmly pursue generalization and seek more universal technical capabilities.**

**But from the perspective of applying scenarios on the ground, we indeed need to implement one scenario at a time.** So we often emphasize two core points for product implementation internally. First, our solutions must be able to form a closed loop, meaning they should address all issues and anomalies in the client's business and meet all process requirements. Second, we need to ensure that costs are controllable, making clients feel that the cooperation is worthwhile.

Only when these two prerequisites are met will clients consider scaling up the application of our products.

Therefore, for every scenario we implement, we must clearly understand the client value and ensure that both points can be realized. This is a process of placing orders year by year.

Internally, we describe this process as the relationship between model development and application implementation, which exists at a 45-degree angle, meaning they are related but not absolutely correlated.

Of course, our model needs to develop towards that universal direction.

## Have Respect for Scenarios

**All Weather Technology: So you advocate for a universal robot approach?**

**Tang Wenbin: Personally, I believe the model has universality, but hardware is difficult to achieve.**

In fact, our hands are very flexible; one can perform fine operations while also lifting 20 pounds, and even more impressively, can lift 50 pounds.

However, due to the limitations of physics and materials science, a robotic arm that can lift 2 kilograms is definitely different from one that can lift 20 kilograms, as their power densities are different.

Therefore, we believe that if you adopt a universal design and apply it to specific scenarios, it is easy to find that it is either under-designed or over-designed.

Under-design means that weight limits may not be met, or the installation space for sensors is too narrow, leading to unresolved issues; it may just barely work, but that could be over-design, making the price very high.

Take wheeled dual arms as an example; when the center of gravity is high, it runs faster. But once it speeds up, it becomes difficult to stop, otherwise, it will fall.

At this point, we may find that in certain scenarios, remaining stationary might be a better choice, allowing moving vehicles to deliver items.

Thus, there may be issues of over-design in these situations.

Our internal logic is to make the model universal and adaptable to different hardware platforms.

**All Weather Technology: So now investors are more focused on your capabilities in modeling?**

**Tang Wenbin: Yes, our team's uniqueness lies in not only engaging in the research and development of robotic scenarios but also deeply understanding the model. We have accumulated rich experience in the logistics field at Megvii and have a certain scale, so we have a deep understanding of the product, and we have a group of professionals focused on model optimization.**

**All Weather Technology: Because many companies within a specific industry may have a better understanding of the demands of that industry, but you started with modeling; will your understanding of scenario demands be relatively weak?**

\*\*Tang Wenbin: Actually, we did a lot of scenarios when we were at Megvii, so I think we are quite educated people.

This is actually a mindset issue; the robotics industry needs two groups of people: one group understands technology better, and the other understands scenarios better. We are actually standing in the middle In fact, those who focus solely on technology tend to make many assumptions about scenarios, thinking it’s just these things. But the devil in real scenarios is hidden in the details. For example, when problems arise, the production process cannot stop, so there must be a complete exception handling process.

Therefore, those working in technology must have a sense of reverence for the scenarios.

However, there are also many issues within the industry. Historically, many colleagues have had two states regarding technology: initially believing that technology can do anything, but once it involves AI intelligence, they expect you to solve all problems. However, when they find that certain problems cannot be solved, they become extremely disappointed and choose to revert to traditional, rule-based methods.

But today, the development of models is neither omnipotent nor completely ineffective; it is in an intermediate stage with a high slope, in a state of rapid development.

Thus, we need people who can both judge scenarios and understand algorithms and their development speed. At the same time, we need someone to design how to tackle current problems so that projects can be launched quickly.

All the work we are engaged in today is essentially about meeting demands. We will certainly have limitations in our vision.

Therefore, I advocate for broad learning and multi-angle observation, but we should also have our own standards for judgment, choosing those scenarios that can survive sustainably.

**All Weather Technology: How do you position your target customer group? Is it robot companies or scenario application parties?**

**Tang Wenbin:** Actually, it’s still the scenario application parties.

To be frank, whether domestically or internationally, the models used by peers are not very mature. Therefore, today, no one has reached a state where models can be directly deployed to robot company equipment and used after simple training.

I believe that in the case of immature models, vertical integration is necessary to achieve scenario landing applications.

If we cannot handle this scenario ourselves but expect partners and customers to solve it, it is undoubtedly a beautiful fantasy. I believe that one day, we may create some vertical scenarios ourselves, and more scenarios can be completed through an open platform in collaboration with our partners. They can use our hardware or just our brains to explore more possibilities independently.

**All Weather Technology: So this is the reason for your model open-source, hoping more people can join in?**

**Tang Wenbin:** Open-source has two considerations. First, we hope more people use our framework and models so that everyone can explore more application scenarios together and promote the landing of technology. Second, although the current industry is very hot, the overall maturity of models is still at a primary stage, and promoting mutual communication and progress is crucial.

**All Weather Technology: You previously mentioned that the core goal for 2026 is to deploy 1,000 sustainable operating devices for each scenario. Can you share the progress on this goal?**

**Tang Wenbin:** This may not be achieved until the second half of the year for continuous operation. Currently, we are still conducting POC testing We are quite confident in the potential for mass implementation in our own scenarios.

In fact, to ensure that robots can operate continuously, we must find fault-tolerant mechanisms. Frankly speaking, the current model-driven approach cannot achieve 100% accuracy.

What if the task fails? This question must have an answer. We need to explore ways to take over tasks so that failed tasks can be recovered. At the same time, we need to assess the impact of such failures on the business and determine whether this impact is acceptable.

After implementing a fallback plan, we also need to confirm the ROI of the entire system.

**All Weather Technology: Speaking of ROI, will customers directly ask how much money you can help save on the production line?**

**Tang Wenbin:** Customers usually ask us how long it will take to break even.

**If a project takes more than five years to break even, then it's not worth doing.**

**If it is expected to break even within two to three years, then we should proceed immediately.** In the current B2B environment, most of our decisions are based on rational analysis, calculating how much efficiency we can actually improve for the customer. For example, robots can extend the operating time of certain production processes, utilize existing equipment more efficiently, and bring value to customers.

**All Weather Technology: Can you give a sneak peek into the upcoming model updates?**

**Tang Wenbin:** This year, our core topic will focus on generalization.

**All Weather Technology: You just started a company last year to work on embodied intelligence models. Do you think it's too late?**

**Tang Wenbin:** In fact, we wanted to create a general-purpose robot many years ago, but we felt that the technology was not mature at that time. However, with the development of large models like DeepSeek, I have indeed become more confident about this matter.

**All Weather Technology: If you had to give one keyword for the embodied intelligence industry in 2026, what would it be?**

**Tang Wenbin:** I would like to give two keywords: one is the enhancement of model capabilities, and the other is the continuous operation of scenarios.

I believe the current models are still in the early stages, but they are developing rapidly, so we need to work hard to improve the algorithmic capabilities of the models, including enhancements in object and environmental adaptability and task generalization. The generalization ability of the model is crucial. Secondly, regarding the application of scenarios, I believe that a simple POC is not very meaningful; it is just a starting point. The focus should be on how to operate continuously in real scenarios, and this year is indeed the right time.

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investing based on this is at your own risk

### Related Stocks

- [KITTW.US](https://longbridge.com/en/quote/KITTW.US.md)
- [399283.CN](https://longbridge.com/en/quote/399283.CN.md)
- [00370.HK](https://longbridge.com/en/quote/00370.HK.md)
- [KITT.US](https://longbridge.com/en/quote/KITT.US.md)
- [SERV.US](https://longbridge.com/en/quote/SERV.US.md)
- [MCRP.US](https://longbridge.com/en/quote/MCRP.US.md)

## Related News & Research

- [Unitree debuts US$574,000 ‘mecha’ robot that ‘transforms’ from 2 legs to 4](https://longbridge.com/en/news/286076045.md)
- [What Serve Robotics (SERV)'s Q1 Losses And Diligent Deal Mean For Shareholders](https://longbridge.com/en/news/285837716.md)
- [Figure AI had one of its robots race a human to sort packages. It lost.](https://longbridge.com/en/news/286883540.md)
- [ElliQ is a surprisingly helpful companion robot for older adults](https://longbridge.com/en/news/286673621.md)
- [Figure AI's robots can make a bed faster than you](https://longbridge.com/en/news/285805765.md)