---
title: "Top funds in Silicon Valley collectively bet! Morgan Stanley provides a detailed analysis of the next frontier of AI - \"World Models\""
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/280103970.md"
description: "Morgan Stanley's latest report indicates that the language dividend of large models is reaching its peak, and the next battlefield in the AI arms race is the \"world model\"—enabling machines to truly understand three-dimensional space, physical laws, and the evolution of time. From Waymo's billions of miles of virtual road testing to Microsoft's AI rendering playable versions of \"Quake II,\" applications extend beyond robotics, with the gaming, film, and design industries all facing transformation"
datetime: "2026-03-23T06:17:01.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/280103970.md)
  - [en](https://longbridge.com/en/news/280103970.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/280103970.md)
---

> Supported Languages: [简体中文](https://longbridge.com/zh-CN/news/280103970.md) | [繁體中文](https://longbridge.com/zh-HK/news/280103970.md)


# Top funds in Silicon Valley collectively bet! Morgan Stanley provides a detailed analysis of the next frontier of AI - "World Models"

Large models have brought "language" to this point, with increasingly clear boundaries: they excel at writing, searching, modifying, and programming, but once the questions fall into three-dimensional space, temporal evolution, and physical constraints, existing paradigms begin to struggle. Morgan Stanley is betting the next phase of growth on "world models"—teaching AI to understand, simulate, and make decisions in environments, with applications not only in robotics and autonomous driving but also reshaping digital content industries such as gaming, design, and film production.

According to the Wind Trading Desk, Adam Jonas, a stock analyst from Morgan Stanley's North America team, bluntly stated in a recent report: "**AI is moving beyond language toward models that understand, simulate and navigate the physical world.**" The subtext of this statement is: the next round of competition is not about whose chat is more human-like, but about who can compress the laws of the real world into a usable internal representation and then turn it into an interactive "imagination engine."

**The evidence provided in the report does not rely on visionary narratives but rather on some engineering practices that have already occurred:** Waymo has conducted "billions of miles" of virtual road testing using a world model based on DeepMind Genie 3; Microsoft created a "fully AI-rendered, playable" version of the 1997 game Quake II using Muse; Roblox has also publicly shared its research direction of generating immersive environments with self-developed world models and iterating games using natural language. Major companies are in the game (DeepMind, Meta, Microsoft, Tesla, NVIDIA), and new companies are also competing for talent and funding.

What is even more noteworthy is that Morgan Stanley has focused on two emerging companies in this material: Fei-Fei Li's World Labs leans towards "generating navigable 3D worlds," while Yang Likun's AMI Labs focuses on "learning efficient latent space representations for prediction and reasoning." Behind these two routes lies the same question: in what form should AI "understand the world," and when can this understanding transition from demo to productivity?

## **From Language to Physics: What World Models Need to Supplement is the Hard Shortcomings of LLMs**

The report describes the "physical world" as a more challenging battlefield: constrained by the laws of matter, thermodynamics, fluids, lighting, etc., operating in a constantly changing three-dimensional space. The training targets of LLMs are primarily text and its variants, excelling at white-collar tasks (coding, searching, writing), but when it comes to questions like "What will happen next second?" or "What consequences will my action cause?", the deficiency lies not in the corpus but in the ability to maintain consistent environmental representation and inference over the long term.

Therefore, world models are defined as a type of "internally usable environmental representation": they must not only reproduce what is seen but also be able to roll the state forward and provide different future branches when "action conditions" change—this is the metaphor repeatedly used in the report: AI's "imagination engine."

## **World Models Are Not One Thing: Five Mainstream Approaches Running in Parallel**

Morgan Stanley roughly categorizes current practices into several types (emphasizing that boundaries will gradually blur):

-   **Interactive, Action-Conditioned World Models**: Like "learned game engines," where the environment changes in real-time based on agent actions (e.g., DeepMind Genie).
    
-   **Consistent 3D World Generators**: Emphasizing spatial geometric consistency and the ability to explore from multiple perspectives (e.g., World Labs Marble).
    
-   **Abstract Representation/Non-Generative Models**: Not pursuing pixel-level generation, but rather predicting higher-level latent space structures and dynamics, focusing on efficiency and reasoning (e.g., Meta V-JEPA, AMI Labs).
    
-   **Predictive Generative World Models**: More like "predicting the next frame/next state," used for planning, forecasting, and driving reasoning (e.g., Wayve GAIA, NVIDIA Cosmos's Predict).
    
-   **Physics-Constrained Simulation Data Engines**: Combining world models with simulation/physics engines and data pipelines to produce more "physically consistent" synthetic data for robot training (e.g., NVIDIA Cosmos's Transfer).
    

This classification has a practical significance: although they are all called world models, some pursue "generating a world to explore," while others aim to "compress the world into computable states," resulting in different product forms, computational structures, and commercialization paths.

## **First Landing in Games and Content Production: Replacement Engines Are Tempting, But Not So Fast**

Games are the most "intuitive" use case in the report: world models can generate interactive environments from a small number of prompts, potentially accelerating content production to another level. Microsoft's playable version of "Quake II" made with Muse serves as a strong contrast—no longer relying on traditional engines to render frame by frame, but rather the model predicts each frame based on player input.

However, the video game analyst team at Morgan Stanley (which referenced Matt Cost's framework in the materials) does not offer a romantic view: in the long term, there are two scenarios—**existing giants integrate AI into their toolchains for "adaptation," or they are replaced/seriously disrupted by new paradigms**. Replacement seems simpler because today's models can already "generate playable worlds using natural language";

The difficulty lies ahead: while computational speed and cost may have solutions, issues like "meta-systems, latency" will be more challenging, and problems such as "determinism, memory, updates" may be hard nuts to crack under the world model paradigm. This means that short-term constraints provide a window of opportunity for old players, while long-term threats remain a real concern.

## **Autonomous Driving and Robotics Are More Pragmatic: Virtual Worlds Are First Used to "Supplement Data" and "Think Before Doing"**

The grasp of autonomous driving is clearer: moving the "marginal scenarios" that are dangerous, rare, and expensive in reality to a virtual environment for large-scale testing. The report mentions that Waymo uses a world model based on DeepMind Genie 3 to conduct "billions of miles" of virtual driving tests to train and validate the system's performance in rare edge cases—scenarios that are either hard to encounter on real roads or carry uncontrollable risks.

The logic on the robotics side is more engineering-like: the world model may solve two issues—**training data volume** and **pre-execution reasoning**. The report cites research showing that training robots with data generated from world models can yield results comparable to those trained with real interaction data. However, Morgan Stanley also draws a clear boundary: in the short term, world models and simulation data are more likely to supplement real data pipelines rather than replace them.

**The details that truly hinder progress come from "contact and friction":** The report emphasizes that small physical quantities often overlooked by the outside world are crucial—subtle forces applied by fingers, differences in the old and new states of actuators, slight changes in surface friction and material properties, and even static friction in joints can lead to significant discrepancies in the "simulation to reality" transfer.

## **The hardest challenges are "long-term stability" and "controllability": several hurdles remain before usability**

The report lists the challenges in a very specific and unreserved manner:

-   **Error accumulation and time drift:** The longer the interaction, the higher the probability of object drift, geometric deformation, and deviation from physical rules. The advanced Genie 3 currently only supports "a few minutes" of continuous interaction.
    
-   **Insufficient controllability:** No matter how beautiful the visuals, if the action space is limited to basic movements, the product's value will be constrained.
    
-   **Multi-agent and social dynamics:** Simultaneous interactions among multiple people/vehicles/robots are much more challenging than navigating with a single camera, and DeepMind specifically identifies this as one of the difficulties of Genie 3.
    
-   **Data scale and diversity:** Especially in the robotics field, collecting real sensor data is expensive and slow.
    
-   **Lack of unified benchmarks:** How to quantify the quality of long-term interactions lacks a recognized standard, and progress often relies on demos and task tests for support.
    

These constraints dictate a realistic pace: world models are likely to first spread in the "high fault tolerance, fast iteration" digital content field before gradually penetrating industries that require strict physical consistency.

## Fei-Fei Li's bet: making AI "understand" three-dimensional space

Morgan Stanley places World Labs in a representative position for "generating consistent 3D worlds." The company was founded by Fei-Fei Li and her team in 2023 and will emerge from stealth in 2024; its flagship product **Marble** is set to be publicly released in November 2025, **with the goal of generating "persistent, explorable" three-dimensional environments from text, images, short videos, or rough 3D inputs, and supporting editing and expansion.** The functions listed in the report resemble a workstation aimed at creation and production: the ability to delete and modify objects after generation, using "Chisel" to first create a rough model before adding details, generating expanded selections, composing multiple worlds into larger scenes, exporting to external 3D software/engines, and providing APIs for developers to integrate.

It also emphasizes interfaces with industry toolchains: the ability to export to Unreal Engine and Unity; integration with simulation platforms like NVIDIA Isaac Sim; and showcases usage scenarios in architectural design, robotic simulation, and more.

The capital heat is also noted in the report: PitchBook estimates that World Labs has raised approximately $1.29 billion in total financing, with a post-money valuation of about $5.4 billion after a round of financing in February 2026.

## Yang Likun's Alternative Path: Predicting Structure Without Rendering Images

The storyline of AMI Labs is more "research paradigm": the company emerged from stealth in March 2026, co-founded by Yann LeCun, **with a path leaning towards the JEPA framework—focusing not on reconstructing every pixel, but on predicting the potential representations (latent embeddings) of occluded/future parts, using more abstract structures to learn the evolutionary laws of the world.** Morgan Stanley categorizes it on the "abstract representation/non-generative model" side, emphasizing its potential value in reasoning, planning, and physical AI systems (especially robotics).

The report discloses very limited specifics about AMI's products, only listing possible application directions: robotics, autonomous driving, video understanding/analysis, and AR/VR with cameras and smart assistants. In terms of financing, the report mentions that AMI Labs debuted with over $1 billion in seed funding, with a post-money valuation exceeding $4.5 billion according to PitchBook.

## **Capital and Talent Are Already Gathering: The Race for Spatial Intelligence Is Starting to "Accelerate"**

The most important signal from this Morgan Stanley material may not be a specific model parameter or a demo, but rather the change in landscape it describes: from DeepMind, Meta, Microsoft, Tesla, and NVIDIA to a batch of new startups, world models are becoming "the common language of the next phase." This can explain why there is a leap in productivity in gaming, film, and design, as well as why autonomous driving and robotics are increasingly moving training, validation, and planning into the virtual world.

World models are not a plug-and-play universal component. The conclusions drawn in the report resemble a roadmap: runnable scenarios have already emerged, and the real challenges are laid out—long-term stability, controllability, multi-agent systems, physical details, and evaluation systems. The next key question is who can turn these hard problems into engineering closed loops, which will be the watershed for how far the journey from "digital to physical" can go

### Related Stocks

- [Alphabet Inc. (GOOG.US)](https://longbridge.com/en/quote/GOOG.US.md)
- [Alphabet Inc. (GOOGL.US)](https://longbridge.com/en/quote/GOOGL.US.md)
- [Microsoft Corporation (MSFT.US)](https://longbridge.com/en/quote/MSFT.US.md)
- [iShares Global Tech ETF (IXN.US)](https://longbridge.com/en/quote/IXN.US.md)
- [iShares Semiconductor ETF (SOXX.US)](https://longbridge.com/en/quote/SOXX.US.md)
- [Global X Cloud Computing ETF (CLOU.US)](https://longbridge.com/en/quote/CLOU.US.md)
- [Global X Data Center & Dgtl Infrs ETF (DTCR.US)](https://longbridge.com/en/quote/DTCR.US.md)
- [Global X Video Games & Esports ETF (HERO.US)](https://longbridge.com/en/quote/HERO.US.md)
- [Direxion Daily Semicondct Bull 3X ETF (SOXL.US)](https://longbridge.com/en/quote/SOXL.US.md)
- [Direxion Daily GOOGL Bull 2X Shares (GGLL.US)](https://longbridge.com/en/quote/GGLL.US.md)
- [VanEck Vdo Gaming and eSprts ETF (ESPO.US)](https://longbridge.com/en/quote/ESPO.US.md)
- [Global X Internet of Things ETF (SNSR.US)](https://longbridge.com/en/quote/SNSR.US.md)
- [Roundhill GOOGL WeeklyPay ETF (GOOW.US)](https://longbridge.com/en/quote/GOOW.US.md)
- [Invesco Semiconductors ETF (PSI.US)](https://longbridge.com/en/quote/PSI.US.md)
- [First Trust IndXX NextG ETF (NXTG.US)](https://longbridge.com/en/quote/NXTG.US.md)
- [Franklin Exponential Data ETF (XDAT.US)](https://longbridge.com/en/quote/XDAT.US.md)
- [Direxion Daily MSFT Bull 2X Shares (MSFU.US)](https://longbridge.com/en/quote/MSFU.US.md)
- [Amplify Video Game Leaders ETF (GAMR.US)](https://longbridge.com/en/quote/GAMR.US.md)
- [ProShares Big Data Refiners ETF (DAT.US)](https://longbridge.com/en/quote/DAT.US.md)
- [State StreetSPDRS&PSftwr&SvcsETF (XSW.US)](https://longbridge.com/en/quote/XSW.US.md)
- [VanEck Semiconductor ETF (SMH.US)](https://longbridge.com/en/quote/SMH.US.md)
- [iShares Expanded Tech-Software Sect ETF (IGV.US)](https://longbridge.com/en/quote/IGV.US.md)
- [Pacer Benchmark Data&Infras RE SCTR ETF (SRVR.US)](https://longbridge.com/en/quote/SRVR.US.md)
- [State Street® SPDR® S&P® Smcndctr ETF (XSD.US)](https://longbridge.com/en/quote/XSD.US.md)
- [iShares U.S. Digital Infras & RE ETF (IDGT.US)](https://longbridge.com/en/quote/IDGT.US.md)

## Related News & Research

- [Coastline Trust Co Has $55.32 Million Holdings in Alphabet Inc. $GOOGL](https://longbridge.com/en/news/280045203.md)
- [Alphabet Insider Sold Shares Worth $318,579, According to a Recent SEC Filing](https://longbridge.com/en/news/279753631.md)
- [Judge rejects lawyers' preemptive fee bid in Google ad tech class action](https://longbridge.com/en/news/279973438.md)
- [Evercore Stays Bullish on Microsoft Stock (MSFT), But Says 'There is No Quick Fix to the Capacity Issues'](https://longbridge.com/en/news/280157423.md)
- [Sashiko: AI code review system for the Linux kernel spots bugs humans miss](https://longbridge.com/en/news/279918718.md)