
Deciphering the end-to-end puzzle: computing power miracles, diverse architectures and implementation challenges

"How exactly does Tesla's end-to-end autonomous driving solution work?"
At an end-to-end AI symposium, someone posed this question to a panel of experts.
Present were Zhao Xing(Assistant Professor at Tsinghua's Institute for Interdisciplinary Information Sciences), Xu Chunjing(Chief AI Scientist for Huawei's Intelligent Driving BU), Wang Naiyan(Distinguished Scientist at Xiaomi's Intelligent Driving division), and Jia Peng(VP of Algorithm R&D at Li Auto) - yet none could provide a definitive answer.
Nobody fully understands the specific model architecture of Tesla's FSD V12, yet Tesla alone has stirred the tides of end-to-end innovation.
We've pieced together clues from Musk's statements and Tesla's developments: a unified neural network controls perception-to-decision processes, likely built on generative AI extending their Occupancy model into a world model.
One certainty emerges: end-to-end solutions demand unprecedented cloud computing power.
As Musk repeatedly stated: "FSD V12's end-to-end model iteration is primarily constrained by cloud computing resources."
Thus Tesla is investing heavily, planning over $1 billion in DOJO supercomputing by 2024's end targeting 100 exaFLOPS.
If computing power is end-to-end's prerequisite, this signals a new arms race where victory favors those who "brute-force breakthroughs".
Like Tesla's opaque methodology, everyone simply chases the wave's direction.
Suddenly, end-to-end solutions proliferate as players scramble to keep pace or face obsolescence.
01、End-to-End Driving: "Brute Force" Breaks Barriers
The AI-modeled approach inherently fuels computational demands, igniting a computing arms race.
Intelligent computing centers enter land-grab mode, sparking a power struggle.
Tesla, Changan, Geely and others aggressively build or partner for computing infrastructure.
Tesla's DOJO aims for 100 exaFLOPS by October 2024 - equivalent to 300,000 Nvidia A100s.
Domestic automakers like Geely, Changan, and startups Nio, XPeng, Li Auto race to keep up.
Notably, Nio's Tencent-partnered center remains undisclosed, but CEO Li Bin calls their computing strategy "insanely ambitious," claiming global leadership for years.
Suppliers like Huawei, SenseTime's Jueying, and Haomo hold their ground.
Huawei's QIANKUN ADS 3.0 reaches 3.5 petaFLOPS, processing 30 million km daily - covering Earth's 64 million km roads in 2.1 days.
SenseTime's latest report shows 45,000 GPUs delivering 12 petaFLOPS, doubling 2023 capacity. Haomo's Volcano-powered "Snow Lake Oasis" hits 670 petaFLOPS.
Clearly, computing centers become end-to-end's prerequisite, with demand growing exponentially.
"End-to-end players without computing centers are unqualified," a Haomo expert states, noting computing power accelerates model iteration and issue resolution.
SenseTime's VP Shi Jianping adds that abundant computing enables experimentation, yielding superior models.
Does this mean end-to-end requires brute force breakthroughs?
The industry diverges:
- Some embrace "violent computing" through heavy investment;
- Others pursue algorithm-focused "craftsmanship".
While all agree autonomous driving's triad (algorithms, data, computing) must balance, priorities differ.
Proponents argue algorithms show minimal differentiation - the edge lies in efficiently training data.
One insider notes that with academia publishing viable architectures, industry must accumulate computing and data advantages.
Opposing voices counter that algorithmic breakthroughs are more urgent.
DeepRoute emphasizes creating networks satisfying Scaling Law - where performance improves with model size (parameters, data, computing).
Thus model optimization precedes brute-force scaling.
Neither approach is universally superior - strategies align with corporate resources.
But Tesla and Huawei's massive investments suggest computing power elevates end-to-end's ceiling.
So what scale suffices?
Chengtai Capital's report shows 100 high-performance GPUs can train a model, but likely insufficient for mass production.
Haomo suggests 1,000 GPUs as baseline for iterative algorithms.
Upper limits remain undefined - companies must gauge their capacities against Tesla's dominance.
Tesla plans 85,000 Nvidia H100 GPUs in 2024, matching Google/Amazon - an unreachable scale for domestic firms given H100's $25k-$40k price (over $2 billion).
Tesla's ambitions span embodied AI, Robotaxis, and robots - justifying such investment.
Domestic players focus on urban NOA deployment - Haomo states 2,000-5,000 GPUs suffice nationwide L2-L3, though L4-L5 demands will rise.
02、End-to-End Enigma: Who's Authentic?
The end-to-end frenzy creates confusion - everyone claims the label before achieving it.
Without unified standards, definitions split:
- Broadly: lossless data transfer enabling holistic optimization
- Narrowly: single neural network from sensors to controls
Current implementations vary:
1. Perception-Cognition Modeling: Split into perception (Huawei's GOD network) and cognition (PDP network) stages
2. Modular End-to-End: Unified training of connected modules (OpenDriveLab's UniAD)
3. Single Neural Network: Pure end-to-end (Wayve's GAIA-1/LINGO-2)
Traditional players transition gradually through four phases: perception modeling → decision modeling → modular → single model.
Source: Chengtai Capital "End-to-End Autonomous Driving Industry Report"
Nio's Ren Shaoqing notes perception modeling is mature, but decision modeling remains incomplete industry-wide.
XPeng and Haomo advocate radical restructuring for efficiency.
Authentication remains problematic - neither BEV+Transformer architecture nor vision/LiDAR choices prove authenticity.
As SenseTime's Shi notes, high-definition maps aren't prerequisite - they're adding navigation maps despite being "mapless".
Ultimately, only code inspection or driving experience reveals truth.
One insider states: "Authentic end-to-end shows dramatic improvement - comparable performance means it's fake."
03、End-to-End: Not Ultimate, But Optimal
From UniAD's CVPR 2023 award to FSD V12 and Wayve's $1B funding, academia, industry, and capital converge on this revolution.
Nvidia's Wu Xinzhu calls it autonomous driving's "final movement"; XPeng's He Xiaopeng predicts disruptive change.
Yet debates continue - end-to-end doesn't wholly surpass modular approaches, especially regarding verification and safety.
Currently optimal though not necessarily ultimate, it solves edge cases and reduces manual coding.
Three development trends address key challenges:
1. Cost Control: Momenta's dual-path approach (long/short-term memory) cuts training costs 10-100x
2. Safety Nets: Li Auto's dual-system combines end-to-end (routine) with VLM (edge cases)
3. Verification: Simulation replaces expensive real-world testing (CARLA, Lightwheel's solutions)
The transition from rule-based to deep learning represents a paradigm shift - with autonomous driving at its forefront.
$Tesla(TSLA.US) $Li Auto(LI.US) $NIO(NIO.US)
The copyright of this article belongs to the original author/organization.
The views expressed herein are solely those of the author and do not reflect the stance of the platform. The content is intended for investment reference purposes only and shall not be considered as investment advice. Please contact us if you have any questions or suggestions regarding the content services provided by the platform.


