--- title: "NVDA (GTC Trans): LPU re-architects AI inference; AI factories aim for space---" type: "Topics" locale: "zh-HK" url: "https://longbridge.com/zh-HK/topics/39303796.md" description: "For detailed commentary, see 'NVIDIA GTC: AI’s Spring Gala — High Hopes or Letdown?'.Below is the full transcript of $NVIDIA(NVDA.US) GTC.Jensen Huang, NVIDIA’s founder and CEO, delivered the keynote at GTC 2026.Key topics included the 20th anniversary of the CUDA platform, the inference inflection and a surge in compute demand, the Vera Rubin system architecture, Groq integration, the OpenClaw agent revolution, and physical AI and robotics. CUDA at 20 and the platform flywheel: CUDA has been around for 20 years..." datetime: "2026-03-17T07:40:58.000Z" locales: - [en](https://longbridge.com/en/topics/39303796.md) - [zh-CN](https://longbridge.com/zh-CN/topics/39303796.md) - [zh-HK](https://longbridge.com/zh-HK/topics/39303796.md) author: "[Dolphin Research](https://longbridge.com/zh-HK/news/dolphin.md)" --- > 支持的語言: [English](https://longbridge.com/en/topics/39303796.md) | [简体中文](https://longbridge.com/zh-CN/topics/39303796.md) # NVDA (GTC Trans): LPU re-architects AI inference; AI factories aim for space--- **For a detailed take, cf. '**[**NVIDIA GTC: AI’s Spring Gala — High Hopes, Mixed Feelings?**](https://longbridge.com/en/topics/39303579)**'.** **Below is the full content from**$NVIDIA(NVDA.US) **GTC:** NVIDIA founder and CEO Jensen Huang delivered the GTC 2026 keynote, covering the **CUDA platform’s 20th anniversary, the inference inflection and an explosion in compute demand, the Vera Rubin system architecture, Groq integration, the OpenClaw agent revolution, and physical AI & robotics**. These were the core themes. **CUDA at 20 and the platform flywheel** CUDA turns 20 this year. Over two decades, NVIDIA kept investing in the architecture, evolving from SIMT to the recent Tiles to ease Tensor Core programming. CUDA is now embedded across every ecosystem, with hundreds of thousands of open-source projects. NVIDIA’s core strategy fits one slide: an expanding install base attracts developers, who create breakthrough algorithms (e.g., deep learning). Breakthroughs spawn new markets and ecosystems, which in turn grow the install base and spin the flywheel faster. Library downloads on NVIDIA’s stack continue to accelerate off a large base. Because CUDA supports the full AI lifecycle, all data-processing platforms, and scientific solvers, GPUs enjoy very long effective lives. Ampere, launched six years ago, is even appreciating in cloud pricing. This is unusual longevity for compute. CUDA’s roots trace back 25 years to GeForce’s programmable shaders—the world’s first programmable accelerator (Pixel Shader). GeForce took CUDA to the world, and Alex Krizhevsky, Ilya Sutskever, Jeff Hinton, and Andrew Ng discovered GPUs turbocharge deep learning. That discovery ignited the AI boom. **Neural Rendering and DLSS 5** Huang unveiled next-gen graphics—Neural Rendering, the fusion of 3D graphics and AI—branded as DLSS 5. The core idea combines controllable 3D graphics (structured data, the virtual world’s ground truth) with GenAI (probabilistic computation): one is fully predictable, the other probabilistic yet highly photoreal. The result is content that is both beautiful and controllable. This fusion of 'structured information + GenAI' will recur industry after industry. Structured data is the bedrock for trustworthy AI. **Data platforms: cuDF and cuVS** NVIDIA built two foundational libraries: cuDF for structured data (dataframes) and cuVS for vector stores (semantic, unstructured data). Roughly 90% of new data generated each year is unstructured (PDFs, video, speech), historically unsearchable and unretrievable. Multimodal AI now enables indexing of that unstructured corpus. Partnerships include: IBM using cuDF to accelerate watsonx data; Dell co-building a Dell AI Data Platform with cuDF and cuVS; and Google Cloud integrating with BigQuery, cutting compute costs nearly 80% in work with Snapchat. Acceleration yields a triple benefit: speed, scale, and cost. **Accelerated computing: vertically integrated, horizontally open** NVIDIA calls itself the first company that is vertically integrated yet horizontally open. Accelerated computing is fundamentally 'application acceleration'—you must grasp the application, domain, and algorithms, then deploy across DC, cloud, edge, and robotic systems. NVIDIA integrates from chip to systems to libraries, while opening horizontally to global clouds and OEMs. Cloud partners include Google Cloud (accelerating Vertex AI, BigQuery, JAX/XLA, PyTorch), AWS (accelerating EMR, SageMaker, Bedrock, bringing OpenAI onto AWS), Microsoft Azure (AI Foundry, Bing Search, confidential computing), Oracle (its first AI customer), CoreWeave (the first AI-native cloud), and Palantir with Dell to deploy AI platforms in any country or air-gapped environment. Coverage is broad and deep. NVIDIA is the only accelerator excelling on both PyTorch and JAX/XLA. That is unique. **Industry verticals** GTC spans every layer of the AI 'five-layer cake': infrastructure, chips, platforms, models, and apps. It is end-to-end coverage. NVIDIA is deeply positioned across verticals: autonomous driving (Alpamayo), financial services (the largest cohort at GTC, shifting from traditional quant to large-scale deep learning), healthcare (AI drug discovery and diagnostics), industrial manufacturing (a global wave of AI factory builds), media/entertainment & gaming, quantum (35 firms co-building quantum–GPU hybrids), retail (a $35 tn sector with agentic shopping), robotics & manufacturing (a $50 tn arena with 110 robots on show), and telecom (AI-RAN with Nokia and T-Mobile). The breadth is unprecedented. CUDA-X libraries are NVIDIA’s 'crown jewels', with this GTC announcing about 100 libraries, ~70 new ones, and ~40 models. cuDNN transformed AI and helped spark the modern boom. It remains foundational. **AI-native companies and the VC wave** AI-native startups have raised $150 bn in VC—the largest such wave in history. For the first time, checks jumped from millions to hundreds of millions or even billions, since every company needs large-scale compute and tokens. Firms will either mint their own tokens or add value atop tokens from providers like Anthropic and OpenAI. **The inference inflection: 1,000,000x compute demand** Three pivotal shifts occurred in two years: 1. **GenAI (ChatGPT, 2022/23)**: a move from retrieval to generative computation, fundamentally changing computing. 2\. **Reasoning AI (o1/o3)**: enables reflection, planning, problem decomposition, and research-based self-verification, making GenAI grounded and credible. This improves trust. 3\. **Agentic AI (Claude Code)**: the first agent model that reads files, codes, compiles, tests, evaluates, and iterates. It has transformed software engineering, and 100% of NVIDIA engineers use a mix of Claude Code, Codex, and Cursor. Adoption is universal internally. The inference moment is here: thinking requires reasoning, acting requires reasoning, and reading requires reasoning. Over the past two years, compute per task rose ~10,000x, usage is up ~100x, and the combined compute requirement is ~1,000,000x higher. All AI companies are compute-constrained—more compute would translate directly into revenue. At last year’s GTC, Huang saw $500 bn of high-confidence demand (Blackwell and Rubin through 2026). Now, through 2027, he sees at least $1 tn, with true demand likely well above that. The bar has moved materially. **Grace Blackwell’s inference showing** 2025 is NVIDIA’s 'Year of Inference'. SemiAnalysis ran the most comprehensive AI inference benchmark to date and found: - NVIDIA leads globally on both tokens/watt (throughput) and token speed (capability). \- Grace Blackwell NVLink 72 delivers a 35x per-watt uplift vs. Hopper H200 (50x observed), vs. Moore’s Law’s ~1.5x expectation. This is a step-change. \- NVIDIA has the world’s lowest token cost—'basically untouchable'. SemiAnalysis’ Dylan Patel said Huang 'sandbagged'. Fireworks is one example: on the same system, a software update boosted token speed from ~700 tokens/s to nearly ~5,000 tokens/s, a 7x jump. That uplift was software-driven. **The token-factory economics** Data centers are shifting from 'file stores' to 'token factories'. Each plant is power-limited (e.g., 1 GW), and CEOs must manage token throughput and token speed. tokens will tier like commodities. \- Free tier: high throughput, low speed. - $3 per mn tokens tier. \- $6 per mn tokens tier. - $45 per mn tokens tier. \- Premium: $150 per mn tokens. These are illustrative tiers. In a 1 GW DC, allocate 25% power per tier: Grace Blackwell can generate 5x the revenue of Hopper, and Vera Rubin can add another 5x. The revenue stack compounds. **Vera Rubin system architecture** Vera Rubin is NVIDIA’s new AI system, featuring: - 100% liquid cooling (45°C hot-water), cableless racks, cutting install time from two days to two hours. \- 6th-gen NVLink scale-up switching (neither Ethernet nor InfiniBand), fully liquid-cooled. - A new CPU focused on extreme single-thread performance, massive data egress, and peak energy efficiency using LPDDR5—the only DC CPU with LPDDR5—and an independent CPU biz. set to reach bn-scale. \- A new Groq system (3rd-gen LP30, Samsung fab), already in volume. - The world’s first CPO (co-packaged optics) Spectrum-X switch in full production, co-inventing the COUPE process with TSMC. \- The BlueField-4 storage platform (Vera CPU + CX9). This is a full-stack upgrade. Vera Rubin is live on Microsoft Azure (first rack), confirmed by Satya Nadella. NVIDIA’s supply chain can ship thousands of systems per week, enabling multi-GW AI factory capacity per month. Scale is the strategy. **Rubin Ultra**: 144 GPUs within one NVLink domain, with the new Kyber rack, vertical node insertion, and NVLink switches replacing copper behind the midplane. This maximizes density. In a 1 GW factory, token gen rate rises from 22 mn to 700 mn in two years—a **350x increase**. That is dramatic. **Groq integration: disaggregated inference** NVIDIA acquired the Groq team and licensed its tech. Groq is a deterministic dataflow processor with static compilation, compiler scheduling, and large on-die SRAM, purpose-built for inference. It is optimized for low latency. A single Groq die has 500 MB SRAM vs. 288 GB on a single Rubin die, so Groq alone cannot host mainstream model weights and KV cache. Capacity is complementary. **The solution is disaggregated inference via Dynamo: prefill on Vera Rubin; the attention portion of decode on Vera Rubin (compute-heavy); and the MLP/token-generation portion of decode offloaded to Groq (ultra-low latency and high bandwidth).** The two are tightly coupled over Ethernet, with a special mode halving latency. This balances performance and cost. Result: up to 35x performance at the highest-value tiers. If most workloads are high-throughput, go 100% Vera Rubin; if there is significant coding/high-value token demand, provision ~25% Groq + 75% Vera Rubin. Groq LP30 (Samsung) is in volume, with shipments expected in Q3. **Product roadmap** \- **Blackwell/Rubin**: Oberon systems (standard racks), copper scale-up (NVLink 72); optical scale-up to NVLink 576. - **Rubin Ultra**: Kyber racks, copper scale-up to NVLink 144. \- **Next-gen Rubin Ultra**: new GPU + LP35 (first NVFP4 compute) with Oberon + Spectrum-6 CPO. - **Feynman** (next): new GPU + LP40 (LPU) + Rosa CPU (named after Rosalind) + BlueField-5 + CX10; Kyber copper scale-up + Kyber CPO scale-up (first to support both copper and CPO scale-up). **One new architecture every year. NVIDIA will keep investing across copper, optics, and CPO interconnects.** Cadence stays annual. **AI factories and the DSX platform** NVIDIA is evolving from a chip company into an AI factory/infrastructure company. Its new NVIDIA DSX platform (built on Omniverse) designs GW-scale AI factories in simulation: mechanical, thermal, electrical, and network for racks; grid interaction for power modulation; and Max-Q to dynamically tune system power and cooling. The goal is to waste not a single watt, with '2x optimization' still on the table. Separately, NVIDIA announced Vera Rubin Space-1, aiming to deploy data centers in space—where heat must be shed via radiation with no conduction or convection. This presents unique engineering challenges. **The OpenClaw agent revolution** OpenClaw has become the most popular open-source project in history, surpassing 30 years of Linux progress within weeks. Huang likens it to the 'OS for agent computers'—just as Windows enabled PCs, OpenClaw enables 'personal agents'. It is a platform moment. OpenClaw provides resource management, tool invocation, file system access, LLM connectivity, task scheduling (cron jobs), problem decomposition, sub-agent orchestration, and multimodal I/O. The stack is comprehensive. **Every company needs an OpenClaw strategy**, akin to prior Linux, HTTP/HTML, and Kubernetes strategies. Every SaaS company will become a GaaS firm (Agent-as-a-Service). This is the next service model. Yet agents inside enterprise networks can access sensitive data, execute code, and communicate externally—requiring enterprise-grade security. NVIDIA partnered with OpenClaw founder Peter Steinberger to launch **NemoClaw**, an enterprise-security reference design for OpenClaw that integrates OpenShell, with a network guardrail and a privacy router, and hooks into SaaS policy engines. Security is built-in. Huang predicts every engineer will receive an annual token budget—potentially half of base pay—to 10x productivity. 'How many tokens are in your offer?' is becoming a Silicon Valley recruiting pitch. This could reshape comp. **NVIDIA’s Open Models initiative and the Nemotron Coalition** NVIDIA is at the frontier across model families: Nemotron (language), Cosmos (world foundation model), GR00T (general-purpose robotics), Alpamayo (autonomous driving), BioNeMo (digital biology), and Earth-2 (AI physics). Coverage is broad. Nemotron-3 ranks top-three globally on OpenClaw. Nemotron-3 Ultra aims to be the best base model and to support sovereign AI efforts across countries. It is positioned as a foundation layer. Announcing the **Nemotron Coalition** partners: Black Forest Labs (imaging), Cursor (coding), LangChain (agent framework, 1 bn downloads), Mistral, Perplexity, Reflection, Sarvam (India), and Thinking Machines Lab (Mira Murati’s lab). The ecosystem is expanding. **Physical AI and robotics** Nearly all robot builders are working with NVIDIA. The company provides three computers: a training computer, a synthetic data and simulation computer, and an onboard robot computer. This covers the full stack. **Autonomous driving**: 'The ChatGPT moment for autonomy is here.' Four new robotaxi-ready platform partners—BYD, Hyundai, Nissan, and Geely—add up to 18 mn units/year, alongside Mercedes, Toyota, and GM. NVIDIA also announced robotaxi deployments with Uber across multiple cities. The partner roster is broadening. **Industrial robots**: ABB, Universal Robots, and KUKA are integrating NVIDIA’s physical AI models into simulation systems and deploying to manufacturing lines. Adoption is scaling. **Humanoids**: 110 robots exhibited. Disney’s Olaf robot demoed live—powered by Jetson, trained to walk in Omniverse, and using the Newton physics solver (co-developed by NVIDIA Warp, Disney, and DeepMind). It showcased rapid iteration. **Risk Disclosure & Disclaimer:**[**Dolphin Research Disclosure**](https://support.longbridge.global/topics/misc/dolphin-disclaimer) ### 相關股票 - [OpenAI (OpenAI.NA)](https://longbridge.com/zh-HK/quote/OpenAI.NA.md) - [Palantir Tech (PLTR.US)](https://longbridge.com/zh-HK/quote/PLTR.US.md) - [T-Mobile US (TMUS.US)](https://longbridge.com/zh-HK/quote/TMUS.US.md) - [Oracle (ORCL.US)](https://longbridge.com/zh-HK/quote/ORCL.US.md) - [Destiny Tech100 (DXYZ.US)](https://longbridge.com/zh-HK/quote/DXYZ.US.md) - [NVIDIA (NVDA.US)](https://longbridge.com/zh-HK/quote/NVDA.US.md) - [ORACLE CORP DEPOSITARY SH REP 1/2000TH PFD SER D (ORCL-D.US)](https://longbridge.com/zh-HK/quote/ORCL-D.US.md) ## 評論 (1) - **资深韭菜 · 2026-03-18T01:20:58.000Z**: There's no Disney robot in the picture 😂