---
title: "Jensen Huang's full speech at GTC: The era of inference has arrived, with revenue expected to reach at least one trillion dollars by 2027, and lobster is the new operating system"
type: "News"
locale: "zh-CN"
url: "https://longbridge.com/zh-CN/news/279337105.md"
description: "At the GTC 2026 conference, NVIDIA CEO Jensen Huang positioned the company as a builder of \"AI factories,\" stating that \"by 2027, we will see at least $1 trillion in high-confidence demand.\" He introduced the concept of \"Token Factory Economics,\" emphasizing that performance per watt is the core of commercial monetization. Jensen Huang asserted that Agents will end the traditional SaaS model, and in the future, \"salary + Token budget\" will become the new standard in the workplace"
datetime: "2026-03-16T23:37:15.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/279337105.md)
  - [en](https://longbridge.com/en/news/279337105.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/279337105.md)
---

> 支持的语言: [English](https://longbridge.com/en/news/279337105.md) | [繁體中文](https://longbridge.com/zh-HK/news/279337105.md)


# Jensen Huang's full speech at GTC: The era of inference has arrived, with revenue expected to reach at least one trillion dollars by 2027, and lobster is the new operating system

On March 16, 2026, the NVIDIA GTC 2026 conference officially opened, with NVIDIA founder and CEO Jensen Huang delivering the keynote speech.

At this conference, regarded as the "annual pilgrimage of the AI industry," Jensen Huang elaborated on NVIDIA's transformation from a "chip company" to an "AI infrastructure and factory company." Addressing the market's primary concerns about performance sustainability and growth potential, Jensen Huang detailed the underlying business logic driving future growth—"Token Factory Economics."

## Performance guidance extremely optimistic, "At least $1 trillion demand by 2027"

In the past two years, global AI computing demand has exploded exponentially. As large models evolve from "perception" and "generation" to "reasoning" and "action (executing tasks)," the consumption of computing power has surged dramatically. In response to the market's keen interest in order and revenue ceilings, Jensen Huang provided a very strong outlook.

Jensen Huang stated in his speech:

> Last year at this time, I mentioned that we saw a high-confidence demand of $500 billion, covering Blackwell and Rubin until 2026. Now, right here and now, I see at least $1 trillion of demand by 2027 (at least $1 trillion).

Jensen Huang's trillion-dollar forecast once drove NVIDIA's stock price up over 4.3%.

Moreover, he further elaborated on this figure:

> **Is this reasonable? That's what I'm going to talk about next. In fact, we may even face a supply shortage. I'm certain that the actual computing demand will be much higher than this.**

**Jensen Huang pointed out that today's NVIDIA systems have proven to be the world's "lowest-cost infrastructure." Because NVIDIA can run AI models across almost all fields, this versatility ensures that the $1 trillion invested by customers can be fully utilized and maintained over a long lifecycle.**

Currently, 60% of NVIDIA's business comes from the top five hyperscale cloud service providers, while the remaining 40% is widely distributed across sovereign clouds, enterprises, industrial sectors, robotics, and edge computing.

## Token Factory Economics, performance per watt determines business lifeline

To explain the rationale behind this $1 trillion demand, Jensen Huang presented a new business mindset to global CEOs. He pointed out that future data centers will no longer be warehouses for storing files but "factories" for producing Tokens (the basic units generated by AI)
Jensen Huang emphasized:

> Every data center and every factory is, by definition, power-limited. A 1GW (gigawatt) factory will never become 2GW; this is a law of physics and atoms. At fixed power, whoever has the highest Token throughput per watt will have the lowest production costs.

Jensen Huang categorized future AI services into four commercial tiers:

> -   Free tier (high throughput, low speed)
> -   Intermediate tier (~$3 per million tokens)
> -   Advanced tier (~$6 per million tokens)
> -   High-speed tier (~$45 per million tokens)
> -   Ultra-high-speed tier (~$150 per million tokens)

He pointed out that as models become larger and contexts longer, AI will become smarter, but the Token generation rate will decrease. Jensen Huang stated:

> In this Token factory, your throughput and Token generation speed will directly translate into your precise revenue for next year.

**Jensen Huang emphasized that NVIDIA's architecture allows customers to achieve extremely high throughput in the free tier while delivering an astonishing 35 times performance improvement at the highest value inference tier.**

****

## Vera Rubin achieves 350x acceleration in two years, Groq fills the gap for ultra-fast inference

Under the constraints of physical limits, NVIDIA introduced its most complex AI computing system ever, Vera Rubin. Jensen Huang stated:

> In the past, when mentioning Hopper, I would hold up a chip, which was cute. But when mentioning Vera Rubin, everyone thinks of the entire system. In this 100% liquid-cooled system, which completely eliminates traditional cables, racks that used to take two days to install now only take two hours.

Jensen Huang pointed out that through extreme end-to-end hardware and software co-design, Vera Rubin created an astonishing data leap within the same 1GW data center:

> **In just two years, we increased the Token generation rate from 22 million to 700 million, achieving a 350-fold growth. Moore's Law during the same period could only bring about a 1.5-fold improvement.**

To address the bandwidth bottleneck under ultra-fast inference conditions (such as 1000 Tokens/second), NVIDIA provided the final solution integrating the acquired company Groq: asymmetric separated inference. Jensen Huang explained:

> **These two processors have completely different characteristics. The Groq chip has 500MB of SRAM, while a Rubin chip has 288GB of memory.******

Jensen Huang pointed out that NVIDIA has assigned the "pre-fill" stage, which requires massive computation and video memory, to Vera Rubin, while the "decoding" stage, which is extremely sensitive to latency, has been assigned to Groq. Jensen Huang also provided suggestions for enterprise computing power allocation:

> If your work primarily involves high throughput, use 100% Vera Rubin; if you have a large demand for high-value programming-level Token generation, allocate 25% of your data center capacity to Groq.

It has been revealed that the Groq LP30 chip, manufactured by Samsung, has entered mass production and is expected to ship in the third quarter, while the first Vera Rubin rack is already running on Microsoft Azure cloud.

**In addition, regarding optical interconnect technology, Jensen Huang showcased the world's first mass-produced Co-Packaged Optics (CPO) switch, Spectrum X, and quelled market concerns about the "copper to optical" route debate:**

> **We need more copper cable capacity, more optical chip capacity, and more CPO capacity.**

## Agent Ends Traditional SaaS, "Annual Salary + Token" Becomes Standard in Silicon Valley

In addition to hardware barriers, Jensen Huang devoted a significant portion of his speech to the revolution of AI software and ecosystems, particularly the explosion of Agents.

He described the open-source project OpenClaw as "the most popular open-source project in human history," claiming it surpassed the achievements of Linux over the past 30 years in just a few weeks. **Jensen Huang stated that OpenClaw is essentially the "operating system" for Agent computers.**

Jensen Huang asserted:

> Every SaaS (Software as a Service) company will become an AaaS (Agent-as-a-Service) company. There is no doubt that to safely deploy these agents, which have the ability to access sensitive data and execute code, NVIDIA has launched an enterprise-level NeMo Claw reference design, which includes a policy engine and privacy router.

For ordinary professionals, this transformation is also just around the corner. Jensen Huang envisioned a new workplace model for the future:

> **In the future, every engineer in our company will need an annual Token budget. Their base salary may be hundreds of thousands of dollars, and I will allocate about half of that amount as a Token quota to help them achieve a 10x efficiency improvement. This has already become a new recruitment chip in Silicon Valley: how many Tokens are included in your offer?**

**At the end of the speech, Jensen Huang also "spoiled" the next-generation computing architecture Feynman, which will achieve the first joint horizontal scaling of copper wires and CPO. More intriguingly, NVIDIA is developing a data center computer "Vera Rubin Space-1" to be deployed in space, completely opening up the imagination space for AI computing power extending beyond Earth.**\*\*

**Jensen Huang's Full Speech at GTC 2026, Translated as Follows (Assisted by AI Tools):**

> **Host:** Welcome NVIDIA founder and CEO Jensen Huang to the stage.
> 
> **Jensen Huang, Founder and CEO:**
> 
> Welcome to GTC. I want to remind everyone that this is a technology conference. I am very pleased to see so many people lining up to enter so early in the morning and to see all of you here.
> 
> At GTC, we will focus on three major themes: technology, platform, and ecosystem. NVIDIA currently has three major platforms: the CUDA-X platform, the system platform, and our newly launched AI factory platform.
> 
> Before we officially begin, I want to thank our warm-up session hosts—Sarah Guo from Conviction, Alfred Lin from Sequoia Capital (NVIDIA's first venture capitalist), and Gavin Baker, NVIDIA's first major institutional investor. These three have profound insights into technology and have a wide influence across the entire technology ecosystem. Of course, I also want to thank all the distinguished guests I personally invited to attend today. Thank you to this all-star team.
> 
> I also want to thank all the companies present today. NVIDIA is a platform company, and we have technology, platforms, and a rich ecosystem. The companies represented here today encompass almost all participants in the $100 trillion industry, with 450 companies sponsoring this event, for which I am deeply grateful.
> 
> This conference features 1,000 technical forums and 2,000 speakers, covering every level of the artificial intelligence "five-layer cake" architecture—from infrastructure such as land, power, and data centers, to chips, platforms, models, and various applications that ultimately drive the entire industry forward.
> 
> ## CUDA: Two Decades of Technological Accumulation
> 
> The starting point of everything is right here. This year marks the 20th anniversary of CUDA.
> 
> For twenty years, we have been dedicated to the development of this architecture. CUDA is a revolutionary invention—SIMT (Single Instruction Multiple Threads) technology allows developers to write programs in scalar code and extend them into multi-threaded applications, with programming difficulty far lower than that of previous SIMD architectures. We have recently added the Tiles feature to help developers program Tensor Cores more conveniently, as well as various mathematical operation structures relied upon by today's artificial intelligence. Currently, CUDA has thousands of tools, compilers, frameworks, and libraries, with hundreds of thousands of public projects in the open-source community, and has been deeply integrated into every technology ecosystem.
> 
> This chart reveals NVIDIA's 100% strategic logic, and I have been presenting this slide since the beginning. The most difficult and core element to achieve is the "installed base" at the bottom of the chart. After twenty years, we have accumulated hundreds of millions of GPUs and computing systems running CUDA worldwide.
> 
> Our GPUs cover all cloud platforms, serving almost all computer manufacturers and industries. The vast installed base of CUDA is the fundamental reason this flywheel continues to accelerateThe installation volume attracts developers, developers create new algorithms and achieve breakthroughs, breakthroughs give rise to new markets, new markets form new ecosystems and attract more companies to join, thereby expanding the installation volume—this flywheel is continuously accelerating.
> 
> The download volume of the NVIDIA library is growing at an astonishing rate, with a large scale and increasing growth speed. This flywheel enables our computing platform to support massive applications and an endless stream of new breakthroughs.
> 
> More importantly, it also gives these infrastructures an extremely long lifespan. The reason is obvious: there are a wealth of applications that can run on NVIDIA CUDA, covering every stage of the AI lifecycle, various data processing platforms, and various scientific principle solvers. Therefore, once an NVIDIA GPU is installed, its actual usage value is extremely high. This is also why the cloud price of the Ampere architecture GPU we released six years ago has actually increased.
> 
> The fundamental reason for all this is: a large installation volume, a strong flywheel, and a broad developer ecosystem. When these factors work together, coupled with our continuous software updates, computing costs will continue to decline. Accelerated computing significantly enhances application performance, and as we maintain and iterate software over the long term, users can not only achieve performance leaps in the early stages but also continuously enjoy declining computing costs. We are willing to provide long-term support for every GPU in the world because they are completely compatible at the architectural level.
> 
> The reason we are willing to do this is that the installation volume is so large—every time a new optimization is released, it benefits millions of users. This dynamic combination allows the NVIDIA architecture to continuously expand its coverage, accelerate its own growth, and continuously lower computing costs, ultimately stimulating new growth. CUDA is at the core of all this.
> 
> ## From GeForce to CUDA: A Twenty-Five-Year Evolution
> 
> Our journey with CUDA actually began twenty-five years ago.
> 
> GeForce—many of you have grown up alongside GeForce. GeForce is NVIDIA's most successful marketing project. We started cultivating future customers when you couldn't afford the products—your parents became NVIDIA's earliest users, purchasing our products year after year, until one day you grew up to become excellent computer scientists, becoming true customers and developers.
> 
> This is the foundation laid by GeForce twenty-five years ago. Twenty-five years ago, we invented the programmable shader—an obvious yet profoundly significant invention that enabled accelerators to become programmable, and the world's first programmable accelerator, namely the pixel shader. Five years later, we created CUDA—one of the most important investments in our history. At that time, the company's financial resources were limited, but we bet most of our profits on this, committed to extending CUDA from GeForce to every computer. We were so determined because we firmly believed in its potential. Despite facing hardships in the early stages, the company held onto this belief for 13 generations, a full twenty years, and today CUDA is everywhere
> 
> It is the pixel shader that has driven the revolution of GeForce. About eight years ago, we launched RTX—a comprehensive innovation of architecture for the modern computer graphics era. GeForce brought CUDA to the world, and because of this, many scholars like Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton, and Andrew Ng discovered that GPUs could become powerful tools for accelerating deep learning, igniting the explosion of artificial intelligence a decade ago.
> 
> Ten years ago, we decided to merge programmable shading with two new concepts: one is hardware ray tracing, which is technically challenging; the other is a forward-looking idea—we foresaw that AI would completely transform computer graphics about a decade ago. Just as GeForce brought AI to the world, AI is now reshaping the entire implementation of computer graphics in return.
> 
> Today, I want to show you the future. This is our next-generation graphics technology, which we call Neural Rendering—an in-depth integration of 3D graphics and artificial intelligence. This is DLSS 5, please take a look.
> 
> ## Neural Rendering: The Fusion of Structured Data and Generative AI
> 
> Isn't it breathtaking? Computer graphics are thus revitalized.
> 
> What have we done? We have combined controllable 3D graphics (the real foundation of the virtual world) with its structured data, integrating generative AI and probabilistic computation. One is completely deterministic, while the other is probabilistic yet highly realistic—we have merged these two concepts into one, achieving precise control through structured data while generating in real-time. Ultimately, the content is both stunningly beautiful and fully controllable.
> 
> The idea of merging structured information with generative AI will continue to replicate across various industries. Structured data is the cornerstone of trustworthy AI.
> 
> ## Accelerated Platform for Structured and Unstructured Data
> 
> Now I want to show you a technical architecture diagram.
> 
> Structured data—familiar SQL, Spark, Pandas, Velox, as well as important platforms like Snowflake, Databricks, Amazon EMR, Azure Fabric, Google BigQuery, etc.—are all processing data frames. These data frames are like giant spreadsheets, carrying all the information of the business world and serving as the basic facts (Ground Truth) of enterprise computing.
> 
> In the AI era, we need to enable AI to use structured data and achieve extreme acceleration. In the past, accelerating structured data processing was aimed at making enterprises operate more efficiently. In the future, AI will use these data structures at speeds far exceeding human capabilities, and AI agents will heavily rely on structured databases.
> 
> Regarding unstructured data, vector databases, PDFs, videos, audios, etc., constitute the vast majority of data forms in the world—about 90% of the data generated each year is unstructured dataIn the past, this data was almost completely unusable: we read it, stored it in file systems, and that was it. We couldn't query it or retrieve it easily, because unstructured data lacks a simple indexing method and requires understanding its meaning and context. Now, AI can do this—thanks to multimodal perception and understanding technology, AI can read PDF documents, understand their meaning, and embed them into a larger structure that is queryable.

NVIDIA has created two foundational libraries for this purpose:

-   cuDF: for accelerated processing of data frames and structured data
    
-   cuVS: for processing vector storage, semantic data, and unstructured AI data
    

These two platforms will become one of the most important foundational platforms in the future.

Today, we announced partnerships with several companies. IBM—the inventor of SQL—will use cuDF to accelerate its WatsonX Data platform. Dell has collaborated with us to create the Dell AI Data Platform, integrating cuDF and cuVS, and achieving significant performance improvements in real projects with NTT Data. On the Google Cloud front, we are now not only accelerating Vertex AI but also BigQuery, and we have partnered with Snapchat to reduce their computing costs by nearly 80%.

The benefits of accelerated computing are threefold: speed, scale, and cost. This aligns with the logic of Moore's Law—achieving performance leaps through accelerated computing while continuously optimizing algorithms, allowing everyone to enjoy the continuously decreasing costs of computing.

NVIDIA has built an accelerated computing platform that brings together numerous libraries: RTX, cuDF, cuVS, and more. These libraries are integrated into global cloud services and OEM systems, reaching users worldwide.

## Deep Collaboration with Cloud Service Providers

Collaboration with major cloud service providers

Google Cloud: We accelerate Vertex AI and BigQuery, deeply integrating with JAX/XLA, while performing excellently on PyTorch—NVIDIA is the only accelerator in the world that excels on both PyTorch and JAX/XLA. We are bringing customers like Base10, CrowdStrike, Puma, and Salesforce into the Google Cloud ecosystem.

AWS: We accelerate EMR, SageMaker, and Bedrock, with deep integration with AWS. What excites me this year is that we will bring OpenAI to AWS, which will significantly drive the consumption growth of AWS cloud computing and help OpenAI expand regional deployments and computing scale.

Microsoft Azure: NVIDIA's 100 PFLOPS supercomputer is the first supercomputer we built and the first supercomputer deployed on Azure, laying an important foundation for collaboration with OpenAI. We accelerate Azure cloud services and AI Foundry, collaborate to promote Azure regional expansion, and work closely on Bing searchIt is worth mentioning our **Confidential Computing** capability—ensuring that even operators cannot view user data and models—NVIDIA GPUs are among the first GPUs in the world to support Confidential Computing, enabling the confidential deployment of OpenAI and Anthropic models in cloud environments across the globe. For example, with Synopsys, we accelerate its entire EDA and CAD workflows and deploy them on Microsoft Azure.

> Oracle: We are Oracle's first AI customer, and I am proud to be able to explain the concept of AI cloud to Oracle for the first time. Since then, they have developed rapidly, and we have introduced many partners such as Cohere, Fireworks, and OpenAI.
> 
> CoreWeave: The world's first AI-native cloud, born for GPU hosting and AI cloud services, with an excellent customer base and strong growth momentum.
> 
> Palantir + Dell: The three parties jointly created a brand new AI platform based on Palantir's Ontology Platform and AI platform, capable of fully localized deployment of AI in any country and any air-gapped environment—from data processing (vectorization or structuring) to the complete accelerated computing stack for AI, covering everything.
> 
> NVIDIA has established this special partnership with global cloud service providers—we bring customers to the cloud, creating a mutually beneficial ecosystem.
> 
> ## Vertical Integration, Horizontal Openness: NVIDIA's Core Strategy
> 
> NVIDIA is the world's first vertically integrated and horizontally open company.
> 
> The necessity of this model is very simple: accelerated computing is not just a chip issue or a system issue; its complete expression should be application acceleration. CPUs can make computers run faster overall, but this path has reached a bottleneck. In the future, only through application or domain-specific acceleration can we continue to achieve performance leaps and cost reductions.
> 
> This is precisely why NVIDIA must delve into one library after another, one field after another, and one vertical industry after another. We are a vertically integrated computing company with no other path to take. We must understand applications, understand fields, deeply understand algorithms, and be able to deploy them in any scenario—data centers, cloud, on-premises, edge, and even robotic systems.
> 
> At the same time, NVIDIA maintains horizontal openness, willing to integrate technology into any partner's platform, allowing the whole world to enjoy the dividends of accelerated computing.
> 
> The structure of attendees at this GTC fully reflects this. Among the attendees, the proportion from the financial services industry is the highest—we hope developers come, not traders. Our ecosystem covers the upstream and downstream supply chains. Whether a company has been established for 50 years, 70 years, or 150 years, last year marked its best year in history. We are at the starting point of something very, very significant
> 
> ## CUDA-X: Accelerated Computing Engine for Various Industries
> 
> NVIDIA has deeply laid out in various verticals:
> 
> -   Autonomous Driving: Wide coverage and far-reaching impact
>     
> -   Financial Services: Quantitative investment is shifting from manual feature engineering to supercomputer-driven deep learning, ushering in its "Transformer moment"
>     
> -   Healthcare: It is 迎来 its own "ChatGPT moment," covering AI-assisted drug discovery, AI agent-supported diagnosis, medical customer service, and more
>     
> -   Industry: The largest construction wave in the world is unfolding, with AI factories, chip factories, and data center factories being established
>     
> -   Entertainment and Gaming: Real-time AI platforms support translation, live streaming, game interaction, and smart shopping agents
>     
> -   Robotics: With over a decade of deep cultivation, three major computing architectures (training computers, simulation computers, onboard computers) are in place, with 110 robots showcased at this exhibition
>     
> -   Telecommunications: An industry with a scale of about $2 trillion, base stations will evolve from single communication functions to AI infrastructure platforms, with the related platform named Aerial, having deep cooperation with companies like Nokia and T-Mobile
>     
> 
> The core of all these fields is our CUDA-X library—this is the fundamental aspect of NVIDIA as an algorithm company. These libraries are the company's most core assets, enabling computing platforms to realize practical value across various industries.
> 
> One of the most important libraries is cuDNN (CUDA Deep Neural Network Library), which has completely revolutionized artificial intelligence and triggered the modern AI explosion.
> 
> (Play CUDA-X demonstration video)
> 
> Everything you just saw is simulation—including physics-based solvers, AI agent physical models, and physical AI robot models. Everything is simulated, with no manual animation or joint binding. This is precisely where NVIDIA's core capability lies: unlocking these opportunities through a profound understanding of algorithms and an organic combination with computing platforms.
> 
> ## AI Native Enterprises and the New Computing Era
> 
> You just saw industry giants defining today's society, such as Walmart, L'Oréal, JP Morgan, Roche, and Toyota, as well as a large number of companies you may have never heard of—we call them AI native enterprises. This list is extremely large, including OpenAI, Anthropic, and many emerging companies serving different verticals.
> 
> In the past two years, this industry has experienced an astonishing takeoff. The amount of venture capital flowing into startups reached $150 billion, a record in human history. More importantly, the size of individual investments has jumped from millions of dollars to hundreds of millions and even billions of dollars for the first time. The reason is simple: for the first time in history, every such company requires massive computing resources and a large number of tokens. This industry is creating, generating tokens, or adding value to tokens from institutions like Anthropic and OpenAI
> 
> Just as the PC revolution, internet revolution, and mobile cloud revolution each gave birth to a number of epoch-making companies, this generation of computing platform transformation will also produce a batch of highly influential companies, becoming an important force in the future world.
> 
> ## Three Historic Breakthroughs Driving All This
> 
> What has happened in the past two years? Three major events.
> 
> First: ChatGPT, ushering in the era of generative AI (end of 2022 to 2023)
> 
> It can not only perceive and understand but also generate unique content. I demonstrated the integration of generative AI with computer graphics. Generative AI fundamentally changes the way computing works—computing has shifted from retrieval-based to generation-based, profoundly impacting computer architecture, deployment methods, and overall significance.
> 
> Second: Reasoning AI, represented by o1
> 
> Reasoning capabilities enable AI to self-reflect, plan, and decompose problems—breaking down issues it cannot directly understand into manageable steps. o1 makes generative AI trustworthy, capable of reasoning based on real information. To achieve this, the amount of input context tokens and output tokens used for thinking has significantly increased, leading to a substantial rise in computational demands.
> 
> Third: Claude Code, the first intelligent agent model
> 
> It can read files, write code, compile, test, evaluate, and iterate. Claude Code has completely revolutionized software engineering—100% of NVIDIA's engineers are using one or more of Claude Code, Codex, and Cursor, with not a single software engineer working without the assistance of AI.
> 
> This marks a new turning point—you are no longer asking AI "what is it, where is it, how to do it," but rather letting it "create, execute, build," allowing it to actively use tools, read files, decompose problems, and take action. AI has evolved from perception to generation, to reasoning, and now truly capable of completing tasks.
> 
> In the past two years, the computational demand for reasoning has increased by about 10,000 times, and usage has grown by about 100 times. I have always believed that the computational demand has grown by a million times over the past two years—this is a shared feeling among everyone, including OpenAI and Anthropic. If more computing power can be obtained, more tokens can be generated, revenues will increase, and AI will become smarter. The reasoning turning point has already arrived.
> 
> ## The Trillion-Dollar Era of AI Infrastructure
> 
> At this time last year, I expressed high confidence in the demand and purchase orders from Blackwell and Rubin before 2026, estimated at about $500 billion. Today, one year after GTC, I stand here to tell you: looking ahead to 2027, the number I see is at least $1 trillion. Moreover, I am confident that the actual computational demand will be far beyond this.
> 
> ## 2025: The Year of Inference for NVIDIA
> 
> 2025 will be the Year of Inference for NVIDIA. We hope to ensure excellence not only in training and post-training but also at every stage of the AI lifecycle, allowing the invested infrastructure to operate efficiently over the long term, with longer effective lifespans and lower unit costs
> 
> At the same time, Anthropic and Meta officially joined the NVIDIA platform, together representing one-third of the global AI computing power demand. Open-source models are approaching the cutting edge and are ubiquitous.
> 
> NVIDIA is currently the only platform in the world capable of running all AI models across all fields—language, biology, computer graphics, computer vision, speech, proteins and chemistry, robotics, etc.—whether at the edge or in the cloud, regardless of the language. The NVIDIA architecture is versatile for all these scenarios, making us the lowest-cost and highest-confidence platform.
> 
> Currently, 60% of NVIDIA's business comes from the top five hyperscale cloud service providers globally, with the remaining 40% spread across regional clouds, sovereign clouds, enterprises, industries, robotics, edge computing, and other fields. The breadth of AI coverage itself is its resilience—this is undoubtedly a transformative shift in computing platforms.
> 
> ## Grace Blackwell and NVLink 72: Bold Architectural Innovation
> 
> While the Hopper architecture was still at its peak, we decided to completely re-architect the system, expanding NVLink from 8-way to NVLink 72, thoroughly decomposing and reconstructing the computing system. Grace Blackwell NVLink 72 is a significant technological bet, not easy for all partners, and I sincerely thank everyone for their support.
> 
> At the same time, we launched NVFP4—not just an ordinary FP4, but a brand new type of tensor core and computing unit. We have demonstrated that NVFP4 can achieve inference without any loss of precision while delivering significant performance and energy efficiency improvements, and it is also applicable for training. Additionally, a series of new algorithms such as Dynamo and TensorRT-LLM have emerged, and we even invested billions of dollars to build a supercomputer specifically for optimizing kernels, called DGX Cloud.
> 
> The results show that our inference performance is remarkable. Data from Semi Analysis—so far the most comprehensive AI inference performance evaluation—shows that NVIDIA leads significantly in both the number of tokens per watt and the cost per token. Originally, Moore's Law might have given the H200 a 1.5 times performance boost, but we achieved 35 times. Dylan Patel from Semi Analysis even said, "Jensen Huang was conservative; it's actually 50 times." He was right.
> 
> I quote him: "Jensen sandbagged."
> 
> NVIDIA's cost per token is the lowest in the world, currently unmatched. The reason lies in extreme co-design.
> 
> Take Fireworks as an example; before NVIDIA updated the entire suite of software and algorithms, its average token speed was about 700 tokens per second; after the update, it approached 5,000 tokens per second, an increase of about 7 times. This is the power of extreme co-design
> 
> ## AI Factory: From Data Center to Token Factory
> 
> Data centers were once places for storing files, but now they are factories for producing tokens. In the future, every cloud service provider and AI company will use "token factory efficiency" as a core operational metric.
> 
> This is my core argument:
> 
> -   Vertical Axis: Throughput — the number of tokens generated per second at fixed power
>     
> -   Horizontal Axis: Token Speed — the response speed of each inference; the faster the speed, the larger the usable model and the longer the context, making AI smarter
>     
> 
> Tokens are the new commodity, and once mature, they will be priced in tiers:
> 
> -   Free Tier (high throughput, low speed)
>     
> -   Mid Tier (~$3 per million tokens)
>     
> -   High Tier (~$6 per million tokens)
>     
> -   High-Speed Tier (~$45 per million tokens)
>     
> -   Ultra High-Speed Tier (~$150 per million tokens)
>     
> 
> Compared to Hopper, Grace Blackwell has increased throughput by 35 times at the highest value tier and introduced a new tier. With a simplified model estimate, allocating 25% power to each of the four tiers, Grace Blackwell can generate 5 times more revenue than Hopper.
> 
> ## Vera Rubin: Next-Generation AI Computing System
> 
> (Play Vera Rubin system introduction video)
> 
> Vera Rubin is a complete, end-to-end optimized system designed for agentic workloads:
> 
> -   Large Language Model Computing Core: NVLink 72 GPU cluster, handling prefill and KV Cache
>     
> -   New Vera CPU: Designed for extremely high single-thread performance, using LPDDR5 memory, with excellent energy efficiency; it is the world's only data center CPU using LPDDR5, suitable for AI agent tool calls
>     
> -   Storage System: BlueField 4 + CX 9, a new storage platform for the AI era, with 100% participation from the global storage industry
>     
> -   CPO Spectrum X Switch: The world's first co-packaged optical Ethernet switch, now in full production
>     
> -   Kyber Rack: A new rack system supporting 144 GPUs to form a single NVLink domain, with front-end computing and back-end NVLink switching, creating a giant computer
>     
> -   Rubin Ultra: Next-generation supercomputing node, vertical design, paired with Kyber rack, supporting larger scale NVLink interconnections
>     
> 
> Vera Rubin is 100% liquid-cooled, reducing installation time from two days to two hours, using 45°C hot water cooling, significantly alleviating cooling pressure in data centers. This time, Satya (Nadella) has confirmed in a post that the first Vera Rubin rack is now operational on Microsoft Azure, which I find very exciting
> 
> ## Groq Integration: The Ultimate Extension of Inference Performance
> 
> We have acquired the Groq team and obtained its technology licensing. Groq is a Deterministic Dataflow Processor that utilizes static compilation and compiler scheduling, featuring a large amount of SRAM, optimized specifically for single workload inference, with extremely low latency and high token generation speed.
> 
> However, Groq has limited memory capacity (500MB on-chip SRAM), making it difficult to independently carry large model parameters and KV Cache, which restricts its large-scale application.
> 
> The solution is Dynamo—a set of inference scheduling software. We disaggregate the inference pipeline through Dynamo:
> 
> -   **Prefill and Attention Mechanism Decoding** is completed on Vera Rubin (requiring substantial computing power and KV Cache storage)
>     
> -   **Feed-Forward Network Decoding**, which is the token generation part, is completed on Groq (requiring extremely high bandwidth and low latency)
>     
> 
> The two are tightly coupled via Ethernet, reducing latency by about half through a special mode. Under the unified scheduling of Dynamo, this "AI factory operating system," overall performance improves by 35 times, opening up a new level of inference performance previously unreachable by NVLink 72.
> 
> Recommendations for the combination of Groq and Vera Rubin:
> 
> -   If the workload is primarily high throughput, use 100% Vera Rubin
>     
> -   If a large number of workloads involve high-value token generation such as code generation, Groq can be introduced, with a suggested ratio of about 25% Groq + 75% Vera Rubin
>     
> 
> Groq LP30 is being manufactured by Samsung and has entered mass production, with shipments expected to begin in Q3. Thanks to Samsung for their full cooperation.
> 
> ## A Historic Leap in Inference Performance
> 
> Quantifying previous technological advancements: within 2 years, the token generation rate of a 1-gigawatt AI factory will increase from 22 million tokens/second to 700 million tokens/second, a 350-fold increase. This is the power of extreme collaborative design.
> 
> ## Technical Roadmap
> 
> -   Blackwell: Currently in production, Oberon standard rack system, copper cable expanded to NVLink 72, optional optical expansion to NVLink 576
>     
> -   Vera Rubin (current): Kyber rack, NVLink 144 (copper cable); Oberon rack, NVLink 72 + optical, expanded to NVLink 576; Spectrum 6, the world's first CPO switch
>     
> -   Vera Rubin Ultra (coming soon): Next-generation Rubin Ultra GPU, LP35 chip (first integration of NVFP4), further enhancing performance by several times.
>     
> -   Feynman (next generation): Brand new GPU, LP40 chip (jointly developed by NVIDIA and the Groq team, integrating NVFP4); new CPU - Rosa (Rosalyn); BlueField 5; CX 10; Kyber rack supporting both copper cable and CPO expansion methods.
>     
> 
> The roadmap is clear: three parallel paths of copper cable expansion, optical expansion (Scale-Up), and optical expansion (Scale-Out). We need all partners to continuously expand production in copper cables, optical fibers, and CPO.
> 
> ## NVIDIA DSX: Digital Twin Platform for AI Factories
> 
> AI factories are becoming increasingly complex, but the various technology suppliers that make them up have never collaborated during the design phase, only "meeting" in the data center—this is clearly insufficient.
> 
> To address this, we created Omniverse, and the NVIDIA DSX platform based on it—a platform for all partners to collaboratively design and operate gigawatt-level AI factories in the virtual world. DSX provides:
> 
> -   Rack-level mechanical, thermal, electrical, and network simulation systems.
>     
> -   Connection to the power grid for collaborative energy-saving scheduling.
>     
> -   Dynamic power consumption and cooling optimization based on Max-Q within the data center.
>     
> 
> Conservatively, this system can improve energy utilization efficiency by about 2 times, which is a significant benefit at the scale we are discussing. Omniverse starts from the digital earth and will support digital twins of various scales; we are building the largest computer in human history in collaboration with global partners.
> 
> Additionally, NVIDIA is venturing into space. The Thor chip has passed radiation certification and is operating in satellites. We are developing Vera Rubin Space-1 with partners for building space data centers. In space, we can only rely on radiation for heat dissipation, making thermal management a core challenge, and we are gathering top engineers to tackle it.
> 
> ## OpenClaw: Operating System for the Age of Intelligent Agents
> 
> Peter Steinberger has developed a software called OpenClaw. This is the most popular open-source project in human history, surpassing the achievements of Linux over thirty years in just a few weeks.
> 
> OpenClaw is essentially an Agentic System that can:
> 
> -   Manage resources, access tools, file systems, and large language models.
>     
> -   Execute scheduling and timed tasks.
>     
> -   Gradually decompose problems and invoke sub-agents.
>     
> -   Support arbitrary modalities of input and output (voice, video, text, email, etc.)
>     
> 
> Using the syntax of operating systems, it is indeed an operating system—the operating system of intelligent agent computers. Windows made personal computers possible, and OpenClaw makes personal agents possible.
> 
> Every enterprise needs to formulate its own OpenClaw strategy, just as we all need Linux strategies, HTML strategies, and Kubernetes strategies.
> 
> ## Comprehensive Restructuring of Enterprise IT
> 
> Enterprise IT before OpenClaw: Data and files enter the system, flow through tools and workflows, and ultimately become tools for human use. Software companies create tools, and system integrators (GSI) and consulting firms help enterprises use these tools.
> 
> Enterprise IT after OpenClaw: Every SaaS company will transform into an AaaS (Agentic as a Service) company—not just providing tools, but offering AI agents specialized in specific fields.
> 
> But there is a key challenge: internal agents can access sensitive data, execute code, and communicate externally. This must be strictly controlled in the enterprise environment.
> 
> To address this, we collaborated with Peter to integrate security into the enterprise version, launching:
> 
> -   NeMo Claw (Reference Design): An enterprise-level reference framework based on OpenClaw, integrating NVIDIA's full suite of agent AI toolkits.
>     
> -   Open Shield (Security Layer): Integrated into OpenClaw, providing policy engines, network barriers, and privacy routing to ensure enterprise data security.
>     
> -   NeMo Cloud: Available for download and integrates with the policy engines of all SaaS companies.
>     
> 
> This is a renaissance in enterprise IT, an industry originally worth $2 trillion, set to grow into a multi-trillion dollar scale, shifting from providing tools to offering specialized AI agent services.
> 
> I can fully foresee that in the future, every engineer in a company will have an annual token budget. Their annual salary may be hundreds of thousands of dollars, and I will additionally provide them with a token allocation equivalent to half their salary, allowing their output to multiply by ten. "How many token allocations come with the job" has become a new hiring topic in Silicon Valley.
> 
> Every enterprise will be both a user of tokens (for engineers) and a producer of tokens (providing services to its customers) in the future. The significance of OpenClaw cannot be underestimated; it is as important as HTML and Linux.
> 
> ## NVIDIA Open Model Initiative
> 
> In terms of custom agents (Custom Claw), we provide NVIDIA's self-developed cutting-edge models:
> 
> Model Domain Nemotron Large Language Model Cosmos World Foundation Model GROOT General Humanoid Robot Model Alpamayo Autonomous Driving BioNeMo Digital Biology Phys-AIAI Physics
> 
> We are at the forefront of technology in every field and are committed to continuous iteration—after Nemotron 3 comes Nemotron 4, after Cosmos 1 comes Cosmos 2, and Groq will also iterate to its second generation.
> 
> Nemotron 3 ranks among the top three best models globally in OpenClaw, at the cutting edge level. Nemotron 3 Ultra will become the strongest foundational model ever, supporting countries in building sovereign AI.
> 
> Today, we announce the establishment of the Nemotron Alliance, investing billions of dollars to advance the research and development of AI foundational models. Alliance members include: BlackForest Labs, Cursor, LangChain, Mistral, Perplexity, Reflection, Sarvam (India), Thinking Machines (Mira Murati's lab), and others. One after another, enterprise software companies are joining, integrating the NeMo Claw reference design and NVIDIA's agent AI toolkit into their products.
> 
> ## Physical AI and Robotics
> 
> Digital agents operate in the digital world—writing code, analyzing data; while physical AI refers to embodied agents, that is, robots.
> 
> This GTC features 110 robots, almost encompassing all robot R&D companies globally. NVIDIA provides three computers (training computer, simulation computer, onboard computer) along with a complete software stack and AI models.
> 
> In terms of autonomous driving, the "ChatGPT moment" for self-driving has arrived. Today, we announce four new partners joining NVIDIA's RoboTaxi Ready platform: BYD, Hyundai, Nissan, and Geely, with a total annual production of 18 million vehicles. Along with previous partners Mercedes-Benz, Toyota, and General Motors, the lineup has further strengthened. We also announce a significant collaboration with Uber to deploy and integrate RoboTaxi Ready vehicles in multiple cities.
> 
> In the field of industrial robotics, numerous companies such as ABB, Universal Robotics, and KUKA are collaborating with us to combine physical AI models with simulation systems, promoting the implementation of robots in global manufacturing lines.
> 
> In telecommunications, Caterpillar and T-Mobile are also included. In the future, wireless base stations will no longer just be communication nodes but will become an NVIDIA Aerial AI RAN—an intelligent edge computing platform capable of real-time traffic perception and beamforming adjustments to achieve energy savings and efficiency improvements.
> 
> ## Special Segment: Olaf Robot Debut
> 
> (Play Disney Olaf robot demonstration video)
> 
> Jensen Huang: The snowman is here! Newton is running smoothly! Omniverse is also running smoothly! Olaf, how are you?
> 
> Olaf: I'm really happy to see you.
> 
> Jensen Huang: Yes, because I gave you the computer—Jetson!
> 
> Olaf: What is that?
> 
> Jensen Huang: It's inside your belly.
> 
> Olaf: That's amazing.
> 
> Jensen Huang: You learned to walk in the Omniverse.
> 
> Olaf: I love walking. It's so much better than riding a reindeer and looking up at the beautiful sky.
> 
> Jensen Huang: That's precisely because of physical simulation—Newton solvers running on NVIDIA Warp, developed in collaboration with Disney and DeepMind, allowing you to adapt to the real physical world.
> 
> Olaf: I was just about to say that.
> 
> Jensen Huang: That's where you're smart. I'm a snowman, not a snowball.
> 
> Jensen Huang: Can you imagine? The future Disneyland—all these robotic characters wandering freely in the park. But to be honest, I thought you would be taller. I've never seen such a short snowman.
> 
> Olaf: (noncommittal)
> 
> Jensen Huang: Can you help me wrap up today's speech?
> 
> Olaf: Awesome!
> 
> ## Keynote Summary
> 
> **Jensen Huang: Today, we explored the following core themes together:**
> 
> 1.  The arrival of the inference inflection point: Inference has become the core workload of AI, tokens are the new commodity, and inference performance directly determines revenue.
>     
> 2.  The era of AI factories: Data centers have evolved from file storage facilities to token production factories, and in the future, every company will measure its competitiveness by "AI factory efficiency."
>     
> 3.  The OpenClaw agent revolution: OpenClaw has ushered in the era of agent computing, enterprise IT is transitioning from the tool era to the agent era, and every company needs to formulate an OpenClaw strategy.
>     
> 4.  Physical AI and robotics: Embodied intelligence is scaling up, with autonomous driving, industrial robots, and humanoid robots together forming the next major opportunity for physical AI.
>     
> 
> Thank you all, enjoy GTC!

### 相关股票

- [NVIDIA (NVDA.US)](https://longbridge.com/zh-CN/quote/NVDA.US.md)
- [Direxion Semicon Bull 3X (SOXL.US)](https://longbridge.com/zh-CN/quote/SOXL.US.md)
- [iShares Semiconductor ETF (SOXX.US)](https://longbridge.com/zh-CN/quote/SOXX.US.md)
- [XL2CSOPNVDA (07788.HK)](https://longbridge.com/zh-CN/quote/07788.HK.md)
- [GraniteShares 2x Long NVDA Daily ETF (NVDL.US)](https://longbridge.com/zh-CN/quote/NVDL.US.md)
- [Spdr Select Tech (XLK.US)](https://longbridge.com/zh-CN/quote/XLK.US.md)
- [SPDR S&P Software (XSW.US)](https://longbridge.com/zh-CN/quote/XSW.US.md)
- [YieldMax NVDA Option Income Strategy ETF (NVDY.US)](https://longbridge.com/zh-CN/quote/NVDY.US.md)
- [Direxion Daily NVDA Bull 2X Shares (NVDU.US)](https://longbridge.com/zh-CN/quote/NVDU.US.md)
- [iShares Expanded Tech Software Sector ETF (IGV.US)](https://longbridge.com/zh-CN/quote/IGV.US.md)
- [VanEck Semiconductor ETF (SMH.US)](https://longbridge.com/zh-CN/quote/SMH.US.md)
- [T-Rex 2X Long NVIDIA Daily Target ETF (NVDX.US)](https://longbridge.com/zh-CN/quote/NVDX.US.md)
- [XI2CSOPNVDA (07388.HK)](https://longbridge.com/zh-CN/quote/07388.HK.md)
- [Direxion Daily NVDA Bear 1X ETF (NVDD.US)](https://longbridge.com/zh-CN/quote/NVDD.US.md)
- [T-Rex 2X Inverse NVIDIA Daily Target ETF (NVDQ.US)](https://longbridge.com/zh-CN/quote/NVDQ.US.md)

## 相关资讯与研究

- [The Brutal Math Of Nvidia's $1 Trillion Target: A 3x Quarterly Revenue Surge In Under 2 Years](https://longbridge.com/zh-CN/news/279462250.md)
- [NVIDIA and Global Industrial Software Giants Bring Design, Engineering and Manufacturing Into the AI Era | NVDA Stock News](https://longbridge.com/zh-CN/news/279321742.md)
- [NVIDIA Ignites the Next Industrial Revolution in Knowledge Work With Open Agent Development Platform | NVDA Stock News](https://longbridge.com/zh-CN/news/279318873.md)
- [How Investors Are Reacting To Cadence Design Systems (CDNS) Deepening NVIDIA Agentic AI Design Partnership](https://longbridge.com/zh-CN/news/279409458.md)
- ['This Changes the AI Cost Game,' Says Morgan Stanley on Nvidia Stock (NVDA)](https://longbridge.com/zh-CN/news/279430541.md)