---
title: "Jensen Huang GTC Interview: Low-latency inference will become the next explosive engine of the AI economy, and the supply-demand balance of power chips will continue for a long time"
type: "News"
locale: "zh-HK"
url: "https://longbridge.com/zh-HK/news/279427406.md"
description: "The improvement of AI reasoning capabilities is shifting models from \"generating information\" to \"executing tasks,\" beginning to generate real economic value for the first time. Low-latency reasoning will become a new commercial payment engine. On the supply side, power, chips, and data center construction are almost all lacking redundancy, and tight balance may become a longer-term industry backdrop"
datetime: "2026-03-17T12:05:45.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/279427406.md)
  - [en](https://longbridge.com/en/news/279427406.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/279427406.md)
---

> 支持的語言: [简体中文](https://longbridge.com/zh-CN/news/279427406.md) | [English](https://longbridge.com/en/news/279427406.md)


# Jensen Huang GTC Interview: Low-latency inference will become the next explosive engine of the AI economy, and the supply-demand balance of power chips will continue for a long time

AI is moving from "generating information" to "executing tasks," with **low-latency, high-throughput inference scenarios represented by coding agents** opening the next important phase of AI infrastructure commercialization. On the supply side, power, chips, and data center construction are almost all lacking redundancy, and **tight balance may become a longer-term industry backdrop.**

After the keynote speech at GTC 2026, NVIDIA CEO Jensen Huang accepted an exclusive interview with Ben Thompson, founder of Stratechery, where he expressed systematic views on core topics such as the AI inference economy, CPU strategy, the logic behind acquiring Groq, and supply chain tensions.

In the interview, Huang pointed out that AI has crossed a critical threshold in the past year—**the enhancement of inference capabilities has allowed models to begin generating real economic value for the first time**, and the explosion of programming agents is the clearest manifestation of this shift. NVIDIA has officially incorporated ultra-high-speed, low-latency inference into its product portfolio.

On the supply side, Huang candidly stated that **“almost every link is tight,”** and it is difficult to easily double the supply of power or chips. Although NVIDIA claims its supply chain has been planned for "this year and next," he hopes that "land, power, and data centers" can be established more quickly, which will directly affect the pace of computing power expansion and the path to capital expenditure realization.

## **Inference Economy: Low Latency Becomes the Next Payment Engine**

Huang attributes the core breakthrough in AI development over the past year to the maturity of "inference" capabilities. He stated that generative AI was difficult to commercialize early on due to hallucination issues, while the introduction of inference capabilities allows models to achieve "grounding" through reflection, retrieval, and search, thus elevating from providing information to truly completing tasks.

"Search is a service that no one pays for because the barrier to obtaining information is not high enough to make people spend money," Huang said. "We have now crossed that threshold—AI can not only converse with people but also do tasks for them."

Programming is the most typical example he cited. He pointed out that code generation is not an ordinary language modality; it requires the model to reflect on and validate the execution results of code blocks. The maturity of this capability allows engineers to shift their focus from writing code line by line to architecture and specification design.

He revealed that NVIDIA's internal software engineers are 100% using programming agents, "Many people haven't personally written a line of code in a while, but their productivity is extremely high."

Based on this judgment, NVIDIA decided to incorporate low-latency inference capabilities into its product line. Huang explained that existing GPU systems have an inherent tension between maximizing throughput and maximizing the quality of intelligent tokens, **and for high-value programming agent users, they are willing to pay a premium for a 10-fold increase in token generation speed.**

> "If Anthropic launches a Claude Code service layer that increases programming speed by 10 times, I would pay for it, no doubt. I am building this product for myself."

## **Acquisition of Groq: Strategic Layout to Deconstruct the Inference Pipeline**

NVIDIA's decision to acquire Groq, in Jensen Huang's view, is not a sudden move but a natural extension of its years of investment in the inference infrastructure field.

He stated that NVIDIA began considering how to break down inference processes more granularly on heterogeneous infrastructure when it released the Dynamo inference scheduling framework a year ago. The collaboration with Groq started about six months before the acquisition announcement. The core of this transaction is to acquire the Groq team and technology licensing, rather than its cloud service business.

On a technical level, NVIDIA will extend the inference pipeline breakdown to the internal decoding stage, with the Vera Rubin GPU handling high FLOP attention calculations, while Groq's LPU architecture will take on parts that require extremely high token rates and very low latency. Related products are planned to be launched within this year.

He said:

> "But if your business is similar to Anthropic or OpenAI, Codex **is generating real economic value**, and you want to generate more tokens, then joining **this accelerator can significantly boost revenue**."

He also acknowledged that this solution is not suitable for all customers. For platforms primarily consisting of free users with low paid conversion rates, introducing Groq would increase costs and complexity, which is not cost-effective.

Jensen Huang compared Groq to the previous acquisition of Mellanox—both represent NVIDIA's consistent logic of incorporating external specialized architectures into its computing stack to achieve system-level collaborative optimization. "NVIDIA is an accelerated computing company, not a GPU company; we are not fixated on where the computation happens; we just want to accelerate applications."

## **CPU Strategy: Redefining Server Architecture for the AI Agent Era**

Against the backdrop of NVIDIA being long positioned as a GPU company, Huang systematically elaborated on the logic behind NVIDIA's entry into the CPU market and explained the design philosophy of its self-developed Vera CPU.

He pointed out that the design orientation of CPUs over the past decade has been optimized for hyperscale cloud computing—with the goal of maximizing the number of rentable cores, while single-thread performance has not been a priority. However, in AI agent scenarios, the **single-thread performance of the CPU directly determines the overall efficiency of the system while the GPU waits for tool invocation results.** "You can never let the GPU time idle," he said.

The core differentiation of the Vera CPU lies in memory bandwidth and I/O bandwidth: **its bandwidth per CPU core is three times that of any current CPU, designed specifically to avoid bottlenecks that would slow down the GPU due to I/O.** He also introduced the collaboration with Intel on NVLink to meet the enterprise computing market's demand for continuity in the x86 ecosystem.

Huang categorized the tool usage for AI agents into two types: one is structured tools, including CLI, API, and database queries; the other is unstructured tools, including PC applications that require models to operate web interfaces through multimodal perception. NVIDIA has laid out plans in both paths.

## **Supply Tight Balance: Both Power and Chip Capacity Are in Crisis**

In response to the market's ongoing concerns about AI computing power supply, Jensen Huang provided the most direct judgment to date: **Both electricity and chip production capacity are in a tight balance, and there is no room for doubling in the short term.**

"I don't believe we have twice the electricity demand, nor do we have twice the chip supply; there is no redundancy in any aspect," he said. "But from what I see in the current outlook, our supply chain can support it."

He stated that NVIDIA has about 200 long-term partners in its supply chain and has proactively planned upstream and downstream, holding an optimistic view on large-scale growth in the next two years.

However, he admitted that **the biggest bottleneck may not be the chips themselves, but rather the speed of land, electricity, and construction for data centers.** "What I hope the most is that these infrastructures can be completed faster."

When asked if NVIDIA is the biggest beneficiary of the scarcity of computing power, Huang acknowledged that the company is the largest and has the most prepared supply chain, but attributed this to long-term planning rather than a coincidental market advantage

### 相關股票

- [SPDR S&P Software (XSW.US)](https://longbridge.com/zh-HK/quote/XSW.US.md)
- [GraniteShares 2x Long NVDA Daily ETF (NVDL.US)](https://longbridge.com/zh-HK/quote/NVDL.US.md)
- [Spdr Select Tech (XLK.US)](https://longbridge.com/zh-HK/quote/XLK.US.md)
- [Invesco S&P 500 Equal Weight Tech ETF (RSPT.US)](https://longbridge.com/zh-HK/quote/RSPT.US.md)
- [Leverage Shares 2X Long TSM Daily ETF (TSMG.US)](https://longbridge.com/zh-HK/quote/TSMG.US.md)
- [Taiwan Semiconductor (TSM.US)](https://longbridge.com/zh-HK/quote/TSM.US.md)
- [GraniteShares 2x Long TSM Daily ETF (TSMU.US)](https://longbridge.com/zh-HK/quote/TSMU.US.md)
- [Samsung Electronics  (SSNGY.US)](https://longbridge.com/zh-HK/quote/SSNGY.US.md)
- [iShares Expanded Tech Software Sector ETF (IGV.US)](https://longbridge.com/zh-HK/quote/IGV.US.md)
- [Direxion Daily TSM Bull 2X Shares (TSMX.US)](https://longbridge.com/zh-HK/quote/TSMX.US.md)
- [Direxion Semicon Bull 3X (SOXL.US)](https://longbridge.com/zh-HK/quote/SOXL.US.md)
- [T-Rex 2X Long NVIDIA Daily Target ETF (NVDX.US)](https://longbridge.com/zh-HK/quote/NVDX.US.md)
- [Direxion Daily NVDA Bull 2X Shares (NVDU.US)](https://longbridge.com/zh-HK/quote/NVDU.US.md)
- [YieldMax NVDA Option Income Strategy ETF (NVDY.US)](https://longbridge.com/zh-HK/quote/NVDY.US.md)
- [T-Rex 2X Inverse NVIDIA Daily Target ETF (NVDQ.US)](https://longbridge.com/zh-HK/quote/NVDQ.US.md)
- [AXS 1.5X NVDA Bear Daily ETF (NVDS.US)](https://longbridge.com/zh-HK/quote/NVDS.US.md)
- [NVIDIA (NVDA.US)](https://longbridge.com/zh-HK/quote/NVDA.US.md)
- [iShares Semiconductor ETF (SOXX.US)](https://longbridge.com/zh-HK/quote/SOXX.US.md)
- [VanEck Semiconductor ETF (SMH.US)](https://longbridge.com/zh-HK/quote/SMH.US.md)
- [SPDR S&P Semicon (XSD.US)](https://longbridge.com/zh-HK/quote/XSD.US.md)

## 相關資訊與研究

- [Jensen Huang's GTC Speech: The Era of Reasoning Has Arrived; Lobsters Are the New Operating Systems](https://longbridge.com/zh-HK/news/279356941.md)
- [Aixia, Evroc And Opper Partner To Launch Nordic AI Platform Initiative](https://longbridge.com/zh-HK/news/279371840.md)
- [KX Launches Agentic AI Blueprints Powered by NVIDIA at GTC 2026, Featuring a Capital Markets Research Assistant and Trading Signal Agent](https://longbridge.com/zh-HK/news/278716439.md)
- [Samsung Elec showcases Nvidia's new inference chip made using 4 nanometer process](https://longbridge.com/zh-HK/news/279332138.md)
- [Synopsys Showcases NVIDIA Partnership Impact and Ecosystem Innovation at GTC 2026 | SNPS Stock News](https://longbridge.com/zh-HK/news/279319543.md)