Dolphin Research
2026.01.07 09:31

Jensen Huang unveils Rubin, fueling AI storage.

portai
I'm PortAI, I can summarize articles.

With AI demand driving a new 'super' cycle in memory, the group rallied again overnight ($SentinelOne(S.US)andisk.US +27%, $美光科技.US +10%). The immediate catalyst was Jensen Huang's CES 2026 keynote, which poured more fuel on an already hot memory market.

Source:Longport.app

First, a quick recap: Jensen Huang's CES 2026 focus had two pillars — the expansive outlook and deployment of physical AI, and progress on the Rubin architecture. The latest surge in memory names stems from Rubin's materially larger memory footprint.

Overall: $NVIDIA(NVDA.US) Rubin adds a new 'storage & security layer' atop the compute and networking layers. Downstream demand growth will likely worsen the existing supply-demand imbalance in memory.

Rubin's impact on memory, by component: ① HBM in line. Upgrading to HBM4 with unchanged capacity per GPU. ② DDR slightly better than expected. DDR per CPU triples, implying roughly 0.5TB DDR per GPU in 2026 on a blended basis. ③ NAND the clear positive surprise. The new inference context memory system (ICMS) uses NAND as quasi-attached memory to offload HBM pressure.

Net-net post-CES 2026: HBM demand expectations are unchanged. DDR demand rises by ~1EB, widening the supply gap by ~1ppt. NAND demand jumps by ~45EB directly, expanding the supply gap by ~4–5ppts.

Given NAND is the biggest beneficiary, price action reflects that. Sandisk and Kioxia outperformed, while SK hynix and Samsung lagged.

Below are Dolphin Research's key takeaways on Rubin and Jensen Huang's CES 2026 remarks:

I. Rubin architecture and memory demand

From Blackwell to Rubin, HBM remains the core of AI servers and will not be replaced by the added NAND layer. As inference models scale, HBM faces capacity pressure, which is partly alleviated by moving portions of the KV cache from HBM to BlueField-4 plus NAND.

Jensen noted a nomenclature shift from NVL144 back to NVL72, but the configuration is effectively the same. Previously 144 counted dies, while 72 now counts GPUs (1 GPU = 2 dies). Accordingly, Dolphin Research estimates needs using the NVL72 convention below.

1.1 HBM: non-substitutable, mission-critical

Versus Blackwell, Rubin adopts HBM4. While HBM4 bandwidth per GPU could reach ~22TB/s (2.8x HBM3E), capacity per GPU remains 288GB, same as HBM3E. Growth thus comes from GPU shipments, in line with market expectations.

1.2 DDR: system control, data pre-processing, context management on CPU

Versus Grace, a single Vera CPU will use ~1.5TB of DDR (3x Grace). At the NVL72 level (72 GPUs + 36 CPUs), that implies ~54TB DDR per system, about 3x Blackwell's ~18TB.

Based on expectations that NVDA's 2026 CoWoS split is roughly B300 vs. Rubin at ~1:1, the blended DDR per NVL72 is ~36TB. This translates to ~0.5TB DDR per GPU, modestly above prior market assumptions.

Tying to current 2026 CoWoS volume expectations, Dolphin Research estimates AI servers will require ~9.2EB of traditional DDR in 2026 (+250% YoY). That is ~1EB higher than the prior consensus.

Specifically, with higher CoWoS output and more DDR per system, AI servers add ~6.6EB of DDR demand in 2026. That equals ~15–20% of total DRAM supply, crowding out PC and smartphone allocations.

1.3 NAND: system boot and model weight pre-load, plus Rubin's new ICMS

Under Blackwell, NAND primarily handled system boot and pre-loading model weights, with ~500–1,200TB per NVL72 (midpoint ~850TB). In Rubin, NVIDIA adds the ICMS for inference context, which is a positive surprise.

The dedicated context store offloads KV cache from HBM to a more cost-effective medium, freeing HBM bandwidth for compute. This is one of the core innovations enabling ~90% lower inference cost.

Each Rubin GPU can attach an extra 16TB of NAND as quasi-memory, adding ~1,152TB per NVL72. Since Rubin still needs ~850TB of NVMe SSD, total NAND per NVL72 is ~2,000TB.

The market currently models ~350k CoWoS units for Rubin in 2026, translating to ~39k NVL72 racks. With the incremental 1,152TB NAND per rack, Rubin lifts 2026 NAND demand by another ~44.8EB post-keynote. That equals ~4–5% of 2025 global NAND supply, widening the supply gap and stoking an already tight market.

II. CES 2026: NVDA pivots from GPU vendor to full-stack AI infra provider

NVDA CEO Jensen Huang centered his CES 2026 keynote on physical AI and the Vera Rubin platform. Those are the two main thrusts.

2.1 Physical AI — the 'ChatGPT moment'

AI evolution: Perception AI → GenAI → agents → physical AI. Definition: teaching AI the laws of physics such as gravity, inertia, and causality, enabling reasoning, planning, action, and explanation in the real world.

These are no longer pre-scripted programs but agents that 'think' in real time. Tech underpinnings: (i) synthetic data rooted in physics as ground truth, addressing training data scarcity; (ii) three cooperating computers — GPUs for training, a robot computer for inference, and Omniverse for simulation; (iii) core models — the Cosmos world model and GR00T for humanoids — to understand and interact with the physical world.

Use cases: (i) autonomous driving (Alpamayo end-to-end, from camera input to execution, with inference and trajectory planning); (ii) industrial manufacturing (Siemens partnership for digital twins, factory automation, chip design simulation); (iii) robotics (humanoids, AMRs, surgical robots with environment interaction); (iv) weather forecasting (Earth 2 with ForecastNet/Cordiff).

2.2 NVIDIA Rubin platform: biggest delta is context memory

At CES 2026, Jensen announced Rubin has entered full production, with shipments expected to begin in 2H26. The industry is shifting from training-led to inference-led: training demand is stabilizing, while inference is growing exponentially.

Token cost for inference is now the key bottleneck for AI commercialization, setting pricing and profitability. Rubin targets a roughly 90% reduction in token cost vs. Blackwell via a ground-up, six-chip system redesign.

The Rubin platform comprises six custom chips (Vera CPU, Rubin GPU, ConnectX-9, BlueField-4, NVLink 6 Switch, Spectrum-6). Structurally, NVDA upgrades from compute + networking to compute + networking + storage & security. The added storage & security layer is the biggest incremental change, driving last night's memory rally.

1) Compute layer: Vera CPU + Rubin GPU remain the core.

① Vera CPU is designed for agentic reasoning in large-scale AI factories, orchestrating model cooperation, task decomposition, and resource scheduling. It features 88 custom Olympus cores with spatial multi-threading for 176 full-performance threads, 1.8TB/s NVLink-C2C, and 1.5TB LPDDR5X system memory (3x Grace) with 1.2TB/s bandwidth.

② Rubin GPU is the compute engine for giant-model training and high-throughput inference. It delivers 50 PFLOPS inference (NVFP4, 5x Blackwell) and 35 PFLOPS training (NVFP4, 3.5x Blackwell), with HBM4 bandwidth up to 22TB/s (2.8x) and NVLink at 3.6TB/s per GPU (2x).

2) Networking layer: ConnectX-9 + Spectrum-6 for 'AI Ethernet'

① ConnectX-9 handles AI traffic between nodes, addressing the high-latency and congestion issues of traditional Ethernet. It supports 800Gb/s per-port Ethernet and 200G PAM4 SERDES.

② Spectrum-6 interconnects thousands of Rubin racks to scale up to GigaWatt-class data centers. It supports 128x 800Gb/s or 512x 200Gb/s ports, enabling a 102.4Tb/s scalable switching fabric.

3) Storage & security layer (new): BlueField-4

As AI shifts from training to inference, context memory becomes the choke point. To address this, Rubin adds BlueField-4 for storage offload, security isolation, and KV cache management.

With BlueField-4, Rubin enables rack-level KV cache pooling and secure isolation. a) Each BlueField-4 can manage ~150TB of context memory, adding 16TB of NAND per Rubin GPU as attached memory to support long-context tasks. b) KV cache access is 5x faster vs. traditional storage, lifting token throughput and energy efficiency by 5x while reducing GPU idle time.

c) The ASTRA security architecture establishes end-to-end encryption, multi-tenant isolation, and auditability. This addresses data privacy and multi-tenancy at the system level.

Bottom line, Jensen Huang's CES 2026 remarks underscore NVDA's pivot from a GPU maker to a full-stack AI infra provider. The company is building across the stack: (i) chips (Vera CPU, Rubin GPU, Orin/Thor), (ii) systems (Vera Rubin supercomputer, DGX Cloud), (iii) models (NeMo Tron, Cosmos and vertical models), (iv) tools (NeMo libraries, blueprint frameworks), and (v) ecosystem (with Palantir, Siemens, etc.).

<End here>

Recent Dolphin Research notes on memory and NVDA:

Jan 6, 2026 — Memory Rips: How powerful is AI's 'storage super-cycle'?

Dec 18, 2025 — MU call Micron (Analyst Huddle): cash prioritized for capacity, HBM4 yield ramp faster

Dec 18, 2025 — MU call Micron (Trans): GP to keep improving, pace to moderate

Dec 18, 2025 — MU ER take Micron MU: AI ignites memory, a new cycle?

Dec 9, 2025 — NVDA hot topic H200 released: the 'skyrocket' for NVDA's $6tn road?

Nov 20, 2025 — NVDA Trans NVIDIA (Trans): target 75% GPM next year, OpenAI partnership is disciplined

Nov 20, 2025 — NVDA ER More than NFP: can NVDA rescue U.S. stocks again?

Risk disclosure and disclaimer: Dolphin Research Disclaimer and General Disclosure

The copyright of this article belongs to the original author/organization.

The views expressed herein are solely those of the author and do not reflect the stance of the platform. The content is intended for investment reference purposes only and shall not be considered as investment advice. Please contact us if you have any questions or suggestions regarding the content services provided by the platform.