--- title: "SemiAnalysis GTC Deep Dive: Behind Three New Systems, NVIDIA is Redefining the Boundaries of AI Infrastructure" type: "News" locale: "zh-HK" url: "https://longbridge.com/zh-HK/news/280321049.md" description: "NVIDIA announced three systems at GTC 2026: the LPX inference rack integrating the Groq LP30 chip, the Vera ETL256 liquid-cooled rack packed with 256 CPUs, and the STX storage reference architecture. SemiAnalysis believes these three systems collectively signal a strategic shift: NVIDIA is no longer just a GPU supplier but is evolving into a full-stack AI infrastructure platform provider, extending its reach into areas previously dominated by other vendors, such as inference optimization, CPU density, and storage orchestration. This will profoundly impact the competitive landscape of the entire AI hardware supply chain" datetime: "2026-03-24T12:58:53.000Z" locales: - [zh-CN](https://longbridge.com/zh-CN/news/280321049.md) - [en](https://longbridge.com/en/news/280321049.md) - [zh-HK](https://longbridge.com/zh-HK/news/280321049.md) --- > 支持的語言: [简体中文](https://longbridge.com/zh-CN/news/280321049.md) | [English](https://longbridge.com/en/news/280321049.md) # SemiAnalysis GTC Deep Dive: Behind Three New Systems, NVIDIA is Redefining the Boundaries of AI Infrastructure At the GTC 2026 conference, NVIDIA unveiled three new systems simultaneously—the Groq LPX inference rack, the Vera ETL256 CPU rack, and the STX storage reference architecture—**systematically expanding its product footprint from core GPU computing power to encompass low-latency inference, CPU orchestration, and storage layers, marking NVIDIA's systematic redefinition of the boundaries of AI infrastructure.** The Groq LPX system has garnered the most market attention. This is the first productized outcome launched less than four months after NVIDIA's $20 billion acquisition of Groq's intellectual property licensing and core team. **The LPX rack deeply integrates Groq's LP30 chip with NVIDIA GPUs** and introduces the "Attention FFN Disaggregation" (AFD) technology to specifically compress decoding latency in high-interaction inference scenarios, opening up previously non-existent optimization paths for large-scale inference systems. Concurrently, **the Vera ETL256 packs 256 CPUs into a single liquid-cooled rack**, utilizing copper cabling topology for full intra-rack connectivity, directly targeting the increasingly prominent CPU supply bottleneck accompanying AI's expansion in scale; **STX, through its standardized storage reference architecture**, formally extends NVIDIA's control from the compute and network layers to the storage infrastructure layer. SemiAnalysis believes that the three systems collectively point to a single strategic signal: NVIDIA is no longer merely a GPU supplier but is evolving into a full-stack AI infrastructure platform provider, its reach extending to inference optimization, CPU density, and storage orchestration—areas previously dominated by other vendors, which will profoundly impact the competitive landscape of the entire AI hardware supply chain. ## **LPX and LP30: Groq Architecture Officially Integrated into NVIDIA's Inference Stack** The NVIDIA-Groq transaction was structured as an intellectual property license and talent acquisition, rather than a traditional merger. This granted NVIDIA almost immediate access to Groq's complete IP and core team, leading to the launch of the LP30 chip based on Groq's third-generation LPU architecture and the LPX rack system in less than four months. The LP30 utilizes Samsung's SF4 process, features 500MB of on-chip SRAM, and delivers 1.2 PFLOPS of compute power at FP8 precision. This represents a significant improvement over Groq's first-generation LPU (230MB SRAM, 750 TFLOPS INT8), primarily driven by the migration from GF16 to SF4 process nodes. The LP30 exists as a single, monolithic die, eliminating the need for advanced packaging. Notably, the SF4 process does not consume NVIDIA's scarce N3 wafer allocation on TSMC or strain equally tight HBM resources. Therefore, the LPX system represents genuinely incremental capacity and revenue, a differentiated advantage that SemiAnalysis points out competitors cannot replicate. **Core Value and Inherent Limitations of the LPU Architecture** The competitive advantage of the LPU architecture lies in its high-bandwidth SRAM and deterministic pipeline execution mechanism, enabling it to achieve superior first-token generation speeds compared to GPUs in single-user, low-latency scenarios. However, the trade-off for high-density SRAM is limited capacity—after weights are loaded, very little space remains. As batch sizes increase, KV Cache quickly saturates, leading to significantly lower overall throughput than GPUs. According to SemiAnalysis, LPU systems deployed independently are not economical for large-scale token serving. However, they can command a considerable premium in latency-sensitive applications, which forms the basis for LPU's positioning in disaggregated decoding systems. **AFD Technology: Role Division Between GPUs and LPUs** AFD technology splits the attention computation (Attention) and feed-forward network computation (FFN) in large model inference onto different hardware. Attention computation, due to its involvement with dynamic KV Cache loading, is naturally suited for GPU processing. FFN computation, owing to its stateless and statically schedulable nature, is highly compatible with the LPU's deterministic architecture. Within this framework, GPUs focus on attention computation, allowing their HBM capacity to be fully utilized for KV Cache, thereby increasing the total number of tokens the system can process concurrently. LPUs, on the other hand, handle FFN computation, leveraging their low-latency advantage. Communication between GPUs and LPUs, via All-to-All collective operations, facilitates token distribution and aggregation, with a ping-pong pipeline masking communication latency. Furthermore, LPUs can be employed within a Speculative Decoding framework, deploying draft models or multi-token prediction (MTP) layers onto the LPU to further reduce the latency overhead per decoding step, typically increasing the number of output tokens per decoding step by 1.5 to 2 times. **LPX Rack Architecture** The LPX rack consists of 32 1U LPU compute trays and two Spectrum-X switches. Each compute tray houses 16 LP30s, two Altera FPGAs (referred to by NVIDIA as "Fabric Expansion Logic"), one Intel Granite Rapids host CPU, and one BlueField-4 frontend module. The FPGAs serve multiple critical functions in the system: translating the LPU's C2C protocol to Ethernet for connection to the Spectrum-X scale-out network, providing PCIe bridging between the LPU and the host CPU, and supplying up to 256GB of DDR5 expanded memory per board for KV Cache storage. The total scale-out bandwidth for the entire rack is approximately 640 TB/s. The LPU modules are mounted "stomach-to-stomach" on both sides of the PCB board, eight on top and eight on the bottom, to shorten the X and Y direction routing traces required for the full mesh interconnect. The 16 LPUs within a node are connected in a fully connected mesh topology. Inter-node connections are made via copper backplane, and cross-rack connections are established through the front-panel OSFP interfaces. ## **Vera ETL256: The Density Limit of 256 CPUs** As AI workloads increasingly demand data preprocessing, scheduling orchestration, and reinforcement learning verification, CPUs are becoming a new bottleneck restricting GPU utilization. This is particularly pronounced in reinforcement learning scenarios—CPUs need to run simulation environments, execute code, and verify outputs in parallel. The rate of GPU expansion far outpaces that of CPUs, leading to a continuously growing CPU cluster size required to maintain full GPU utilization. **NVIDIA's solution is the Vera ETL256, integrating 256 Vera CPUs into a single rack, relying on liquid cooling to achieve this density.** The design logic of this system is consistent with the NVL server rack: increasing compute density to the critical point where copper cabling can cover all intra-rack connections, thereby eliminating the need for optical transceivers at the network backbone level. The cost savings from copper cabling are sufficient to offset the additional expenses introduced by liquid cooling. Specifically, the Vera ETL rack comprises 32 compute trays, 16 on top and 16 on the bottom, symmetrically arranged around four 1U MGX ETL switch trays (based on Spectrum-6). This symmetrical layout deliberately minimizes the difference in cable lengths between each compute tray and the backbone switch tray, ensuring all connections remain within the reach of copper cabling. The rear ports of each switch tray handle intra-rack copper backbone communication, while the 32 front-facing OSFP ports provide optical fiber connectivity to the rest of the POD nodes. The intra-rack network employs a Spectrum-X multi-plane topology, distributing 200 Gb/s channels across four switches to achieve Ethernet connectivity for all 256 CPUs within a single network layer. Each compute tray hosts 8 Vera CPUs. ## **STX: NVIDIA's Systematic Extension into the Storage Layer** STX is the storage reference rack architecture launched by NVIDIA at GTC 2026. Together with the previously introduced CMX context storage platform, it forms NVIDIA's comprehensive layout for penetrating the storage infrastructure layer. STX builds upon CMX by establishing a reference architecture that precisely specifies the number of disk drives, Vera CPUs, BF-4 DPUs, CX-9 NICs, and Spectrum-X switches required for a cluster. Each STX chassis contains 2 BF-4 units, totaling 2 Vera CPUs, 4 CX-9 NICs, and 4 SOCAMM modules. The entire STX rack consists of 16 chassis, corresponding to 32 Vera CPUs, 64 CX-9 NICs, and 64 SOCAMMs. Alongside the STX release, NVIDIA has notably named several major storage vendors—including DDN, Dell Technologies, HPE, IBM, NetApp, Supermicro, and VAST Data—indicating that these vendors will support the STX standard. This continues NVIDIA's consistent practice of leveraging industry endorsement to strengthen the influence of its reference architectures. According to SemiAnalysis, the combination of BlueField-4, CMX, and STX represents **NVIDIA's systematic push into the storage, software, and infrastructure operations layers, after establishing its dominant position in the compute layer (GPUs) and network layer (Spectrum-X and NVLink).** The three new systems collectively broaden NVIDIA's product moat and imply that a larger proportion of the AI infrastructure supply chain market share will continue to concentrate towards NVIDIA. ### 相關股票 - [NVIDIA (NVDA.US)](https://longbridge.com/zh-HK/quote/NVDA.US.md) - [First Trust Nasdaq Food & Semicon (FTXL.US)](https://longbridge.com/zh-HK/quote/FTXL.US.md) - [SPDR S&P Semicon (XSD.US)](https://longbridge.com/zh-HK/quote/XSD.US.md) - [Invesco Semiconductors ETF (PSI.US)](https://longbridge.com/zh-HK/quote/PSI.US.md) - [T-Rex 2X Inverse NVIDIA Daily Target ETF (NVDQ.US)](https://longbridge.com/zh-HK/quote/NVDQ.US.md) - [Spdr Select Tech (XLK.US)](https://longbridge.com/zh-HK/quote/XLK.US.md) - [iShares Semiconductor ETF (SOXX.US)](https://longbridge.com/zh-HK/quote/SOXX.US.md) - [Direxion Semicon Bull 3X (SOXL.US)](https://longbridge.com/zh-HK/quote/SOXL.US.md) - [Direxion Daily NVDA Bull 2X Shares (NVDU.US)](https://longbridge.com/zh-HK/quote/NVDU.US.md) - [VanEck Semiconductor ETF (SMH.US)](https://longbridge.com/zh-HK/quote/SMH.US.md) - [T-Rex 2X Long NVIDIA Daily Target ETF (NVDX.US)](https://longbridge.com/zh-HK/quote/NVDX.US.md) - [GraniteShares 2x Long NVDA Daily ETF (NVDL.US)](https://longbridge.com/zh-HK/quote/NVDL.US.md) - [XL2CSOPNVDA (07788.HK)](https://longbridge.com/zh-HK/quote/07788.HK.md) - [XI2CSOPNVDA (07388.HK)](https://longbridge.com/zh-HK/quote/07388.HK.md) - [YieldMax NVDA Option Income Strategy ETF (NVDY.US)](https://longbridge.com/zh-HK/quote/NVDY.US.md) - [Direxion Daily NVDA Bear 1X ETF (NVDD.US)](https://longbridge.com/zh-HK/quote/NVDD.US.md) ## 相關資訊與研究 - [GCT Semiconductor (GCTS) to Release Earnings on Wednesday](https://longbridge.com/zh-HK/news/280209681.md) - [Nvidia Asked by US Senators to Provide Details on Groq Deal](https://longbridge.com/zh-HK/news/279931631.md) - [NVIDIA could shake up AI economy with Rubin platform](https://longbridge.com/zh-HK/news/280132707.md) - [Nvidia Stock Forecast: Buy This 'Too Cheap to Ignore' Stock, Says Wolfe Research](https://longbridge.com/zh-HK/news/280027122.md) - [NVIDIA GTC Keynote: Huang Unveils “AI Factories,” Blackwell-Rubin Roadmap, and $1T Demand View](https://longbridge.com/zh-HK/news/280016667.md)