---
type: "Topics"
locale: "en"
url: "https://longbridge.com/en/topics/37933288.md"
description: "AI inference is shipping, yet as much as 99% of compute still sits idle. The core bottleneck is the 'memory wall' — data movement can’t keep up with compute, so GPUs spend far more time waiting to be fed than crunching numbers. For example, generating one token may take ~10 μs of compute but ~9 ms to load data, meaning most of the time is spent waiting.Fixing this is a two-step process: quick wins in the near term, and a more structural, tech-led cure over the medium to long term.Near term, combine 'faster pipes + shorter distance'. On one hand, upgrade HBM (GPU-attached DRAM) from 12-Hi to 16-Hi to push bandwidth to 16–32 TB/s, effectively speeding up and widening data lanes.On the other hand, add 3D-stacked SRAM adjacent to the GPU to keep hot data nearby, cutting latency from ~100 ns to ~2 ns.SRAM handles speed-sensitive transfers, while HBM provides capacity. NVIDIA’s acquisition of Groq targets SRAM know-how, and the Rubin platform slated for 2H26 is set to integrate this, boosting memory throughput.Over the medium to long term, move to compute-in-memory (CIM), embedding some compute into storage so data isn’t shuttled back and forth. This removes the memory wall at the source.CIM hasn’t been deployed in data centers yet, with rollouts expected after 2027.With HBM4 entering mass production in 2026 and SRAM commercialization ramping, followed by CIM adoption, the 99% idle compute problem should gradually ease.AI inference can then run much closer to full throttle."
datetime: "2026-01-20T06:35:22.000Z"
locales:
  - [en](https://longbridge.com/en/topics/37933288.md)
  - [zh-CN](https://longbridge.com/zh-CN/topics/37933288.md)
  - [zh-HK](https://longbridge.com/zh-HK/topics/37933288.md)
author: "[Dolphin Research](https://longbridge.com/en/news/dolphin.md)"
---

> Supported Languages: [简体中文](https://longbridge.com/zh-CN/topics/37933288.md) | [繁體中文](https://longbridge.com/zh-HK/topics/37933288.md)


# AI inference is shipping, yet as much as 99% of co…


### Related Stocks

- [NVIDIA Corporation (NVDA.US)](https://longbridge.com/en/quote/NVDA.US.md)

## Comments (6)

- **方圆9269 · 2026-01-20T19:24:27.000Z · 👍 1**: Technical post! Very reliable!
- **HIC · 2026-01-20T11:29:11.000Z · 👍 1**: Good article, learned a lot
- **珠穆朗玛8848 · 2026-01-20T08:12:10.000Z · 👍 1**: Concisely explained the storage bottleneck in inference acceleration
  - **Dolphin Research** (2026-01-21T02:22:47.000Z): 💪🏻💪🏻
- **洒家特地来赚刀乐 · 2026-01-20T06:40:34.000Z · 👍 3**: There is a memory wall and an IO wall, the data transfer speed is relatively slow, and photoelectric conversion takes several times
  - **Dolphin Research** (2026-01-20T07:21:23.000Z): Yes, the mass production of HBM4 and the Rubin architecture with more high-speed storage are expected to alleviate the issue.