--- title: "As Demand for Fast AI Tokens Grows, D-Matrix Develops Fast NIC" type: "News" locale: "zh-HK" url: "https://longbridge.com/zh-HK/news/273537723.md" description: "D-Matrix is responding to the growing demand for low-latency AI tokens by developing new AI accelerators and NIC cards. Their Corsair inference accelerator utilizes a compute-in-memory scheme to enhance memory bandwidth, while a 3D stacked DRAM technology aims to improve memory capacity and efficiency. The company has also introduced the Jetstream PCIe Gen5 NIC chip, designed to handle 400 Gbps with low latency, facilitating faster communication for distributed inference systems. D-Matrix is focused on breaking barriers in memory and communication technology to meet the needs of evolving AI workloads." datetime: "2026-01-23T16:15:33.000Z" locales: - [zh-CN](https://longbridge.com/zh-CN/news/273537723.md) - [en](https://longbridge.com/en/news/273537723.md) - [zh-HK](https://longbridge.com/zh-HK/news/273537723.md) --- > 支持的語言: [简体中文](https://longbridge.com/zh-CN/news/273537723.md) | [English](https://longbridge.com/en/news/273537723.md) # As Demand for Fast AI Tokens Grows, D-Matrix Develops Fast NIC //php echo do\_shortcode('\[responsivevoice\_button voice="US English Male" buttontext="Listen to Post"\]') ?\> SANTA CLARA, Calif. — Fast LLM token generation is getting a lot of attention as demand grows. D-Matrix is banking on market demand for low latency tokens combined with trends towards inference disaggregation and heterogeneity in data center hardware to sell its AI accelerators and new, specially-designed low-latency NIC cards, Sree Ganesan, VP of product at D-Matrix, told EE Times. LLM inference workloads are growing with the rise of techniques like reasoning and chain-of-thought, plus agentic AI, which will mean models communicate with each other without being limited to human reading speed. Even for small agents (those that use models below 1B parameters—small language models or SLMs), these trends mean many more tokens will be required and latency will be even more critical, putting more pressure on memory bandwidth. “Even if it’s just a couple of agents in a compound system that are collaborating, we’re starting to see a lot more SLMs coming into play,” Ganesan said. “That brings us back to the memory wall—\[the industry\] can keep up a good pace from a compute perspective, but not for bandwidth. We really need a breakthrough in bandwidth because the gap is widening—so we think this type of memory-compute integration is here to stay.” D-Matrix’s Corsair inference accelerator uses a proprietary compute-in-memory scheme—multiplication is performed in its custom SRAM memory cells, combined with a digital adder tree. However, SRAM doesn’t scale as well as DRAM does, especially at advanced process nodes, Ganesan said. “We went from the 2D way of doing in-memory computing, where we saw the value in terms of getting hundreds of terabytes per second of bandwidth, and now we need to knock down the second barrier, which is memory capacity,” she said. “The way to do it is to go vertical.” D-Matrix has been working on a 3D stack of custom DRAM dies to augment its Corsair compute-in-memory chiplets. Future D-Matrix chips will still have both performance memory (the modified SRAM that does computation) and capacity memory (off-die DRAM that stores data), but the capacity memory will be expanded into three dimensions. Stacking DRAM has meant having to develop a way for the dies to communicate vertically—the logic/SRAM die is above the stacked DRAM which sits directly on the interposer. “What we ended up building is the ability to increase memory capacity significantly,” she said. “We don’t sacrifice any of the memory bandwidth because the entire surface area is available for communication; the same kind of advantage we have in memory bandwidth persists with the added advantage of increasing the capacity.” 3D stacking brings up complex yield and thermal stability issues, but the risks are reduced by using small dies—nowhere near reticle size—and by minimizing the picojoules per bit to keep thermal aspects constrained, Ganesan said. D-Matrix’s 3D custom DRAM test chip, Pavehawk, is up and running in the company’s lab. The company’s next-generation product, Raptor, will incorporate this 3D stacked technology and will target 10× better memory bandwidth and 10× better energy efficiency compared with moving to expensive HBM4. “We have very high confidence taking this into our next generation,” Ganesan said. “The DNA of this company is building out technology that’s breaking barriers, but also validating them before we take it into a commercial product.” ### **Fast NIC** D-Matrix has also been working hard on scale-out. “Whatever we do in terms of distributed inference has to enable Corsair to shine,” Ganesan said. Today, Corsair cards can be connected in a PCIe server with spare slots for NICs to allow scaling out. D-Matrix has developed a PCIe Gen5 NIC chip, now in production, which is designed to break another bottleneck: I/O. Called Jetstream, it can handle 400 Gbps (latency 2 µs) and it has a TDP of 150 W. “What we’re finding is that customers want to use not only the capacity memory, they want to use the performance memory, which is the ultra-low latency batch inference capability,” Ganesan said. The performance memory (alone) on an 8-card Corsair server node can hold an 8-10B (8-bit) parameter model, but a single rack could be configured to hold a 100B (8-bit) parameter model in performance memory for ultra-low latency, provided the chips could communicate fast enough. PCIe and Ethernet didn’t offer the required speed, Ganesan said. Jetstream enables device-initiated communications (communication via the host is not needed) so that communication can keep up with computing speed. “This is all asynchronous communication happening in the background,” Ganesan said. “That separates the data plane and the control plane which allows us to go really fast, keeps up with the compute and gets us the compatibility with industry standards.” D-Matrix took parts of the PCIe stack, optimized them for Corsair’s communication semantics, and added parts of the Ethernet stack. Taking only parts of each stack helped minimize the software overhead. Jetstream cards plug in to a Corsair server where an industry PCIe NIC would have gone, and connect to top-of-rack switches to build a multi-rack cluster. A sweet spot might be a cluster somewhere in the region of 500-1,000 Corsair cards, Ganesan said, based on demand the company is anticipating in the market. Accordingly, D-Matrix’s roadmap now has an I/O dimension. Jetstream builds out the current generation alongside Corsair. The company’s second-generation compute-in-memory platform, Raptor, will need a different approach. “Jetstream is the starting point \[for this roadmap\] where we took the fast path to solve the problem for Corsair,” Ganesan said. “As we look ahead, we want to build an electrical I/O chiplet aligned with industry standards…there are opportunities to take those and put them into a chiplet and integrate them in the Raptor family.” The company’s third-generation compute-in-memory architecture, dubbed Lightning, will use some form of optical I/O. ### **Hardware heterogeneity** Current inference hardware trends include breaking LLM inference workloads into two stages—prefill and decode—with different compute and memory needs and running them on different hardware. “Our core hypothesis that the world is going to get heterogeneous,” Ganesan said. D-Matrix uses the same hardware for prefill and decode, but the same hardware can be configured differently for the two workloads. “If you have a very highly compute-bound prefill phase, you can just use the DDR memory, which we have plenty of, and do your compute-bound prefill part with what we call the capacity memory piece, then transfer it over to use performance memory,” Ganesan said. Heterogeneity will expand beyond just prefill and decode stages, Ganesan said, noting that beyond those two stages, there are parts of the workload that are extremely latency-sensitive and demand small batch sizes. There is customer interest in Corsair for these parts of a workload, she added. “Heterogeneity will start coming more and more, we’re already seeing that,” she said. “We’ve said that for a long—there’s plenty of places for inference to get more and more differentiated with heterogeneity.” Heterogeneity could also mean installing D-Matrix Corsair cards alongside Nvidia GPUs so that latency-critical parts of the workload can be offloaded from GPUs, as needed. D-Matrix’s current customer interest is coming from hyperscalers and neoclouds, where there are a number of Corsair trials up and running, Ganesan said. _**Editors note:** To listen to our podcast on test time scaling with D-Matrix CEO Sid Sheth, click here._ ### 相關股票 - [iShares Semiconductor ETF (SOXX.US)](https://longbridge.com/zh-HK/quote/SOXX.US.md) - [SPDR S&P Semicon (XSD.US)](https://longbridge.com/zh-HK/quote/XSD.US.md) - [ISHRS S&P Glb It (IXN.US)](https://longbridge.com/zh-HK/quote/IXN.US.md) - [Direxion Semicon Bull 3X (SOXL.US)](https://longbridge.com/zh-HK/quote/SOXL.US.md) - [Invesco Semiconductors ETF (PSI.US)](https://longbridge.com/zh-HK/quote/PSI.US.md) - [VanEck Semiconductor ETF (SMH.US)](https://longbridge.com/zh-HK/quote/SMH.US.md) ## 相關資訊與研究 - [AI Search Engineers Recognized as a Leading AI Certified Agency for Ranking Businesses in AI Search Results](https://longbridge.com/zh-HK/news/281223038.md) - [Cotality Launches a Universal Connector for AI: Unveiling Its MCP Server and AI-Ready Property Intelligence](https://longbridge.com/zh-HK/news/281208250.md) - [03:15 ETLunit, CellCarta Announce Strategic Collaboration to Accelerate AI-Enabled Digital Pathology for Companion Diagnostic Programs](https://longbridge.com/zh-HK/news/280971394.md) - [MedPal AI Wins Strong Shareholder Backing at AGM as It Expands AI Health Platform](https://longbridge.com/zh-HK/news/281501359.md) - [Micron Readies Client Storage for AI](https://longbridge.com/zh-HK/news/281383944.md)