---
title: "SemiAnalysis: Downstream Large Model Companies Are Already Highly Profitable; NVIDIA and Taiwan Semiconductor Can Earn Even More"
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/284955261.md"
description: "The AI value chain is undergoing a structural revaluation, with chipmakers facing rapid catch-up from downstream model providers. SemiAnalysis points out that Anthropic's annualized revenue surged from $9 billion to over $44 billion within months, while inference gross margins rose from 38% to above 70%. The pricing frameworks of NVIDIA and Taiwan Semiconductor have not yet reflected market changes, suggesting an upside potential of more than 40%. The value depression in AI is shifting, with profits at the infrastructure layer gradually concentrating at the model layer, and the economic logic of AI is expected to be rewritten in 2025"
datetime: "2026-05-02T04:13:51.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/284955261.md)
  - [en](https://longbridge.com/en/news/284955261.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/284955261.md)
---

# SemiAnalysis: Downstream Large Model Companies Are Already Highly Profitable; NVIDIA and Taiwan Semiconductor Can Earn Even More

The AI value chain is undergoing a structural revaluation. Chipmakers, which previously captured the majority of profits, are now facing rapid catch-up from downstream model providers, but the profit margin at the upstream level has far from reached its ceiling.

SemiAnalysis analysis indicates that Anthropic's annualized revenue rose from $9 billion to over $44 billion within a few months, with inference gross margins increasing from 38% to over 70%. NVIDIA's current pricing framework remains cost-oriented and has not yet reflected changes in the economics of inference workloads. Once the framework is adjusted, NVIDIA's system pricing has an upside potential of more than 40%. Taiwan Semiconductor's N3 process capacity is also at the core of this value redistribution.

The key support for this judgment lies in the structural mismatch between supply and demand: **N3 process utilization is expected to exceed 100% in the second half of 2026, DRAM factories are already operating at over 90% capacity, and token demand for frontier models continues to expand at a compound rate.** In this context, the window for NVIDIA to achieve differentiated pricing through SOCAMM memory modules has opened.

## **Shift in AI Value Depression: Infrastructure Layer Gives Way to Model Layer**

**From 2023 to early 2025, the vast majority of profits in the AI value chain accumulated at the infrastructure layer.** NVIDIA saw an initial explosion, followed by power asset companies Vistra and GE Vernova, which rose by 265% and 146% respectively in 2024. Storage manufacturers SanDisk, Western Digital, Seagate, and Micron all achieved gains of over 200% in 2025.

On the flip side of this landscape, model creators and inference service providers long suffered from low gross margin dilemmas. At that time, the actual practical value of AI was limited, and market skepticism regarding AI investment returns was constant.

The turning point arrived in December 2025. As Agentic AI truly moved towards practical application, the economic logic of AI was completely rewritten. SemiAnalysis disclosed that its own annualized token consumption expenditure had approached 30% of employee compensation, with each employee consuming nearly 5 billion tokens per month, more than five times the per capita usage within Meta. Many tasks that previously required junior analysts to spend hours on—including financial modeling, data visualization, and earnings analysis—can now be completed with just a few dollars in token expenditures.

SemiAnalysis estimates that its team's peak annualized spending on Anthropic Claude reached $10.95 million, yet the competitive advantage gained far exceeded this cost. Anthropic subsequently benefited: ARR skyrocketed from $9 billion to over $44 billion, and inference gross margins rose from 38% to over 70%.

## **Sharp Drop in Token Costs, Sustainable Expansion of Model Provider Margins**

**Another core factor driving the surge in model providers' gross margins is the significant decrease in token production costs.**

From a hardware perspective, for standard inference tasks with 8K input and 1K output, a fully software-optimized B300 system (including wide EP, separation of computation and prefetching, and multi-token prediction) can generate approximately 14,000 tokens per second per GPU, whereas the unoptimized version generates only about 1,000. On the same hardware, software optimization alone contributes a 14-fold increase in throughput. If further combined with hardware upgrades, the optimally configured GB300 NVL72 offers about a 17-fold increase in FP8 throughput compared to the H100. Switching to FP4 precision, which is not natively supported by the H100, widens the gap to 32 times, while the total cost of ownership per GPU for the GB300 is only about 70% higher.

From a pricing structure perspective, Agentic workloads feature extremely high input-to-output ratios (approximately 300:1 for Claude Code use cases) and very high cache hit rates (over 90%), causing the majority of tokens to fall into the lowest billing tier. SemiAnalysis estimates that the true blended cost of Opus 4.7 for agentic tasks is approximately $0.99 per million tokens, far below the listed price of $5 per million input tokens.

Even in the face of Anthropic's significant price reductions for the Opus series—Opus 4.5 pricing was reduced by two-thirds compared to previous levels—SemiAnalysis believes that Anthropic's unit gross profit actually increased: **on one hand, production costs decreased further with hardware upgrades; on the other hand, large-scale user migration from Sonnet to Opus pushed up the blended ASP.**

More strategically, Anthropic retains pricing dominance in its high-end product lines. Opus Fast is priced at six times that of regular Opus, while the announced Mythos is priced at $25/$125 per million tokens, five times that of regular Opus. SemiAnalysis explicitly stated that if Anthropic were to offer Mythos Fast at $150/$750 per million tokens, their team would still purchase it—because the value of productivity gains far exceeds the cost.

## **Why Model Providers' Pricing Power Is Hard to Erode Through Competition**

Regarding the sustainability of high margins for frontier models, the most common challenge comes from competitive pressure. SemiAnalysis provides two counterarguments.

First, **the capability gap between frontier closed-source models and open-source models remains significant and is difficult to bridge in the short term.** Low-priced open-source models represented by Kimi K2.6 ($0.95/$4 per million tokens) exert almost no substantial downward pressure on Opus pricing.

Second, **compute constraints mean that no single frontier lab can serve the entire market alone.** Anthropic has actively managed demand-side pressure by locking Claude Code behind a monthly subscription threshold of over $100 and restricting third-party access. Token demand will continue to exceed supply in the foreseeable future, meaning labs capable of providing truly frontier quality can set prices based on the economic value created by tokens rather than competitive costs.

## **NVIDIA's Pricing Restraint: Regulatory Logic or Strategic Misjudgment**

In the face of the profound restructuring of the AI value chain, NVIDIA has not made substantial adjustments to its pricing framework to date, which is a structural issue worth noting.

NVIDIA's current pricing is still primarily anchored to cost, reflecting an old paradigm where demand value diminishes over time—an assumption that no longer holds. Current demand growth is not linear but expanding at a compound rate, driven by the explosion of agentic workloads and the continuous leap in token consumption per workflow.

SemiAnalysis believes that NVIDIA's restraint in pricing is partly due to regulatory concerns. NVIDIA's dominance in GPUs, interconnects, and software stacks has attracted increasingly close antitrust scrutiny. Against the backdrop of significant profitability among downstream AI labs, aggressive price hikes could exacerbate regulatory risks and potentially accelerate customer dispersion to alternative platforms such as TPU and Trainium.

In this sense, NVIDIA's behavior pattern is quite similar to that of Taiwan Semiconductor. For a long time, even while operating at full capacity and acting as a bottleneck for advanced process supply, Taiwan Semiconductor did not push pricing to the limit of scarcity premiums, instead prioritizing the long-term stability of the ecosystem and customer relationships. **This logic can be summarized as the "AI Central Bank"—supporting downstream ecosystem expansion through moderate concession of profits, rather than maximizing short-term profit extraction, to ensure its long-term dominance in the AI era.**

However, this strategy carries real opportunity costs. In a structural context where compute demand consistently exceeds supply, holding scarce resources without fully pricing them amounts to handing value over to the midstream and downstream parts of the ecosystem chain. Taiwan Semiconductor is similar in its N3 process—SemiAnalysis directly pointed this out as a "strategic error," stating that it should at least demand larger-scale prepayment arrangements.

## **Rubin Pricing Space: SOCAMM Becomes New Profit Lever**

**NVIDIA's upcoming Vera Rubin VR NVL72 system provides an opportunity to reassess the pricing framework.**

From a cost perspective, it is estimated that the minimum GPU rental required for the VR NVL72 to achieve the same 15.6% project IRR as the GB300 NVL72 (5-year term, 15% prepayment) is approximately $4.92 per hour. From a value perspective, if anchored to the current GB300 rental of approximately $0.70 per PFLOP in terms of FP8 dense compute, the theoretical maximum pricing for the VR NVL72 would be approximately $12.25 per GPU per hour, about 2.5 times the cost floor price.

This huge price spread indicates that NVIDIA has ample room to raise prices for the VR NVL72. SemiAnalysis estimates that if NVIDIA increases system pricing by about 40%, it would still leave sufficient profit margin for Neoclouds—even if Neoclouds raise rentals to over $8 per hour, the cost per PFLOP would remain below the historical trend line.

In terms of specific mechanisms, **SOCAMM becomes the most critical pricing lever.** Unlike the GB300, which soldered LPDDR5X memory directly onto the motherboard and embedded it into the overall system pricing, the VR NVL72 adopts pluggable SOCAMM modules, allowing NVIDIA to list and price memory as a separate billing item.

SOCAMM (Small Outline Compression Attached Memory Module) is a new modular memory standard led by NVIDIA and developed jointly with memory manufacturers such as Samsung, SK Hynix, and Micron. Based on LPDDR5X (or future LPDDR6) DRAM technology, it is designed for AI servers and personal AI supercomputer scenarios.

Models show that the contract price NVIDIA paid for SOCAMM in the first quarter of 2026 was approximately $8 per GB, a significant increase from the previous quarter, mainly reflecting tight LPDDR5X supply and rising overall DRAM prices. Based on forecasts for mobile DRAM pricing by the end of 2026, SOCAMM pricing could exceed $13 per GB by the end of 2026, with an average of around $10 for the year being a reasonable assumption.

On this basis, SemiAnalysis believes it is reasonable for NVIDIA to charge a 60% gross margin on SOCAMM: **first, memory supply is comprehensively tight, and NVIDIA has priority access in SOCAMM procurement; second, the VR NVL72 far exceeds competing products in terms of performance/TCO, leaving customers with no viable alternatives; third, NVIDIA itself is facing a significant rise in SOCAMM procurement costs, providing a reasonable basis for passing these costs downstream.**

Furthermore, memory pricing does not face the same antitrust concerns as GPU pricing, giving NVIDIA greater space for differentiated pricing—including implementing differential pricing for Neoclouds and hyperscale cloud providers. Currently, NVIDIA already charges Neoclouds approximately twice the price for networking equipment compared to hyperscale cloud providers, and the same logic can easily extend to the memory level.

Risk Warning and Disclaimer

The market involves risks, and investment requires caution. This article does not constitute personal investment advice, nor does it take into account the specific investment objectives, financial status, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment decisions made based on this content are the sole responsibility of the investor.

### Related Stocks

- [TSM.US](https://longbridge.com/en/quote/TSM.US.md)
- [NVDA.US](https://longbridge.com/en/quote/NVDA.US.md)
- [SOXL.US](https://longbridge.com/en/quote/SOXL.US.md)
- [NVD.US](https://longbridge.com/en/quote/NVD.US.md)
- [NVDS.US](https://longbridge.com/en/quote/NVDS.US.md)
- [TSMG.US](https://longbridge.com/en/quote/TSMG.US.md)
- [TSMU.US](https://longbridge.com/en/quote/TSMU.US.md)
- [SOXX.US](https://longbridge.com/en/quote/SOXX.US.md)
- [NVDL.US](https://longbridge.com/en/quote/NVDL.US.md)
- [NVDD.US](https://longbridge.com/en/quote/NVDD.US.md)
- [TSMX.US](https://longbridge.com/en/quote/TSMX.US.md)
- [NVDX.US](https://longbridge.com/en/quote/NVDX.US.md)
- [SMH.US](https://longbridge.com/en/quote/SMH.US.md)
- [NVDY.US](https://longbridge.com/en/quote/NVDY.US.md)
- [NVDQ.US](https://longbridge.com/en/quote/NVDQ.US.md)
- [NVDU.US](https://longbridge.com/en/quote/NVDU.US.md)
- [XSD.US](https://longbridge.com/en/quote/XSD.US.md)
- [PSI.US](https://longbridge.com/en/quote/PSI.US.md)
- [07788.HK](https://longbridge.com/en/quote/07788.HK.md)
- [07388.HK](https://longbridge.com/en/quote/07388.HK.md)

## Related News & Research

- [Altera Brings Determinism to Physical AI Systems with Latest Release of FPGA AI Suite](https://longbridge.com/en/news/284800493.md)
- [Delta bullish on rising AI demand](https://longbridge.com/en/news/284918443.md)
- [Mag 7 Just Committed $710 Billion To AI Capex](https://longbridge.com/en/news/284955675.md)
- [Intel Stock (NASDAQ:INTC) Surges as SambaNova Deal Clears Regulators](https://longbridge.com/en/news/284931575.md)
- [Verity Asset Management Inc. Trims Stock Holdings in Taiwan Semiconductor Manufacturing Company Ltd. $TSM](https://longbridge.com/en/news/284901417.md)