The Token Economy: How AI Stopped Looking Like a SaaS Business

The unit economics of AI are now stranger than most investors realise.

Between early 2023 and early 2025, the price of a million GPT-4-level output tokens fell from roughly US$60 to under US$1.50 — a more than 40-fold reduction in two years. In the year since, the average cost across major frontier providers fell again, from about US$10 to US$2.50 per million tokens (per Artefact and industry trackers). And yet, in the same period, total enterprise spending on AI tripled, reaching approximately US$37 billion in 2025.

Token prices have fallen 99.7% from their peak. Bills have not gone down.

This is the central puzzle of the AI economy in 2026 — and it is the puzzle that explains why every major AI company has stopped competing on subscription plans and started competing, almost exclusively, on something they used to bury in API documentation: the price of a token.

A Paradox

Most reporting frames the token price collapse as a race to the bottom. That framing is incomplete. It misses what is actually happening at the unit level.

The conventional view of an AI company is that it is a SaaS business. You pay a monthly fee, you get access to a product, and the seller's economics depend on user count and retention. Margins come from scaling marketing while holding cost per user roughly flat. This is how Microsoft sells Office, how Google sells Workspace, how Salesforce sells everything.

AI is not that.

When you query an AI model, the seller does real work in real time — running an inference job that has a measurable cost in GPU-seconds, electricity, and data-centre capacity. That cost does not get amortised over the user's lifetime. It happens, and it costs money, every single time the model is invoked. AI is not really a SaaS business in any meaningful sense. It is a per-unit pricing business that resembles utility metering more than software licensing.

The unit is the token.

What A Token Actually Is?

A token is a fragment of text that the model treats as one atomic step of input or output. Sometimes a token is a whole word ('apple'). Often it is part of a word ('apolo' + 'gise'). Punctuation, spaces, and even bits of code are tokens. The average English word is about 1.3 tokens. A typical paragraph is roughly 100 tokens.

What matters for the economy: every token costs computation. Inference cost scales linearly with token count. The longer the response, the more expensive it was to produce.

When OpenAI charges US$14 per million output tokens for GPT-5.2, or Anthropic charges US$25 for Claude Opus 4.6 output, or Google offers Gemini 3 Flash at US$3 (all per provider API documentation, as of May 2026), what they are selling is not access to a product. They are selling a unit of computation that happens to produce intelligible text on demand. It is closer to selling kilowatt-hours than to selling Office 365.

Once you see this, the strategic positioning of every frontier AI company snaps into focus.

What Did Tokens Replace?

The traditional SaaS metric stack was built around DAU, MAU, ARR, and churn. These metrics measure user engagement and recurring revenue, and they work when the cost-to-serve is roughly fixed per user.

AI companies still report these numbers, but internally they care about something else: tokens shipped per dollar of revenue, tokens shipped per dollar of compute cost, and the spread between the two. That is the real margin equation. When Anthropic doubles its quarterly revenue from US$4.8 billion to a projected US$10.9 billion (per company guidance to investors, May 2026), the question that matters is not "how many seats did you add" but "how many tokens did you ship, and at what unit margin."

The shift from MRR to tokens-per-dollar is not just an internal accounting nuance. It changes who wins.

In a SaaS business, you win by acquiring and retaining users. In a token business, you win by producing more tokens at a lower marginal cost than your competitors — which means winning on chip access, on inference software optimisation, on power contracts, and on data-centre density. The race is not about product polish; it is about industrial throughput.

It is, more precisely, about being a low-cost producer of a commodity that happens to be intelligence.

Who Captures the Surplus?

What's interesting is that this is what economists call a Jevons paradox — a phenomenon first observed in 19th-century British coal markets, where falling per-unit costs drove total consumption up rather than down. The cheaper the token, the more tokens enterprises use; the more they use, the larger the absolute spend, even as price per unit collapses.

This is why AI spending tripled while token prices fell 99.7%. The unit became cheap; the use-case explosion was bigger than the price collapse.

But — and this is the part the headline numbers don't capture — the margin is not flowing to the model companies. It is flowing past them, downstream. Per a NavyaAI cost analysis, 72% of enterprise AI costs hide outside of raw inference spending: agentic workflows multiply usage 50-500x, orchestration tools take their cut, integrators charge implementation fees, and the picks-and-shovels providers (compute clouds, GPU rental neoclouds, networking suppliers) capture the share of value that the model companies are competing away in their pricing wars.

This is the central irony of the token economy. The companies that sell the tokens — OpenAI, Anthropic, Google's AI division — are racing each other to the floor on unit price. The companies that surround them — NVIDIA, Broadcom, CoreWeave, Snowflake, Salesforce — are quietly capturing the value the token sellers are giving away.

My read is that this is structurally permanent, not transitory. As long as the tokens themselves are commoditised, the value will sit one layer up (in orchestration) and one layer down (in infrastructure).

Where This Ends?

The next phase, per academic research on token market dynamics, is supply-demand rebalancing. Token demand is now scaling faster than data centres can be built — and power contracts negotiated — to support it. The price decline will slow. There may be intermittent price rebounds.

That does not change the structural picture. AI is no longer being sold as a SaaS product, and it will not return to being one. The unit of pricing is now a token. The unit of competition is tokens per dollar. The unit of capture is whatever sits above and below the tokens.

If the question is who wins the AI economy, the answer is no longer about who builds the smartest model. It is about who owns the most efficient token factory — and who can profit from what gets done with the tokens once they leave the factory.