Zhipu (Trans): Tech-Driven, Focus on Advancing the Model Frontier

Below is Dolphin Research's Trans of Zhipu's FY2025 earnings call

I. Key takeaways

1. Top-line surged: FY2025 revenue reached RMB 724 mn (+132% YoY). Cloud open platform and API contributed 26.3% of total.

2. API ARR inflected: as of Mar 2026, API ARR hit $250 mn, up 60x over the last 12 months. API call pricing is up 83% vs. Dec 2025, with demand accelerating post-hike.

3. Structural GPM improvement: FY gross profit was RMB 297 mn, with GPM at 41%. Cloud API GPM jumped from 3.4% in 2024 to 18.9% in 2025, nearly a 5x uplift.

4. R&D and P&L: FY R&D spend was RMB 3.18 bn (+44.9% YoY), and Adj. net loss was RMB 3.18 bn. Gross profit has begun to cover S&M and G&A after excluding SBC.

II. Detail from the call

2.1 Management highlights

1. Model roadmap: five paradigm shifts in AI Coding

a. Phase I (AI Coding): Pre-2024, LLMs acted as advanced code completion tools, with humans driving step-by-step debugging and execution. Zhipu was among the first to release the CodeGeeX code model.

b. Phase II (Vibe Coding): By 2025 users no longer needed to know code; they could describe feelings and scenarios to build projects, requiring intent understanding and project-level logic. GLM-4.5 (Jul 2025) natively fused reasoning, coding and Agent capabilities in a 355 bn-parameter MoE, ranking No.1 domestically on 12 benchmarks, No.3 globally and No.1 among open-source; GLM-4.6 followed two months later with 200K-token context, tying for No.1 globally on LMArena’s coding leaderboard. GLM-4.7 (Dec 2025) strengthened planning and tool orchestration, surpassing Claude 3.5 in coding, with API ARR up 10x vs. earlier generations.

c. Phase III (Agentic Engineering): GLM-5 launched in Feb 2026 with a 744 bn-parameter MoE, achieving the top open-source score on SWE-bench Verified and ranking No.4 globally (No.1 open-source) on AI indices. Within 24 hours it was officially integrated by ByteDance, Alibaba Group, Tencent, Meituan, Kuaishou, BIDU, and WPS. Key innovations included: improved MLA256 to cut KV cache, a dynamic absorption attention mechanism reducing deployment cost by 50%, an asynchronous RL framework, and over 10k verifiable real engineering environments. In Mar 2026, GLM-5 Turbo debuted as the first base model deeply optimized for Agentic Coding; the companion AutoClaw local release surpassed 100k users in two days, and the Claw Plan subscription priced 20% above the Coding Plan still faced supply constraints.

d. Phase IV (Long-horizon Tasks): The model will take over complex workflows spanning days to weeks, requiring orchestration of working and long-term memory. Multi-agent collaboration and asynchronous RL are the current focus areas.

e. Phase V (Autonomous OS): LLMs will evolve from dialogue interfaces to operating systems (LMOS), acting as intent schedulers. App stores will be replaced by API stores.

2. Biz. development

a. MaaS platform: BigModel.cn / Z.AI has served over 4 mn SMEs and developers across 218 countries/regions. Nine of China’s top 10 internet companies are using GLM.

b. Coding Plan: As of Mar 2026, paying developers exceeded 242k across 196 countries. Paid token calls rose 15x within six months of launch.

c. Claw Plan: Launched alongside GLM-5 Turbo, it added 100k users in two days and 400k in 20 days, and ranks top two by calls on OpenRouter. Demand continues to outpace supply.

d. Globalization: A national-level MaaS platform, Z.AI Lab, was launched in Malaysia. From Mar 2026, Zhipu began exploring partnerships with overseas inference platforms to list closed-source models and share revenue per call.

3. Outlook

a. The TAC era (Token Architecture Capability): tokens become a new factor of production; the ability to mobilize intelligent resources and convert ideas into economic output will be core competitiveness. This will define winners in the next cycle.

b. LMOS evolution: the LLM OS will directly parse fuzzy intent, decompose long-horizon tasks, and orchestrate platform-wide resources. Zhipu aims to make GLM the core engine of an autonomous system, scaling from cloud APIs to native on-device intelligence.

c. Output revolution: exponential token growth raises the revenue ceiling for the industry. Leveraging China’s full-stack advantages in energy, chips, algorithm-hardware fit and IDC ops, Zhipu plans to export tokens globally as a high-quality, cost-competitive factor of production.

2.2 Q&A

Q: Has compute become the core bottleneck to revenue growth? How will you plan and allocate compute going forward?

A: Tight compute is a sector-wide phenomenon rather than company-specific, both domestically and overseas among leading model vendors. Viewed differently, it also signals very strong real demand.

For us, requested concurrency from major platforms and users implies underlying demand at roughly 1–2x our current daily call capacity. With more supply, both call volume and revenue could scale materially.

Near term, we are filling critical gaps via external compute procurement and internal reallocation, while prioritizing higher-value use cases and partners to improve token supply efficiency under tight resources. Overseas, we are testing deployments with local inference platforms and sharing per-call revenue, which both accelerates penetration and partially externalizes compute load.

Longer term, the fundamental fix lies in model–chip co-optimization, including architectural improvements, inference efficiency gains, and low-level co-design and adaptation with domestic chipmakers to keep cutting per-token compute. We are progressing on this front and expect staged results this year. As co-optimization deepens, compute will shift from a hard constraint to an optimizable variable.

Q: Where are Agent products on commercialization? Any revenue contribution and paid conversion proof points? Have high-frequency, must-have use cases emerged?

A: Agent products have moved from early validation to scale-up. For example, the newly launched Claw Plan is seeing exponential growth in both users and calls.

Notably, we raised prices again when launching Claw, and the acceleration came on top of an 83% hike vs. end-Dec 2025. This shows demand is capability-led rather than price-led. Globally, peers like Anthropic show a similar pattern: each jump in model capability unlocks higher-value scenarios, supporting simultaneous growth in price and demand. Thus, the Claw-led Agent ramp is not a low-margin, price-for-volume phase but natural scaling as high-quality models prove value in real use.

Commercially, while Agents are still early in revenue terms at the aggregate level, signs are strong in paid conversion and depth among developer-facing scenarios. In complex task settings, users have shifted from novelty-driven trials to sustained reliance, with call frequency and time spent both rising. From coding into copilot workflows, high-frequency needs cluster around dev. efficiency and process automation.

Beyond consumer developers, the bigger opportunity in 2H should be enterprise. Agents’ core is a 24x7 long-chain task architecture, and sticky 24x7 workflows are more prevalent in enterprises given many automatable, continuously invoked processes.

Over the medium term, as capability rises and usage barriers fall, Agents will evolve from tools to general-purpose productivity, covering more complex, long-duration tasks. In short, Agents are in the early scale-up phase but have validated real demand and willingness to pay, with substantial upside across both consumer developers and enterprises.

Q: Is pricing more cost-plus driven, or value-based around capability uplift? How should we think about margin trajectory?

A: It depends on the end-state of token economics. Will tokens become fully commoditized like mobile data, or stratified products? We believe pricing will stratify by token quality, so it will not be simple cost-plus.

The market will naturally split into two. One is low-complexity, consumer-grade chat and Q&A, where token prices trend low or even free, with ads as a possible model. The other is high-complexity, high-reliability tokens that solve real productivity problems and embed stronger model capability, carrying clear value creation. This apex layer should confer sustained pricing power and bargaining leverage to providers.

Our pricing reflects some cost factors such as storage, but it is primarily value-based around capability. Strategically, we focus on apex, high-intelligence models, similar to Anthropic globally — building pricing power in high-value use cases by pushing the intelligence frontier, rather than winning on low-price, high-volume competition. Apex demand is less price-sensitive and more demanding on performance and stability.

On cost pass-through, compute does affect model pricing but not linearly. LLM vendors are aggressively optimizing inference, driving down per-token cost, and in high-value use cases customers focus on ROI rather than unit token price. That is why high-priced coding products still see strong adoption by major US tech firms. In short, cost is a floor, not the core driver of pricing.

Take developers: the US has ~4 mn software developers with a ~$130k median salary. Top-tier models price high-end subscriptions at ~$200/month; a developer may subscribe to 2–3 models, spending ~5% of salary, and even at these levels closed-source APIs remain supply-constrained. In China, IT professionals average ~$33k salary; at a 5% share, that implies ~$138/month. Domestic leaders are priced well below this, leaving ample room to move up. Also, among top models the game is not zero-sum; user overlap is large, suggesting that once you reach the apex, the competition hinges more on capability boundaries and UX than price.

Q: With price hikes, token consumption and developer counts still grow fast. Is this volume–price rise driven by capability or by ecosystem/customer mix? Is it sustainable?

A: The volume–price rise is not a transient effect; it is driven by both capability improvements and demand-side shifts.

On demand, two changes matter. First, the expansion of app entry points. In late 2024 we staked out coding, when even Anthropic had not fully educated the market; coding looked niche for pro programmers. Over the past two years, coding has become an enterprise software entry. As capability rises, more companies use coding as a wedge into broader internal dev. needs, spanning automation scripts, data processing and system integration — all done in a coding paradigm. This materially widens token use cases and call volumes.

Second, user base expansion. It started with pro developers, then moved to product managers, data analysts and BI-level citizen developers. Now with Agentic Engineering, even non-developers can accomplish complex tasks via natural language. This is fundamentally about lower barriers from capability gains, democratising development capacity.

These two shifts reinforce each other and expand demand capacity. Top domestic internet companies and major overseas platforms are integrating our models and deepening usage, and their high bar for performance and stability suggests that the core driver is the intelligence frontier, not price. In this context, price increases have not curbed demand; instead, demand expands alongside price, because apex-quality tokens keep moving into higher-value, more complex tasks that are less price-sensitive and more capability-dependent.

On sustainability, capability is advancing into more complex dimensions, unlocking new use cases, while enterprise adoption is still early with room to grow both users and per-user depth. The volume–price rise reflects the transition from tools to foundational productivity infrastructure.

Q: Given recent price hikes, how sensitive are customers to token usage? Any short-term demand suppression?

A: Our data splits into 2025 and 2026. In 2025 we did not raise prices; the first hikes came around the 2026 Spring Festival. But the Coding Plan’s API pricing in 2025 was already on the high side domestically and still drove nearly 10x ARR growth from Mar to year-end.

In 2026 we only have half a quarter of data — we raised prices in a series of steps from Feb, totaling +83%. Even so, both Claw and legacy Coding products kept strong momentum; Claw hit 100k users in two days and 400k in 20 days, vs. our 242k existing Coding Plan users.

So token consumption and user growth stayed in the fast lane despite higher prices, pointing to demand mix as the key driver. Three reasons: first, token stratification — we do not primarily target low-complexity, standardized scenarios where price is sensitive, while in high-complexity, high-value scenarios clients focus on ROI, so price hikes do not suppress demand. Second, for core KA accounts including nine of the top ten internet companies, higher prices did not dent volumes; as each token’s intelligence rises, what 1 mn tokens can accomplish keeps increasing in complexity and value, supporting a higher price band. Third, at the apex, more scenarios unlock and the application boundary expands, lifting the share of high-value tokens.

In short, higher prices create stratification — they curb low-price, high-volume use, but high-value demand is less sensitive. Capability-driven expansion outweighs any price drag, underpinning our view that token consumption and ARR can keep growing over the next year.

Q: You advocate a return to base models, with solid progress in text and coding — a non-consensus approach. What is the strategic trade-off?

A: It is not a binary choice. Over the next 2–3 years, we will lean toward scaling standardized API capabilities.

First, in earlier stages when capability and customer understanding were immature, the industry packaged models into higher value-add products or solutions to match needs and monetize better — also because APIs had yet to scale and unit economics were not fully unlocked. As capability rises and calls scale fast, API-level unit economics improve rapidly; once volumes reach higher tiers, the API itself can deliver significant GP and profitability. In that case, focusing on standardized, high-quality token supply can scale very efficiently.

Second, we are not weakening on-prem deployments; we are redefining their role. We increasingly view enterprise Agents and enterprise-grade general LLM services as acquisition and wedge vehicles: solving specific problems to land quickly, then cross-selling or opening capabilities to shift customers from solutions toward direct token consumption.

Third, structurally, large cloud models keep advancing in parameters and inference efficiency, lowering unit labor and compute costs while extending capability boundaries. This challenges the economics of local models. Fundamentally, the industry is shifting to intelligent output from the cloud, delivered via standardized interfaces. Providers of stable, high-quality token supply are best placed to sit at the center of the stack.

We therefore see open platforms and APIs as the core vehicles for revenue and margin over the next 2–3 years, with Agent solutions as important complements for customer acquisition and scenario validation. The emphasis will tilt toward standardized capability output.

Q: How will you sustain an industry-leading R&D cadence and keep pushing the capability frontier over the next 1–2 years?

A: Our lead in intelligence and iteration cadence is not from a single breakthrough, but from several long-term capabilities working together.

First, long-term strategic focus. We have targeted AGI from the outset, maintaining steady investment and focus on the chosen technical path. Commercialization follows technical progress, rather than constraining the tech roadmap around a pre-set business model, enabling continuous and forward-looking capability breakthroughs.

Second, a distinctive talent system built over time. The core team has deep academic roots in LLMs and maintains strong ties with top domestic universities, attracting and nurturing top young AI talent early. Deep industry–academia integration keeps the talent pipeline full.

Third, compounding first-mover advantages. As capability grows and we accumulate data, training methods and engineering systems, R&D efficiency rises, shortening optimization cycles and increasing the frequency of minor releases.

Sustainability looks solid over the next 1–2 years. The capability frontier is still expanding rapidly and the tech curve is steepening. Differences among leaders increasingly lie in systems-level strengths such as engineering infrastructure, data assets and talent density, which are durable. Overall, our approach is closer to Anthropic — led by the capability frontier and technology — rather than competing on price or a single product form.

Risk disclosure and disclaimer:Dolphin Research Disclaimer and General Disclosure

Dolphin Research, a professional investment research team in the secondary market, offers insights into global core assets and opportunities with in-depth analysis and distinctive perspectives.

Download the Longbridge App to unlock an extensive range of content from Dolphin Research:

Covering interpretations of financial reports, summary analyses, in-depth data analyses, macro strategies, portfolio allocation insights, and more.