With an initial stock of 500,000 units, who is the Doubao AI phone going to be sold to?

Wallstreetcn
2025.12.01 11:50
portai
I'm PortAI, I can summarize articles.

ByteDance's Doubao team released a technical preview version of the Doubao Phone Assistant, in collaboration with ZTE to launch the engineering prototype nubia M153, priced at 3,499 yuan. The Doubao Phone Assistant is based on the Doubao large model and authorized by mobile phone manufacturers, aiming to reconstruct the interaction logic of the mobile internet. The initial sales stock is 500,000 units, with the goal of moving from geek toys to a broader user market

On December 1st, ByteDance's Doubao team released the technical preview version of the Doubao Phone Assistant.

According to reports, the Doubao Phone Assistant is an AI assistant software developed in collaboration with smartphone manufacturers at the operating system level, based on the Doubao APP. Leveraging the capabilities of the Doubao large model and the authorization from smartphone manufacturers, the Doubao Phone Assistant can provide users with more convenient interactions and richer experiences.

At this stage, developers and tech enthusiasts can experience the technical preview version of the Doubao Phone Assistant on the engineering prototype nubia M153, developed in collaboration with ZTE. This version is currently available for a limited sale to developers and tech enthusiasts, priced at 3,499 yuan.

The emergence of the Doubao Phone Assistant aims to bridge the gap between apps using AI Agents, reconstructing the interaction logic of the mobile internet.

Although the current demonstration still faces a disclaimer regarding "technical uncertainty," this attempt to delve into the underlying operating system and pursue "intent-driven services" may be more innovative than a simple Chatbot.

Doubao Phone Design |Image Source: Doubao Official

Perhaps, whoever can first solve the stability issue of "operating a phone" will define the "iPhone moment" of the AI era.

Previously, according to a former hardware product manager at ZTE, ByteDance and Nubia have prepared a first-sale stock of 500,000 units for this phone and have ordered the corresponding number of key components.

In the current smartphone market, mainstream flagship models from domestic brands typically have a first-sale stock of 2-3 million units. Therefore, while the 500,000 units for the Doubao Phone may not compare to flagship phones from leading manufacturers with annual shipments exceeding 10 million, the goal of the Doubao Phone moving away from being a "geek toy" towards a broader user market is already quite clear.

A first-sale stock of 500,000 units, if fully released to the market, is still a number that could have a certain impact on the industry: for comparison, the leading player in the vertical gaming phone market — Black Shark, had a smartphone shipment of 1-1.5 million units in 2022-2023.

01 From "Dialogue Box" to "Action-Oriented"

In the past two years, we have become accustomed to Chatbots that can write poetry and draw pictures, but for ordinary users, the most painful pain point on their phones is often the cumbersome operation flow. The highlight of the Doubao Phone Assistant this time lies in its attempt to leap from "dialogue" to "action."

In the demonstration of the technical preview version, Doubao showcased a capability often mentioned in previous GUI Agent research — it can "understand" the screen like a human and directly simulate click operations This ability to "understand the screen" and simulate human operations comes from the accumulation of multimodal capabilities in the Doubao large model.

According to official sources, the model's performance in visual understanding, reasoning, and image creation is already at the international first tier. It is precisely because the model possesses accurate graphical user interface (GUI) recognition capabilities that it can achieve high scores in multiple authoritative evaluations, allowing it to understand the meanings of "buttons" and "input boxes" like a human, rather than just recognizing a bunch of code.

According to the official user documentation of Doubao mobile, Doubao will automatically determine whether to invoke AI Agent capabilities based on intent. If the user's conversation begins with "help me operate my phone," it will 100% complete the task through AI operation of the phone.

The more detailed the task description, the higher its execution efficiency and effectiveness. For example: "Open Meituan Waimai and help me write good reviews for the recent orders." Additionally, AI operation of the phone is done through virtual screen operations, which will not be expanded by default in the foreground and will not affect other ongoing tasks; you can return to the desktop to use other applications at any time.

Users can also directly converse with Doubao, stating their needs, and Doubao can automatically determine whether to complete the task through phone operation functions. Users can find the "Operate Phone" button at the bottom of the Doubao dialogue box, click the button to manually describe their needs, or set conditions for tasks such as timing.

Imagine a scenario where you are influenced by a good product on social media. In the past, you would need to take a screenshot, exit the app, open the e-commerce platform, search, and compare prices.

However, in Doubao's demonstration, you only need to say, "Help me compare prices and place an order across all platforms," and the AI can automatically jump across applications, search for the same product, compare prices and specifications, collect coupons, and even help you select the lowest-priced item to add to the shopping cart.

Image source: Doubao mobile user guide document

Although for security reasons, the payment process still requires manual confirmation, the series of mechanical clicks and switches have already been handled by AI.

Even complex tasks can be executed. In an official demonstration of travel planning, when a user issues a command like "Next month I'm going to Paris, help me mark the restaurants I've saved on the map, see which day has an exhibition and book tickets," the AI can quickly break down the request into six sub-tasks: from checking social media favorites, to marking on Amap, to booking tickets on Ctrip, and finally organizing everything into a memo.

This ability to execute "task chains" across applications and multiple steps can be considered one of the key milestones in AI's transition from "toys" to "tools."

To achieve this "human-like" interaction, Doubao has opened up multiple permissions at the system level.

At the system level, Doubao mobile has designed various interaction methods for AI capabilities, allowing users to wake it up through the side button, voice, or even headphones; in the photo album, it can directly understand and execute commands like "remove the passerby."

Image Source: Doubao Mobile User Guide Document

In the more complex "Pro Mode," it can also call system tools, combine memory functions, and directly complete complex tasks such as "recommending gifts and adding them to the shopping cart," which require multi-step reasoning.

Image Source: Doubao Mobile User Guide Document

Of course, handing over screen control and personal preferences to AI always raises concerns about privacy and security. Therefore, the Doubao team also emphasizes that this feature supports on-demand activation and promises to strictly protect data privacy.

As a "technical preview version," the Doubao team also specifically reminds at the end of the video that due to the uncertainty of large model technology, the "smooth" experience demonstrated cannot yet be fully replicated, and the product is still some distance from the team's final expectations.

This also reflects the most realistic state of AI Agents at present: the direction is extremely attractive, but implementation still requires time for refinement.

02 The "Third Path" of Not Creating Hardware

In the wave of AI smartphones, there have always been two schools of thought: one is like Google / Pixel phones, which develop their own models and a complete set of AI software product experiences, integrating them into their own systems; the other is pure software vendors trying to seize entry points through super apps.

Image Source: Google

Doubao has chosen the third path: not making hardware, but creating an ecosystem.

While releasing the preview version, Doubao clearly stated that "there are no plans to develop their own mobile phones." Their strategy is very pragmatic—by negotiating with multiple mobile phone manufacturers, they aim to embed Doubao's large model capabilities into different brands' devices through "operating system-level cooperation."

This deep coupling of "mobile phone manufacturers + large model manufacturers" is becoming a new trend in the industry.

Just like the collaboration between Google Gemini and Samsung, specialization is gradually becoming a consensus.

For mobile phone manufacturers, building a model from scratch that possesses top-level reasoning, visual understanding, and complex task planning capabilities is extremely costly; while for internet giants like ByteDance, lacking a hardware carrier means that AI will always be separated by a glass wall of an app, unable to reach the most core data and scenarios of users.

The current nubia M153 engineering machine is just a start. The price threshold of 3499 yuan may be more of an "invitation" aimed at developers and geek audiences, intended to validate the technical feasibility and user feedback of this cross-industry collaboration

03 Just having an APP is no longer enough in the AI era

The emergence of Doubao Mobile Assistant may essentially be a reconstruction of the interaction logic of the mobile internet.

As the capabilities of large models become stronger, simply having an APP is no longer sufficient in the AI era.

AI Agents need to take on more complex tasks, perceive richer contexts, and perform some real functions to have more practical value. This means they must step outside the walls of software, integrate deeply with the underlying permissions of the operating system and hardware capabilities.

In the past, ByteDance has always been a powerful "air force"—possessing extreme algorithms and a vast application ecosystem. However, compared to Google, which has Android, or Huawei, which has a full-scenario terminal, ByteDance has always lacked a grounded "territory" in operating systems and terminal hardware.

In the mobile internet era, this may not have been a problem, but in the current situation where AI needs to deeply intervene in user scenarios, the lack of hardware carriers may mean a loss of perceptual ability in those scenarios.

The launch of Doubao Mobile Assistant seems to be an exploration thrown out by ByteDance at this stage.

From Pico to Ola Friend, and now to the assistant deeply integrated into the mobile OS layer, ByteDance is cautiously addressing the shortcoming of "hardware touchpoints."

This may not be the final form of the industry in the next two to three years, but it can be confirmed that: ByteDance has realized that to truly enable AI, it must take this crucial step of "combining software and hardware."

Source: Geek Park

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not consider the specific investment goals, financial conditions, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investing based on this is at one's own risk