---
title: "Meta released Muse Spark: The Chinese dream team rebuilds the ruins, and the one who hates Llama the most is indeed Mark Zuckerberg himself"
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/282118707.md"
description: "Meta has released its first model, Muse Spark, marking a comprehensive restart for the company after the collapse of Llama. Zuckerberg dismantled the old team and formed a new AI research and development team primarily composed of Chinese scientists, overturning the technical architecture of the Llama era. Muse Spark is a lightweight multimodal reasoning model with native multimodal capabilities, able to think in visual space and construct relationships between visual elements"
datetime: "2026-04-09T01:45:30.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/282118707.md)
  - [en](https://longbridge.com/en/news/282118707.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/282118707.md)
---

# Meta released Muse Spark: The Chinese dream team rebuilds the ruins, and the one who hates Llama the most is indeed Mark Zuckerberg himself

After Llama's complete "collapse," Meta founder and CEO Mark Zuckerberg personally dismantled the past teams and structures, fully moving towards an "anti-Llama" approach, investing billions to build an AI research team primarily composed of Chinese scientists. Today, nine months later, amidst the attention of the entire Silicon Valley and much mockery, he and this new team have finally delivered their first model, attempting to prove that a complete AI stack built from scratch is operational.

On April 8, Meta officially released Muse Spark, the first model since the establishment of MSL (Meta Superintelligence Labs). Nine months ago, Alexandr Wang joined Meta as Chief AI Officer, bringing with him a group of core researchers from OpenAI, and overhauling the entire technology stack of the Llama era—new infrastructure, new architecture, new data pipelines, all built from the ground up. Muse Spark is the first output of this new stack, and it is now directly online driving Meta AI.

Against the backdrop of Llama 4 being passive due to benchmark fraud allegations, this is a comprehensive restart for Meta.

## What is Muse Spark

It is a model that is designed to be the opposite of Llama:

A deliberately designed compact, lightweight, and high-response native multimodal inference closed-source model.

First, let's look at its core capabilities:

Native multimodal: It is not a "stitched" architecture where a visual encoder is hardwired to a text model. From the pre-training stage, text, images, and speech are trained in the same high-dimensional feature space. This means it processes images without needing to translate them into text descriptions, but rather extracts information directly from the pixel level.

Visual Chain of Thought (VCoT): Traditional chain-of-thought reasoning is purely textual, where the model gradually disassembles problems in text. Muse Spark introduces this mechanism into the visual space—it can "think" within images, autonomously constructing spatial and logical relationships between visual elements.

Contemplating Mode: This is a limit reasoning mode comparable to Gemini Deep Think and GPT Pro. The difference is that it does not perform single-threaded serial reasoning, but instead simultaneously activates multiple parallel computing sub-agents in the background, each handling different dimensions of the task, with the main control system merging the results. In contemplating mode, Humanity's Last Exam reached 58%, and FrontierScience Research reached 38% Tool invocation and multi-agent orchestration: natively supported, not something added later.

Currently, Muse Spark has been launched on meta.ai and the Meta AI app, with Contemplating Mode gradually being rolled out and a private API preview opened to a small number of partners.

## Technical Highlights: What the Chinese Team is Saying

Today, the MSL team almost collectively posted on X, with several key pieces of information worth noting:

Meta's official blog released an extremely important piece of data: during the pre-training phase, the computational power required for the new stack to reach equivalent capability levels has decreased by more than an order of magnitude compared to the previous generation Llama 4 Maverick. This is not a percentage optimization but an efficiency improvement of over 10 times. The blog states "over an order of magnitude less compute" and that it is "significantly more efficient than the leading base models available for comparison"—even more efficient than other companies' base models.

The most important sentence in Alexandr Wang's nine threads is: "we saw predictable scaling across pretraining, RL, & test-time reasoning." Predictable scaling was observed across three lines: pre-training, reinforcement learning, and test-time reasoning—this may be more important than any benchmark number. It means that this stack is not just a lucky shot but a system with a smooth scaling curve.

Chief Scientist Zhao Shengjia (@shengjia\_zhao) provided a more specific description: the training path of this model is "end-to-end education"—school (pre-training), homework (RL), on-the-job training (continuous learning after product deployment). He emphasized "we just got started." There is an interesting technical detail in the RL section. Bi Shuchao (@shuchaobi) mentioned the most painful part of training: the instability of large-scale RL and "fighting reward hacking"—counteracting cheating in the reward mechanism. However, the official blog shows that they ultimately ran the RL to a state of "smooth, predictable gains," with both pass@1 and pass@16 showing log-linear growth, and it can also generalize smoothly on unseen evaluation sets.

Even more interesting is the phenomenon of "phase transition" that occurred during RL training: the team introduced a thinking time penalty, where the model initially improved performance by thinking longer, then learned "thought compression" under the pressure of penalties—solving the same problem with fewer tokens, and then extending reasoning again to achieve higher performance. Ananya Kumar (@ananyaku) referred to this process as "pretty neat" in a post.

Another set of charts released by Ananya shows the key insight of multi-agent reasoning: multiple agents reasoning in parallel can achieve higher performance than a single agent under the same latency. In other words, Contemplating Mode is not just about "letting the model think longer," but "letting multiple models think about different things simultaneously."

Yu Jiahui (@jhyuxm), as the chief architect of the multimodal foundation, said something very interesting: "It's been a fulfilling journey not just building the model, but the team and culture behind it." Building a model is one thing, but building a team and culture is another— they accomplished both in nine months.

!\[\](https://cdn.pingwest.com/portal/2026/04/09/portal/2026/04/09/Hp3AMTz75ytW8NQtjcYH7Db43mwP7i5e? Jason Wei (@\_jasonwei) has the most vivid memories: "In the first week, we had a long dinner in the cafeteria, discussing research directions, and then returned to the table to write a basic inference llama script. Now we have a fairly complete tech stack, and the first model has been released."

## Benchmark: What leads and what doesn't, back to the table first

Let's take a look at the benchmark data:

HealthBench Hard (extremely difficult medical Q&A): Muse Spark 42.8, GPT-5.4 is 40.1, Gemini 3.1 Pro only 20.6, Claude Opus 4.6 only 14.8. Absolutely leading, nearly two to three times higher than other models.

CharXiv Reasoning (deep understanding of scientific paper charts): 86.4, the highest in the industry.

SWE-bench Pro (real software engineering tasks): 55.0%, surpassing Claude Opus 4.6's 51.9%.

Artificial Analysis Comprehensive Intelligence Index: 52 points, while GPT-5.4 and Gemini 3.1 Pro are both 57 points.

Meta wants to illustrate that Muse Spark is indisputably the first in the two fields of medical multimodal and scientific chart understanding, which require "truly understanding the graph." It has also entered the first tier in code engineering.

However, its overall capability still lags behind GPT-5.4 and Gemini 3.1 Pro by 5 points, and it has not yet shaken the accumulation of Anthropic and Google in advanced text reasoning.

Such performance continues to attract some criticism, with Ndea's co-founder François Chollet directly calling Muse Spark "a disappointing model," believing that the model has overly optimized for public benchmarks at the expense of practical usability—while Alexandr Wang's response is very restrained: acknowledging that the model performed poorly on evaluations like ARC AGI 2 and emphasizing that this data has been proactively disclosed Chollet's doubts are not without reason. During the Llama 4 era, Meta suffered a blow to its reputation due to a benchmark fraud scandal. This time, Muse Spark still lags behind GPT-5.4 and Gemini 3.1 Pro by five points on the Artificial Analysis composite index, with gaps in medical and research charts. Whether this is due to targeted optimization for specific benchmarks or the genuine capabilities brought by the native multimodal architecture remains to be answered by more independent third-party testing.

Muse Spark is certainly important, but its most significant meaning does not lie in today's benchmark scores.

From the design of this model to the technical highlights emphasized by these researchers, everything points to a rejection of Llama: the major failure of Llama 4 is something Zuckerberg wants to completely turn the page on. Therefore, not only does its open-source route need to change, but its model architecture must also be revised, and more importantly, its entire training infrastructure must be overturned. The posts by these core authors on X seem to revolve around the reconstruction of the underlying tech stack. This release of Muse Spark also clarifies Zuckerberg's purpose in bringing in Alexander Wang.

The one who hates Llama the most is Zuckerberg himself; he must completely overturn it and rebuild from the ruins.

This release is also the first model delivered by the Chinese team after Meta recruited talent. Yu Jiahui (former head of the perception team at OpenAI and core developer of GPT-4o), Zhao Shengjia (former leader of synthetic data research at OpenAI and co-creator of ChatGPT), Ren Hongyu (former core contributor to OpenAI o1/o3 reasoning), Bi Shuchao (former head of multimodal post-training at OpenAI), and Lin Ji (former core optimization expert at OpenAI) — these AI scientists, lured by Meta with signing bonuses of over a hundred million dollars, form a star team on paper. They must first use a model to bring Meta back to the table. This is Zuckerberg's urgent priority.

Nine months ago, Zuckerberg handed them a blank slate. Today, the answer they delivered is actually more of a complete stack for pre-training, RL, and inference during testing, and — crucially — the scaling curve is smooth and predictable.

A larger model is already on the way

### Related Stocks

- [METU.US](https://longbridge.com/en/quote/METU.US.md)
- [IXP.US](https://longbridge.com/en/quote/IXP.US.md)
- [CLOU.US](https://longbridge.com/en/quote/CLOU.US.md)
- [METW.US](https://longbridge.com/en/quote/METW.US.md)
- [FDN.US](https://longbridge.com/en/quote/FDN.US.md)
- [XLC.US](https://longbridge.com/en/quote/XLC.US.md)
- [IDGT.US](https://longbridge.com/en/quote/IDGT.US.md)
- [XSW.US](https://longbridge.com/en/quote/XSW.US.md)
- [METD.US](https://longbridge.com/en/quote/METD.US.md)
- [FBL.US](https://longbridge.com/en/quote/FBL.US.md)
- [IGV.US](https://longbridge.com/en/quote/IGV.US.md)
- [FCOM.US](https://longbridge.com/en/quote/FCOM.US.md)
- [XDAT.US](https://longbridge.com/en/quote/XDAT.US.md)
- [VOX.US](https://longbridge.com/en/quote/VOX.US.md)
- [DAT.US](https://longbridge.com/en/quote/DAT.US.md)
- [META.US](https://longbridge.com/en/quote/META.US.md)
- [DTCR.US](https://longbridge.com/en/quote/DTCR.US.md)

## Related News & Research

- [Meta Ditches Llama, Analyst Expects Muse Spark To Revamp AI Roadmap](https://longbridge.com/en/news/282224416.md)
- [KeyBanc Analyst Slashes Meta Platforms Stock Forecast to $760 as 'Llama 4' AI Costs Start to Bite](https://longbridge.com/en/news/282178006.md)
- [Meta rides first major AI launch under Alexandr Wang to 9% stock surge](https://longbridge.com/en/news/282176632.md)
- [Meta transfers top engineers into new AI tooling team](https://longbridge.com/en/news/282256242.md)
- [BREAKINGVIEWS-Meta ignites a spark of big-spending AI hope](https://longbridge.com/en/news/282232885.md)