---
title: "Tencent HY-WU aims to break through the model ceiling: allowing the model to generate a new brain for each task"
type: "News"
locale: "en"
url: "https://longbridge.com/en/news/278248938.md"
description: "Tencent's Hunyuan team released a technical report HY-WU, aiming to break through the capability limitations of current large models, pointing out that a fixed set of parameters cannot meet the demands of diverse and contradictory tasks. Despite the AI industry's massive investment in training large models, the models still need to compromise between fixed parameters when handling user requests, resulting in compromised performance. The report proposes a new paradigm that could change the training and application methods of large models"
datetime: "2026-03-08T13:20:30.000Z"
locales:
  - [zh-CN](https://longbridge.com/zh-CN/news/278248938.md)
  - [en](https://longbridge.com/en/news/278248938.md)
  - [zh-HK](https://longbridge.com/zh-HK/news/278248938.md)
---

# Tencent HY-WU aims to break through the model ceiling: allowing the model to generate a new brain for each task

Have you ever had the experience where the same model is praised by others for being so useful, but your own usage falls short of expectations?

When GPT-5 was first released, it led the benchmarks across the board, but a large number of users complained that it lacked human touch. Its writing was stiff, and it was less comforting in breakups compared to the older GPT-4o. Heavy users directly stated that it "isn't far from becoming a rock."

OpenAI's response was to train several models, including ones for coding, general capabilities, and those suitable for conversation.

Behind this lies a fundamental issue: **A single set of parameters cannot do everything well.**

In the past three years, the AI industry has spent hundreds of billions of dollars training large models, with parameter counts increasing from tens of billions to hundreds of billions. However, one thing that few people have stopped to consider is that regardless of how large the model is, after fine-tuning, it uses the same fixed set of parameters for each user request. When tasks multiply and directions conflict, this set of parameters is forced to compromise between conflicting demands, resulting in a discount on every task.

The Tencent Hunyuan team published a technical report on March 6, HY-WU, aiming to challenge the ceiling that limits the capabilities of today's large models: when tasks are sufficiently diverse or even contradictory, there is no single set of parameters that can perform all tasks well simultaneously. This is a structural dead end, unrelated to the adequacy of training.

If their solution is validated as correct, a new paradigm for large models may emerge.

## A single set of parameters cannot serve everyone

A pre-trained large model is a jack-of-all-trades; it understands a bit of everything but lacks precision in specific tasks.

To improve performance, it needs to be retrained on specific task data, known as fine-tuning. Full fine-tuning requires adjusting all parameters, which is costly. The emergence of LoRA in 2022 took a different approach by not altering the original parameters but adding a small group of new parameters alongside and only training this group. The parameter count is less than 1% of the original model, yet the effect is close to full fine-tuning, quickly becoming an industry standard.

However, whether LoRA or full fine-tuning, they do not change one fact: **After tuning, the parameters are fixed, and all requests share the same set.**

If you have experience with raw images, you understand that each run requires loading the corresponding LoRA. Choosing the wrong LoRA can easily produce indescribable images.

Hunyuan provided a more extreme example in the report: a model may need to handle both "restoring old photos" and "aging new photos" simultaneously, where the former makes the blurry clear and the latter makes the clear blurry. A single fixed set of parameters trying to learn both tasks ends up being a compromise on both sides.

The report analyzed 60 editing tasks and 12,000 samples to conduct gradient analysis to validate this hypothesis, and the results indeed confirmed expectations: **Different tasks often require opposite adjustments to parameters, and forcing them into a single set will cancel each other out.**

So, should we train a separate set of parameters for each task? While conflicts are avoided, it leads to over-specialization, and task demands are infinite. Matching each one would be unsustainable in terms of storage and management costs.

Retrieval-augmented generation (RAG) and similar methods also do not help; they can change what the model "sees," but they cannot change how the model "processes information." When the core of the task is about changing rules rather than missing facts, adding more context is useless Traditional methods understand adaptation as "finding an optimal point in the parameter space," but when tasks are diverse and contradictory, this point does not exist.

## On-site Parameter Generation

Let's take a look at how HY-WU from Hunyuan does it.

Traditional solutions are all about "static parameter memory," compressing new knowledge into a fixed point, with all requests sharing the same during inference. HY-WU has changed the way of memory, calling it functional memory, **not looking for a fixed parameter point in space, but training a parameter generator that synthesizes a set of exclusive parameters in real-time each time it receives specific input, discarding them after use.** What the model remembers is not a fixed set of weights, but the mapping relationship of "what kind of weights should be generated under what conditions."

Using a live image as an example, when the model receives a request for old photo restoration, it will train parameters for high definition and increased saturation; when it receives a request to generate an old photo, it will train opposing parameters.

Specifically, HY-WU is divided into three steps. For ease of understanding, we can think of HY-WU as a tailor, customizing parameters for each requirement.

**Step 1: Measurement.**

A visual language encoder simultaneously looks at the input image and text instructions, clarifying two things: what the image looks like and what the user wants to do with it. This information is compressed into a set of conditional features, equivalent to the customer's body data and style preferences.

**Step 2: Tailoring.**

The conditional features are sent into an 8B parameter Neural Network Transformer. This Transformer is different from the usual ones; it outputs not text or images, but a complete set of LoRA weights, totaling 0.72B parameters.

You can understand it as calculating a cutting plan based on body data on-site. When receiving a request for "restoring old photos," it tailors parameters that enhance details; when receiving a request for "aging photos," it tailors parameters in the completely opposite direction. The entire process takes only a few seconds on an 80B base model.

**Step 3: Fitting.**

The generated LoRA is inserted into the base model for editing. The base model remains unchanged, and each inference only temporarily swaps in a set of LoRA, discarding it after use.

HY-WU also solves an engineering challenge. The shape of LoRA at each layer of the base model is different, and the paper designs a set of anchoring and chunking schemes based on LoRA rank, unifying matrices of different shapes into the same size tokens, allowing the generator to generate parameter blocks one by one like processing text sequences.

With the architecture in place, the next step is how to train this generator (tailor) The previous super-network method was somewhat like having 100 tailors each make a sample garment, collecting them as templates, and then training a new tailor to imitate these templates.

HY-WU skipped the step of collecting templates. The training is end-to-end, where the generator generates a set of LoRA based on the input, edits it using a base, evaluates the editing effect, and feeds back the results to adjust the generator. There is no need to pre-collect checkpoints or store a LoRA weight library. After millions of iterations, the generator gradually figured out what parameters to generate for different inputs from its initial random outputs.

## How effective is HY-WU?

In human preference evaluations (GBS), HY-WU's win rate against mainstream open-source image editors is between 67% and 78%. It also has advantages over closed-source commercial models, with a win rate of 55.6% against Seedream 4.5 and 55.5% against GPT Image 1.5. It is just slightly lower than the Nano Banana series.

Beyond the scores, there is a question that needs to be answered: Where does the improvement of HY-WU come from? Is it due to the addition of an 8B generator that brings more parameters, or is it because of the mechanism of "customizing parameters based on input"?

The paper designed two experiments to break down this question.

**The first experiment** averaged all the LoRA generated by the generator for a large number of samples to obtain a "uniform LoRA," and then fixed this uniform LoRA to handle all requests. The generator was still there, and the number of parameters remained unchanged, but each request received the same LoRA. This is equivalent to having the tailor still present, but regardless of who comes, they all get the same size. The result: performance immediately dropped back to baseline, similar to having no HY-WU.

**The second experiment** had the generator working as usual, but the input conditions were randomly scrambled, generating LoRA using image A with instruction B. The generator was still dynamically generating, but the parameters generated did not match the actual input. This is equivalent to having the tailor still measuring, but using Zhang San's measurements on Li Si. The performance was similarly poor.

Through these two experiments, **it was verified that the quantity of parameters is not the key; the crucial factor is that each input can obtain the set of parameters that match itself.**

## The Next Paradigm Shift in Model Development?

Looking back at the history of large model development, there are not many technological milestones that have truly changed the direction of the industry.

The Transformer architecture established the foundation in 2017. LoRA in 2022 solved the cost issue of fine-tuning, making adaptation of large models no longer a patent of big companies. MoE broke the limitation of "more parameters lead to slower inference" by using a routing mechanism that activates only a portion of the model while maintaining a large parameter count. The chain of thought enabled models to learn "step-by-step reasoning," leading to breakthroughs in mathematics and programming for the o1 and R1 series.

These technologies share a commonality: they each address the questions of "how to build" or "how to think" for models. However, one question has remained unaddressed: after a model is built, how can it provide differentiated optimal responses using the same set of parameters for different users and tasks?

The industry's default answer has been to train more models. The names of big companies' models are too numerous to count on one hand, and the open-source community has piled up tens of thousands of LoRA weight sets.

HY-WU is precisely targeting this gap. While MoE performs routing within the model, HY-WU performs routing outside the model.

Of course, it is still too early to say that HY-WU can achieve the same industry impact as MoE or the chain of thought. It has only been validated in image editing so far. They have also proposed several future exploration directions, including handling the "new and old" aspects of memory, managing capacity allocation, and whether a more universal interface can be developed for broader applications from images to videos and agents.

The evolution of models should not only be about "bigger" or "better reasoning," but also about "better understanding individual differences." If similar effects can be replicated in scenarios such as language models, video generation, and agents, it could potentially become the next paradigm shift following MoE

### Related Stocks

- [159998.CN](https://longbridge.com/en/quote/159998.CN.md)
- [00700.HK](https://longbridge.com/en/quote/00700.HK.md)

## Related News & Research

- [Ritholtz Wealth Management Acquires 12,948 Shares of ServiceNow, Inc. $NOW](https://longbridge.com/en/news/282518787.md)
- [07:36 ETSOFTSWISS gana el premio Global Tech Award a la innovación en ciberseguridad con Pipeguard](https://longbridge.com/en/news/282537014.md)
- [Jeff Bezos enters AI race with $100bn bet](https://longbridge.com/en/news/282500295.md)
- [China seeks deeper energy ties with UAE, urges protection of citizens](https://longbridge.com/en/news/282536903.md)
- [03:40 ETLundbeck to showcase new neurology data at the American Academy of Neurology Annual Meeting](https://longbridge.com/en/news/282503648.md)