
From operator development to inference acceleration: A Gen-Z developer's 'leveling up' journey

More than two years ago, Zheng Hui was a soon-to-graduate senior student, immersed in the world of code every day, busy preparing for his postgraduate re-examination, and never gave much thought to AI.
No one would have imagined that two years later, Zheng Hui would become a seasoned AI developer. The NonZero operator he independently developed was merged into the mainline of the heterogeneous computing architecture CANN. The deep learning framework he contributed to lowered the threshold for large model training. Currently, he is working on distributed inference acceleration for large models, hoping to help other developers efficiently deploy online inference services in production environments.
In the Ascend AI ecosystem, developers like Zheng Hui, who silently contribute to the wave of large models, are countless. They may not be the "main characters" of the world, but they are changing it in their own way.
01 "The First Project Assigned by My Advisor"
Rewind to April 2022. Due to his outstanding written exam results, Zheng Hui smoothly entered the postgraduate re-examination at Hangzhou Dianzi University. When his advisor asked about his research interests, Zheng Hui shared his experience of solving bugs and was introduced to distributed machine learning for the first time.
Also in April 2022, the Ascend AI Developer Day was held in Xi'an, officially launching the "Ascend Collective Intelligence Plan 2022," which included over 4,000 tasks covering operator development, model development, and innovative applications. Hangzhou Dianzi University was one of the universities that signed cooperation agreements.
Zheng Hui, who had just joined the lab, was assigned his first project by his advisor—developing and optimizing the NonZero operator using C++.
Deep learning algorithms consist of computational units called operators, which can be understood as code modules that implement specific computational logic, such as text generation, translation, and sentiment analysis tasks. A well-designed operator can not only improve model efficiency and performance but also reduce resource consumption, enabling more tasks to be completed with the same computational resources.
Because operators play a crucial role in models and this was his first such project, Zheng Hui had doubts: "Am I capable enough? Can I do it well? Will I slow others down?" After consulting with a senior lab member, Zheng Hui "bit the bullet" and took on the task.
To learn about operator development, Zheng Hui watched many video courses on Bilibili and found detailed documentation and ready-made operator libraries on MindSpore. He could also directly communicate with Ascend AI engineers when encountering problems, gradually gaining confidence in completing the project.
What impressed Zheng Hui deeply was: "When developing the operator, I saw there was already a for loop in the code and didn’t understand why I needed to implement parallelization for it—it seemed like the gains would be minimal. But the Ascend AI engineer told me that even a tiny improvement could yield significant benefits when dealing with massive data."
After nearly two months of hard work, Zheng Hui's pull request was approved. By introducing Parallel.For loops, he enabled multi-core operator execution and multi-threading acceleration, making MindSpore's NonZero operator as precise as TensorFlow and PyTorch. The code has now been merged into the CANN mainline.
More than the final result, Zheng Hui believes he gained greater value from the process: "Operator development was my first project. Going through the entire workflow—requirement analysis, performance analysis, development, testing, and optimization—gave me a deep understanding of the internal mechanisms of large models, including data flow, computational graphs, and parallel computing. More importantly, it gave me the courage to take on bigger projects. Whenever I face difficulties, I tell myself to try—how else would I know if I can succeed?"
Two months wasn’t long, but it marked Zheng Hui's transformation from a novice to a "Collective Intelligence Developer." It also validated Ascend AI's vision for the Collective Intelligence Plan: accelerating breakthroughs in foundational software by pooling industry wisdom and strength, fostering the development and ecosystem prosperity of diverse computing.
02 "The Challenge Lies in Finding Application Scenarios"
Zheng Hui, who loves solving problems, didn’t want to be an academic confined to an ivory tower. In his view, the biggest challenge for AI applications is no longer technical implementation but selecting the right scenarios—how to make generative AI deliver value in more contexts.
Zheng Hui’s perspective isn’t unfounded. AI engineering has long been a hot topic.
Even at tech giants like Google, AI scientists and engineers often face situations where "development takes a week, but deployment takes months," requiring months of checks on robustness, data consistency, and other dimensions. Whether AI can deliver better results and greater value for business scenarios is key to its adoption across industries.
Beyond foundational work like operator and deep learning framework development, Zheng Hui didn’t want to miss any opportunity to apply AI in real-world scenarios. When the Ascend AI Innovation Contest 2023 was announced, he joined without hesitation—even opting for the more challenging application track instead of the MindSpore track, where he had prior experience.
"In the team, I was mainly responsible for selecting scenarios and designing the entire ship monitoring system platform. Scenarios like smart coastal defense monitoring have extremely high data security requirements, making them ideal for domestically developed hardware-software ecosystems. So, based on Ascend’s computing platform, we applied AI to maritime fishing operations, predicting vessel routes to enhance safety."
Unlike some who participate in contests just for the sake of it, Zheng Hui’s team continued working on their project six months after the Ascend AI Innovation Contest 2023 ended: "This year, we plan to expand beyond specific regional ports and include data from coastal provinces nationwide. Our goal is to implement this project in all coastal cities, integrating broader data resources to build an invisible safety net for fishing vessels."
Many call 2023 the "Year of Generative AI," with industries racing to train their own large models. But "model training" is just the first step in AI’s industrial adoption. Between reality and intelligence lies a vast, uncharted "wild sea" that requires countless developers to act as "ferries," illuminating the path to industry-wide intelligence through practical applications.
As a second-year graduate student, Zheng Hui embodies the hope of China’s AI future.
For example, in the "fishing vessel route prediction" project, Zheng Hui specifically mentioned commercialization: "We can collaborate with insurance companies to integrate vessel routes into their risk assessment systems, providing more comprehensive risk analysis when offering financial services to fishing vessels."
Developers driven by passion may eventually lose their zeal, but those who spot commercial opportunities are the ideal "ferries." Starting as mere "rafts," they can evolve into skiffs, sailboats, and cargo ships, carrying more scenarios toward the shores of intelligence.
03 "Speeding Up Large Model Inference"
Thanks to operator development, Zheng Hui stepped into the river of AI. Through the Ascend AI Innovation Contest 2023, he recognized the pain points of real-world applications. The once AI-indifferent young man gradually developed more ideas and took on more challenging projects.
For instance, when ChatGPT went viral, Zheng Hui immediately tested it with various questions and found its response speed frustratingly slow, believing "this poor experience dampens user interest." The reason lies in the iterative nature of Transformer-based models, leading to load imbalance and underutilized computational resources during inference. In practice, inference speed often becomes a major bottleneck.
Over the past year, large model training has dominated tech discussions, but inference acceleration is even harder. Training acceleration is mainly influenced by data and model parallelism, while inference acceleration depends on model architecture, computational graph optimization, memory access, and real-time requirements—processing vast data with minimal latency.
To tackle large model inference, Huawei’s 2012 Labs and Hangzhou Dianzi University formed a "task force." Zheng Hui volunteered to join, contributing to MindSpore Serving’s development to help developers efficiently deploy online inference services.
Unlike approaches that sacrifice accuracy for speed, Zheng Hui’s strategy implemented and optimized the Fastserve paper. By using multi-level request queues, dividing requests into different queues, and employing preemptive scheduling to reduce latency, starvation scheduling to wake low-priority requests, and proactive Kvcache management to maximize resource utilization, he improved system throughput while lowering average completion time.
"Without my initial operator development experience, I might not have dared to tackle large model inference acceleration. From an operator’s perspective, inference acceleration is essentially operator optimization, followed by scripting to improve speed and throughput," Zheng Hui recalled.
Compared to his solo operator development, collaborating on Fastserve gave Zheng Hui deeper insights into teamwork: "An individual can move faster, but a team can go further. In a team, you’re like a gear—technical depth determines how deeply it meshes with the machine, while collaboration smooths its rotation."
At 23, Zheng Hui hasn’t been in AI development for long, but his growth trajectory is emblematic: serendipitously entering the Ascend AI ecosystem, embarking on an "upgrading journey" through projects that sharpen his understanding of scenarios and technology, and learning to collaborate effectively to solve complex challenges.
04 Epilogue
A dream-driven sprint will eventually lead to a radiant life.
It’s young talents like Zheng Hui—technically skilled, visionary, and business-savvy—who fearlessly dive into AI, sweat over code, and solve technical challenges with wisdom and perseverance, that give us hope for AI’s integration across industries.
The copyright of this article belongs to the original author/organization.
The views expressed herein are solely those of the author and do not reflect the stance of the platform. The content is intended for investment reference purposes only and shall not be considered as investment advice. Please contact us if you have any questions or suggestions regarding the content services provided by the platform.

