<p>Nvidia is making strides in the competition for autonomous intelligent agent infrastructure, marking a strategic shift for the chip giant from being a hardware supplier to deeply extending into the model layer in the artificial intelligence (AI) race.</p>
<p>On Wednesday, November 11th, Eastern Time, Nvidia announced the launch of the next-generation open-source large language model, Nemotron 3 Super, designed specifically for enterprise-level multi-agent systems. With a new mixture of experts (MoE) architecture, it boosts inference throughput to more than five times that of the previous generation model. The total parameter count of this model reaches 120 billion, activating only 12 billion parameters during inference, and natively supports a context window of 1 million tokens.</p>
<p>Nvidia stated that Nemotron 3 Super has topped the Artificial Analysis rankings in terms of efficiency and openness, leading in accuracy among models of the same scale, and drives Nvidia's AI-Q research agents to rank first in both the DeepResearch Bench and DeepResearch Bench II leaderboards.</p>
<p><img src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/e57a9da9-6e5d-4d1d-93f4-9f659816993a.jpeg?x-oss-process=image/auto-orient,1/interlace,1/resize,w_1440,h_1440/quality,q_95/format,jpg" alt="" original-src="https://imageproxy.pbkrs.com/https://wpimg-wscn.awtmt.com/e57a9da9-6e5d-4d1d-93f4-9f659816993a.jpeg"/></p>
<p>Nvidia disclosed the first batch of partners for Nemotron 3 Super. AI search company Perplexity has become the first partner to access this model for executing agent tasks, providing users with multi-agent orchestration services in search and computer products. Enterprise software giants such as Palantir, Siemens, Cadence, Dassault Systèmes, and Amdocs have also announced plans to deploy this model for workflow automation in telecommunications, cybersecurity, semiconductor design, and manufacturing.</p>
<p>The Nemotron 3 Super model is now available to developers through Nvidia's build.nvidia.com, Hugging Face, and OpenRouter channels.</p>
<h2>Two Major Bottlenecks Give Rise to New Architecture</h2>
<p>Nvidia pointed out in a blog that enterprises face two core constraints when transitioning from chatbots to multi-agent applications.</p>
<p>The first is &#34;context explosion&#34;: multi-agent workflows require the complete historical record (including tool outputs and intermediate reasoning steps) to be retransmitted with each interaction, resulting in a token count that can be up to 15 times that of standard conversations. As tasks extend, this massive context not only increases costs but can also lead to &#34;goal drift&#34;—agents gradually deviating from their original objectives.</p>
<p>The second is &#34;thinking tax&#34;: complex agents must reason at each step, and if each sub-task calls upon a large model, multi-agent applications will struggle to be implemented due to high costs and slow responses.</p>
<p>Nemotron 3 Super directly addresses the context explosion issue with its 1 million token native context window, ensuring agents maintain state coherence in ultra-long tasks and preventing goal drift. The mixed architecture design specifically alleviates the thinking tax</p>
<h2>Triple Architecture Innovation Supports Fivefold Acceleration</h2>
<p>Nvidia's blog reveals that the performance leap of the Nemotron 3 Super comes from three core innovations at the architectural level.</p>
<ul>
<li>Hybrid Mamba-Transformer Backbone Network: The model interleaves the deployment of Mamba-2 layers and Transformer attention layers. The Mamba layer handles most sequence tasks, providing a fourfold improvement in memory and computational efficiency with linear time complexity, making a million-token context window practically feasible; the Transformer layer is inserted at critical depths to ensure precise associative recall capability.</li>
<li>Latent Mixture of Experts (MoE): Before routing decisions, token embeddings are compressed into a low-rank latent space, with expert computations completed in this smaller dimension before being projected back to the full dimension. Nvidia states that this design allows the model to activate four times the number of experts at the same inference cost, achieving finer-grained specialized routing—such as activating different experts for Python syntax and SQL logic.</li>
<li>Multi-Token Prediction (MTP): The model synchronously predicts multiple future tokens in a single forward pass, rather than generating them token by token. Nvidia claims that this design enhances the model's internalization of long-range logical dependencies during the training phase and incorporates speculative decoding capabilities during the inference phase, achieving up to three times speed improvement for structured generation tasks like code and tool invocation, without the need for additional draft models.</li>
</ul>
<p>On Nvidia's Blackwell platform, this model runs at NVFP4 precision, achieving up to four times the inference speed compared to the FP8 on Nvidia's Hopper platform, with no loss in accuracy, according to Nvidia.</p>
<h2>Open Weights Overlay Multi-Layer Ecological Layout</h2>
<p>Unlike the current mainstream cutting-edge models that generally adopt an API-only access method, Nvidia has chosen to open the weights, datasets, and training schemes of the Nemotron 3 Super under a permissive licensing agreement, allowing developers to freely deploy and customize it on workstations, data centers, or the cloud.</p>
<p>Nvidia has also publicly released the complete training and evaluation scheme, covering the entire process from pre-training to alignment, and has published over 10 trillion tokens of pre-training and post-training datasets, 21 reinforcement learning training environments, and evaluation schemes. During the pre-training phase, the model was trained on 25 trillion tokens at NVFP4 native precision, learning accuracy under the constraints of four-bit floating-point operations from the first gradient update, rather than through post-quantization.</p>
<p>At the ecological level, Nvidia has partnered with major cloud service providers and hardware manufacturers such as Google Cloud Vertex AI, Oracle Cloud Infrastructure, Dell Technologies, and HPE. Access to Amazon AWS Bedrock and Microsoft Azure is also in preparation. Software development agent companies like CodeRabbit, Factory, and Greptile, as well as life sciences institutions Edison Scientific and Lila Sciences, have also announced plans to integrate this model into their agent workflows</p>
<h2>&#34;Super+Nano&#34; Combination Deployment</h2>
<p>Nvidia also elaborated on the collaborative deployment logic of the Nemotron 3 series in its blog. The Nano version of the Nemotron 3 model, launched last December, is suitable for handling targeted single-step tasks within agent workflows, while the Nemotron 3 Super is designed for complex multi-step tasks that require deep planning and reasoning.</p>
<p>Taking the software development scenario as an example, Nvidia suggests: simple merge requests can be handled by Nano, while complex coding tasks that involve a deep understanding of the codebase should be undertaken by Super, and expert-level tasks can further call upon third-party proprietary models. This layered architecture aims to help enterprises seek an optimal balance between cost and capability.</p>
<p>In specific application scenarios, Nvidia's blog cites that software development agents can load the entire codebase into context at once, achieving end-to-end code generation and debugging; in financial analysis scenarios, thousands of pages of reports can be loaded into memory, eliminating repetitive reasoning across long dialogues; in cybersecurity, autonomous security orchestration scenarios can benefit from high-precision tool calls, avoiding execution errors in high-risk environments.</p>
<h2>Hardware Moat's Model Layer Extension</h2>
<p>The rationale behind Nvidia's open model strategy is based on a clear business logic. Previously, Nvidia primarily accumulated its dominant position in the AI field by selling GPUs to model providers like OpenAI and Google. Now, if Nemotron becomes the mainstream foundational model for enterprise intelligent agents, the GPU infrastructure required for large-scale operation of this model will still rely on Nvidia—consolidating hardware layer demand while promoting openness at the model layer.</p>
<p>Currently, the Nemotron 3 Super has been packaged and delivered through Nvidia's NIM microservices, supporting flexible deployment from local to cloud. Whether performance data can be validated under production-level workloads, and how enterprise clients make trade-offs between open flexibility and competitors' proprietary model capabilities, will be key variables in assessing the effectiveness of this strategy</p>

159325

SOXX

512760

NVDY

NVDU

NVDL

SOXL

<p>Nemotron 3 Super activates only 12 billion active parameters during inference, natively supporting a context window of 1 million tokens; the performance leap comes from three architectural innovations: hybrid Mamba-Transformer backbone network, latent mixture of experts (latent MoE), and multi-token prediction (MTP). This model runs on the Blackwell platform with NVFP4 precision, achieving inference speeds up to four times that of Hopper platform FP8, with no loss in accuracy. Perplexity has become the first partner to access this model for executing agent tasks</p>

- Nvidia is intensifying its efforts in the autonomous agent infrastructure competition, marking a strategic shift from hardware supplier to deep model layer involvement in AI.  
- The newly launched Nemotron 3 Super model, designed for enterprise-level multi-agent systems, boasts a total of 120 billion parameters and enhances inference throughput by over five times compared to its predecessor, addressing challenges like context explosion and reasoning costs.  
- Nvidia has opened access to Nemotron 3 Super through various platforms, collaborating with tech giants to enhance automation workflows across industries while aiming to solidify its GPU demand in the AI landscape.

Wallstreetcn

SPDR S&P Semicon

China Southern CSI Semiconductor Industry Custom ETF

iShares Semiconductor ETF

Guotai CES Semiconductor Chip Industry ETF

YieldMax NVDA Option Income Strategy ETF

VanEck Vectors Semiconductor UCITS ETF Accum A USD

Direxion Daily NVDA Bull 2X Shares

Invesco Semiconductors ETF

GraniteShares 2x Long NVDA Daily ETF

Direxion Semicon Bull 3X

Nvidia focuses on intelligent agents! The open-source model Nemotron 3 Super has 120 billion parameters and a fivefold increase in throughput