NVIDIA Unveils Nemotron 3 Open Models to Power Multi-Agent AI Systems

Perplexity CEO Aravind Srinivas said the company is using Nemotron within its agent routing system to optimise performance.

Share

NVIDIA on Monday announced the NVIDIA Nemotron 3 family of open models, datasets and libraries aimed at supporting the development of transparent and efficient multi-agent AI systems across industries.

The Nemotron 3 lineup includes Nano, Super and Ultra models built on a hybrid latent mixture-of-experts (MoE) architecture, which NVIDIA says is designed to reduce inference costs, limit context drift and improve coordination among multiple AI agents.

“Open innovation is the foundation of AI progress,” NVIDIA founder and CEO Jensen Huang said. “With Nemotron, we’re transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale.”

Among the three models, Nemotron 3 Nano is available immediately. It is a 30-billion-parameter model that activates up to 3 billion parameters per task and is optimised for low-cost inference use cases such as software debugging, summarisation and AI assistants. NVIDIA said the model delivers up to four times higher token throughput than Nemotron 2 Nano and reduces reasoning token generation by up to 60%.

It is available on Hugging Face and through inference providers such as Baseten, DeepInfra, Fireworks, FriendliAI, OpenRouter and Together AI.  The model is also offered as an NVIDIA NIM microservice for deployment on NVIDIA-accelerated infrastructure. 

Nemotron 3 Nano will also be available on AWS via Amazon Bedrock and supported on multiple cloud platforms in the coming months.

On the other hand, Nemotron 3 Super is a roughly 100-billion-parameter model, designed for multi-agent applications requiring low latency, while Nemotron 3 Ultra, with about 500 billion parameters, is intended for deep reasoning and long-horizon planning tasks.

Both Super and Ultra use NVIDIA’s 4-bit NVFP4 training format on Blackwell GPUs to reduce memory requirements. These models are expected to be available in the first half of 2026.

The launch comes as companies move beyond single AI chatbots toward collaborative agent-based systems, where multiple models work together on complex workflows. 

According to NVIDIA, Nemotron 3 allows developers to route tasks between frontier proprietary models and open Nemotron models within the same workflow to balance reasoning capability and cost efficiency.

NVIDIA said the Nemotron 3 family also aligns with its sovereign AI strategy, allowing governments and enterprises to deploy models tailored to local data, regulations and policy requirements. Organisations across Europe and South Korea are among those adopting the open models, the company said.

Several enterprise customers and partners, including Accenture, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens, Synopsys and Zoom, are integrating Nemotron models into AI workflows spanning manufacturing, cybersecurity, software development and communications.

Perplexity CEO Aravind Srinivas said the company is using Nemotron within its agent routing system to optimise performance. “We can direct workloads to fine-tuned open models like Nemotron 3 Ultra or use proprietary models when tasks require it,” he said.

Alongside the models, NVIDIA released three trillion tokens of pretraining, post-training and reinforcement learning datasets, including an Agentic Safety Dataset for evaluating multi-agent systems. The company also open-sourced NeMo Gym, NeMo RL and NeMo Evaluator to support training, customisation and evaluation of agentic AI.

ALSO READ: Broadcom Reveals $21 Billion Google TPUs Order from Anthropic

Staff Writer
Staff Writer
The AI & Data Insider team works with a staff of in-house writers and industry experts.

Related

Unpack More