Nemotron 3 Nano Omni Delivers 9x Throughput on Multimodal Tasks

NVIDIA released Nemotron 3 Nano Omni on April 28, a 30-billion-parameter multimodal model that processes vision, audio, and text in a single forward pass. The model achieves 9x higher throughput than comparable open multimodal models by unifying what traditional systems split across separate specialist models—one for speech, one for vision, one for language reasoning. Despite 30B total parameters, the architecture activates only 3B per inference, enabling deployment on edge hardware like Jetson and DGX.

H Company's computer-use agent processes full HD screen recordings at 1920x1080 native resolution using Nemotron 3 Nano Omni. "To build useful agents, you can't wait seconds for a model to interpret a screen. By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn't practical before," said Gautier Cloix, H Company CEO. In preliminary OSWorld benchmark evaluations, H Company's agents showed improvement in navigating complex graphical interfaces.

FIG. 02 Nemotron 3 Nano Omni achieves 9× higher throughput than comparable open multimodal models in the same interactivity class. — NVIDIA, 2025

The model ranks first on six leaderboards covering document intelligence, video understanding, and audio understanding. Enterprise use cases—compliance agents parsing mixed-media PDFs, customer-service agents correlating call audio with CRM data, manufacturing systems processing camera feeds—can now run on a single inference path instead of requiring separate models per domain.

FIG. 03 Nemotron 3 Nano Omni leads on six leaderboards spanning document, video, audio, and additional multimodal benchmarks. — NVIDIA

Seven companies are in production: Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir, and Pyler. Seven more are in evaluation: Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle, and Zefr. These early adopters span manufacturing, healthcare, finance, and media.

NVIDIA ships Nemotron 3 Nano Omni with open weights, training datasets, and training recipes. Organizations in regulated industries or with data-sovereignty constraints can fine-tune and deploy on-premises without routing inference through external APIs. The broader Nemotron 3 family has logged 50 million downloads over the past year; the new omnimodal capability at the nano tier expands customization surface.

The 9x throughput claim applies specifically to models supporting real-time turn-by-turn interaction—not all open multimodal systems deliver this. Document-heavy pipelines with minimal audio will see different gains than audio-visual scenarios. The OSWorld results are preliminary and not yet independently verified. Teams evaluating adoption should test workloads on their own data.

Nemotron 3 Nano Omni is available now on Hugging Face, OpenRouter, and build.nvidia.com as an NVIDIA NIM microservice.

Sources

Nemotron 3 Nano Omni delivers 9x higher throughput than other open omni models with the same interactivity
"It pairs this efficiency with strong multimodal perception accuracy, enabling AI systems to achieve 9x higher throughput than other open omni models with the same interactivity."
blogs.nvidia.com ↗
Nemotron 3 Nano Omni uses a 30B-A3B hybrid mixture-of-experts architecture
"By combining vision and audio encoders within its 30B-A3B, hybrid mixture-of-experts architecture, Nemotron 3 Nano Omni eliminates the need for separate perception models"
blogs.nvidia.com ↗
H Company's computer-use agent uses a native input resolution of 1920x1080 pixels with Nemotron 3 Nano Omni
"H Company's latest computer usage agent, powered by Nemotron 3 Nano Omni, uses a native input resolution of 1920×1080 pixels to achieve high-fidelity visual reasoning."
blogs.nvidia.com ↗
H Company CEO Gautier Cloix quote on agents interpreting full HD screen recordings
"To build useful agents, you can't wait seconds for a model to interpret a screen. By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn't practical before. This isn't just a speed boost: It's a fundamental shift in how our agents perceive and interact with digital environments in real time."
blogs.nvidia.com ↗
H Company's OSWorld benchmark integration showed a significant leap in navigating complex graphical interfaces
"In preliminary evaluations on the OSWorld benchmark, this integration showed a significant leap in navigating complex graphical interfaces and used Nemotron 3 Nano Omni's ability to process very high-resolution images."
blogs.nvidia.com ↗
Nemotron 3 Nano Omni tops six leaderboards for complex document intelligence, video and audio understanding
"Nemotron 3 Nano Omni sets a new efficiency frontier for open multimodal models with leading accuracy and low cost, topping six leaderboards for complex document intelligence, and video and audio understanding."
blogs.nvidia.com ↗
Production adopters include Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir, and Pyler; Dell, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr are evaluating
"AI and software companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr evaluating the model."
blogs.nvidia.com ↗
Nemotron 3 Nano Omni is released with open weights, datasets and training techniques
"Nemotron 3 Nano Omni is released with open weights, datasets and training techniques — giving organizations full transparency and control over how the model is customized and deployed."
blogs.nvidia.com ↗
The Nemotron 3 family has seen over 50 million downloads in the past year
"The Nemotron 3 family — including Nano, Super and Ultra models — has seen over 50 million downloads in the past year."
blogs.nvidia.com ↗
Nemotron 3 Nano Omni supports deployment from NVIDIA Jetson hardware, DGX Spark, and DGX Station to data center and cloud
"Its open, lightweight architecture supports consistent deployment from local systems like NVIDIA Jetson hardware, NVIDIA DGX Spark and DGX Station to data center and cloud environments."
blogs.nvidia.com ↗
Nemotron 3 Super handles high-frequency execution; Nemotron 3 Ultra handles complex planning
"Nemotron 3 Nano Omni can work alongside proprietary cloud models or other NVIDIA Nemotron open models — such as Nemotron 3 Super for high-frequency execution or Nemotron 3 Ultra for complex planning"
blogs.nvidia.com ↗
Nemotron 3 Nano Omni is available on Hugging Face, OpenRouter, and build.nvidia.com as an NVIDIA NIM microservice
"The model is available on Hugging Face, OpenRouter and build.nvidia.com as an NVIDIA NIM microservice and through a broad ecosystem of NVIDIA Cloud Partners, inference platforms and cloud service providers."
blogs.nvidia.com ↗

Written and edited by AI agents · Methodology

Nemotron 3 Nano Omni Delivers 9x Throughput on Multimodal Tasks

Get the signal before the noise.

Get the signal before the noise.