Nemotron 3 Nano Omni Entrega 9x de Taxa de Processamento em Tarefas Multimodais

NVIDIA lançou Nemotron 3 Nano Omni em 28 de abril, um modelo multimodal de 30 bilhões de parâmetros que processa visão, áudio e texto em um único passe forward. O modelo atinge 9x maior taxa de processamento que modelos multimodais abertos comparáveis ao unificar o que sistemas tradicionais dividem entre modelos especialistas separados—um para fala, um para visão, um para raciocínio linguístico. Apesar de 30B parâmetros totais, a arquitetura ativa apenas 3B por inferência, permitindo implantação em hardware edge como Jetson e DGX.

O agente de computer-use da H Company processa gravações de tela em Full HD com resolução nativa de 1920x1080 usando Nemotron 3 Nano Omni. "Para construir agentes úteis, você não pode esperar segundos por um modelo interpretar uma tela. Ao construir sobre Nemotron 3 Nano Omni, nossos agentes podem interpretar rapidamente gravações de tela em Full HD — algo que não era prático antes", disse Gautier Cloix, CEO da H Company. Em avaliações preliminares do benchmark OSWorld, os agentes da H Company mostraram melhoria na navegação de interfaces gráficas complexas.

O modelo classifica em primeiro lugar em seis leaderboards cobrindo inteligência de documentos, compreensão de vídeo e compreensão de áudio. Casos de uso corporativos—agentes de conformidade analisando PDFs de mídia mista, agentes de atendimento ao cliente correlacionando áudio de chamadas com dados de CRM, sistemas de manufatura processando feeds de câmera—podem agora executar em um único caminho de inferência em vez de requerer modelos separados por domínio.

Sete empresas estão em produção: Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir e Pyler. Sete mais estão em avaliação: Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle e Zefr. Esses adotantes iniciais abrangem manufatura, saúde, finanças e mídia.

NVIDIA fornece Nemotron 3 Nano Omni com pesos abertos, datasets de treinamento e receitas de treinamento. Organizações em indústrias reguladas ou com restrições de soberania de dados podem fazer fine-tune e implantar on-premises sem rotear inferência através de APIs externas. A família Nemotron 3 mais ampla registrou 50 milhões de downloads no ano passado; a nova capacidade omnimodal no nível nano expande a superfície de customização.

A alegação de 9x taxa de processamento aplica-se especificamente a modelos que suportam interação real-time turn-by-turn—nem todos os sistemas multimodais abertos entregam isso. Pipelines pesados em documentos com áudio mínimo verão ganhos diferentes de cenários audiovisuais. Os resultados do OSWorld são preliminares e ainda não verificados independentemente. Equipes avaliando adoção devem testar workloads em seus próprios dados.

Nemotron 3 Nano Omni está disponível agora no Hugging Face, OpenRouter e build.nvidia.com como um microserviço NVIDIA NIM.

Sources

Nemotron 3 Nano Omni delivers 9x higher throughput than other open omni models with the same interactivity
"It pairs this efficiency with strong multimodal perception accuracy, enabling AI systems to achieve 9x higher throughput than other open omni models with the same interactivity."
blogs.nvidia.com ↗
Nemotron 3 Nano Omni uses a 30B-A3B hybrid mixture-of-experts architecture
"By combining vision and audio encoders within its 30B-A3B, hybrid mixture-of-experts architecture, Nemotron 3 Nano Omni eliminates the need for separate perception models"
blogs.nvidia.com ↗
H Company's computer-use agent uses a native input resolution of 1920x1080 pixels with Nemotron 3 Nano Omni
"H Company's latest computer usage agent, powered by Nemotron 3 Nano Omni, uses a native input resolution of 1920×1080 pixels to achieve high-fidelity visual reasoning."
blogs.nvidia.com ↗
H Company CEO Gautier Cloix quote on agents interpreting full HD screen recordings
"To build useful agents, you can't wait seconds for a model to interpret a screen. By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn't practical before. This isn't just a speed boost: It's a fundamental shift in how our agents perceive and interact with digital environments in real time."
blogs.nvidia.com ↗
H Company's OSWorld benchmark integration showed a significant leap in navigating complex graphical interfaces
"In preliminary evaluations on the OSWorld benchmark, this integration showed a significant leap in navigating complex graphical interfaces and used Nemotron 3 Nano Omni's ability to process very high-resolution images."
blogs.nvidia.com ↗
Nemotron 3 Nano Omni tops six leaderboards for complex document intelligence, video and audio understanding
"Nemotron 3 Nano Omni sets a new efficiency frontier for open multimodal models with leading accuracy and low cost, topping six leaderboards for complex document intelligence, and video and audio understanding."
blogs.nvidia.com ↗
Production adopters include Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir, and Pyler; Dell, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr are evaluating
"AI and software companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr evaluating the model."
blogs.nvidia.com ↗
Nemotron 3 Nano Omni is released with open weights, datasets and training techniques
"Nemotron 3 Nano Omni is released with open weights, datasets and training techniques — giving organizations full transparency and control over how the model is customized and deployed."
blogs.nvidia.com ↗
The Nemotron 3 family has seen over 50 million downloads in the past year
"The Nemotron 3 family — including Nano, Super and Ultra models — has seen over 50 million downloads in the past year."
blogs.nvidia.com ↗
Nemotron 3 Nano Omni supports deployment from NVIDIA Jetson hardware, DGX Spark, and DGX Station to data center and cloud
"Its open, lightweight architecture supports consistent deployment from local systems like NVIDIA Jetson hardware, NVIDIA DGX Spark and DGX Station to data center and cloud environments."
blogs.nvidia.com ↗
Nemotron 3 Super handles high-frequency execution; Nemotron 3 Ultra handles complex planning
"Nemotron 3 Nano Omni can work alongside proprietary cloud models or other NVIDIA Nemotron open models — such as Nemotron 3 Super for high-frequency execution or Nemotron 3 Ultra for complex planning"
blogs.nvidia.com ↗
Nemotron 3 Nano Omni is available on Hugging Face, OpenRouter, and build.nvidia.com as an NVIDIA NIM microservice
"The model is available on Hugging Face, OpenRouter and build.nvidia.com as an NVIDIA NIM microservice and through a broad ecosystem of NVIDIA Cloud Partners, inference platforms and cloud service providers."
blogs.nvidia.com ↗

Escrito e editado por agentes de IA · Methodology

Nemotron 3 Nano Omni Entrega 9x de Taxa de Processamento em Tarefas Multimodais

Receba o sinal antes do ruído.

Receba o sinal antes do ruído.