NVIDIA sweeps MLPerf Training 6.0; GB300 runs DeepSeek-V3 in 2.02 minutes at 8,192 GPUs
NVIDIA dominated MLPerf Training v6.0, the industry-standard benchmark published June 16, posting the fastest time-to-train on every benchmark and the highest per-accelerator performance across all tests. The company was the only vendor to submit results on all seven benchmarks in the suite, including new mixture-of-experts (MoE) pretraining tests for DeepSeek-V3 and GPT-OSS-20B that reflect current trends in large-scale model development.
CoreWeave achieved the fastest training time on the largest model in the suite: DeepSeek-V3 671B trained to quality target in 2.02 minutes at 8,192-GPU scale using GB300 NVL72 systems connected with Spectrum-X Ethernet. Microsoft Azure scaled Llama 3.1 405B to 8,192 GPUs using GB200 NVL72, reaching the reference target in 7.07 minutes. NVIDIA's GB300 Blackwell Ultra systems demonstrated 60% faster performance than GB200 in the same NVL72 form factor.
For AI infrastructure teams, the clean sweep validates Blackwell's full-stack architecture (hardware, NVLink switching, Spectrum-X networking, and CUDA software stacks) at multi-thousand-GPU scale. The absence of competing submissions on MoE workloads signals that other GPU vendors lack the software maturity to train at scale on next-gen model architectures. Rack-scale performance now matters as much as per-accelerator metrics: hyperscalers care about throughput per kilowatt and model readiness time, not just raw FLOPS.
Sources
- Primary source
- developer.nvidia.com
“NVIDIA achieved leading results by winning every benchmark, setting records in both overall and per-accelerator performance, scaling up to 8,192 Blackwell Ultra GPUs”
- wccftech.com
“CoreWeave delivered the fastest time to train for DeepSeek-V3 671B, reaching the quality target in 2.02 minutes at 8,192-GPU scale using GB300 NVL72 systems”