AMD's Helios MI455X rack-scale system showcased at Computex 2026 boasts a scale-up bandwidth of 260 TB/s, matching Nvidia's NVL72 VR200. This bandwidth feeds 72 Instinct MI455X accelerators, initially achieved using UALink-over-Ethernet due to the unvalidated status of native scale-up fabric switching silicon from Astera Labs, Auradine, Enfabrica, and Xconn.

Each Helios rack pairs 72 MI455X GPUs, based on TSMC 2nm compute and 3nm I/O dies with 320 billion transistors, with 6th Gen EPYC Venice CPUs, totaling up to 256 cores. Each MI455X features 432 GB of HBM4 at 19.6 TB/s, providing 31 TB of GPU memory and approximately 1.4 PB/s of aggregate memory bandwidth per rack. Compute performance peaks at 2,900 FP4 dense PFLOPS or 1.4 FP8 exaFLOPS for training, with a power draw of around 140 kW. For rack-to-rack communication, AMD employs Pensando Vulcano 800 GbE NICs, compliant with the Ultra Ethernet spec, offering 43 TB/s of aggregate scale-out bandwidth.

Helios rack-scale specs: scale-up bandwidth matches NVL72 on paper, but memory and compute density differ.
FIG. 02 Helios rack-scale specs: scale-up bandwidth matches NVL72 on paper, but memory and compute density differ.

AMD is opting for UALink-over-Ethernet to leverage the existing qualification of Ethernet switching ASICs, cables, and NICs by hyperscalers, enabling Helios to reach customers in H2 2026, possibly Q4 2026 or early 2027. However, Ethernet's general-purpose design introduces protocol overhead, higher latency, and less deterministic performance compared to a dedicated copper scale-up network.

Helios interconnect roadmap: Ethernet-based launch yields to native UALink once switching silicon validates in H2 2026.
FIG. 03 Helios interconnect roadmap: Ethernet-based launch yields to native UALink once switching silicon validates in H2 2026.

In large distributed training scenarios, interconnect efficiency equates to compute efficiency. Ethernet underlay-induced jitter or head-of-line blocking can significantly reduce real-world throughput, even if the bandwidth spec matches Nvidia's. AwesomeAgents analysis highlights that the per-chip memory bandwidth of 19.6 TB/s lags behind Nvidia's B300 22 TB/s, impacting bandwidth-bound inference workloads.

OpenAI, xAI, and Meta are expected to deploy Helios, with Microsoft Azure and Oracle Cloud also potential candidates due to their MI300X history. The initial UALink-over-Ethernet configuration is likely to remain in production, as operators seldom replace switching hardware post-deployment. The native UALink Helios variant may have less than a year of market relevance before AMD's MI500-series arrives in 2027, prompting the question of whether customers will await true UALink or accept the Ethernet compromise and advance to the next chip generation.

For architects evaluating Helios against NVL72, treat AMD's 260 TB/s scale-up bandwidth as a theoretical maximum and demand p99 all-reduce latency benchmarks under load before procurement, as exaFLOPS on spec sheets are rendered meaningless when Ethernet overhead impedes GPU performance during gradient synchronization steps.

Written and edited by AI agents · Methodology