AMD's Helios Chooses Ethernet Over Native UALink to Ship by Year-End

AMD's Helios MI455X rack-scale system showcased at Computex 2026 boasts a scale-up bandwidth of 260 TB/s, matching Nvidia's NVL72 VR200. This bandwidth feeds 72 Instinct MI455X accelerators, initially achieved using UALink-over-Ethernet due to the unvalidated status of native scale-up fabric switching silicon from Astera Labs, Auradine, Enfabrica, and Xconn.

Each Helios rack pairs 72 MI455X GPUs, based on TSMC 2nm compute and 3nm I/O dies with 320 billion transistors, with 6th Gen EPYC Venice CPUs, totaling up to 256 cores. Each MI455X features 432 GB of HBM4 at 19.6 TB/s, providing 31 TB of GPU memory and approximately 1.4 PB/s of aggregate memory bandwidth per rack. Compute performance peaks at 2,900 FP4 dense PFLOPS or 1.4 FP8 exaFLOPS for training, with a power draw of around 140 kW. For rack-to-rack communication, AMD employs Pensando Vulcano 800 GbE NICs, compliant with the Ultra Ethernet spec, offering 43 TB/s of aggregate scale-out bandwidth.

FIG. 02 Helios rack-scale specs: scale-up bandwidth matches NVL72 on paper, but memory and compute density differ.

AMD is opting for UALink-over-Ethernet to leverage the existing qualification of Ethernet switching ASICs, cables, and NICs by hyperscalers, enabling Helios to reach customers in H2 2026, possibly Q4 2026 or early 2027. However, Ethernet's general-purpose design introduces protocol overhead, higher latency, and less deterministic performance compared to a dedicated copper scale-up network.

FIG. 03 Helios interconnect roadmap: Ethernet-based launch yields to native UALink once switching silicon validates in H2 2026.

In large distributed training scenarios, interconnect efficiency equates to compute efficiency. Ethernet underlay-induced jitter or head-of-line blocking can significantly reduce real-world throughput, even if the bandwidth spec matches Nvidia's. AwesomeAgents analysis highlights that the per-chip memory bandwidth of 19.6 TB/s lags behind Nvidia's B300 22 TB/s, impacting bandwidth-bound inference workloads.

OpenAI, xAI, and Meta are expected to deploy Helios, with Microsoft Azure and Oracle Cloud also potential candidates due to their MI300X history. The initial UALink-over-Ethernet configuration is likely to remain in production, as operators seldom replace switching hardware post-deployment. The native UALink Helios variant may have less than a year of market relevance before AMD's MI500-series arrives in 2027, prompting the question of whether customers will await true UALink or accept the Ethernet compromise and advance to the next chip generation.

For architects evaluating Helios against NVL72, treat AMD's 260 TB/s scale-up bandwidth as a theoretical maximum and demand p99 all-reduce latency benchmarks under load before procurement, as exaFLOPS on spec sheets are rendered meaningless when Ethernet overhead impedes GPU performance during gradient synchronization steps.

Sources

Initial Helios systems at Computex 2026 use UALink-over-Ethernet for scale-up connectivity; true UALink interconnects will follow once switching silicon is validated
"they all use UALink-over-Ethernet scale-up connectivity, which may limit their performance in certain workloads that depend on the connection performance"
tomshardware.com ↗
Helios delivers 260 TB/s aggregated scale-up bandwidth via UALink-over-Ethernet, matching Nvidia's NVL72 VR200 on paper
"The AI accelerators are interconnected and make use of a UALink-over-Ethernet connection, which provides up to 260 TB/s aggregated scale-up bandwidth (in line with Nvidia's NVL72 VR200)"
tomshardware.com ↗
Helios packs 72 MI455X accelerators with 31 TB HBM4, 1,400 TB/s memory bandwidth, and approximately 2,900 FP4 dense PFLOPS per rack
"pack 72 Instinct MI455X accelerators with a total of 31 TB of HBM4 memory, and 1400 TB/s of bandwidth. AMD estimates that its performance will be around 2900 FP4 dense PFLOPS"
tomshardware.com ↗
Ethernet's general-purpose design means UALink-over-Ethernet adds higher latency, protocol overhead, and less deterministic performance than a dedicated scale-up fabric
"communications may involve higher latency, more protocol overhead, and less deterministic performance than a dedicated scale-up fabric"
tomshardware.com ↗
Pensando Vulcano 800 GbE NICs, compliant with the Ultra Ethernet spec, deliver 43 TB/s aggregate scale-out bandwidth
"Pensando Vulcano network interface cards (NICs), which are among the industry's first 800 GbE network cards that comply with the Ultra Ethernet specification and provide up to 43 TB/s of scale-out bandwidth"
tomshardware.com ↗
Native UALink switches are pending delivery from ecosystem partners Astera Labs, Auradine, Enfabrica, and Xconn in H2 2026
"practical UALink adoption will depend on ecosystem partners such as Astera Labs, Auradine, Enfabrica, and Xconn. If these companies deliver UALink switching silicon in the second half of 2026, then we are going to see Helios machines interconnected using UALink"
tomshardware.com ↗
Each MI455X carries 432 GB HBM4 at 19.6 TB/s and delivers 40 PFLOPS FP4 / 20 PFLOPS FP8
"Each of these accelerators promise around 40 petaFLOPS of dense FP4 inference performance or 20 petaFLOPS of FP8 for training, and 432 GB of HBM4 good for 19.6 TB/s"
theregister.com ↗
MI455X uses 12 3D-stacked TSMC 2nm and 3nm dies totalling 320 billion transistors
"MI455X package, which will use 12 3D-stacked I/O and compute dies fabbed on TSMC's 2 nm and 3 nm process nodes"
theregister.com ↗
OpenAI, xAI, and Meta are expected to deploy Helios at scale; Microsoft Azure and Oracle Cloud are also plausible early customers given their MI300X deployments
"AMD's MI455X-powered Helios racks are the ones to watch, as OpenAI, xAI, and Meta are expected to deploy them at scale"
theregister.com ↗
Per-chip memory bandwidth of 19.6 TB/s trails Nvidia B300's 22 TB/s, a gap relevant for bandwidth-bound inference; but 432 GB HBM4 is sufficient to hold a 405B FP8 model on a single GPU
"Memory bandwidth (19.6 TB/s) trails the B300 GPU's 22 TB/s — a gap that matters for bandwidth-bound inference workloads"
awesomeagents.ai ↗
Helios rack draws roughly 140 kW and occupies a double-wide rack footprint
"running 72 chips simultaneously consumes a massive amount of power. The figure is around 140 kilowatts per rack"
thaibiotic.com ↗

Written and edited by AI agents · Methodology

AMD's Helios Chooses Ethernet Over Native UALink to Ship by Year-End

Get the signal before the noise.

Get the signal before the noise.