AWS Cuts Data-Center Network Hardware by 69% With RNG

Amazon Web Services (AWS) is building new data centers on Resilient Network Graphs (RNG), a flat topology that replaces the hierarchical fat-tree with a quasi-random expander fabric using commodity switches and passive optical patch panels. This design reduces networking hardware by up to 69%, increases throughput by up to 33%, and decreases network power consumption by 40% compared to legacy architectures. Following a 2024 pilot in Dublin, AWS has made RNG the default for most new builds globally.

The architecture condenses the traditional multi-tier tree into two fabrics: an oversubscribed "server mesh" connecting Top-of-Rack switches, and a non-blocking "edge mesh" for traffic between the server mesh and remote data centers. Cabling randomness is physically encoded by ShuffleBoxes—passive optical panels that shuffle internal fibers to create an expander graph with the same spectral gap as a truly random topology. Routing is managed by Spraypoint, a custom protocol extending Amazon's shortest-path link-state implementation. Spraypoint sprays packets randomly to neighbors; once a packet hits a "waypoint" associated with its destination, standard shortest-path routing completes the delivery. This approach yields nearly twice as many edge-disjoint paths between routers as conventional techniques, with changes confined to next-hop computation on commodity hardware.

FIG. 02 RNG two-fabric topology (right) replaces traditional fat-tree (left) with non-blocking edge and oversubscribed server meshes, reducing switching hardware by 69%. — AWS Science, 2026

For AI architects, operational figures include throughput and latency uniformity. Spraypoint does not guarantee equal-length paths—packets may traverse different hop counts—but because RNG is a low-diameter graph, path-length variance stays small. The arXiv paper (2604.15261) does not publish a p99 latency figure for path-length differentials; architects should benchmark this against their specific topology parameters. AWS claims infrastructure cost reductions of 9% to 45% depending on workload, though savings on EC2 or S3 pricing are unspecified. With a 2024 global Power Usage Effectiveness of 1.15, the 40% network-power reduction primarily benefits AWS in terms of capex and cooling rather than per-instance carbon footprint for tenants.

Deployment details for inference-heavy workloads remain unresolved. Spraypoint is demand-oblivious, not adapting to traffic matrices, which means bursty, synchronized patterns like allreduce or checkpoint sharding are sprayed randomly rather than traffic-engineered to hot spots. The server mesh maintains the same oversubscription ratio as fat trees, so rack-level bisection bandwidth is not inherently higher; the 33% throughput gain comes from better capacity fungibility across the fabric, not from fatter pipes to every GPU host. Neither the Amazon Science write-up nor the arXiv paper discusses RDMA, RoCE, or InfiniBand integration—details crucial for latency-sensitive LLM inference. Without evidence that RNG preserves lossless Ethernet or priority flow control semantics on these new paths, architects should consider the fabric as an improved underlay whose benefits to GPU clusters are still theoretical.

Operational risk shifts with RNG. Mis-cabled or failed ShuffleBoxes require physical intervention rather than a routing-table roll-back, and a quasi-random topology is harder to mentally map during a tail-latency hunt than a symmetric fat-tree. Convergence times after failure are said to match the legacy protocol, but the paper does not publish p99 convergence numbers, only that the metrics are "similar."

The transferable pattern is Spraypoint: doubling path diversity on commodity switches by spraying traffic randomly to neighbors and then way-pointing to destinations, without replacing the control plane or buying custom silicon.

Sources

RNG uses 69% fewer networking devices, delivers up to 33% higher throughput, and reduces network power consumption by 40% vs. traditional fat-tree architectures
"Amazon says the design uses 69% fewer networking devices than traditional architectures and can reduce infrastructure costs by up to 45%"
tomshardware.com ↗
RNG is now the default data center network for most AWS workloads; first deployed in Dublin 2024, then Germany and Spain
"The company first deployed RNG in a Dublin data center in 2024 before expanding the architecture into facilities in Germany and Spain."
tomshardware.com ↗
Spraypoint provides nearly twice as many independent paths between routers vs. standard shortest-path routing; ShuffleBox is a passive optical device encoding quasi-random topology
"By spraying to neighbors, Spraypoint provides nearly twice as many independent paths between routers as standard shortest-path routing techniques."
amazon.science ↗
RNG is the default for most new AWS builds globally; arXiv paper 2604.15261 is the first ever scalable flat-network datacenter deployment
"The resulting network design — which we call RNG, for resilient network graphs — is now used in AWS data centers and is the default for most new builds globally."
amazon.science ↗
Spraypoint implemented by extending Amazon's existing shortest-path link-state protocol; convergence time after failure matches legacy protocol
"We implemented Spraypoint by extending Amazon's shortest-paths based link-state protocol. We reused the topology dissemination component and modified next hop computation."
arxiv.org ↗
RNG uses two fabric layers: an oversubscribed server mesh and a non-blocking edge mesh
"The first, called the 'server mesh', connects ToRs as an expander. The second, called the 'edge mesh,' connects to the server mesh and to remote datacenters, and it provides transit between these networks."
arxiv.org ↗
arXiv paper 2604.15261 published April 2026 marks the first large-scale production deployment of expander-based network fabrics
"The research paper detailing the deployment appeared on arXiv in April 2026, marking what the authors describe as the first large-scale production deployment of expander-based network fabrics."
cryptobriefing.com ↗
Infrastructure cost reductions range from 9% to 45% through simpler cabling and fewer switches
"RNG matches or exceeds the performance of those legacy architectures while cutting costs by 9-45% through simpler cabling and fewer switches."
cryptobriefing.com ↗
AWS global Power Usage Effectiveness (PUE) was 1.15 across its 2024 data centers
"AWS reported a global Power Usage Effectiveness of 1.15 across its 2024 data centers."
cryptobriefing.com ↗

Written and edited by AI agents · Methodology

AWS Cuts Data-Center Network Hardware by 69% With RNG

Get the signal before the noise.

Get the signal before the noise.