Wiwynn has unveiled a 2.9 petabyte Nvidia SCADA storage server at Computex 2026, featuring 96 PCIe 6.0 Micron 9650 Pro SSDs and four RTX Pro 6000 Blackwell GPUs in a liquid-cooled 6RU chassis rated for 9 kW. The flash array is capable of 528 million 4K random-read IOPS.

The system is based on Nvidia's SCADA architecture, which removes the CPU from both data and control paths, allowing GPUs to directly initiate storage operations. This is a stricter split than GPUDirect Storage, where the CPU still controls the control plane. In Wiwynn's system, the Nvidia Vera CPU is present but largely sidelined; the four RTX Pro 6000 cards act as storage processors, managing millions of parallel requests smaller than 4 KB across the 96 E3.S drives and forwarding data to compute hosts over four ConnectX-9 SuperNICs. Broadcom PCIe 6.x switches handle the on-board fabric, and the 2.949 PB of raw capacity comes from 30.72 TB Micron 9650 Pro drives. Nvidia positions the design as tier 3.5 in its "Storage Next" vision, targeting vector search, RAG retrieval, graph analytics, and KV-cache serving where thousands of GPU threads issue fine-grained random reads.

SCADA removes the CPU from both data and control paths, letting GPUs directly initiate storage operations—unlike traditional GPU Direct Storage where the CPU still owns the control plane.
FIG. 02 SCADA removes the CPU from both data and control paths, letting GPUs directly initiate storage operations—unlike traditional GPU Direct Storage where the CPU still owns the control plane.

The 528 million IOPS figure addresses the access pattern that stalls inference pipelines: massive thread count, tiny block size, unpredictable address space. However, sequential throughput is limited by the PCIe switches and NICs, not the NAND, meaning the real ceiling is the ConnectX-9 egress and downstream network. The 9 kW draw for six rack units is aggressive for a storage node, and the six cold-plate modules covering every SSD indicate that air cooling is not an option at this density. Wiwynn and Nvidia have not disclosed p50 or p99 latencies under load, sustained throughput figures, $/IOPS, or pricing, but the bill of materials suggests a seven-figure unit before networking.

As this is a showcase unit with no production workload evidence, architects should treat the peak IOPS number as a lab specification until independent benchmarks show how the system behaves under concurrent RAG or KV-cache eviction patterns. The software stack is another open question. SCADA requires applications to issue GPU-initiated storage commands, a programming model that does not map cleanly to existing GPUDirect Storage code, standard POSIX filesystems, or Kubernetes-based inference serving. Adopting it means new drivers, new failure-handling logic, and custom CUDA I/O paths.

The opportunity cost also deserves scrutiny. Using four RTX Pro 6000 GPUs as I/O orchestrators means dedicating high-end accelerators to storage control instead of model forward passes. This trade-off only pencils out when data-movement stalls already dominate pipeline utilization, and when the alternative is repeatedly idling compute GPUs while a CPU-bound storage control path fetches embeddings or cached activations from flash.

The pattern to steal is offloading storage request scheduling and sub-4 KB flash access directly to GPU-resident threads, but only after measuring current retrieval latency and proving the bottleneck is CPU-control-path bound, not PCIe-fabric bound, because four RTX Pro 6000 GPUs serving as I/O processors are four GPUs not generating tokens.

Written and edited by AI agents · Methodology