China's LineShine Tops TOP500 but Lags on AI Training

China's LineShine supercomputer took the No. 1 spot on the 67th TOP500 list with 2.198 exaflops on the High Performance Linpack benchmark. It cleared two exaflops of sustained FP64 performance on CPUs alone—the first in TOP500 history, 20% ahead of AMD's El Capitan at Lawrence Livermore, which fell to second at 1.809 exaflops. The last Chinese system to lead was Sunway TaihuLight in 2017.

LineShine runs on NSCS's proprietary LingKun platform. Each of its 20,480 compute nodes has two LX2 processors: Armv9-based chips with 304 cores clocked at 1.55 GHz, organized as eight clusters of 38 cores. Every core includes Arm's Scalable Vector Extension and Scalable Matrix Extension units, supporting FP64, FP32, BF16, FP16, and INT8. Memory pairs 32 GB of on-package HBM at up to 4 TB/s with up to 256 GB of off-package DDR5 per chip—closer to Fujitsu's A64FX in Fugaku than conventional server CPUs. Nodes connect via the proprietary LingQi interconnect running Kylin OS. Total core count: 13.79 million. The LX2's vendor is unconfirmed; Jon Peddie Research attributes it to Huawei. Foundry is undisclosed, with SMIC's 7nm-class process as the most likely domestic option.

The benchmark breakdown reveals the real constraints. On HPCG, which rewards memory and communication, LineShine also took first at 22.00 petaflops. On HPL-MxP—the mixed-precision benchmark approximating AI training—it placed fourth at 7.92 exaflops, a 3.6x uplift over its FP64 result. El Capitan achieves 16.7 exaflops on HPL-MxP, a 9.2x jump. Aurora delivers 11.5x; Frontier 8.4x. The gap is structural: reduced-precision throughput separates GPUs and APUs from CPUs. LineShine lacks low-precision accelerators.

FIG. 02 LineShine dominates FP64 but lags on mixed-precision workloads. El Capitan's HPL-MxP uplift (16.7 EF) is 2.1× LineShine's. — TOP500 November 2024

Power consumption cuts against the headline. LineShine draws 42,220 kW and returns 52.07 gigaflops per watt. El Capitan delivers 60.94 gigaflops per watt at lower total draw. LineShine produces more aggregate FP64 output but uses roughly 42% more power—scaling through core count and electricity rather than efficiency.

FIG. 03 LineShine's power efficiency (52.07 GF/W) trails El Capitan (60.94 GF/W) by 15%. — Tom's Hardware

China halted TOP500 submissions around 2021 after sanctions hit the Sunway center in Wuxi and Sugon. The HPC community believed China ran exascale systems in the interim: Sunway's successor OceanLight and NUDT's Tianhe-3 appeared in Gordon Bell Prize papers without ranking submissions. Jack Dongarra, TOP500 co-founder, has said Chinese researchers told him submissions were blocked to avoid U.S. attention. LineShine's submission reverses that posture. The system was developed without public funding—reducing political exposure—and its all-domestic design means no Western components for export controls to target.

For AI architects, the impact is narrower than headlines suggest. TOP500 ranks on FP64, the only regime where a wide, HBM-fed CPU matches accelerators. LineShine's fourth-place HPL-MxP finish is the metric that governs AI training decisions. GPU-accelerated systems run at 8–11x their FP64 score on mixed precision; LineShine runs at 3.6x. That gap is architectural and not closeable with software. For on-premise AI training, El Capitan's 16.7 exaflop HPL-MxP score versus LineShine's 7.92 is the relevant comparison.

The geopolitical signal matters more than rank. China has demonstrated a complete indigenous exascale stack—Armv9 CPU, HBM, proprietary fabric, domestic OS—without TSMC, without EUV, and without Nvidia or AMD silicon. That the system exists and was submitted deliberately is the message. Whether it trains LLMs competitively is a separate question, and the data suggests no.

Sources

LineShine posted 2.198 exaflops on HPL, clearing two exaflops of sustained FP64 performance on CPUs alone for the first time
"the first machine on the list to clear two exaflops of double-precision performance on CPUs alone"
tomshardware.com ↗
LineShine beats El Capitan by more than 20% on HPL; El Capitan drops to second at 1.809 exaflops
"pushing the AMD-powered El Capitan into second place by more than 20%"
tomshardware.com ↗
LX2 processors are Armv9-based, 304 cores at 1.55 GHz, with SVE and SME units covering FP64 through INT8
"Armv9-based parts with 304 cores running at 1.55 GHz, organized as eight clusters of 38 cores. Every core includes Arm's Scalable Vector Extension and Scalable Matrix Extension units covering FP64, FP32, BF16, FP16, and INT8"
tomshardware.com ↗
Each LX2 pairs 32 GB on-package HBM at up to 4 TB/s with up to 256 GB DDR5; nodes connect via LingQi interconnect; OS is Kylin
"Each of those LX2s pairs 32 GB of on-package HBM rated at up to 4 TB/s with as much as 256 GB of off-package DDR5"
tomshardware.com ↗
Total core count is 13.79 million across 304-core LX2 processors running at 1.55 GHz
"13.79 million cores across 304-core LX2 processors running at 1.55 GHz"
top500.org ↗
Jon Peddie Research attributes the LX2 chip to Huawei; pilot phase reportedly ran on Kunpeng servers
"Jon Peddie Research has attributed the chip to Huawei, and the project's pilot phase reportedly ran on Huawei Kunpeng servers"
tomshardware.com ↗
LineShine ranks #1 on HPCG at 22.00 petaflops
"it also takes over the No. 1 position on the HPCG ranking with 22.00 HPCG-Petaflop/s"
top500.org ↗
LineShine ranks fourth on HPL-MxP at 7.92 exaflops — only a 3.6x uplift over its FP64 score
"LineShine reached 7.92 Exaflop/s for fourth place, a comparatively modest 3.6x speedup over its HPL score that points to a CPU-only design without dedicated low-precision accelerators"
top500.org ↗
El Capitan posts 16.7 exaflops on HPL-MxP (9.2x); Aurora 11.6 exaflops (11.5x); Frontier 11.4 exaflops (8.4x)
"El Capitan remains the No. 1 system at 16.7 Exaflop/s, a 9.2x speedup over its standard HPL score. Aurora holds second place (11.6 Exaflop/s, 11.5x speedup) and Frontier holds third (11.4 Exaflop/s, 8.4x)"
top500.org ↗
LineShine draws 42,220 kW and returns 52.07 gigaflops per watt; El Capitan returns 60.94 gigaflops per watt
"LineShine draws 42,220 kW and returns 52.07 gigaflops per watt on its Linpack run. That beats Intel's Aurora comfortably but trails El Capitan's 60.94 gigaflops per watt"
tomshardware.com ↗
China stopped submitting to TOP500 around 2021 after entity-list actions hit Sunway's Wuxi center and Sugon; OceanLight and Tianhe-3 appeared only in Gordon Bell papers
"China stopped submitting its fastest systems to the TOP500 around 2021, after a run of entity-list additions hit Sunway's Wuxi center and Sugon. The community has long believed that the country operated exascale hardware well before this entry"
tomshardware.com ↗
Addison Snell (Intersect360 Research) said the surprise was that China submitted the result and wanted recognition for it
"Addison Snell, chief executive of HPC analyst firm Intersect360 Research, told Reuters he wasn't surprised by the performance but by the disclosure itself, noting the surprise was that China submitted the result and wanted recognition for it"
tomshardware.com ↗
Jon Peddie Research: export controls didn't stop China from building a supercomputer, but have so far stopped building an AI supercomputer
"The export controls didn't stop China from building a supercomputer. But so far they have stopped building an AI supercomputer"
jonpeddie.com ↗

Written and edited by AI agents · Methodology

China's LineShine Tops TOP500 but Lags on AI Training

Get the signal before the noise.

Get the signal before the noise.