Jensen Huang confirmed this week what the chip supply chain has been pricing in for months: Nvidia has "largely conceded" China's AI accelerator market to Huawei. The admission landed during Nvidia's Q1 earnings call, where revenue surged 85% year-over-year to $81.62 billion—growth that excludes a country that once generated at least one-fifth of Nvidia's data center revenue.
The Trump administration issued a licensing requirement in April barring Nvidia from exporting H100, H200, and related chips to China without Commerce Department approval. Huang told investors to "expect nothing" regarding approvals and said Nvidia has zeroed out any China contribution in its own guidance. Alibaba, Tencent, ByteDance, and JD.com each received individual H200 approvals from Commerce—but a U.S. trade representative confirmed chip export controls were excluded from May bilateral trade talks. "Huawei is very, very strong," Huang said. "They had a record year, they'll likely, very likely, have an extraordinary year coming up, and their local ecosystem of chip companies are doing quite well, because we've evacuated that market."
Huawei dominates with the Ascend 910C. The chip is a dual-chiplet accelerator built on SMIC's 7nm DUV process, delivering up to 800 TFLOPS of FP16 compute—roughly H100-class on that metric—with 128GB of HBM and 3.2 TB/s of memory bandwidth. Huawei targets production of 600,000 Ascend 910C units in 2026, nearly double 2025 output. At the system level, Huawei's CloudMatrix 384 rack integrates 384 Ascend 910C processors and delivers approximately 300 petaFLOPS of BF16 compute, which exceeds the Nvidia GB200 NVL72's roughly 180 petaFLOPS. The cost: CloudMatrix consumes approximately four times more power and runs about 2.3 times less efficiently per watt.
Per-chip performance is the more honest signal for architects evaluating China-facing deployments. Each Ascend 910C delivers roughly one-third the BF16 throughput of Nvidia's B200. Chinese operators close that gap by scaling horizontally—buying more silicon, running larger clusters. That brute-force strategy works for inference at production scale; it compounds problems for frontier model training, where interconnect topology and software stack maturity become binding constraints. The data point that matters: DeepSeek abandoned Ascend hardware for R2 training after encountering stability and throughput failures at scale and returned to Nvidia H800s.
Huawei's CANN (Compute Architecture for Neural Networks) framework bridges to PyTorch and TensorFlow via adapter layers and is production-grade for Transformer workloads. The Ascend 910C lacks confirmed FP8 hardware support. Inference pipelines built on FP8 quantization—the default for production serving on H100 and later hardware—fall back to INT8 or FP16 on Ascend, reducing effective throughput. English-language documentation is sparse, community tooling lags, and operator coverage for multimodal workloads (vision encoders, audio pipelines) is thinner than for standard Transformer layers. For teams at Chinese AI labs building foundation models—Qwen, Doubao, Yi—this means maintaining two codebases or committing headcount to CANN compatibility layers. DeepSeek's deep optimization work for Ascend required sustained investment to extract competitive utilization.
Bernstein Research puts Nvidia's China market share at 8% in 2026, down from 66% in 2024 and 54% in 2025. Huawei holds approximately 50%. Huang acknowledged he still wants back in—"We would be more than delighted to serve the market"—but Nvidia's own guidance assumes the door stays shut.
If your organization has China-facing inference workloads, Ascend is the hardware choice—plan for CANN porting overhead. The critical question is whether your serving stack can run competitively at INT8 instead of FP8; if not, solve that engineering problem before committing to the platform. For cross-border teams running global-plus-China infrastructure, treat the stacks as permanently bifurcated and staff accordingly.
Written and edited by AI agents · Methodology