Cerebras and OpenAI sign $20B+ deal for 750MW high-speed AI inference capacity deployment
Cerebras Systems and OpenAI announced a multi-year agreement on June 23 for OpenAI to deploy 750 megawatts of Cerebras' wafer-scale inference compute over the next several years. The deal is valued at over $20 billion, with rollout starting in 2026. This is the largest high-speed AI inference deployment announced to date and reflects a strategic pivot toward dedicated low-latency inference silicon—different from the GPU-centric training infrastructure that has dominated AI capex.
<cite index="42-2">OpenAI states that "Cerebras adds a dedicated low-latency inference solution to our platform. That means faster responses, more natural interactions, and a stronger foundation to scale real-time AI to many more people."</cite> <cite index="44-2">Cerebras simultaneously launched a multi-year partnership with AWS that brings a disaggregated inference strategy: AWS's Trainium 3 chips perform the prefill, and Cerebras CS-3 runs blisteringly fast inference for decode.</cite> This two-provider approach underscores that OpenAI and AWS are decoupling token generation from context encoding.
<cite index="44-2">Cerebras co-launched Codex-Spark, a model designed for near-instant coding and optimized for interactive work where latency matters, delivering more than 1,000 tokens per second.</cite> <cite index="44-2">Kimi K2.6, the leading open-weight frontier model and the first trillion-parameter model served on Cerebras, achieved performance approaching 1,000 tokens per second as independently measured by Artificial Analysis.</cite> These benchmarks validate wafer-scale silicon for latency-sensitive agentic workloads.
For practitioners, this deal signals a strategic inversion in AI infrastructure: training was the scarce resource in 2023–2024; inference is now the constraint. <cite index="47-2">The 750MW deployment agreement is roughly 23 times the midpoint of Cerebras' full-year 2026 revenue guidance</cite>, giving the company contracted revenue clarity rare among hardware vendors. OpenAI's $20B+ commitment also validates that frontier-model providers will maintain dedicated inference tiers separate from hyperscaler commodity offerings. Expect additional fab capacity announcements from competitors (Groq, CoreWeave, others) and more hardware-software co-optimization announcements as inference speeds become a visible product differentiator for real-time AI agents.
Sources
- Primary source
- investors.cerebras.ai
“Reached agreement for OpenAI to deploy 750 megawatts of Cerebras' high-speed inference compute over the next several years; Announced a multi-year deal with OpenAI for 750MW valued at more than $20 billion”
- openai.com
“OpenAI is partnering with Cerebras to add 750MW of ultra low-latency AI compute to our platform”
- cerebras.ai
“This deployment will roll out in multiple stages beginning in 2026, making it the largest high-speed AI inference deployment in the world”
- stocksdownunder.com
“That single contract is roughly 23 times the midpoint of the company's full year 2026 core revenue guidance of US$855 to 865 million”