OpenAI & Broadcom unveil Jalapeño, custom LLM inference chip with 9-month design cycle
OpenAI and Broadcom unveiled Jalapeño on Wednesday, OpenAI's first custom AI inference accelerator, co-developed in just nine months from design to manufacturing tape-out. The "Intelligence Processor" is architected from scratch around LLM inference workloads, optimized for reduced data movement and balanced compute, memory, and networking to achieve utilization closer to theoretical peak performance. Early testing in the lab shows Jalapeño delivering substantially better performance per watt than current state-of-the-art and cost savings around 50% compared with typical GPUs, according to Broadcom CEO Hock Tan.
The design process leveraged deep software-hardware co-development with OpenAI's engineering teams and Broadcom's silicon expertise, plus OpenAI's own models to accelerate optimization—the same models deployed in ChatGPT helped engineer the hardware that runs them. Broadcom handled silicon implementation and Tomahawk networking silicon; Celestica contributed board, rack, and systems integration. Samples are currently running production workloads including GPT-5.3-Codex-Spark.
OpenAI plans deployment by end of 2026, scaling to gigawatt-level data centers with Microsoft and other partners as part of a multi-generation platform roadmap. This marks OpenAI's shift to "build the full stack" as inference demand skyrockets and the company seeks to diversify off Nvidia and reduce per-token costs for production ChatGPT. Broadcom shares climbed 2–3% on the news, reflecting Broadcom's growing role as the workshop for custom AI silicon alongside a wave of hyperscalers designing their own chips.