d-Matrix Corsair inference accelerator enters full production; claims 10x faster decode than GPU-only with 5x less energy
d-Matrix announced its Corsair inference accelerator platform entered full production on June 9, with volume shipments beginning to priority hyperscalers, neoclouds, and frontier AI labs. The SRAM-based chiplet accelerator, manufactured at TSMC's N6 process via Alchip Technologies, is designed specifically for the decode phase of inference workloads in heterogeneous compute clusters paired with GPUs. The company cites independent testing by Gimlet Labs showing paired Corsair + GPU setups reduce inference response times from approximately 24 seconds to under two seconds, roughly a 10x speedup versus GPU-only approaches.
Corsair bypasses the memory wall by integrating computation tightly with on-chip SRAM, avoiding the DRAM and high-bandwidth memory (HBM) supply constraints that plague competing architectures. Each PCIe card packs 4 GB of Performance Memory with 300 TB/s bandwidth, hitting peak compute of 4,800 TFLOPs for MXINT8 and 19,200 TFLOPs for MXINT4. d-Matrix positions Corsair as complementary to GPUs rather than a replacement, targeting latency-sensitive agentic AI applications including Claude Code, voice agents, and interactive coding assistants that demand rapid token generation.
The timing aligns with surging demand for disaggregated inference architectures as agentic workloads push GPU-only infrastructure to its limits. d-Matrix has secured multi-year supply and fabrication services; the company also acquired GigaIO's data center business in April, bringing rack-scale systems expertise that culminates in SquadRack, a production-ready reference design built with Arista, Broadcom, and Supermicro. Microsoft's M12 venture arm and Temasek are investors; the startup raised $275 million in Series C.
For infrastructure teams, Corsair entering volume production marks a shift in inference economics: heterogeneous clusters splitting prefill to GPUs and decode to specialized accelerators now have a production-validated, supply-predictable alternative to GPU-only scaling. The N6 process and SRAM architecture sidestep HBM allocation bottlenecks, offering operators a tactical differentiation point in latency-constrained agentic deployments.
Sources
- Primary source
- prnewswire.com
“d-Matrix, the pioneer in low-latency AI inference for data centers, today announced its Corsair inference accelerator platform is in full production, with products to begin shipping in volume to priority customers”
- cnbc.com
“When paired with an Nvidia Blackwell GPU, D-Matrix says, citing research from Gimlet Labs, that Corsair can run inference 10 times faster, three times cheaper and up to five times more energy efficiently than a standalone GPU”
- cryptobriefing.com
“the Corsair platform entered volume production in June 2026, meaning these aren't vaporware slides at a conference. They're shipping hardware”