Enthusiast runs 1-trillion-parameter LLM locally on single GPU with 768GB Intel Optane
A developer achieved a technical milestone by running Kimi K2.5, a 1-trillion-parameter language model, on a single system using 768GB of Intel Optane DIMM memory paired with one GPU.
The setup achieved roughly 4 tokens per second, demonstrating feasibility of large-model inference on commodity hardware with extended memory.