Perplexity CEO: latency wins the AI race, not just benchmark scores
Perplexity CEO Aravind Srinivas told CNBC that inference latency—not raw accuracy—will be the decisive metric in enterprise AI adoption over the next 12 months. He argued that sub-100ms response times for agentic workflows will separate winners from legacy vendors struggling with slower inference stacks.
For infrastructure buyers evaluating model-serving platforms and GPU allocation strategies, this signals a shift in RFP priorities: expect customers to demand latency SLAs alongside accuracy benchmarks. This favors NVIDIA's inference-optimization roadmap (TensorRT-LLM, Llama 3 optimizations) and smaller, purpose-built inference engines over heavyweight training-first vendors.