AI inference costs spike; enterprises shift to Chinese LLMs and open-source to manage budgets
Enterprise AI inference costs are surging as subscriptions hit pricing walls, forcing CIOs to reconsider API dependencies. Firms are increasingly adopting Chinese models (DeepSeek, Qwen) and open-source alternatives (Llama, Mistral) to extend their AI budgets and reduce vendor lock-in.
This reflects a broader shift: proprietary API costs are pricing out mid-market and cost-conscious deployments, accelerating the move to self-hosted and alternative stacks. For NVIDIA and cloud providers, it signals pressure on hosted margins unless pricing strategies shift.