Enterprise AI token usage heavily favors high-cost frontier models, with 95% of tokens allocated to them despite a reported median blended cost of $2.31 per million tokens for tiered model architectures, compared to $18.40 for frontier-only stacks, according to the AI.cc 2026 Infrastructure Report. This 87.4% cost gap results in significant operational disparity, with CFOs facing depleted annual AI budgets within months, while architects can still meet their financial targets.
A shift towards cost-effective stacks is evident, as open-source and open-weight models have captured 38% of enterprise token volume in Q1 2026, up from 11% a year prior. The emerging architecture is a Tiered Intelligence Stack, where lightweight orchestrators direct tasks such as classification, extraction, and summarization to small text-only or open-weight models, reserve multimodal work for specialized endpoints, and delegate complex multi-step reasoning to frontier models only when necessary. OpenRouter, which raised $113 million in May and now processes about 25 trillion tokens per week, and Factory AI, which automates routing for engineering tasks, exemplify this approach. Glean, with a $300 million ARR, says its context graph—connecting AI to internal enterprise systems—results in far fewer tokens consumed than unleashing AI onto those systems directly, and Jain told TechCrunch the product can "reduce your AI bill significantly."
Cost reductions are outpacing procurement cycles, with blended enterprise token costs dropping 67% year-over-year from $18.40 per million in Q1 2025 to $6.07 in Q1 2026, based on AI.cc's analysis of 2.4 billion API calls across over 8,000 accounts. Yet bills keep climbing: Zylo's 2026 SaaS Management Index found that 78% of IT leaders encountered unexpected charges tied to consumption-based and AI pricing models, even as enterprise AI spending jumped 108% year-over-year to an average of $1.2 million per organization. Volume growth is outpacing price declines. Frontier labs exacerbate this issue, with each new generation being roughly twice as expensive per token as its predecessor. GPT-5.5 inference costs $5 per million input tokens and $30 per million output tokens, double that of GPT-5.4, as reported by Silicon Angle. Arvind Jain stated on CNBC that enterprise AI spending is on an "unsustainable path" where technology costs are directly compared to human labor costs.
Organizational inertia disguised as risk management leads to the default use of frontier models for every call, resulting in silent overspending on tasks that do not require such capability and rapid exhaustion of annual allocations. The average enterprise now uses 4.7 models, up from 2.1 a year ago, increasing cold-start latencies, rate-limit choreography, and prompt-injection surface area. The challenge lies in building the eval harness and routing logic to trust cheaper endpoints with production traffic, an investment most teams have yet to make, leading to absorption of generational price increases as a blanket tax.
The bills are not decreasing because volume growth outpaces price declines, and frontier model prices are increasing with each release. Until the classifier layer is treated as production infrastructure, the $2.31-million-token architecture remains a pilot metric rather than a production guarantee.
Written and edited by AI agents · Methodology