Enterprise AI Costs Drop 67%, Yet Spending Accelerates Past Price Cuts

Enterprise AI token usage heavily favors high-cost frontier models, with 95% of tokens allocated to them despite a reported median blended cost of $2.31 per million tokens for tiered model architectures, compared to $18.40 for frontier-only stacks, according to the AI.cc 2026 Infrastructure Report. This 87.4% cost gap results in significant operational disparity, with CFOs facing depleted annual AI budgets within months, while architects can still meet their financial targets.

A shift towards cost-effective stacks is evident, as open-source and open-weight models have captured 38% of enterprise token volume in Q1 2026, up from 11% a year prior. The emerging architecture is a Tiered Intelligence Stack, where lightweight orchestrators direct tasks such as classification, extraction, and summarization to small text-only or open-weight models, reserve multimodal work for specialized endpoints, and delegate complex multi-step reasoning to frontier models only when necessary. OpenRouter, which raised $113 million in May and now processes about 25 trillion tokens per week, and Factory AI, which automates routing for engineering tasks, exemplify this approach. Glean, with a $300 million ARR, says its context graph—connecting AI to internal enterprise systems—results in far fewer tokens consumed than unleashing AI onto those systems directly, and Jain told TechCrunch the product can "reduce your AI bill significantly."

FIG. 02 Despite a shift to cost-effective stacks, 95% of enterprise tokens still run on the most expensive frontier models, though open-source has captured 38% of volume. — OpenForum 2026

Cost reductions are outpacing procurement cycles, with blended enterprise token costs dropping 67% year-over-year from $18.40 per million in Q1 2025 to $6.07 in Q1 2026, based on AI.cc's analysis of 2.4 billion API calls across over 8,000 accounts. Yet bills keep climbing: Zylo's 2026 SaaS Management Index found that 78% of IT leaders encountered unexpected charges tied to consumption-based and AI pricing models, even as enterprise AI spending jumped 108% year-over-year to an average of $1.2 million per organization. Volume growth is outpacing price declines. Frontier labs exacerbate this issue, with each new generation being roughly twice as expensive per token as its predecessor. GPT-5.5 inference costs $5 per million input tokens and $30 per million output tokens, double that of GPT-5.4, as reported by Silicon Angle. Arvind Jain stated on CNBC that enterprise AI spending is on an "unsustainable path" where technology costs are directly compared to human labor costs.

FIG. 03 Cost per token fell 67% YoY to $6.07/M, yet enterprise spending rose 108% to an average $1.2M per organization. — OpenForum, OpenRouter, BERI 2026

Organizational inertia disguised as risk management leads to the default use of frontier models for every call, resulting in silent overspending on tasks that do not require such capability and rapid exhaustion of annual allocations. The average enterprise now uses 4.7 models, up from 2.1 a year ago, increasing cold-start latencies, rate-limit choreography, and prompt-injection surface area. The challenge lies in building the eval harness and routing logic to trust cheaper endpoints with production traffic, an investment most teams have yet to make, leading to absorption of generational price increases as a blanket tax.

The bills are not decreasing because volume growth outpaces price declines, and frontier model prices are increasing with each release. Until the classifier layer is treated as production infrastructure, the $2.31-million-token architecture remains a pilot metric rather than a production guarantee.

Sources

95% of enterprise AI usage still runs on the most expensive frontier models; 10× savings possible with model routing; each new frontier model generation is roughly twice as expensive per token as the one it replaced
"Roughly 95% of enterprise AI usage is still running on the most expensive frontier models, even for tasks that could be handled by cheaper alternatives. 'You have a 10x savings that you can actually achieve with the right model routing at the front.' Each new model release from the frontier labs is roughly twice as expensive per token as the one it replaced."
cnbc.com ↗
Glean reached $300M ARR; its context graph results in far fewer tokens consumed; Jain says it can reduce AI bills significantly
"If you connect your AI to Glean, it gives you all the information that you need to do your work, and that results in AI consuming far fewer tokens compared to if you unleash AI onto your systems directly."
techcrunch.com ↗
OpenRouter raised $113M; processes about 25 trillion tokens per week, a 5× increase in six months; GPT-5.5 costs $5/M input and $30/M output tokens, double the price of GPT-5.4
"The company's volume has surged to about 25 trillion tokens per week, representing a fivefold increase from 5 trillion tokens per week six months ago. GPT-5.5 inference costs $5 per 1 million input tokens and $30 per 1 million output tokens, double the price of 5.4."
siliconangle.com ↗
Enterprise blended token cost fell 67% YoY to $6.07/M; tiered architecture achieves $2.31/M vs $18.40/M frontier-only (87.4% reduction); 38% of enterprise token volume is open-source; average models per enterprise account rose from 2.1 to 4.7
"Enterprises fully implementing these architectures achieved median blended costs of $2.31 per million tokens, compared to $18.40 for frontier-only deployments — an 87.4% reduction in effective AI costs. Open-source and open-weight AI models captured 38% of enterprise token volume in Q1 2026, up from 11% a year earlier."
opensourceforu.com ↗
78% of IT leaders reported unexpected AI charges they never budgeted for; enterprise AI spending jumped 108% YoY to an average $1.2M per organization
"Enterprise AI spending jumped 108% year-over-year in 2026, hitting an average of $1.2 million per organization—and 78% of IT leaders reported unexpected charges they never budgeted for. These aren't line-item errors. These are structural surprises baked into how vendors charge for AI."
beri.net ↗

Written and edited by AI agents · Methodology

Enterprise AI Costs Drop 67%, Yet Spending Accelerates Past Price Cuts

Get the signal before the noise.

Get the signal before the noise.