Tech

Armstrong predicts shift to cheaper AI models as cost pressures mount

Mounting operational costs and rising token prices are prompting a potential industry pivot away from the assumption that larger models are always superior, with early tests showing significant savings without quality loss.

Author

Owen Mercer

Markets and Finance Editor

Published

Draft

Source: TechCrunch · original

Artificial Intelligence

Related coverage

Explore Artificial Intelligence coverage More from the Tech desk

Can tech companies learn to love cheaper AI models?

Coinbase co-founder forecasts 80% of workloads will move to lower-cost options within 18 months, challenging the scaling-first strategies of major labs ahead of IPOs

The artificial intelligence sector is undergoing a strategic pivot away from the long-held industry assumption that larger, more compute-intensive models are inherently superior. Historically, major laboratories such as OpenAI and Anthropic have pursued a "scaling-first" approach, driven by the belief that the most powerful models would win, while clients relied on heavily subsidised prices to access these advanced options. However, mounting operational costs and rising token prices are forcing a re-evaluation of this strategy. Coinbase co-founder Brian Armstrong predicts that within 12 to 18 months, 80% of AI workloads will shift to significantly cheaper models, with only 20% reserved for tasks requiring maximum intelligence.

This shift is evidenced by early tests from legal AI tool Harvey, in partnership with Fireworks AI, which demonstrated a threefold reduction in inference costs without compromising quality. The test utilised a hybrid approach involving Claude Opus for intensive tasks and Fireworks’ GLM 5.1 for others, resulting in significantly lower server time and overall costs. Harvey co-founder Gabe Pereyra noted that while quality remains paramount, the definition is evolving from using the most powerful model for everything to using the best model that gets the right answer most efficiently.

The trend challenges the "scaling-first" approach of major labs like OpenAI and Anthropic, potentially impacting their financial outlook as they prepare for initial public offerings. Armstrong’s prediction, outlined on X, suggests that demand for intelligence is near infinite, but the vast majority of tasks will run on models that are 99% cheaper. This represents a significant departure from the previous industry norm where companies competed on quality by defaulting to the most advanced available options.

The real industry divide is increasingly viewed as being between large and small models, rather than proprietary versus open-weight distinctions. While switching from GPT-5.5 to DeepSeek’s V4 Flash offers savings, switching to GPT-5.4-mini works just as well, indicating that the specific type of smaller model matters less than the move away from frontier compute. An active price war is emerging between in-house inference from big labs and independently served open-weight models, further pressuring the economics of training frontier models.

With token prices rising and investor subsidies slowing down, users are facing cost pressure for the first time. While it remains unclear whether enterprise users will fully adopt smaller models or simply reduce usage, the potential for a massive shift in AI economics is clear. If most deployments can be run effectively on smaller models, it could dampen the growing demand for inference and raise new questions about how to justify the high costs associated with training the most compute-intensive models possible.

Armstrong predicts shift to cheaper AI models as cost pressures mount

More from Tech

Florida lawmaker denies using AI to draft legislation after Claude signature found in draft

Xbox expands gamertag limits to 15 characters in latest Insider test

UK Police AI Rollout Proceeds Despite Audit Revealing Unreliable Predictive Models