Token Is the New COGS: Why AI Cost Discipline Became an Operating Discipline

The cost of running an AI model has collapsed. According to the Stanford AI Index 2025, the inference cost for a system performing at GPT-3.5 level fell from about $20 per million tokens in November 2022 to $0.07 by October 2024 — a more than 280-fold drop in roughly eighteen months. By every intuition, AI should be getting cheaper to operate. Instead, enterprise AI bills are climbing, and cost control has quietly become the single most important operational discipline in AI-native companies.

The data

The clearest signal comes from the people whose job is watching cloud and software spend. In the State of FinOps 2026 survey of 1,192 practitioners, 98% now report managing AI spend — up from just 31% two years earlier. AI cost governance has become near-universal, and “FinOps for AI” now ranks as the top forward-looking priority, with AI cost management the number-one skill teams are trying to hire for. In two years, governing AI spend went from a niche concern to a near-universal one.

The paradox resolves once you look at how AI gets consumed. Per-token prices fell, but consumption exploded — and the shape of consumption changed. A single agentic task no longer makes one model call; it fans out into many, as the agent plans, retrieves, calls tools, reflects, and retries. Token pricing is also volatile in a way a SaaS license never was: cost scales with context length, retrieval payloads, and retry behaviour, so a minor prompt change can double the bill overnight. The result is that AI spend behaves like cost of goods sold — it moves with every unit of usage — rather than like a fixed subscription. That single fact breaks the seat-based mental model most companies still budget with.

Why it matters

If token spend is COGS, then it belongs in engineering, not just procurement. The FinOps Foundation’s emerging playbook for AI reflects this: model-routing layers that send simple queries to small, cheap models and reserve frontier models for hard reasoning; per-task token budgets; and a shift from “cost per call” to “cost per outcome.” These are not finance controls bolted on after the fact. They are architectural decisions made while the system is being built.

For operators, the implication is concrete. Every agent should have a budget, every call should be traceable to an owner, and the choice of model should be a deliberate routing decision rather than a default. A team that defaults every task to its most powerful model will watch margins evaporate as usage scales — precisely when the product is succeeding. The companies that win the AI-native era will treat a unit of intelligence the way a manufacturer treats a unit of input: measured, sourced at the right grade, and never over-spent.

The Charaka View

This is a discipline Manthan Intelligence built in from the start rather than retrofitting. Every metered call in our architecture accepts an explicit spending cap and refuses to exceed it — a hard limit, not a guideline — with defaults set per product tier. Model selection runs through a resolver that routes work across a tiered stack: the heaviest reasoning to the most capable model, routine operations to a mid-tier, and high-volume classification to the cheapest — the same routing logic the FinOps community is now codifying as best practice. And our standing rule is zero external API spend by default: no agent incurs metered cost without explicit, capped authorisation.

The reason isn’t frugality for its own sake. It’s that cost per analysis is a first-class metric we track alongside accuracy, because an intelligence product that can’t predict its own unit economics can’t be priced, scaled, or trusted. The era of treating model spend as an invisible line item is over. Token is the new COGS — and the firms that learn to manage it like one will be the ones still standing when the subsidised pricing ends.

This analysis draws on the Stanford HAI 2025 AI Index Report, the State of FinOps 2026 data, the Linux Foundation’s State of FinOps survey release, and the FinOps Foundation’s FinOps for AI working group. Human editorial oversight applied.

This analysis is informational and does not constitute investment advice, a research report, or a recommendation to buy, sell, or hold any security.

Charaka Notes by Manthan Intelligence. Subscribe

The data

Why it matters

The Charaka View

Never miss an insight