In November 2022, running GPT-3 via API cost approximately $20 per million tokens. By 2026, equivalent inference costs under $0.07 per million tokens — a decline tracked at roughly 300x since 2023, according to TokenCost’s AI Price Index. The curve has not flattened. Gemini 1.5 Flash is now priced at $0.075 per million input tokens; DeepSeek V3 at $0.14 per million. Open-weight models with comparable benchmark performance to GPT-4 of 18 months ago are available for free, self-hosted.
This is not a feature. It is a structural market signal — and it has very specific implications for where the next 18 months of AI investment will flow.
What the curve looks like at the application layer.
The cost collapse is most visible at the application layer, where companies built on AI inference are seeing their COGS fall faster than their pricing. A customer service chatbot that cost $8,000–$15,000 per month to run in 2023 now costs under $200 for equivalent throughput, according to pricing analysis across major providers. This is not incremental improvement. It is a complete restructuring of the unit economics for any company that consumes inference at scale.
The first-order effect was predictable: AI products that were economically unviable at 2023 pricing are now viable. Autonomous agents that run long context windows, multi-step reasoning chains, or continuous background tasks were prohibitively expensive 18 months ago. At current pricing, they are deployable at scale by a two-person team.
The second-order signal that investors are underweighting.
The more interesting signal is not that existing AI applications are getting cheaper — it’s that a large class of applications that didn’t get built because of cost constraints is now becoming buildable. Inference unit economics analysis consistently shows that the true cost per million tokens — when accounting for batching, caching, context compression, and model routing — is often 60–80% below list price for sophisticated operators. Companies that understand inference architecture are not just saving money; they are accessing capabilities that were genuinely unavailable at 2023 economics.
This matters for investment thesis construction in a specific way. The first wave of AI investment (2022–2024) went predominantly into infrastructure (foundation models, GPU clusters, vector databases) and into AI-enabled versions of existing software categories (AI writing, AI customer service, AI code). The infrastructure is now commoditising. The AI-enabled versions of existing software are getting squeezed on pricing as incumbents add AI features.
The third category — applications that require inference at a cost point that only became viable in 2025–2026 — has barely been built. These are products where the business model depends on running thousands of inference calls per user per day: autonomous background agents, continuous monitoring systems, real-time document intelligence, always-on advisory products. At $20/million tokens, these products didn’t work economically. At $0.07/million tokens, they do.
Where the signal points.
Three investment implications follow from the cost curve:
First, look for inference-intensive moats. As list prices fall toward zero, the competitive advantage shifts to companies that can extract more value per token — through better prompting, better context management, better task decomposition. The floor on inference pricing means commodity value accrues to the infrastructure layer; differentiated value accrues to the application layer that uses it intelligently.
Second, watch vertical AI deployments in cost-sensitive markets. Healthcare, legal, and financial services delayed AI deployment partly because the inference cost made the unit economics unworkable. At current pricing, the constraint has largely been removed. Verticals with high-volume, repetitive cognitive tasks are now approaching the cost-per-analysis thresholds at which AI substitution becomes economically rational.
Third, track the agent-hours economy. The emerging pricing model for AI is not per-seat or per-query — it is per-outcome or per-hour of autonomous work. As inference costs fall toward commodity, the pricing premium moves to the orchestration layer that chains inference calls into durable, useful work. This is where the next wave of enterprise AI revenue will be priced.
The Charaka View.
Manthan Intelligence’s calibration data tracks a consistent finding across the 1,000+ companies in our backtest corpus: the companies that generate the highest returns to investors are not those that arrived first in a category, but those that arrived when the enabling cost dropped below a viability threshold. The ~300x inference cost decline is that drop. The application layer built on 2022-era economics is not the application layer that will be built at 2026-era economics. Investors who are screening for the next Cursor or Perplexity should be looking for products whose daily-workflow ownership only became economically viable in the last 18 months — not products that have been running since the expensive era and are now just cheaper to operate.
This analysis draws on tokencost.app’s AI Price Index tracking inference cost decline, navyaai.com’s cost report on AI billing at the application layer, and aimagicx.com’s LLM API pricing comparison for 2026. Human editorial oversight applied.
This analysis is informational and does not constitute investment advice, a research report, or a recommendation to buy, sell, or hold any security.
Charaka Notes by Manthan Intelligence. Subscribe