Cognition raised $1B+ at $26B — evaluation function death diagnosis for AI agent companies

Cognition just raised $1B+ at a $26B valuation. The autonomous coding agent is real.

And the raise is actually bad news for most of the AI agent companies in your portfolio.

Here’s the death diagnosis.

Cognition works because code has a perfect evaluation function. Write a function. Run the test suite. A thousand simulations. Pass or fail. The agent knows within seconds whether it was right. Devin can compound its intelligence because there’s an objective score. Raise the score. Ship the product. Repeat.

This is a structural advantage that $492M of run-rate revenue validated this week.

Now ask yourself: how many of the “autonomous agent” companies launched since 2023 have an equivalent evaluation function?

“Did the legal advice hold up in court?” Outcome known in 18 months. No fast feedback loop. “Did the investment thesis prove correct?” Known in 5–7 years. No compounding signal. “Did the consulting recommendation improve revenue?” Causal attribution is impossible. Attribution is a fiction.

Without an objective evaluation function, the agent can’t learn. It can’t calibrate. It can’t improve. What you have isn’t an agent — it’s a sophisticated prompt wrapper with a SaaS pricing model.

Cognition at $26B validates one category: agents with hard evaluation functions. Code. Mathematics. Anything with a test suite or a simulation.

It quietly invalidates the business model for most of the rest.

This is why Manthan Intelligence spent the first four months building the evaluation infrastructure before the product.

We didn’t launch with an impressive demo. We launched with a backtest.

Every company we assess gets a verdict locked before we look at what actually happened to it. The calibration score is public on getmanthan.com/live: K1 weighted accuracy 63.35%. INVEST reliability 96.3% across 284 graded deals. Wrong 36.65% of the time — and we know exactly which deal types we’re wrong about (CLP over-issuance, biotech blind spots, EdTech pattern gaps).

That’s not an AI demo. That’s a measurement system.

Investment intelligence is a domain without a natural fast feedback loop — an outcome takes 5–7 years. We built a synthetic one: a blind backtesting protocol that compresses that signal into 48 hours. That infrastructure is the defensible moat. Not the model. Not the prompt.

The death diagnosis for the AI agent category:

Cognition’s success will attract a thousand imitators who misread the lesson.

The lesson isn’t “autonomous agents win.”

The lesson is: autonomous agents win when you have a fast, objective evaluation function. Build that first.

If you can’t articulate what your evaluation function is, ask hard questions about what compounds in your system — and what just drifts.

Manthan Intelligence runs a calibrated investment intelligence system: K1 63.35% weighted accuracy, 284 graded deals, blind backtesting every Sunday. Intelligence infrastructure at getmanthan.com

Read more at getmanthan.com

Never miss an insight