In April 2026, Uber's CTO Praveen Neppalli Naga said something most CTOs are saying privately. He said it publicly, on the record:
"I'm back to the drawing board, because the budget I thought I would need is blown away already."
— Praveen Neppalli Naga, CTO, Uber (ByteIota, May 2026)
Uber's annual AI budget — sized against a $3.4B R&D base — was exhausted by April. Four months in. The cause was not strategic miscalculation. It was a structural gap between the published seat price of an AI dev tool and the realized cost per developer at production scale.
Every engineering organization deploying AI dev tools in 2026 is on a path that ends somewhere on Uber's curve. This post is about what to do before you get there.
What Uber publicly disclosed:
The arithmetic, conservatively: $200/engineer × 5,000 engineers × 12 months = $12M/year just for Claude Code, against an advertised $20 × 5,000 × 12 = $1.2M/year.
The 10× spread is not a Claude Code anomaly. It is a structural property of any agentic AI dev tool at production-team scale.
Three reasons, all worth understanding because they apply to any AI dev tool you might roll out:
The advertised seat price assumes a developer occasionally asks the AI a question. In production, the developer is running:
Each of these is multi-call agentic activity. The pricing model that assumes "one chat per day per user" doesn't survive contact with developers who are using AI as a primary coding partner.
Computeleap documented in May 2026 that Claude Code's prompt cache TTL was reduced from 1 hour to 5 minutes on March 6, 2026. The change wasn't loudly announced. The result was measurable:
Any organization that budgeted on January-February economics was instantly underwater when the TTL change took effect — without their budget assumptions ever being formally invalidated. The vendor's pricing model changed silently; the customers' projections didn't.
A team of 5,000 engineers where 95% use AI monthly and 70% of code is AI-generated has saturated the tool. The cost-per-engineer at saturation is the relevant number — not the cost-per-engineer in pilot, when adoption was 30%.
Pilot economics are wildly flattering for two reasons:
Production economics are realized only after the tool has been in use long enough for adoption depth to plateau. For engineering tools, that's usually 6-12 months.
If you have an AI dev tool deployment at >100 engineers (or are planning one), three actions before you get to the surprise:
Not the seat price. The actual API token cost being billed against that engineer's usage. If your vendor doesn't surface this, your gateway should. If neither does, you need to add instrumentation before scaling further.
The number to track: realized cost per active engineer per month, by week. The week-over-week trend is the leading indicator.
Cache TTLs. Batch tier discounts. Retention policies. Read/write ratio shifts. These are levers vendors pull silently. If you're not monitoring them, you're paying the new price without knowing the old price changed.
A useful diagnostic: at the end of every month, compute your cost-per-token spent. If it changes by more than 10% without an obvious reason, investigate. There is almost certainly a vendor-side change you missed.
The right ceiling is not the published seat price. It is 2-3× the published seat price as a soft warning and 5-10× as a hard cap — the latter being the structural multiplier most agentic dev tools land at.
If your CFO has not been briefed on this multiplier, brief them now. The $20/seat number in the procurement contract is informational. The $150-250/seat realized cost is what gets budgeted against.
This is not a "Claude Code is bad" post. Claude Code is, by most engineering accounts, the strongest AI coding tool available. The 70% AI-generated code at Uber is a productivity story, not a failure story. The companies that don't roll out AI dev tools in 2026 are going to lose engineering productivity competitions to companies that do.
The point of the Uber lesson is not "don't deploy these tools." It's deploy them with eyes open about the unit economics. The realized cost is high. The productivity gains may justify it. But finance has to be in the conversation from Day 1, not Q2 of next year.
Distilled:
The Uber disclosure is the most useful AI cost overrun case study published in 2026 because it's specific, public, and from a CTO with no incentive to misrepresent the numbers. Every CFO and engineering leader has a ten-minute task: read the realized cost numbers, compare them to your own assumed seat price model, and decide whether your projections survive the comparison.
If they don't — and most don't — you have time to act. That's the point of writing this down before you get to your own four-month budget burn.
Trimio is the LLM API gateway built for AI cost governance — including realized-cost-per-engineer tracking, automated alerts when vendor pricing patterns shift, and routing that captures savings even on heavy agentic dev tool usage. See how it works.