Trimio Field Notes

Goodhart's Law: Meets Your AI Budget

May 22, 2026 7 min read finopsai-budgetengineering

70% of committed code AI-generated, annual AI budget exhausted in four months, zero per-team attribution. This is not a hypothetical. It is what happens when you set "AI adoption" as a target metric without building the financial infrastructure to govern it.

In 2023, Uber published an internal engineering memo describing exactly this dynamic. Engineers were measured on AI adoption — how much they used AI coding tools, how much they integrated AI into their workflows. Rational engineers, responding to rational incentives, maximized the metric. They used AI tools constantly, aggressively, on every task regardless of marginal value. Monthly AI spend per engineer reached $500–$2,000 depending on role and workflow. The annual budget projection, calibrated in Q4 of the prior year, was burned through by April.

4 months
$0 → Budget Gone
Uber's annual AI budget exhausted by April. Calibrated in Q4 the year before.
7–12×
Seat vs. Realized Cost
Published seat price: $20/month. Actual per-engineer cost: $150–250/month.
$200/engineer
Realized Monthly
The midpoint realized cost at full adoption — the number no one budgeted against.

This is Goodhart's Law in its most expensive form: when a measure becomes a target, it ceases to be a good measure.

Essential
Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. "AI adoption rate" is safe to optimize only when you have per-team attribution data that tells you what each point of adoption actually costs.

Why "AI Adoption" Is a Dangerous KPI Without Attribution

The problem isn't that engineers used AI tools. The problem is that there was no feedback loop connecting usage to cost to team. Finance saw a budget line item: "AI Tools — $X/month." Engineering saw a usage metric: "AI adoption — Y%." Neither team had a view that connected them.

The CFO could not answer: which team is driving 40% of our AI spend? Which use case costs $18 per request when the median is $0.40? Is the backend team's usage returning proportionally more value than the data team's? Without per-team attribution, these questions are unanswerable — and the budget conversation reduces to a single lever: cut or don't cut.

That's a terrible lever. "Cut AI spend" means "reduce adoption" means you're now optimizing against the original KPI. You've created a governance paradox: the metric you want to grow is the metric you need to constrain, because you can't see which part of it to constrain.

"I'm back to the drawing board, because the budget I thought I would need is blown away already."
— Uber CTO Praveen Neppalli Naga, April 2026

Published Seat Price
$20/mo
Advertised developer seat price
  • What was budgeted in Q4
  • Baseline interaction only
  • No agentic loop cost included
Realized Cost
$150–250/mo
Actual API token cost per engineer
  • What the April bill reflected
  • Includes agentic loops + re-prompting
  • 7–12× the published seat price
The Core Problem
AI spend is structurally untagged at the source. Every call looks identical to the billing system. Without a proxy layer that adds attribution metadata, per-team cost visibility is impossible.

The Anatomy of an Attribution Gap

Most engineering organizations today have the following visibility into AI spend:

This is roughly equivalent to receiving a cloud infrastructure bill that shows total compute costs but has no resource tags — no breakdown by service, environment, or team. Every serious engineering organization fixed that cloud problem five years ago with tagging strategies and cost allocation tools. The AI cost problem is structurally identical, and most organizations haven't fixed it yet.

The reason: AI spend flows through a single API endpoint (or a handful of them) rather than through dozens of discrete infrastructure resources. There's no natural tagging boundary. Every call to api.openai.com/v1/chat/completions looks identical to the billing system, regardless of which team triggered it, which product feature it powered, or which workflow it served.

KPIWithout AttributionWith Attribution
AI adoption rateA single number — no team breakdownRate by team, with cost-per-point context
Monthly AI spendTotal provider bill — opaqueBy team, use case, and model
Budget forecast accuracy±40–60% miss typicalWithin ±10% with per-team caps
The Fix
Per-team cost attribution converts "AI adoption" from a dangerous single-dimensional target into a governed, multi-dimensional metric. You can optimize adoption and cost simultaneously — but only once you can see both.

What Attribution Actually Requires

Attribution requires a layer that sits between your application and the LLM provider — a layer that can:

  1. Receive metadata about each call (team, product, use case, user) as part of the request
  2. Log that metadata alongside the token counts and model selection
  3. Aggregate and surface cost by those dimensions in real time
  4. Enable budget rules and spend caps at the team or use-case level

This is not a complex engineering problem. It's a routing and logging problem. The proxy receives a request tagged with team: backend and use_case: code-review, passes it through to the provider, and records the token cost against those dimensions. At the end of the month, finance gets a breakdown that looks like a cloud cost report — by team, by use case, by model, by period.

The engineering change required is minimal: add a header or metadata field to your existing LLM calls. No restructuring of application code, no changes to model selection logic, no new infrastructure. One proxy URL swap and a tagging convention.

Making "AI Adoption" Safe to Optimize

Goodhart's Law doesn't mean you can't use adoption as a metric. It means you can't use it in isolation. The fix is a companion metric that makes the adoption rate legible: cost per adoption unit, broken down by team.

With per-team attribution in place, the dashboard that matters looks like this:

Now you can see that the data team's requests cost 6× more on average than the backend team's — and ask why. Maybe it's justified (complex document processing). Maybe it isn't (using GPT-4 for tasks where Gemini Flash would suffice). The adoption metric is the same; the cost-per-unit metric tells you where to look.

Budget governance also becomes precise. Instead of "cut AI spend by 15%," you can say "data team: requests over $1.00/call require an engineering review." You're not slowing adoption — you're applying routing intelligence to the expensive outlier cases.

The Budget Cycle Problem
Enterprise budget cycles run annually. AI model pricing changes monthly. Attribution solves the "where is the money going" question. Real-time pricing awareness solves the "is this the right model for this cost" question. Both are required for a CFO to sign off confidently.

The Budget Cycle Problem

There's a second dimension to the Uber failure that attribution alone doesn't solve: the budget cycle mismatch.

Enterprise budget cycles run annually. AI model pricing changes monthly. The Gemini 2.5 Flash price point at announcement was different from the price at GA. GPT-4o's price has changed multiple times since launch. Claude Sonnet 3.5 launched at a price significantly lower than its predecessor. If your annual AI budget was set assuming last year's pricing, and a model you use heavily dropped 40% in price, you may be running significant budget headroom without knowing it — or you may be running over budget because you migrated to a model that's more expensive than the one in your baseline.

Attribution solves the "where is the money going" question. Real-time pricing awareness solves the "is this the right model for this cost" question. Both are required for a CFO to feel confident signing off on AI spend as a budget line.

The Trimio Approach
The proxy layer adds per-team, per-use-case attribution to every LLM call. When Gemini Flash drops to $2/M output tokens and routing rules say "use cheapest capable model," the price drop automatically routes more traffic to Flash — budget doesn't need renegotiation.

The Trimio Approach

Trimio's Finance Alpha is built around this problem. The proxy layer adds per-team, per-use-case attribution to every LLM call. The cost dashboard provides the breakdown finance needs for period close. Budget overlays let engineering leads set spend caps at the team level — the data team's cap triggers a routing change (shift to a cheaper model) rather than an outage.

The routing layer also addresses the budget cycle problem: when Gemini Flash drops to $2/M output tokens and your routing rules say "use cheapest capable model," the price drop automatically routes more traffic to Flash. Your budget doesn't need to be re-negotiated — the system captures the savings without a manual intervention.

Goodhart's Law is a warning, not a prohibition. The metric is fine. The problem is running it without the infrastructure to make it legible. Per-team attribution is that infrastructure.

Trimio is the LLM gateway built for teams where AI adoption is the goal — and AI cost governance is the prerequisite. Per-team attribution, budget caps, and routing intelligence in one layer. See how it works.

Trimio
Stop guessing. Start governing.
trimio is the LLM API gateway purpose-built for AI cost governance — visibility, routing, caching, and budget enforcement in one layer.