When the cloud bill climbs, the first reflex is to find someone to blame. "Who ordered this m5.4xlarge?", "why does this service have 8 GB of memory for a 200 MB workload?" It is a convenient model, and a false one. Cloud overspend is almost never the result of negligence. It is the predictable sum of many locally rational decisions that add up to a collectively irrational outcome. And as long as FinOps tries to cure it with moralizing, nothing changes.
Defensive overprovisioning: everyone is right individually
Take a typical scenario. A service once died of OOM at three in the morning. The on-call engineer doubled the memory request — and was right: nobody wants to be the person whose name is signed on the incident. Another time latency spiked during an ad campaign, so CPU was bumped "with headroom across the board." Also reasonable: a spike costs more than hardware.
Each of these decisions, taken alone, is cautious and justified. But sum up N such decisions across hundreds of services over a couple of years and you get the picture every audit reveals: 70–80% of provisioned capacity sits idle. No one acted stupidly. What was stupid was the system that rewarded caution and never showed its price.
Hence the conclusion that changes the whole practice: blaming an engineer for overprovisioning is pointless. This is not an individual problem but a systemic one — and it has to be solved systemically:
- Visibility at the point of decision. A cost annotation right in the PR: "this change adds $40/month." Not a report two months later when the context is forgotten, but a number at the moment the engineer is still holding the change in their head.
- Sensible defaults on the paved path. HPA target 70%, requests sized to P95/P99 rather than the peak spike. Most people accept the default — so make the default correct.
- VPA in advisory mode. "Requested 1 GB, observed P99 — 200 MB." Not automatic trimming, but information: the decision stays with the engineer, except now it is an informed one.
- Collective accountability instead of personal. An error-budget-style mechanic: a team overshoots its quarterly cost target, so the next sprint budgets in a rightsizing pass. Accountability rests on the team, not on a scapegoat.
Showback before chargeback
The same logic produces a rule of sequence. Chargeback — actually debiting spend from a team's budget — looks like the "honest" mechanism. But rolled out unprepared, it creates billing friction: people start arguing with the numbers, hiding resources under someone else's tags, treating FinOps as the tax office.
Showback — "here is how much your team spent," with no debit — almost always delivers meaningful cost reduction on its own. Visibility works without coercion: just show an engineer the cost of their service and they will start optimizing it, because professional pride is stronger than fear of the budget. Show first; count later, if the maturity gets there. Chargeback without prior showback is a pain nobody signed up for.
The LLM as a new line of variable cost
In 2026 this logic gained a fresh dimension. AI agents in DevOps — triage, Slack bots, agentic workflows — turned LLM API calls into a visible line on the cloud bill. And here is the same trap of local rationality, only inverted: expensive not because someone over-insured, but because the tool was chosen without regard to the task.
A reasoning model at $75 per million output tokens, fired off on a routine kubectl get pods, is "a principal engineer called in to change a light bulb." Individually — "well, it is smarter." In aggregate — a bill nobody budgeted for. The discipline is the same as for infrastructure:
- Per-task model routing. Frequent, repetitive operations go to cheap/free models; reasoning is reserved for rare, deep investigations.
- Mismatch detection — an expensive model on a cheap task is a good indicator of hidden spend.
- Cost tags on API keys. Treat a key like an EC2 instance: team / service / cost-center are mandatory. Otherwise, six months from now nobody will remember whose agent burned the budget.
A real case: four monitoring crons that drove hundreds of calls a day on an expensive reasoning model delivered −80–85% in LLM cost after the routine was moved to a free tier. The architecture did not change — what changed was the fit between tool and task.
What follows from this
FinOps is not a "cost-cutting team" that engineers start to resist. It is an engineering discipline that recognizes: people in the cloud behave rationally within the information they have. Want a different outcome — change not the people but their information environment. Visibility at the right moment, correct defaults, feedback at the team level. Cost becomes part of the engineering decision — not a surprise at the end of the month.