I Fixed a $30K/Year Anomaly in the Time It Takes to Make Coffee

By Larry Advey // Director, Cloud Platform & FinOps

Contents

The Signal What Usually Happens Next What Happened Instead The Part That Actually Surprised Me What This Means For The Backlog

If you work in FinOps, you know the feeling. You open your recommendations queue on a Monday morning. There are 47 items. You worked through 12 of them last week. You’re back up to 47 again.

All represent real money leaving the building, but not all are “bad money” – of those 47, a significant share will be “this is ok, expected, we got value”. That’s what really kills FinOps enthusiasm (and is why the engineer is skeptical towards the FinOps person).

The cloud bill is the public-facing problem. The recommendations backlog is the private one. Every FinOps practitioner has one.

This is the FinOps tax. Not the cloud bill but the investigation. The time it takes to go from “Optimize found something” to “an engineer actually fixed it.” And if you’ve ever tried to manually correlate a cost spike with a Git commit late on a Friday, you know exactly what I mean.

So when I tell you we fixed a $30,000/year waste problem in the time it takes to make a coffee, I want you to understand: I’m not bragging about the technology. I’m bragging about getting an engineer to act on a cost recommendation the same day I found it.

The Signal

CloudZero Optimize runs daily on our infrastructure (not just our customers’). It found an S3 anomaly, an aggregation job writing at the wrong granularity, generating thousands of tiny files instead of a sensible few. One resource went from about $1/day to $100/day.

That’s ~$30,000 annualized. For a process accomplishing exactly nothing extra. If you’ve been in FinOps for any length of time, you’ve seen this flavor of waste before. Small config error, huge compounding cost, invisible until someone actually looks. The frustrating part is never the finding. It’s everything that happens next.

Report

Finance needs to prove AI’s return: CloudZero report

260 senior finance leaders (more than half CFOs) told us why the speed of seeing AI spend, not the size of it, separates who pulls ahead on AI from who gets burned.

Read the report

What Usually Happens Next

You get the recommendation to fix the waste. Now you need to figure out what actually happened.

You pull cost data and look at timestamps. You open GitHub in another tab and start scrolling through recent commits hoping something jumps out. You message the Slack channel for the team that might own this resource. You get a thumbs-up emoji and no other response for two business days. This is fine. This is normal. This is FinOps.

Eventually you piece it together. A good FinOps analyst resolves something like this in an hour or two. Regardless, you’ve burned half a day on one recommendation, and you have dozens more.

This is why recommendation backlogs exist. The cost of acting on each recommendation is high enough that the math doesn’t work.

What Happened Instead

I used a Claude skill from CloudZero’s AI Hub to enrich the recommendation, passing it through our MCP server to pull context from the tools my team actually lives in.

Here’s what happened, and I’ll be specific:

The skill retrieved the Snowflake query history and type, which confirmed exactly when the cost behavior changed; not a range, a specific date.
It cross-referenced GitHub pull requests around that window and identified the specific code change that introduced the problem.
It surfaced the engineer who pushed it, with the context already assembled.

The output wasn’t vague. It was: here’s what changed, when, who wrote it, why it probably caused this. Go confirm and fix it.

I sent that to the engineer. His response:

[Itinerant software engineer]: hot damn

[Itinerant software engineer]: I just read the first paragraph 😲

[Itinerant software engineer]: thanks for the report. I think it’s a good report, and something I will fix

He created a Jira ticket. He made the fix. He sent a detailed explanation:

[Itinerant software engineer]: I did a deeper investigation and it turns out we're triggering aggregation at the wrong grain. I made an assumption when creating the aggregator that each account would have a lot of data, but it turns out that assumption was wrong, so the aggregator isn't actually achieving one of its goals, which is to aggregate many small files into a few very large (~250-500MB) files. Instead, the files we're producing are on the order of 50KB. I'm going to change the way we trigger the aggregation to do it at an org level instead of connection level. That will result in much larger files and eliminate this extra cost. I made a jira ticket for this. Thanks again for the report!

The recommendation closes automatically and the savings hit the tracker. We went from signal to triage to evidence to action in one workflow.

The Part That Actually Surprised Me

I’ve been in FinOps long enough to be skeptical of anything that promises to make engineering care about cost recommendations. The bottleneck usually isn’t detection. It’s everything that comes after.

What I didn’t expect was how much the framing mattered. When you send an engineer a recommendation that says “S3 costs went up on this resource,” they have to figure out whether it’s their problem and what to do about it. The answer is usually “I’ll look at it later.”

When you send an engineer a recommendation that says “This commit you pushed on February 16th changed how this job aggregates data, and it’s been costing $100/day ever since and here’s the Jira ticket,” the answer is “Oh, yeah, I can fix that.”

Context converts recommendations into actions.

What This Means For The Backlog

Great story, right? But what about the 46 other recommendations?

Here’s what I’ll say: the enrichment step is the unlock, not the recommendation itself. Most of what keeps FinOps recommendations sitting untouched is the investigation overhead. If that overhead drops from hours to minutes, everything changes.

CloudZero’s MCP server already supports fetching and querying Optimize recommendations directly from AI coding tools, including Claude Code and, now, 12 others.

If you have a backlog of Optimize recommendations, this is what gets the backlog moving.

See how CloudZero Optimize + AI enrichment words, and what it can surface in your environment. Read about our Claude Code Plugin and 12 additional AI coding agents you can use with CloudZero

Author Spotlight

Larry Advey

Larry Advey brings decades of hands-on FinOps experience to his role as Director, Cloud Platform & FinOps at CloudZero. As Fred FinOps, he's a champion for cost optimization best practices across the FinOps community. Now he's focused on building FinOps education and enablement at scale.