Table Of Contents
The gap that existed and why it mattered What's changed Why this matters for AI-driven organizations Built for the way GPU workloads actually work Get started

If your organization is running AI or machine learning workloads on Kubernetes, the bill is real. GPU instances are among the most expensive resources in cloud infrastructure, where a single high-end node can run $30 to $40 per hour, and a multi-day training job on a cluster can cost tens of thousands before anyone looks up from their terminal.

What most engineering and FinOps teams haven’t been able to do (until now) is connect that spend to the workloads that caused it.

CloudZero has just released GPU reservation attribution for Kubernetes. GPU request costs are factored into CloudZero’s Kubernetes cost attribution model, attributed directly to the workloads that reserved them. For organizations running ML training jobs, AI inference pipelines, or GPU-accelerated containers, this changes what cost visibility actually means.

The gap that existed and why it mattered

Kubernetes cost attribution requires a cost model, or a way to divide the expense of shared infrastructure across the workloads running on it. CloudZero’s attribution engine has long distributed CPU and memory costs to workloads based on their resource requests. But GPU resources were not part of that calculation.

The result could be frustrating. GPU costs, among the most significant line items in any AI-driven organization’s cloud bill, were landing in idle and unattributed cost pools. A model training job requesting four GPUs would show CPU and memory costs. The GPU portion? Gone into the idle pool.

This created three specific problems for engineering and FinOps teams:

First, GPU-heavy workloads appeared cheaper than they were — making accurate decisions about efficiency, scheduling, or architecture impossible.

Second, idle costs appeared inflated. GPU spend in unattributed pools made it look like organizations were wasting more on idle infrastructure than they actually were.

Third, the most important question in AI infrastructure — what does it actually cost to run this workload? — had no reliable answer.

FinOps In The AI Era: A Critical Recalibration

What 475 executives told us about AI and cloud efficiency.

What’s changed

CloudZero now includes GPU requests in the Kubernetes cost attribution model, using the same methodology applied to CPU and memory.

Idle costs go down. GPU spend previously sitting in unattributed pools is now assigned to the workloads responsible for it.

Workload costs become accurate. If a workload requests GPUs, its cost now reflects the full resource reservation — CPU, memory, and GPU combined.

No action required. For customers running CZ-agent v1.2.0 or above, this change takes effect automatically. Nothing to configure, no migration to run.

All of CloudZero’s attribution capabilities apply. GPU costs flow into the same custom dimensions you already use — by team, product, customer, or any other business lens. Cost per ML training job, cost per inference workload, cost per team running GPU-accelerated containers — all of it is now computable inside CloudZero.

Why this matters for AI-driven organizations

The organizations feeling this gap most acutely are the ones moving fastest on AI: ML engineers running multi-day training jobs, platform teams managing GPU node pools, and product teams shipping AI features that hit inference endpoints thousands of times per hour.

For those teams, GPU spend is not a line item to check once a month. Was that training run worth the cost? Is our inference serving layer efficient at scale? Which team is driving the majority of our GPU spend, and is it generating the outcomes we expect?

These are not questions you can answer with a billing dashboard that lumps GPU costs into an idle pool. They require attribution — the kind CloudZero has delivered for CPU and memory costs for years, now extended to the resource type that matters most.

Built for the way GPU workloads actually work

GPU cost attribution is not a simple problem. GPUs are expensive, shared across workloads on the same node, and often reserved but not fully utilized — so the cost model must account for reservation, not just active usage.

CloudZero attributes GPU costs based on GPU requests. A workload that requests two GPUs bears the cost of reserving two GPUs, regardless of how intensively it uses them at any given moment. This mirrors how cloud providers charge for GPU instances and reflects the economic reality of GPU reservation in Kubernetes.

Get started

GPU reservation attribution is available now to all CloudZero customers running CZ-agent v1.2.0 or above. No setup required. If you’re on a compatible agent version and running GPU workloads, your cost data is already more accurate as of today.

If you have questions about how GPU attribution works in your environment, reach out to your CloudZero account team or visit cloudzero.com.

The cloud bill has always included your GPU spend. Now your cost intelligence does too.

FinOps In The AI Era: A Critical Recalibration

What 475 executives told us about AI and cloud efficiency.