The Modern Guide To Managing Cloud Costs

Table Of Contents

Introduction

The promise of the cloud was efficiency at scale — and not just build efficiency but cost-efficiency. However, throughout the 2010s, we saw organizations fall in love with the cloud’s build potential, racing each other to scale without much concern for their cloud bills.

The results: a trajectory toward $1 trillion in global annual cloud spend, a third of which goes to waste. If the trends continue, we’ll see more than $300 billion of wasted cloud spend by 2028. That’s more than $34 million wasted per hour.

That volume of waste is frightening enough, but scarier yet is investors’ shift from growth to profitability.

Throughout 2024, as markets saw prolonged contractions for the first time in over a decade, investors began weighing profitability more heavily than growth in company valuations. The more companies spend on the cloud with no ROI, the more it eats into their profitability and slashes their valuations.

Why is so much cloud spend wasted?

Organizations have not been blind to the issue of cloud waste — they’ve just taken a limited approach to managing it. Most have focused on optimization: reducing their overall cloud spend through provider savings plans, reservations, and usage discounts.

But calling it “optimization” is a little misleading. Yes, employing these methods can lop off a percentage of your cloud bill — but what if you’re paying for inefficient cloud resources? What if one of your customers costs you more money than all the rest? What if one of your freemium features is racking up more than its share of compute spend?

Optimization can’t answer deeper questions about inefficiencies like these — it can only make them cost a little less. But as you scale, these inefficiencies only become more pronounced, and no level of “optimization” can make them cost-efficient.

The age of optimization-only cloud cost management is drawing to a close. Organizations must transition to cloud cost intelligence to curb exponential cloud waste, turn cloud bills from ciphers into skeleton keys, and unlock profitability.

Traditional Cloud Cost Management Is Broken — Here’s Why

1. The traditional approach is reactive, not proactive

Before the advent of the cloud, IT procurement fell squarely on finance teams. Engineering teams would submit requests for additional IT resources, finance would evaluate the requests, and engineering teams would wait anywhere from 6-20 weeks to get the new servers if accepted.

The cloud destroyed this chain of command.

For the last decade or so, cloud-native engineers have had near-total autonomy over what resources they use and when. Cloud providers like AWS, Google Cloud Platform (GCP), and Microsoft Azure enable engineers to spin up virtual servers in seconds and innovate at breakneck speed.

The result: Finance has largely been at the mercy of engineers. Engineers rack up IT resources without fully understanding their cost, finance gets a bill at the end of the month without fully understanding the resources, and all they can do to “manage” the cost is negotiate better discounts.

But a discount doesn’t fix the underlying problem.

It would be like negotiating a better price on your heating bill rather than fixing your insulation. Fixing the insulation translates to spending less on heating, and a dollar unspent is infinitely more efficient than a dollar spent at a discount.

Building better in the cloud means cost-conscious engineering.

Engineers understand and care about the cost consequences of their cloud resources, can identify and root out the sources of cost spikes, and, over time, build cloud-efficient software whose cost benefits go deeper than discounts ever could.

2. Growth priorities historically sidelined cost awareness

“Stop Focusing on Profitability and Go for Growth,” read the headline of a 2017 Harvard Business Review article.

The article encapsulates the prevailing corporate narrative of the era: growth at all costs.

The cloud has perhaps been the central element of the growth-at-all-costs philosophy. By providing access to a potentially global customer base, the cloud expanded companies’ total addressable markets (TAMs) by unprecedented multiples.

The race was on: Whoever could innovate the quickest could grab the biggest market share. And from one perspective, the cost of being the fastest to scale is nothing compared to the potential rewards.

Thus, cost became a second-order metric — if that. And, throughout a decade-long bull market, companies could afford to keep it that way.

But considering cost a second-class metric promoted the second-class cost management solutions we discussed. Now, with markets faltering and cloud costs continuing to mount, companies are scrambling to promote cost to a first-class metric — and feeling the limitations of sheer cost optimization.

3. Traditional cost optimization tools don’t cut it

Consider these popular AWS cost management techniques:

AWS Reserved Instances

If you make a long-term (one- to three-year) commitment to a certain level of resource utilization, AWS will sell the resources to you at a discounted rate. AWS calls these resources Reserved Instances (RIs), and most traditional cost management tools help you optimize your RI planning.

These tools may recommend purchasing new RIs, modifying existing RIs for better coverage, or identifying underutilized RIs.

But there are several issues:

These tools’ effectiveness depends entirely on how well you’ve configured your AWS tags;
Cloud usage can vary a lot over a couple of months, let alone a couple of years; and
RI data doesn’t give you much insight into the ROI of your cloud investment. It just tells you whether you’re getting more or less of an overall discount.

AWS Savings Plans

Savings Plans (SPs) are nearly identical to RIs, except RIs work according to committed utilization, whereas SPs work according to committed spend. SPs seemingly offer more flexibility for companies that expect significant fluctuations in their cloud utilization.

Here are some examples of AWS SPs and types available now.

But again, the issue with SPs is that they don’t offer insights beyond raw usage data. You can’t use SPs to determine which customer is costing you the most in the cloud, how much a new product’s individual features cost, or how to remediate cost anomalies as they arise.

Rightsizing

“Rightsizing” means picking the right instances for the job at hand — for example, not using an a1.large when an a1.medium will do. (We actually developed a free tool, CloudZero Advisor, to explore the cost of all AWS instance types.)

Engineering teams periodically forget to terminate or suspend resources like Amazon EC2 instances when they’re not in active use. Rightsizing tools that identify these underutilized resources can help companies get some easy cost savings.

But altering resources might negatively impact features or customers — and rightsizing tools can’t anticipate those adverse effects. So, while you might be able to identify rightsizing opportunities, executing becomes a risky endeavor.

To reduce these risks, you need accurate unit cost (cost per feature, per product, per customer, etc.) data. This data gives you a better sense of what levers to pull without affecting mission-critical components, frustrating customers with sensitive workloads, or failing to meet service-level agreement (SLA) targets.

Spot Instances

AWS Spot Instances enables companies to use idle EC2 capacity on the AWS platform. Available capacity varies by time and location, but AWS offers steeply discounted Spot pricing when available.

Spot Instances are useful for fault-tolerant workloads, can handle interruptions without breaking, and don’t require high availability. But if you need consistent, reliable performance, Spot Instances are less than ideal.

Why Shift From Traditional Cloud Cost Management To Cloud Cost Intelligence?

1. It gives you 100% cost allocation — without tagging

In a perfect world, perfect tagging would be more than a myth.

Anyone who’s relied on tagging as a primary means of cloud cost allocation knows how hazardous it can be. There’s the simple challenge of keeping tags current. Then, there are numerous deeper challenges:

Keeping tag formats consistent. Tags are case- and spelling-sensitive; “Unit1” and “unit1” will show up as two different resources.
Normalizing tags after an acquisition. If your company acquires another company with a different tagging schema, it’s incredibly painful to change all of one organization’s tags to fit the new context.
Mapping untagged and untaggable resources. Some resources slip through the cracks and don’t have tags, while others can’t be tagged. Relying on tagging alone won’t incorporate these resources into your cost analyses.
Defunct tag names. Organizations often name tag categories after the relevant teams or leaders responsible for them. If team structure or leadership changes, the tag names apply to people/orgs that no longer exist, but there’s no easy way to change them all at once.
Combining tags categories to understand business dimensions. The aforementioned challenges make it very difficult to extract higher-order business intelligence (e.g., cost per customer per product feature) from tags alone.

In our experience talking with customers, working toward a “perfect tagging strategy” ends up feeling like banging your head against the wall. At the very least, companies get frustrated; at worst, they give up entirely.

Cloud cost intelligence breaks the tag barrier entirely. Instead of relying on tags, it uses code-driven allocation to break down the costs of different resources. Then, it lets you combine different dimensions of spend to get more granular insights: cost per video game feature per customer, for example.

2. It provides Kubernetes cost insights

Kubernetes — the name alone is enough to make you shudder.

Kubernetes is the open-source container orchestration system that engineers often consider a must-have. If you’ve tried and failed to understand what it is and how it provides value, you’re not alone — there’s an entire meme subculture dedicated to Kubernetes frustrations.

Organizations usually migrate to Kubernetes at the insistence of engineering teams who believe in the tool’s potential for greater resource flexibility. But cost tracking methods that worked before Kubernetes migration aren’t compatible with Kubernetes, and organizations that don’t anticipate this can lose all cost visibility.

Cloud cost intelligence platforms have easy-to-digest ways to present Kubernetes (and other containerized) cost information — like so:

You can then break up Kubernetes cost by namespaces, so you know where your Kubernetes spend is going (and why):

3. Traditional cloud cost management no longer provides the same competitive advantage

Traditional cloud cost management gave early adopters a competitive advantage in the early days of cloud adoption. Companies who paid any attention to cloud spend spent more efficiently than those who kicked the can down the road. Then, companies adopting AI-powered optimization tools could stay optimized without investing much manual effort.

But optimization tools are rapidly becoming standard, and more organizations are stepping up their FinOps games. As a result, cloud spend is now a board-level issue. FinOps is practiced in all major industries, and FinOps teams are growing. Optimization is a key part of FinOps — but is just one ingredient in a very complex recipe.

The good news: The deeper you dive below the surface of discounts, the richer the insights become.

Benefits of switching to cloud cost intelligence:

Understand how different customers impact your cloud spend
Map cloud spend to specific business activities
Explore your cloud costs with resource-level granularity in just a few clicks
Replace cumbersome tagging exercises with automated, telemetry-based cost allocation
Achieve higher levels of FinOps maturity
Get real-time notifications about cost spikes (and the resources responsible)
Unite finance and engineering teams around common language and metrics
Nurture a culture of cost-consciousness among all cloud users in your organization

So, how can you get to cloud cost intelligence?

Evolving From Traditional Cloud Management To Cloud Cost Intelligence

Evolving to cloud cost intelligence starts with a mindset. Traditional cloud management frames cloud cost as a liability — something to manage and minimize. Cloud cost intelligence flips the script, framing cloud cost not as a liability but as an asset.

How?

If analyzed properly, cloud cost is rich with insights that can drive strategic business decisions.

Given the abstracted nature of business in the cloud (relative to the physical parameters of a business like a membership gym), a single service can provide wildly different levels of value to different customers — at wildly different costs to you.

Managing these costs may knock a few tenths of a percentage point off your cost of goods sold (COGS). But gleaning intelligence from these costs can refine your business model, enhance cross-team collaboration, and prime you for sustainable profit at scale.

1. Start tracking and allocating cloud costs as soon as possible

There’s a reason cost allocation is “the most important thing you will do as someone involved in FinOps.”

All cloud insights depend on knowing what you’re spending and who you’re spending it on. The earlier you start tracking and allocating your cloud costs (in a way that goes beyond simple tagging), the more valuable these insights become.

Startups often wait until they approach $1 million in annual cloud spend to start tracking their cloud costs. Until then, they prioritize other metrics, like daily active users, monthly recurring revenue (MRR), and churn rate.

But by the time your annual cloud spend hits $1 million, you’ll have made numerous architectural choices that are hard to modify or undo. If you’ve committed to inefficient architecture, this could result in high recurring costs, lower gross margin, higher COGS, a slower path to profitability, and a lower overall valuation.

That’s not to say metrics like MRR and daily active users don’t matter — they do. But so do business metrics that tell you how efficiently you support those users and drive that revenue. The earlier businesses (especially cloud-native businesses) start tracking cloud cost, the more likely their growth will be sustainable.

2. Foster a culture of cloud cost-consciousness

A cloud cost-conscious culture is one in which every cloud user takes ownership of their cloud spending. This means the engineers who use public cloud infrastructure know what they’re spending, why they’re spending it, and whether their expenses are as economical as possible.

As your company grows, your cloud costs will naturally increase. A culture of cloud cost-consciousness can calculate whether cost increases correspond to natural growth cycles.

This becomes especially relevant in a few key scenarios:

Product/feature launches

Cloud cost can help you plan, execute, and evaluate the success of product and feature launches. Correlating customer uptake with cloud cost impact (and, for more sophisticated cost trackers, identifying the resources responsible for cost inefficiencies) can give you insights to inform future product strategy.

Fundraising rounds

Cloud cost data can help you quantify your overall business efficiency — a central factor when investors evaluate whether a company is ready for scale. A more efficient business is much more enticing to capital partners.

Day-to-day cost spikes

Cost spikes are inevitable. But without granular visibility, it’s extremely difficult to track down their sources, much less mitigate them. Daily insight into inefficiencies like these yields cost-saving opportunities and reinforces an attitude of cloud cost consciousness.

Cloud cost should be a daily practice — not just a monthly conversation among the senior leadership team.

3. Present cloud spend in a business context

Seeing that your spend increased or decreased means nothing without business context. To view your cloud spend in a business context, you need to slice it up into unit-level cost dimensions, giving you cloud unit metrics.

Here are a few common examples of cloud unit metrics:

Cost per customer

It’s practically unheard of for any two customers to cost you exactly the same amount in the cloud. This is especially true if you serve multiple different types of customers — small-to-medium-sized businesses (SMBs) and enterprise customers, for example.

Understanding your precise cloud cost per customer enables a range of strategic actions. If certain customers cost much more than others, you can direct engineering resources to understand why and/or revise your pricing model to more accurately reflect the value you’re delivering to different types of customers.

Cost per product (and per feature)

Likewise, different products might shoulder different cloud burdens and incur varied costs as a result. Certain features within products might be architected inefficiently, and optimizing them might be the solution to optimizing the product as a whole.

Cost per team

Evaluating the cost of different engineering teams can show you who’s achieving the greatest cloud efficiency, who’s lagging behind, and how efficiency principles can apply across teams.

What are the primary uses of unit cost insights?

The hallmark of FinOps maturity is delivering cloud insights to different teams in a language each of them can understand.

Finance and engineering teams, for example, will make different — but symbiotic — use of cloud insights. Here are the stakeholders you’ll want to inform and the overarching objectives for each:

Engineering team(s)

Overarching goal: Send targeted cost insights to relevant engineering teams, and make it easy to act on the insights
Key metrics: Cost per feature, cost per deployment, cost per development team, cost anomalies
Outcomes: Make (and document) efficient architectural decisions, address cost spikes early

Finance team(s)

Overarching goal: Turn cloud spend into business-relevant data (that doesn’t require specialized engineering knowledge to understand/use)
Key metrics: Cost per customer/tenant, cloud unit costs, COGS
Outcomes: Determine which types of customers are most profitable, optimize pricing models, improve renewals

Investors

Overarching goal: Attract more investors by presenting cloud efficiency as a proxy for cloud-native scalability
Key metric: Cloud efficiency
Outcomes: Attract more investment through informed ROI data, improve overall valuation

Unlock Cloud Cost Intelligence With CloudZero

CloudZero is the cloud cost intelligence platform that puts any cloud or software spend into context for your business.

CloudZero enables better strategic decisions, stronger unit economics, and more efficient spending by aligning engineering, infrastructure, and finance teams around common metrics. Trusted by top cloud-driven companies like New Relic, Rapid7, and Malwarebytes, CloudZero helps organizations of all sizes achieve cloud cost maturity.

How does it work?

CloudZero starts by ingesting billing data from all of your cloud providers. We then use a combination of code-driven cost allocation and telemetry-based customer usage information to break your billing data into whatever dimensions are most relevant to your business.

CloudZero helps customers overcome common obstacles, like:

Tagging. Tagging is the traditional method of cost allocation. But it’s a cumbersome, limited allocation method; according to our State of Cloud Cost Intelligence 2024 report, only 13% of companies have allocated 75% or more of their cloud costs. CloudZero’s code-driven cost allocation gives companies 100% cost allocation — without perfect tagging.
Shared resources. When multiple customers share the same cloud resources, it’s difficult to understand what costs to attribute to which customers. Because CloudZero can correlate billing data with usage-based telemetry streams, we can attribute percentages of cloud spend to specific customers and derive per-customer costs from shared resources.
Kubernetes. CloudZero combines container and non-container cost data in a single view, giving customers a comprehensive view of their cloud spend. Most other cost tools present container spend (Kubernetes, typically) in a separate view, making it extra challenging to derive granular cost insights.

The first generation of cloud cost solutions focused solely on reducing your cloud costs. However, on their own, cost-reduction exercises have dubious value.

Why?

Because as your business grows, your cloud costs will naturally grow.

Reducing your costs by some arbitrary percentage says nothing about the efficiency of your underlying architecture.

Plus, cost reduction exercises present an interesting catch-22. At the moment, they direct engineering resources away from innovation exercises — adding functionality, courting new customer segments, and finding new ways to delight your users. But if your business is successful and you grow to the next level, your underlying architecture remains inefficient, and you face the same cost challenges as before — this time at a new scale.

So, if you choose to address cost challenges, it inhibits innovation. But if you choose to innovate, it compounds cost challenges that simple optimization exercises can’t fix.

Cloud cost intelligence gives you the best of both worlds. It sends you real-time cost insights as you innovate, letting you iterate more quickly and scalably. It puts an end to the vicious cycle of cloud cost billing, turning cloud spend from a liability into a strategic asset.

Curious to learn more? Take a tour of CloudZero or

Any Cost Source, All In One View