Kubernetes cost management is a big challenge. Here’s how you can analyze your costs — and how CloudZero makes it easy.
The benefits for innovation with Kubernetes are clear: it can allow small teams to deliver more value, more rapidly. However, cost discussions around Kubernetes — and Kubernetes cost management — can be difficult.
You have disposable and replaceable compute resources constantly coming and going, on a range of types of infrastructure. Yet at the end of the month, you just get a billing line item for EKS cost and a bunch of EC2 instances.
When you try to do a Kubernetes cost analysis, the bill doesn’t have any context about the workloads being orchestrated — and it certainly doesn’t align the amount to business contexts like cost per tenant.
How do you calculate how much it really costs to run this or that feature in Kubernetes?
This seems like a simple question, but it requires some work to answer. It’s one thing to say how much your Kubernetes cluster itself costs to operate — that, after all, is a group of real servers, associated with a specific number.
If you’re running Kubernetes via EKS on AWS (which we’ll assume for the rest of this post, even though a lot of the information generalizes pretty well) that would be the line item costs associated with a set of EC2 instances.
Removing Kubernetes from the equation, knowing these server costs gets us pretty close to understanding the cost of operating the service. Let’s imagine we’re building a social media site that currently consists of three distinct backend services:
In a conventionally server-based architecture, we might run this with a load balancer in front of a distinct collection of EC2 instances running each service. So, if our “chat” service is running on three t3.medium instances, we can roll the cost of operating those into the total cost of that service.
Does that tell us everything we need to know about the cost of operating our services? Not quite; the “content” service probably hits S3 to store block data, and has a database to store metadata about posts. Maybe it shares this database with the “users” service directly, or exposes it via an internal API. In any case, the “chat” service also needs its own database, along with somewhere to store ephemeral data about ongoing conversations (“user xyz is typing a message …”).
To completely understand all this, we need to distribute the costs of using these external services to the parts of our business that are using them. But, at least regarding direct server costs — the piece that Kubernetes will impact — the answer is basic enough: The server cost of any given service is the cost of the servers it runs on.
So, why is Kubernetes cost optimization different? As noted, we could give you a basic answer about your Kubernetes cluster itself — the cost of a cluster is the cost of the servers it runs on. But that doesn’t necessarily answer the questions we need answered to run our business effectively; for that we need a detailed Kubernetes cost analysis.
Now, imagine our monthly bills are skyrocketing and we need to "debug" our costs. Consider these two scenarios:
Scenario 1: Our “chat” service is a compute monster sitting on twenty c5.18x.large instances and still running out of CPU, but “users” and “content” are both happily plugging along with clusters of three m5.large each. Clearly it’s “chat” that’s driving our costs and probably needs some serious rethinking.
Scenario 2: All of our services were running on a single Kubernetes cluster of twenty-one c5.18xlarge machines. The total cost of running that cluster wouldn’t by itself tell us anything about that kind of imbalance, or about which of our features might be responsible for most of our costs. It would be like looking at just a bottom line in place of our entire AWS bill, without anything broken down into individual line items, and then guessing from there.
“Debugging” cost gets a lot more complicated when you’re running Kubernetes.
To get closer to thinking about Kubernetes service costs, first let’s reconfigure our raw server-based architecture a little. What if, instead of running each service on its own separate cluster, we just had one cluster of machines, and each of them hosted some subset of our services? In fact, let’s make it a little easier to do this on paper — what if we had each of our instances always host a copy of each of our services? We’d have chosen a pretty silly architecture with some illogical scaling properties, true, but bear with us for a minute. Now we still have specific, concrete costs for operating each server, but we need some intermediate model to say how much of that cost belongs to each service.
What drives the need to scale our cluster up or down? After all, that’s the clearest meaning of “driving costs” — what’s making us unable to operate our cluster using fewer, cheaper resources?
Generally, scaling is driven primarily by two things: CPU and memory. So, if “chat” is using 80% of an instance’s CPU and 20% of its memory while “users” and “content” are both using 3% of each, we can look at these numbers and distribute the total cost of the machine into four buckets: “chat,” “users,” “content,” and “unused.”
It’s still a little bit tricky — we need some way of deciding how to weigh the relative cost of memory and CPU. At CloudZero, we’ve approached this by building an econometric model that estimates — all else being equal — the relative marginal costs of adding one additional vCPU and one additional GB of memory to a given instance (to be clear: this is a useful modeling conceit, not an actually available exchange).
Here’s the model we use:
Let’s say our c5.18xlarge costs $3/hour; it has 72 vCPUs and 144 GB of memory. Let’s say one additional vCPU costs 8.9x as much as one more GB of RAM. This would mean that $2.45 of our hourly cost is attributable to compute cost, $0.55 to memory. And, further, $2.07 of the $3 belongs to the “chat” service, $0.09 each to “users” and “content,” and $0.84 is unused. Now we’re back to a model that shows what engineering-meaningful unit is driving costs — that pesky “chat” service — and with a model that we’ll be able to carry over directly to Kubernetes.
Kubernetes itself is a way of running all of these services across a single underlying cluster of servers, even if it’s a considerably smarter one. Here, instead of just spinning up an instance of each service on each server and letting it do what it does, each of these services will be encapsulated in a logical set of pods, and then Kubernetes will do its container orchestration magic to schedule those pods onto the cluster as needed.
Exactly the same logic discussed here applies to breaking node costs out into the pods that are using them — the only Kubernetes-specific part of the procedure comes from collecting those metrics about compute and memory usage. Really, “pod” is the only Kubernetes abstraction that we need directly, because it’s the atomic unit of scheduling, and based on it we can reconstruct any higher-level abstractions we might want to use.
AWS bills EC2 instances on an hourly basis, but a variety of pods belonging to various services, namespaces, and so on could spin up and down on a given instance over the course of that hour.
Fortunately, Kubernetes exposes a number of Prometheus-formatted metrics on its /metrics endpoints that we can use: pod_cpu_utilization and pod_memory_utilization tell us what is going on minute-to-minute.
A Kubernetes pod can also reserve memory and CPU, setting a floor on the resources it partitions off for its own usage while running, so that really a running pod is “using” the maximum of pod_cpu_reserved_capacity (if present) and pod_cpu_utilization. If we’re reserving much more than we’re actually using, our “optimization” might be as trivial as changing that reservation number. But, even so, we’re still driving costs by demanding a ton of CPUs.
Now we have enough information to answer how much our Kubernetes service costs to operate. First, we take our AWS bill with per-instance costs for the cluster nodes. Then, we collect Prometheus usage metrics from our Kubernetes cluster. We use Amazon’s Container Insights for this collection process, which gives us minute-by-minute data. However, any collection process will work, so long as we get those usage/reservation metrics and a way of correlating Kubernetes pods back to EC2 instance IDs (and thus to our AWS bill). This is also available directly in Container Insights.
Now, we can get a pod’s hourly utilization as the sum of its per-collection-period totals — the max of reserved and utilized like we discussed before, and effectively zero for collection periods in which a pod didn’t run and so is absent from the metrics — divided by the number of collection periods per hour. Break out the instance’s costs into memory and CPU like before, partition those costs based on utilization, and voila! Per-pod costs!
So, service costs are just the sum of pod costs belonging to that service. And, identically, we can construct costs for other higher-level Kubernetes abstractions like namespace in exactly the same way, by summing over pods.
So that’s how we can calculate the cost of Kubernetes. Drop that all into a spreadsheet and off you go — for hours of mind-numbing data analysis.
CloudZero now offers an incredibly simple way to do a Kubernetes cost analysis and view detailed breakdowns of real cost by cluster, namespace, or pod down to the hour. And those costs can be understood in the context of what’s important to your business — by product and feature or by team and business unit, for example.
Are there unexpectedly high costs for a given feature? Alert the team responsible for it in their Slack channel. Want to see the impact that a given release from your CI/CD pipeline had on COGS for a product? We can do that, too.
How does it work? In brief, we bring container utilization data, AWS cost data, and information about your business context all together in the CloudZero platform and apply our own proprietary algorithms to accurately and automatically allocate costs within your Kubernetes clusters. There are no manual rules for you to create, just configure the data ingestion and sit back — the CloudZero platform does the rest.
For runtime insight about your containerized workloads, what better source than the platform where those workloads are running?
AWS CloudWatch Container Insights is a service that does the Kubernetes cost analysis for you by collecting, aggregating, and summarizing metrics and logs from your containerized applications and microservices. It discovers all of the running containers in a cluster and collects performance and operational data which you can view on dashboards or use with CloudWatch alarms.
CloudZero ingests a small amount of Container Insights data, which is generated by the CloudWatch Agent, to power the container cost allocation feature.
If you’re using (or would like to use) the full Container Insights service, great — all you need to do is set the permissions for the CloudZero platform to read the CloudWatch data. But if you don’t want to use the full Container Insights service, a custom configuration is available that will only log the information needed for the cost allocation capability.
CloudZero combines the metrics from the AWS Container Insights service with AWS billing information to automatically allocate costs to the workloads being orchestrated by Kubernetes. Generally speaking, this proportional algorithm works across a broad range of EC2 instance types, including those with SSD, NVMe SSD, GPU cores, GPU memory, and networking enhancements.
CloudZero customers can see the cost of each individual pod, namespace, and cluster within Kubernetes. More importantly for their business, they can see which product features those costs correlate to — helping them answer key questions about what it costs to build and run their products.
With this information, you can understand the cost of each individual containerized workload just like you would any other non-containerized resource — or, as is often the case, along with related non-containerized resources like storage or networking — to get a complete understanding of your software’s COGS.
And you can bring that understanding to the individual engineering teams responsible for each component of your solution so they can make better decisions to improve your company’s bottom line. Best of all, you can do it without crazy spreadsheets or a dedicated financial analyst to help.
CloudZero is the only cloud cost intelligence platform that combines metrics from the AWS Container Insights service with AWS billing information to automatically calculate Kubernetes costs.
With CloudZero, you can easily go beyond saving on your cloud costs to focus on what really matters: how your spending aligns with your business strategies and objectives.