How much does your Kubernetes service cost to operate? This seems like a simple question, right? It’s one thing to say how much your Kubernetes cluster itself costs to operate — that, after all, is a group of real servers, associated with a specific number. If you’re running Kubernetes via EKS on AWS — which I’ll assume for the rest of this post, even though a lot of the information generalizes pretty well — that would be the line item costs associated with a set of EC2 instances.
Now, if we consider the world without Kubernetes, knowing these server costs gets us pretty close to understanding the cost of operating the service. Let’s imagine that I’m building a social media site that currently consists of three distinct backend services: a “content” service, which contains the full text of users’ posts, together with things like comments and reactions; a “users” service, which contains lists of time-ordered references to the content each user has created or reposted; and a “chat” service, which allows real-time text communication between specific users. In a conventionally server-based architecture, I might run this with a load balancer in front of a distinct collection of EC2 instances running each service. So, if my “chat” service is running on three t3.medium instances, I can roll the cost of operating those into the total cost of that service.
Does that tell me everything I need to know about the cost of operating my services? Well, no, not quite. The “content” service probably hits S3 to store block data, and has a database to store metadata about posts. Maybe it shares this database with the “users” service directly, or exposes it via an internal API. In any case, the “chat” service needs its own database, too, along with somewhere to store ephemeral state about ongoing conversations (“user xyz is typing a message…”). To completely understand all this, I need to distribute the costs of using these external services to the parts of my business that are using them. But, at least regarding direct server costs — the piece that Kubernetes will impact — the answer is basic enough: the server costs of any given service is the cost of the servers it runs on.
Kubernetes Cost Optimization Requires a Different Approach
So, why is Kubernetes cost optimization different? I said in the beginning, I could give you pretty much the same basic answer about your Kubernetes cluster itself — the cost of a cluster is the cost of the servers it runs on. But that doesn’t necessarily let me answer the questions I need to run my business effectively. Now, imagine your monthly bills are skyrocketing and you need to "debug" your costs. Consider these two scenarios:
Scenario 1: My “chat” service is a compute monster sitting on twenty c5.18xlarge instances and still running out of CPU, but “users” and “content” are both happily plugging along with clusters of three m5.large each, clearly it’s “chat” that’s driving my costs and probably needs some serious rethinking.
Scenario 2: All of my services were running on a single Kubernetes cluster of twenty-one c5.18xlarge machines, the total cost of running that cluster wouldn’t by itself tell me anything about that kind of imbalance, or about which of my features might be responsible for most of my costs. It would be a little like looking at just a bottom line in place of my entire AWS bill, without anything broken down into individual line items, and then guessing from there.
“Debugging” cost gets a lot more complicated when you’re running Kubernetes.
To get closer to thinking about Kubernetes service costs, first let’s reconfigure our raw server-based architecture a little. What if, instead of running each service on its own separate cluster, I just had one cluster of machines, and each of them hosted some subset of my services? In fact, let’s make it a little easier to do this on paper — what if I had each of my instances always host a copy of each of my services? “Well then, you’d have chosen a pretty silly architecture with some hard-to-reason-about scaling properties.” Yes, true, but bear with me for a minute. Now I still have specific, concrete costs of operating each server, but I need some intermediate model to say how much of that cost belongs to each service.
To Measure Cost, Look To What Scales Your Infrastructure Up or Down
Well, what makes me need to scale my cluster up or down? After all, that’s the clearest meaning of “driving costs” — what’s making me unable to operate my cluster using fewer, cheaper resources? Generally, scaling is driven primarily by two things: CPU and memory. So, if “chat” is using 80% of an instance’s CPU and 20% of its memory, while “users” and “content” are both using 3% of each, I can look at these numbers and distribute the total cost of the machine into four buckets: “chat”, “users”, “content”, and “unused”. It’s still a little bit tricky — I need some way of deciding how to weigh the relative cost of memory and CPU. At CloudZero, we’ve approached this by building an econometric model that estimates, cet. par., the relative marginal costs of adding one additional vCPU and one additional GB of memory to a given instance (to be clear: this is a useful modeling conceit, not an actually available exchange). So, let’s say my c5.18xlarge costs $3/hour; it has 72 vCPUs and 144 GB of memory. Let’s say one additional vCPU costs 8.9x as much as of one more GB of RAM. This would mean that $2.45 of my hourly cost is attributable to compute cost, $0.55 to memory. And, further, $2.07 of the $3 belongs to the “chat” service, $0.09 each to “users” and “content”, and $0.84 is unused. Cool! Now we’re back to a model that shows what engineering-meaningful unit is driving costs — that pesky “chat” service — and with a model that we’ll be able to carry over directly to Kubernetes.
Kubernetes itself is a way of running all of these services across a single underlying cluster of servers, even if it’s a considerably smarter one. Here, instead of just spinning up an instance of each service on each server and letting it do what it does, each of these services will be encapsulated in a logical set of pods, and then Kubernetes will do its container orchestration magic to schedule those pods onto the cluster as needed. Exactly the same logic discussed here applies to breaking node costs out into the pods that are using them — the only Kubernetes-specific part of the procedure comes from collecting those metrics about compute and memory usage. Really, “pod” is the only Kubernetes abstraction that we need directly, because it’s the atomic unit of scheduling, and based on it we can reconstruct any higher-level abstractions we might want to use.
AWS bills EC2 instances on an hourly basis, but a variety of pods belonging to various services, namespaces, and so on could spin up and down on a given instance over the course of that hour. Fortunately, Kubernetes exposes a number of Prometheus-formatted metrics on its /metrics endpoints that we can use: pod_cpu_utilization and pod_memory_utilization tell us what is going on minute-to-minute. A Kubernetes pod can also reserve memory and CPU, setting a floor on the resources it partitions off for its own usage while running, so that really a running pod is “using” the maximum of pod_cpu_reserved_capacity (if present) and pod_cpu_utilization — if I’m reserving much more than I’m actually using, my “optimization” might be as trivial as changing that reservation number. But, even so, I’m still driving costs by demanding a ton of CPUs.
Pulling It All Together
So now, we have enough information to answer how much my Kubernetes service costs to operate. First, I take my AWS bill with per-instance costs for the cluster nodes. Then, I collect Prometheus usage metrics from my Kubernetes cluster. At CloudZero, we use Amazon’s Container Insights for this collection process, which gives us minute-by-minute data. However, any collection process will work, so long as we get those usage/reservation metrics and a way of correlating Kubernetes pods back to EC2 instance IDs (and thus to our AWS bill). This is also available directly in Container Insights. Now, we can get a pod’s hourly utilization as the sum of its per-collection-period totals — the max of reserved and utilized like we discussed before, and effectively zero for collection periods in which a pod didn’t run and so is absent from the metrics — divided by the number of collection periods per hour. Break out the instance’s costs into memory and CPU like before, partition those costs based on utilization, and voila! Per-pod costs! So: service costs are just the sum of pod costs belonging to that service. And, identically, we can construct costs for other higher-level Kubernetes abstractions like namespace in exactly the same way, by summing over pods.
Of course, you don’t have to do this manually: all of this logic is built into CloudZero’s platform, and we’ll break down your Kubernetes costs for you, presenting them in a form that’s as easy to understand as any other concrete cost. Here’s a simplified version of what that looks like:
Our customers can see the cost of each individual pod, namespace, and cluster within Kubernetes. However, more importantly for their business, they can see which product feature those costs contribute to -- helping them answer key questions about what it costs to build and run their products.
Learn More About CloudZero
CloudZero is the first real-time cloud cost platform designed specifically for engineering and DevOps teams.