If cost optimization is your only reason for adopting Kubernetes and containers, you might be in for a rude surprise — many companies find that costs increase after moving to Kubernetes.
Even companies who adopt Kubernetes for other reasons, like time-to-market advantages, should follow basic cost control best practices to stay within the budget.
Optimizing cloud costs related to running Kubernetes doesn’t have to involve trade-offs for performance or availability. As with most types of cloud cost control, the key to following Kubernetes cost optimization best practices is to get visibility into how you are using cloud resources and reduce waste.
In most cases, organizations can reduce costs substantially before they have to think about making trade-offs.
As with most Kubernetes-related best practices, only some of these are directly related to technology choices and architectural decisions.
Cultural and organizational decisions, like how you talk about costs, how you integrate cost management into your workflow and how you tackle cost issues can be just as important for continued success.
Non-Architectural Best Practices For Kubernetes Cost Optimization
1. Get deep visibility
A look at how CloudZero enables teams to break down Kubernetes spend by cluster, namespace, label, and pod — allowing engineers to filter, zoom in, and explore the cost of any aspect of their Kubernetes infrastructure.
Understanding the total cost of running an application per day isn’t enough to start making changes that can bring costs down. Organizations should have access to the following information about their Kubernetes deployment:
- Memory, CPU, and disk usage
- What jobs are running at any given moment, and where they are running
- How traffic is moving throughout the system
- The costs of everything other than compute, including things like storage, data transfer and networking
- A map how things run on the cluster
- How much the application costs to run right now as well as trends in cost change
- Your complete cost picture on an hourly basis
With this information, organizations can make informed decisions about adjusting resource provisioning and/or changing the application architecture to reduce costs without impacting performance or availability.
CloudZero can help you drill down into each of these details and paint a complete picture of your cloud spending.
2. Measure before and after costs
Organizations should start considering costs one of the operational metrics to track as part of the engineering process.
Just as it’s normal to measure performance and uptime before and after major and minor changes, measuring cost changes should be a part of the operational practice.
Similarly, just as organizations have service level objectives related to performance and availability, they should have internal guidelines related to how much it’s acceptable for an application to cost to run.
They should be able to measure, understand and then accept or decline those costs after a change is made to the application.
Each application has a different role and different priorities. The key is to become aware of how costs fit into the decisions made related to that application.
There’s no ‘good’ or ‘bad’ cost, necessarily, as long as the organization is allocating its resources in a way that matches priorities. Some applications might be very costly but also mission-critical and/or very profitable — simply knowing the raw dollar amount an application costs to run doesn’t provide enough information about whether or not it’s ‘worth it.’
3. Buy better
Carefully selecting discount plans can cut a significant percentage of costs overnight.
With time and experience, you will have a fairly solid understanding of which resources are most critical to your operations. Ideally, you should also keep an eye on which resources are stable as opposed to fluctuating.
Additionally, if you have successfully achieved granular visibility into your Kubernetes costs, you know which of these resources cost the most money and how much you typically spend on them.
That knowledge opens up the opportunity to make savvy purchasing decisions rather than sticking with your cloud provider’s base rate. In particular, you can start looking for reserved instances, savings plans, and spot instances. Which is best for your company depends on the nature of your business and your needs.
Reserved Instances and Savings Plans
These can both be great options if your usage remains relatively stable and you know with some degree of certainty how much you typically spend for that usage.
Reserved instances allow you to book your server space ahead of time in exchange for a discount.
Savings plans provide a discount based on time commitments; if you commit to three years of use, for example, you can save some significant money compared to purchasing standard, on-demand instances.
The difficulty comes in finding and managing the opportunities that best fit your company.
We highly recommend ProsperOps as a solution.
Using automatic, intelligent optimization algorithms — sometimes called Optimization 2.0, to distinguish from the old days of manual optimization — ProsperOps will get the best results possible for your situation.
Truthfully, not every business can harness the benefits of spot without making too steep a trade-off. Rather than purchasing a stable amount or duration of server space, you can bid for open “spots” of unused resources.
Just as you want to avoid wasting money on idle resources in your business, your cloud provider feels the same way. To avoid a total loss, they will often auction off these idle spaces to the highest bidder.
If you’re savvy, you might save as much as 90% on costs compared to on-demand instances. However, as soon as those resources are needed elsewhere — or if a higher bidder comes along — your work will be shut down and kicked off to make room for the higher priority customer.
This makes spot instances a fantastic opportunity for businesses who don’t mind unexpected interruptions in some processes, and potentially a terrible choice for businesses that require constant, steady service.
The one exception to this rule is if you use a service called Xosphere to intelligently manage your spot instances and avoid disruptions.
We recommend Xosphere to any business that wants the best of both worlds: the savings opportunities of spot and great reliability.
Architectural Best Practices
4. Reduce nodes
Architecturally speaking, the most efficient way to lower your Kubernetes costs is to reduce the number of nodes you have running.
At the end of the day, there’s no way around that. You can take many of the other steps listed here and achieve some improvements, but the real cost savings comes from using fewer resources.
The goal is to use only the number of nodes needed without going too far in either direction, hindering performance or having extra idle resources.
To accomplish this, you can use three tools:
- A horizontal autoscaler allows you to control the number of pods to fit your current needs.
- A vertical autoscaler can moderate the requests and limits of those pods to make sure they are not too busy or too idle.
- A cluster autoscaler functions similarly, controlling the number and size of nodes instead of pods.
One open-source cluster autoscaler that you can use is Karpender, but there are several other options on the market if you have specific needs.
5. Reduce Traffic
Reducing or eliminating traffic between availability zones and regions means you’ll avoid unnecessary data transfer charges.
Of special importance are regional charges. Sometimes, Kubernetes nodes encompass two or more geographical regions, and points within that node must communicate from a fairly long distance away. This can rack up data transfer charges quickly.
When possible, it can be better to separate nodes or clusters into their own regions, so that information transfers stay within one region instead of multiple.
If you have a pod that needs to use S3, for example, you might want to set up an S3 endpoint within the pod so that it doesn’t have to access the internet to find the resources it needs. Similarly, setting up AWS service endpoints can help you more effectively route your traffic and avoid sending your pods onto the internet.
While regional costs are the priority, availability zone costs also shouldn’t be overlooked. You can use just one namespace per availability zone, so you wind up with an assortment of single zone namespace deployments.
That way, communication between pods should remain within each individual availability zone and therefore would not accrue data transfer fees.
CloudZero can help you break apart your node costs and see what network data fees are costing your business. You’ll be able to visualize how each change to your network affects data costs for each region or availability zone, so you can optimize on a deep level.
A look at how CloudZero helps you identify your most expensive workloads and find opportunities for optimization. For example, you can explore the underlying nodes that power your cluster to determine if they are wrongly sized for your workloads.
And, if you need guidance, our FinOps specialists are always available to help you identify areas ripe for optimization and suggest solutions to fix the problems.
6. Reduce local storage
Every node has to have some type of storage, and every data point stored on a cloud server costs money. If you can maintain smaller local drives and upload your data to a database instead, you could shed some unnecessary fees.
Reducing stored data shouldn’t be the highest priority on your list of optimizations, but if you’ve made the sweeping changes and now you’re working toward fine-tuning, this can be a great way to achieve more control over your costs.
7. Review logging and monitoring practices
Similarly, every single character you write costs a little bit of money. Many developers unknowingly rack up costs by turning on a debug statement — and leaving it on — when an application is already live and in use.
These printed logs can wind up far more voluminous than the developers intend. In fact, CloudZero has seen cases where small pods use only one tenth of a node — and therefore shouldn’t cost very much — but the log is so massive it costs three times the amount of the pod’s actual operations.
Sometimes, of course, developers do need to use debugging statements and logs to review their code. Another option is to use third-party monitoring programs such as Dynatrace, New Relic, or Datadog to keep an eye on applications.
The financial tradeoffs of using a third party service versus a standard log aren’t always clear, however; this is where cost visibility from CloudZero can make all the difference when deciding which is more worthwhile.
With CloudZero, users can see the fees from third parties and other resources aggregated into one place alongside the rest of the application costs. This gives engineers deep visibility into what it truly costs to run the application and how their changes affect the total.
8. Lock Yourself In
This might seem counterintuitive, but it’s often best to avoid building the application to be portable between clouds. Not everyone sees this as a “general” architectural best practice.
However, many experts see multi-cloud particularly challenging to optimize costs. Not only can it lead to high network costs, but it prevents you from using the best-of-breed services your cloud provider has to offer.
Ideally, organizations will follow cost best practices from the beginning. In real life, understanding these best practices and combining them with deep visibility into the cost ramifications of the different parts of the application allow teams to continually improve the cost effectiveness of applications, often without any other trade-offs.
Monitor Your Kubernetes Spend In The Context Of Your Business
Achieving in-depth cost intelligence isn’t as hard as you might think.
For more information, check out this resource explaining how the CloudZero platform works with Kubernetes and read our container cost tracking documentation here.