While the cloud offers significant benefits, its inherent elasticity and scalability tend to give rise to uncontrolled cloud costs. Cloud costs can be opaque and difficult to analyze; without some system of identifying the source of costs and managing them, they can quickly undermine your profit margin.
A number of tools have emerged over the past decade to help engineering teams manage cloud costs. In this article, we’ll look at some of the AWS cost optimization tools offered by AWS and others. We’ll also introduce the concept of cloud cost intelligence, which goes beyond cloud cost management to help you connect costs to business metrics, while empowering your engineering team with cost autonomy.
There are several categories of AWS cost optimization tools. These include the free tools offered by AWS (AWS native tools) as well as AWS cloud cost management tools, continuous cloud optimization tools, and Kubernetes cost management solutions available from Amazon and a variety of other vendors.
We’ve broken them down by category below, and listed reasons why you might or might not want to use these tools, according to your situation and needs.
Follow along or use the links below to “jump” to the section you’d like to read:
Cloud cost intelligence is a new category of cost optimization solutions that focuses on giving engineering autonomous control of their cloud costs. Instead of a centralized cost center, engineers have access to the cost of the features they’re working on, with relevant context, so they can treat cost like any other performance metric — from the way they make decisions during development to how they detect and debug an issue in production.
By integrating cloud cost intelligence into your DevOps processes, controlling costs becomes more proactive and allows SaaS companies to build more profitable software.
CloudZero is an example of a cloud cost intelligence solution. Designed for engineering teams, it offers dev teams specific views, easily explorable context, and automatic cost anomaly alerts — without extensive manual tagging. To give companies a complete understanding of their COGS, CloudZero offers unit cost analysis (including cost per customer), Kubernetes cost intelligence, Snowflake cost intelligence, and more.
There are several AWS native tools for cloud cost management; these are free but basic, and may not be enough if you need granular-level detail on costs. If your organization is small and you have a relatively straightforward cloud bill, they are a good starting point.
However, once you reach significant spend across many AWS accounts, you may need to upgrade to something more robust. We usually see this happen when companies reach the realm of $50,000–$100,000 in monthly cloud costs. While AWS is continually improving its suite of tools, AWS Cost Explorer has some limitations, which can make it challenging to manage cost at this scale.
When your company reaches this point, you may want to provide cost views to your developers — which can be difficult with a centralized console. Additionally, AWS Cost Explorer relies on tagging and requires a knowledgeable user to manually pull together reports, a time-consuming and cumbersome task in most cases.
If you need a tool that helps you gain visibility across your entire infrastructure, while providing relevant views to engineering teams, it’s probably best to upgrade to a paid solution.
AWS Cost Explorer is a free tool that has an easy-to-use interface that lets you visualize, understand, and manage your AWS costs and usage over time.
AWS Cost Anomaly Detection leverages advanced machine learning technologies to identify anomalous spend and root causes, so you can quickly take action.
This category represents a new generation of cost optimization solutions to help with reserved instances and savings plans. These solutions apply more advanced techniques than traditional cost optimization or AWS cloud-native tools.
ProsperOps states that with their application, “Algorithms, advanced techniques, and continuous execution automatically blend Savings Plans and Reserved Instances to deliver superior financial outcomes.”
If you’re looking for a fully automated point solution for managing RIs and Savings Plans, these types of DevOps tools can be very useful.
However, they are limited to this particular sub-segment of cost optimization, so they will likely need to be used in conjunction with other tools for a more comprehensive approach.
ProsperOps relies on algorithms, advanced techniques, and continuous execution to automatically blend Savings Plans and Reserved Instances to deliver superior financial outcomes.
Traditional cloud cost management tools and optimization tools were built to help companies reduce wasteful cloud spend and make optimized purchasing decisions. Most of them were released about a decade ago, at a time when AWS offered little to help engineering and finance teams understand their bills.
Each company defines cloud cost optimization differently. CloudCheckr says it is “the process of reducing your overall cloud spend by identifying mismanaged resources, eliminating waste, reserving capacity for higher discounts, and Right Sizing computing services to scale.”
VMware defines cloud cost management as “the organizational planning that allows an enterprise to understand and manage the costs and needs associated with its cloud technology. In particular, this means finding cost-effective ways to maximize cloud usage and efficiency.”
These tools are widely used and can be effective for reporting on cost, helping with Reserved Instance and Savings Plan purchasing, and waste reduction. Many of these tools are geared toward finance or FinOps, a centralized team or user that interfaces between engineering and finance.
A limitation of these tools is that they rely heavily on tagging, so they can be challenging to maintain and often require frequent oversight to ensure they are accurate.
Some of these tools have also been acquired in recent years, and have evolved based on the strategy and clientele of their parent company. For example, CloudHealth was acquired by VMware and has since shifted toward a multi-cloud and hybrid-cloud focus.
When choosing a tool, it’s worth considering the future direction of the company to make sure it’s moving in the direction you want to go.
CloudHealth provides cloud computing services related to cost management, governance, automation, security, and performance.
Apptio Cloudability optimizes cloud resources and translates bills and tags into insights to provide real-time clarity and accountability for consumption.
CloudCheckr provides visibility and insight to lower costs, maintain security and compliance, and optimize resources.
Densify offers enterprise cloud and container cost optimization and control.
This category of tools automates and continuously optimizes cloud compute infrastructure. Most use some kind of AI or machine learning to automatically make changes to infrastructure and applications to reduce overall cost and improve efficiency.
Spot.io “provide[s] insights, recommendations, and automation to continuously right-size your infrastructure, maximize utilization and leverage the most cost-efficient compute resources available.” Their “machine learning and automation scale to exactly meet application needs using the most efficient mix of instances and pricing models, eliminating overprovisioning and waste.”
Similarly, Opsani “autonomously manages your runtime environment so that your software runs at its best and its leanest form.” Opsani describes its product as “the only solution on the market that has the ability to autonomously tune applications at scale, either for a single application or across the entire service delivery platform...simultaneously and continuously.”
Granulate “helps organizations optimize infrastructure performance and costs through AI-driven dynamic and continuous OS-level adaptations”
These solutions are a great first step to reduce overall costs, and can create immediate savings with minimal work. Many of them are also priced based on savings, so they don’t cost anything upfront.
Each of these solutions has complicated use processes and requires more advanced allocation of compute resources. They provide a more customizable, manageable system for optimizing the type of computing resources you use.
You probably won’t use these solutions if your workloads are not easily managed by an outside solution — or if you have already developed more advanced applications using optimization tools in AWS.
Your workloads might also not be compatible with these types of solutions, in particular, if they are stateful and can't easily be turned on and off.
Cloud cost optimization is a stepping stone to a more cloud-native approach and can be helpful if you’re trying to improve resource utilization, but you may not be able to use these tools for other reasons.
Spot.io automates cloud infrastructure to give your workloads infrastructure that’s always available, always scalable, and always at the lowest possible cost.
Opsani maximizes cloud workload performance and efficiency using the latest in AI and Machine Learning to continuously reconfigure and tune with every code release, load profile change, and infrastructure upgrade.
Granulate provides unprecedented workload performance with the lowest possible costs at scale.
You might use one of these tools if the majority of your work is handled in Kubernetes. These solutions focus only on activities in clusters; if that is the majority or all of your system, they can be useful. Some of these tools provide recommendations, while others are just reporting tools.
However, most developers are using managed databases and a combination of services to deliver their software to the world. Because these tools don’t cover all cloud costs, they may not be as helpful as a more comprehensive solution.
Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service.
Amazon Elastic Kubernetes Service (Amazon EKS) gives you the flexibility to start, run, and scale Kubernetes applications in the AWS cloud or on-premises.
Kubecost gives you visibility into your Kubernetes resources to reduce spend and prevent resource-based outages.
Organizations rely on GitLab’s source code management, CI/CD, security, and more to deliver software rapidly.
Replex.io offers Kubernetes governance and cost management for the cloud-native enterprise.
Spot Ocean is a serverless infrastructure engine for containers.
There are a lot of options for cost optimization — and deciding which direction to go can be overwhelming. While cost optimization has traditionally focused on waste reduction and purchasing plans (like reserved instances) — companies today are increasingly focusing on engineering enablement and architectural optimization.
For inspiration, read how companies like Netflix, Lyft, Segment, and more have designed their own cost optimization tools.