Discover the power of cloud cost intelligence
Give your team a better cost platform
Give engineering a cloud cost coach
Learn more about CloudZero and who we are
Learn more about CloudZero's pricing
Take a customized tour of CloudZero
Understand your cloud unit economics and measure cost per customer
Discover and monitor your real Kubernetes and container costs
Measure and monitor the unit metrics that matter most to your business
Allocate cost and gain cost visibility even if your tagging isn’t perfect
Identify and measure your software COGS
Decentralize cost decisions to your engineering teams
Automatically identify wasted spend, then proactively build cost-effective infrastructure
Discover the best cloud cost intelligence resources
Browse webinars, ebooks, press releases, and other helpful resourcesBlog
Discover the best cloud cost intelligence contentCase Studies
Learn how we’ve helped happy customers like SeatGeek, Drift, Remitly, and moreEvents
Check out our best upcoming and past eventsFree Cloud Cost Assessment
Gauge the health and maturity level of your cost management and optimization efforts
Discover how SeatGeek decoded its AWS bill and measures cost per customerRead customer story
Learn how Skyscanner decentralized cloud cost to their engineering teamsRead customer story
Learn how Malwarebytes measures cloud cost per productRead customer story
Learn how Remitly built an engineering culture of cost autonomyRead customer story
Discover how Ninjacat uses cloud cost intelligence to inform business decisionsRead customer story
Learn Smartbear optimized engineering use and inform go-to-market strategiesRead customer story
Discover what Kubernetes monitoring is and the tools you can use to improve container visibility, performance, and related costs.
The Kubernetes platform is the standard for orchestrating containerized applications. It’s ideal for large applications running on distributed instances. The problem is that monitoring Kubernetes infrastructure can be notoriously challenging.
In this guide, we'll cover Kubernetes monitoring in more detail, including what Kubernetes metrics to track to improve visibility and control over your K8s containers, apps, microservices, etc.
Also, in addition to covering Kubernetes monitoring best practices, we'll also share how to break down your Kubernetes costs into cost insights you can quickly understand and act on with confidence.
Table Of Contents
Kubernetes monitoring is the process of continuously tracking, measuring, and analyzing the performance, health, and cost characteristics of containerized apps running in a Kubernetes system.
The goal of monitoring in DevOps is often to proactively ensure optimal performance and health of the containers to prevent issues from affecting customer experiences.
DevOps engineers analyze specific metrics that your Kubernetes infrastructure outputs, determining what is working and what isn't. But that’s not all.
Kubernetes monitoring has several other benefits, such as:
So, what Kubernetes monitoring metrics should you track?
Monitoring metrics help engineers analyze how K8s applications and containers perform after deployment. There are several Kubernetes monitoring metrics to keep track of, including those that indicate:
That said, how do you collect and analyze Kubernetes metrics?
Engineering teams do this by evaluating different abstraction levels, like containers, pods, nodes, and Kubernetes clusters. Engineers often collect as many metrics as possible before they can zero in on select Key Performance Indicators (KPIs) that suit their unique K8s monitoring needs.
In general, Kubernetes metrics are collected in one of two ways:
DaemonSets are features in Kubernetes that run copies of required pods on all nodes. This runs a monitoring agent on all desired pods to collect health and performance metrics. Many tools use this approach since DaemonSets are easy to provision.
Engineers install a Metrics Server as a regular pod inside Kubernetes to collect data and events from pods and containers within a cluster. Metrics Server used to be Heapster before the deprecation. Metrics Servers are an excellent choice if you work with large workloads because they can monitor clusters with up to 5,000 nodes.
Now, the following tools simplify monitoring Kubernetes for engineers and team leaders alike.
There are both proprietary and open-source monitoring tools for Kubernetes. Many open-source options are free, but may require quite a bit of configuration to meet your needs. Proprietary solutions are paid. In exchange, proprietary solutions come near-ready-to-use, with regular updates, professional technical support, and some vendor-managed elements.
Now, different teams use different tracking tools:
Credit: The New Stack
Others use multiple tools, suggesting they may not have found a single tool to do it all for them:
Credit: The New Stack
Either way, what are the best tools for monitoring Kubernetes clusters today?
Most Kubernetes monitoring platforms struggle to present costs in a granular, easy-to-understand, actionable format. With CloudZero’s Kubernetes cost intelligence approach, you can view your costs down to the hour as cost per cluster, cost per pod, and cost per namespace.
Here’s another angle:
While most platforms only present total costs and averages, CloudZero goes further by providing cost context around unit costs. That includes viewing your:
This approach makes it so much easier to understand your cost of goods sold. That way you can not only tell precisely where your Kubernetes budget is going but also pinpoint where you could cut costs without negatively impacting performance.
CloudZero AnyCost also empowers you to combine the costs of your containerized and non-containerized costs to ease calculation. You can also track idle costs to optimize them and correlate costs together from AWS, Azure, GCP, Snowflake, New Relic, MongoDB, and Databricks.
With Kubernetes Dashboard, you can monitor, manage, and troubleshoot a Kubernetes environment using a UI add-on that runs in your web browser. It presents essential metrics like CPU and memory utilization across all nodes, along with workload health statistics.
Since it is part of the Kubernetes ecosystem (like Kube-state-metrics and Fluentd/Fluent Bit for logging), some people do not consider it a tool. But it does the job, so we'll include it here, especially since it's an excellent place to start before deploying more advanced Kubernetes monitoring services, tools, or platforms.
Prometheus is one of the most popular open-source monitoring solutions for Kubernetes for several reasons. Chief among those is it combines a powerful querying language (PromQL) with a multi-dimensional data model, unlike alternative time-series databases like InfluxDB, Cassandra, and Graphite.
In addition, it uses a pull method rather than a push one, has a large developer community that helps improve the platform, and provides real-time alerting tools. You can also run the platform on top of your Kubernetes with the Prometheus Operator.
However, it does not come with a built-in visualization tool. For that, you will need to use another tool, such as Grafana.
New Relic is a full-stack observability platform, so it’ll help you track myriad metrics, logs, and traces to gauge the health, performance, and security of your Kubernetes infrastructure. Its Kubernetes integration enables you to analyze services without altering source code, observe the relationship between apps, containers, nodes, and pods, and is eBPF-based. With its 16 tools and more than 470 integrations in one platform, New Relic can be your one-stop monitoring tool for K8s — except for Kubernetes cost monitoring.
Container Advisor is also a native Kubernetes monitoring tool for gathering, analyzing, and reporting resource utilization as well as historical data and performance statistics both at the container and cluster levels.
cAdvisor automatically discovers active containers so you can monitor their metrics, such as CPU, network, and memory usage (at the node level, not per pod). However, as with the Kubernetes Dashboard, it primarily focuses on collecting metrics — not event logs, traces, and events — and does not store long-term data.
Grafana is a robust and open-source solution for querying, visualizing, monitoring, alerting, and analyzing metrics, traces, and logs. You will typically find engineers using it in combination with Prometheus, Grafite, or InfluxDB. That’s because Grafana offers excellent visualization and monitoring dashboards to use with the databases.
Many engineers use it because it offers robust alerting, can query multiple entities at once, supports Elasticsearch, and is compatible with many data sources. Grafana also allows for some log browsing.
Jaeger is a distributed tracing and monitoring tool for complex distributed systems such as a Kubernetes environment. It also acts as a troubleshooting tool, allowing your team to perform distributed transaction monitoring and context propagations, root cause and service dependency analyses, as well as latency and performance optimizations.
It supports several data sources, including Cassandra, memory, Kafka, and Elasticsearch, and has been open-source since 2016.
Like Grafana and cAdvisor, you can use a DaemonSet configuration to deploy it. Or you can use the Jaeger Operator.
The ELK Stack is an open-source monitoring tool for logging Kubernetes. The acronym stands for Elasticsearch, Logstash, and Kibana, which form the basis of a small logging pipeline. But the stack usually includes Beats for data collection and Kafka for buffering when dealing with massive amounts of data.
The stack combines Elasticsearch’s scalability, Logstash’s log aggregation and analysis capabilities, and Kibana’s rich analysis capabilities.
Despite being challenging to maintain at scale, the ELK Stack deploys easily and has a robust developer community to support it.
Monitoring Kubernetes with Sematext provides full-stack visibility for container orchestrators and containers if you want a comprehensive logging and monitoring solution. Yet, it is also compatible with traditional systems.
You can collect all events, logs, and metrics running in a Kubernetes cluster, structure them, and visualize them in custom monitoring dashboards. This all happens in real-time.
With Sematext, you can also detect anomalies in real time and receive alerts regarding pod-level issues. Besides monitoring resource usage, it also captures network throughput. In addition, it is also easy to install as a Sematext Operator, DaemonSet, Helm chart, or Kubernetes Operator.
Weave Scope is a robust improvement over Kubernetes-native Kube-state-metrics because it allows engineers to run diagnostic commands on and manage containers within the interface. You’ll have access to a drill-down view of their app, the infrastructure you deploy it on, and the connections between components through the user interface.
Weave Scope also displays contextual logs, metrics, and metadata for Docker and Kubernetes containers.
If you do not want to monitor infrastructure or run any storage yourself, Datadog can help. Datadog allows your team to aggregate service states, metrics, and events within your Kubernetes environment in real time. With it, you can monitor entire clusters down to a single host.
Datadog lets you see inside apps or stacks, from anywhere, at any scale. It offers comprehensive DevOps services, including network, security, and real-time monitoring. It also provides log management, including filters, search functions, and a logs analysis tool for troubleshooting needs.
The tool offers full API access across all apps and infrastructure to increase visibility. In addition, it can be run on every cluster node by using a DaemonSet.
Dynatrace is also a full-stack monitoring solution for Kubernetes infrastructure. You can use it to monitor the availability and health of applications and processes, dependencies, and connections among hosts, containers, and cloud instances.
Dynatrace enables you to unify and harness insights from over 500 different tools you probably already use, including AWS, Azure, OpenShift, Google Cloud, and Kubernetes, among others. Even better, it uses events, traces, metrics, and behavioral information to reveal the inner workings of Kubernetes applications.
Dynatrace, like DataDog, provides superior APM integration and is best suited for complex, distributed systems. It does require substantial investment, too.
kubewatch is a K8s watcher, tracking specific Kubernetes events. As soon as those events/changes occur, it pushes notifications about them to multiple endpoints, like Slack and PagerDuty. Some Kubernetes resources you set it up to watch, include pods, daemon sets, services, deployments, secrets, replica sets, replication controllers, and configuration maps. You can also simply configure and deploy kubewatch via a Helm or a custom deployment.
Note that VMware’s Bitnami no longer maintains the project on GitHub. However, the project is actively being maintained as a Robusta fork.
Other basic K8s watchers include kube-state-metrics, kubetail, and kube-ops-view.
This eBPF-based, SOC II Compliant K8s monitoring tool tracks metrics, logs, and traces. ContainIQ also provides Kubernetes profiling, real-time latency tracking, and deployment health observability. You can also deploy it on top of Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), and AWS Fargate. You can also deploy it as a SaaS or On-Prem.
You can monitor Kubernetes with Sensu Go either on its own or as a Sensu + Prometheus combo. To collect current state data on your containers, you can use Sensu’s sidecar pattern method, run a daemonset for the Sensu agent (which runs the Sensu agent on K8s), or run a Sensu agent with the Kube host (or VM). The tool is also multi-cloud and supports real-time monitoring and autoscaling.
These best practices will help you derive useful insights across your Kubernetes environment:
Monitoring Kubernetes provides insight into your system's performance, health, security, and cost. Continuously monitoring your Kubernetes containers also helps you detect inconsistencies before they become costly problems.
The solutions we've shared here all provide robust observability features you can use immediately. But CloudZero goes beyond basic K8s cost monitoring.
CloudZero’s Kubernetes cost intelligence empowers you to:
Cody Slingerland, a FinOps certified practitioner, is an avid content creator with over 10 years of experience creating content for SaaS and technology companies. Cody collaborates with internal team members and subject matter experts to create expert-written content on the CloudZero blog.
CloudZero is the only solution that enables you to allocate 100% of your spend in hours — so you can align everyone around cost dimensions that matter to your business.