The Kubernetes platform is the standard for orchestrating containerized applications. It’s ideal for large applications running on distributed instances. The problem is that monitoring Kubernetes infrastructure can be notoriously challenging.
In this guide, we’ll cover Kubernetes monitoring in more detail, including what Kubernetes metrics to track to improve visibility and control over your K8s containers, apps, microservices, etc.
Also, in addition to covering Kubernetes monitoring best practices, we’ll also share how to break down your Kubernetes costs into cost insights you can quickly understand and act on with confidence.
Table Of Contents
- What Is Kubernetes Monitoring?
- Why Monitor Kubernetes?
- Kubernetes Monitoring Metrics: What Metrics Should You Monitor?
- Kubernetes Monitoring Tools: What Are the Best Tools For Monitoring Kubernetes?
- Kubernetes Monitoring Best Practices to Implement Right Away
- How to Collect, Understand, and Optimize Who, Why, and What’s Driving Your Kubernetes Costs
What Is Kubernetes Monitoring?
Kubernetes monitoring is the process of continuously tracking, measuring, and analyzing the performance, health, and cost characteristics of containerized apps running in a Kubernetes system.
The goal of monitoring in DevOps is often to proactively ensure optimal performance and health of the containers to prevent issues from affecting customer experiences.
DevOps engineers analyze specific metrics that your Kubernetes infrastructure outputs, determining what is working and what isn’t. But that’s not all.
Why Monitor Kubernetes?
Kubernetes monitoring has several other benefits, such as:
- Enables you to track changes in highly distributed and dynamic K8s environments continuously.
- Helps you assess the impact of such changes with context, so you can prevent unwanted anomalies or downtime.
- Helps visualize how all containers and other components work together.
- Lets you identify and act on threats to your Kubernetes security and compliance requirements before they become a problem.
- Empowers you to track resource usage in Kubernetes and the associated costs, so you can more accurately allocate them.
- Helps detect the root causes of performance, security, and cost anomalies in a K8s setup, so you can reduce downtime.
- Enables you to keep track of user experiences, so you can optimize them for your target customers.
- Facilitates business continuity by detecting issues before they occur.
- Boosts visibility into your Kubernetes networking, empowering you to assess your switches, routers, servers, virtual machines, and other components’ configuration and performance.
So, what Kubernetes monitoring metrics should you track?
Kubernetes Monitoring Metrics: What Metrics Should You Monitor?
Monitoring metrics help engineers analyze how K8s applications and containers perform after deployment. There are several Kubernetes monitoring metrics to keep track of, including those that indicate:
- Kubernetes cluster health, such as its state, running containers, network input/output, and resource consumption indicators.
- Latency of API requests.
- The running and deployment of Kubernetes pods.
- Resource metrics, such as CPU utilization, memory utilization, and disk utilization at both the node and pod levels.
- Control plane metrics, such as etcd data stores, API servers, controller life cycles, and scheduler status.
- Metrics related to cost, such as Kubernetes cost per environment, product, team, or cluster.
- Container-native and application metrics.
That said, how do you collect and analyze Kubernetes metrics?
Engineering teams do this by evaluating different abstraction levels, like containers, pods, nodes, and Kubernetes clusters. Engineers often collect as many metrics as possible before they can zero in on select Key Performance Indicators (KPIs) that suit their unique K8s monitoring needs.
In general, Kubernetes metrics are collected in one of two ways:
1. Using DaemonSets
DaemonSets are features in Kubernetes that run copies of required pods on all nodes. This runs a monitoring agent on all desired pods to collect health and performance metrics. Many tools use this approach since DaemonSets are easy to provision.
2. Using Metrics Server
Engineers install a Metrics Server as a regular pod inside Kubernetes to collect data and events from pods and containers within a cluster. Metrics Server used to be Heapster before the deprecation. Metrics Servers are an excellent choice if you work with large workloads because they can monitor clusters with up to 5,000 nodes.
Now, the following tools simplify monitoring Kubernetes for engineers and team leaders alike.
Kubernetes Monitoring Tools: What Are The Best Tools For Monitoring Kubernetes?
There are both proprietary and open-source monitoring tools for Kubernetes. Many open-source options are free, but may require quite a bit of configuration to meet your needs. Proprietary solutions are paid. In exchange, proprietary solutions come near-ready-to-use, with regular updates, professional technical support, and some vendor-managed elements.
Now, different teams use different tracking tools:
Credit: The New Stack
Others use multiple tools, suggesting they may not have found a single tool to do it all for them:
Credit: The New Stack
Either way, what are the best tools for monitoring Kubernetes clusters today?
1. CloudZero – Granular Kubernetes cost intelligence
Most Kubernetes monitoring platforms struggle to present costs in a granular, easy-to-understand, actionable format. With CloudZero’s Kubernetes cost intelligence approach, you can view your costs down to the hour as cost per cluster, cost per pod, and cost per namespace.
Here’s another angle:
While most platforms only present total costs and averages, CloudZero goes further by providing cost context around unit costs. That includes viewing your:
- Cost per customer
- Cost per feature
- Cost per project
- Cost per team
- Cost per environment
- Cost per product
This approach makes it so much easier to understand your cost of goods sold. That way you can not only tell precisely where your Kubernetes budget is going but also pinpoint where you could cut costs without negatively impacting performance.
CloudZero AnyCost also empowers you to combine the costs of your containerized and non-containerized costs to ease calculation. You can also track idle costs to optimize them and correlate costs together from AWS, Azure, GCP, Snowflake, New Relic, MongoDB, and Databricks.
2. Kubernetes Dashboard – Native Kubernetes monitoring tool
With Kubernetes Dashboard, you can monitor, manage, and troubleshoot a Kubernetes environment using a UI add-on that runs in your web browser. It presents essential metrics like CPU and memory utilization across all nodes, along with workload health statistics.
Since it is part of the Kubernetes ecosystem (like Kube-state-metrics and Fluentd/Fluent Bit for logging), some people do not consider it a tool. But it does the job, so we’ll include it here, especially since it’s an excellent place to start before deploying more advanced Kubernetes monitoring services, tools, or platforms.
3. Prometheus – Open-source K8s monitoring tool
Prometheus is one of the most popular open-source monitoring solutions for Kubernetes for several reasons. Chief among those is it combines a powerful querying language (PromQL) with a multi-dimensional data model, unlike alternative time-series databases like InfluxDB, Cassandra, and Graphite.
In addition, it uses a pull method rather than a push one, has a large developer community that helps improve the platform, and provides real-time alerting tools. You can also run the platform on top of your Kubernetes with the Prometheus Operator.
However, it does not come with a built-in visualization tool. For that, you will need to use another tool, such as Grafana.
4. New Relic – Instant, no-code K8s monitoring
New Relic is a full-stack observability platform, so it’ll help you track myriad metrics, logs, and traces to gauge the health, performance, and security of your Kubernetes infrastructure. Its Kubernetes integration enables you to analyze services without altering source code, observe the relationship between apps, containers, nodes, and pods, and is eBPF-based. With its 16 tools and more than 470 integrations in one platform, New Relic can be your one-stop monitoring tool for K8s — except for Kubernetes cost monitoring.
5. cAdvisor – Container resource usage and performance analysis
Container Advisor is also a native Kubernetes monitoring tool for gathering, analyzing, and reporting resource utilization as well as historical data and performance statistics both at the container and cluster levels.
cAdvisor automatically discovers active containers so you can monitor their metrics, such as CPU, network, and memory usage (at the node level, not per pod). However, as with the Kubernetes Dashboard, it primarily focuses on collecting metrics — not event logs, traces, and events — and does not store long-term data.
6. Grafana – Open, Kubernetes visualization platform
Grafana is a robust and open-source solution for querying, visualizing, monitoring, alerting, and analyzing metrics, traces, and logs. You will typically find engineers using it in combination with Prometheus, Grafite, or InfluxDB. That’s because Grafana offers excellent visualization and monitoring dashboards to use with the databases.
Many engineers use it because it offers robust alerting, can query multiple entities at once, supports Elasticsearch, and is compatible with many data sources. Grafana also allows for some log browsing.
7. Jaeger – Distributed tracing and monitoring tool
Jaeger is a distributed tracing and monitoring tool for complex distributed systems such as a Kubernetes environment. It also acts as a troubleshooting tool, allowing your team to perform distributed transaction monitoring and context propagations, root cause and service dependency analyses, as well as latency and performance optimizations.
It supports several data sources, including Cassandra, memory, Kafka, and Elasticsearch, and has been open-source since 2016.
Like Grafana and cAdvisor, you can use a DaemonSet configuration to deploy it. Or you can use the Jaeger Operator.
8. The ELK Stack – Open-source Kubernetes observability platform
The ELK Stack is an open-source monitoring tool for logging Kubernetes. The acronym stands for Elasticsearch, Logstash, and Kibana, which form the basis of a small logging pipeline. But the stack usually includes Beats for data collection and Kafka for buffering when dealing with massive amounts of data.
The stack combines Elasticsearch’s scalability, Logstash’s log aggregation and analysis capabilities, and Kibana’s rich analysis capabilities.
Despite being challenging to maintain at scale, the ELK Stack deploys easily and has a robust developer community to support it.
9. Sematext – Full-Stack Kubernetes monitoring
Monitoring Kubernetes with Sematext provides full-stack visibility for container orchestrators and containers if you want a comprehensive logging and monitoring solution. Yet, it is also compatible with traditional systems.
You can collect all events, logs, and metrics running in a Kubernetes cluster, structure them, and visualize them in custom monitoring dashboards. This all happens in real-time.
With Sematext, you can also detect anomalies in real time and receive alerts regarding pod-level issues. Besides monitoring resource usage, it also captures network throughput. In addition, it is also easy to install as a Sematext Operator, DaemonSet, Helm chart, or Kubernetes Operator.
10. Weave Scope – Automatically detect containers, hosts, and processes
Weave Scope is a robust improvement over Kubernetes-native Kube-state-metrics because it allows engineers to run diagnostic commands on and manage containers within the interface. You’ll have access to a drill-down view of their app, the infrastructure you deploy it on, and the connections between components through the user interface.
Weave Scope also displays contextual logs, metrics, and metadata for Docker and Kubernetes containers.
11. Datadog – Full-stack Kubernetes observability
If you do not want to monitor infrastructure or run any storage yourself, Datadog can help. Datadog allows your team to aggregate service states, metrics, and events within your Kubernetes environment in real time. With it, you can monitor entire clusters down to a single host.
Datadog lets you see inside apps or stacks, from anywhere, at any scale. It offers comprehensive DevOps services, including network, security, and real-time monitoring. It also provides log management, including filters, search functions, and a logs analysis tool for troubleshooting needs.
The tool offers full API access across all apps and infrastructure to increase visibility. In addition, it can be run on every cluster node by using a DaemonSet.
12. Dynatrace – K8s observability, AIOps, and security monitoring
Dynatrace is also a full-stack monitoring solution for Kubernetes infrastructure. You can use it to monitor the availability and health of applications and processes, dependencies, and connections among hosts, containers, and cloud instances.
Dynatrace enables you to unify and harness insights from over 500 different tools you probably already use, including AWS, Azure, OpenShift, Google Cloud, and Kubernetes, among others. Even better, it uses events, traces, metrics, and behavioral information to reveal the inner workings of Kubernetes applications.
Dynatrace, like DataDog, provides superior APM integration and is best suited for complex, distributed systems. It does require substantial investment, too.
13. Kubewatch – Open-source events and trigger handlers watcher
kubewatch is a K8s watcher, tracking specific Kubernetes events. As soon as those events/changes occur, it pushes notifications about them to multiple endpoints, like Slack and PagerDuty. Some Kubernetes resources you set it up to watch, include pods, daemon sets, services, deployments, secrets, replica sets, replication controllers, and configuration maps. You can also simply configure and deploy kubewatch via a Helm or a custom deployment.
Note that VMware’s Bitnami no longer maintains the project on GitHub. However, the project is actively being maintained as a Robusta fork.
Other basic K8s watchers include kube-state-metrics, kubetail, and kube-ops-view.
14. ContainIQ – Kubernetes profiling and monitoring
This eBPF-based, SOC II Compliant K8s monitoring tool tracks metrics, logs, and traces. ContainIQ also provides Kubernetes profiling, real-time latency tracking, and deployment health observability. You can also deploy it on top of Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), and AWS Fargate. You can also deploy it as a SaaS or On-Prem.
15. Sumo Logic’s Sensu Go
You can monitor Kubernetes with Sensu Go either on its own or as a Sensu + Prometheus combo. To collect current state data on your containers, you can use Sensu’s sidecar pattern method, run a daemonset for the Sensu agent (which runs the Sensu agent on K8s), or run a Sensu agent with the Kube host (or VM). The tool is also multi-cloud and supports real-time monitoring and autoscaling.
Kubernetes Monitoring Best Practices To Implement Right Away
These best practices will help you derive useful insights across your Kubernetes environment:
- While it is important to pick the Kubernetes monitoring telemetry you want to observe, be sure to modify your choices as you as your needs change.
- Ensure that all cluster components are visible by collecting kernel-level data about running processes and their relationships.
- Use a SaaS-based K8s monitoring solution instead of an On-Prem solution to ease management, infrastructure development and costs, and get regular updates.
- Use libraries and sidecar agents to instrument your system, ensuring you collect all data.
- Enhance root cause analysis by tracking and analyzing historical performance data and event logs.
- Regularly monitor Kubernetes control plane elements, such as the API server, kube-dns, kubelet, kube-proxy, etcd, and controller manager, to ensure cluster services are running smoothly.
- Automate correlations between data using a robust K8s observability platform, linking metrics to events, traces to logs, and events to logs.
- Leverage anomaly detection and alerting to catch issues before they become problems. If you can find a tool with intelligent alerting, which sends you alerts only for meaningful events or changes to minimize alert fatigue, even better.
- When using an open-source tool, use a managed service, such as an AWS or GCP Prometheus offering, to avoid further complexity.
- Use tools that permit user permissions and role-based access controls, such as at the namespace or/and cluster levels.
- Understand who, why, and what’s driving your Kubernetes costs. How? Well, rather than collecting totals and averages, analyze your unit Kubernetes costs, such as by business unit, project, and environment.
What Next: Understand Who, Why, And What’s Driving Your Kubernetes Costs
Monitoring Kubernetes provides insight into your system’s performance, health, security, and cost. Continuously monitoring your Kubernetes containers also helps you detect inconsistencies before they become costly problems.
The solutions we’ve shared here all provide robust observability features you can use immediately. But CloudZero goes beyond basic K8s cost monitoring.
CloudZero’s Kubernetes cost intelligence empowers you to:
- View Kubernetes unit costs — See how much you spend on a specific feature, product, customer, team, business unit, and more.
- Narrow down to the hourly cost of a cluster, pod, or namespace.
- Get real-time Kubernetes cost detection and alerting.
- Keep on top of your Kubernetes spend by receiving only the most relevant cost alerts via Slack, email, PagerDuty, or another incident response tool.
- Analyze the cost of containerized and non-containerized workloads in a single platform.
- Compare Kubernetes costs in the context of services like AWS, GCP, Azure, Snowflake, Databricks, and New Relic.