Monitoring Kubernetes clusters is essential for understanding system performance, debugging issues, and ensuring resource efficiency. Prometheus and Grafana are two powerful tools that together provide robust metrics collection, visualization, and alerting capabilities. This tutorial guides you through setting up Kubernetes monitoring using Prometheus with the kube-prometheus-stack, collecting node and pod metrics, creating custom Grafana dashboards, and configuring alerting rules.
Table of Contents
- What is Prometheus & Grafana?
- Setting up Prometheus with kube-prometheus-stack
- Monitoring Node and Pod Metrics
- Creating Custom Dashboards in Grafana
- Configuring Alerting Rules
- Final Thoughts
What is Prometheus & Grafana?
Prometheus and Grafana are widely used tools for monitoring and observability in Kubernetes environments.
Prometheus
A time-series database designed for metrics collection and query. It gathers metrics from Kubernetes nodes, pods, and applications via an HTTP pull mechanism.
Grafana
A visualization tool that integrates with Prometheus to display metrics on customizable dashboards. Grafana makes it easy to create and share insightful graphs and charts.
Key Features:
- Prometheus: Time-series storage, multi-dimensional data model, powerful query language (PromQL).
- Grafana: Interactive dashboards, integrations with Prometheus and other data sources, and advanced alerting.
Together, these tools enable real-time observability for your Kubernetes environment.
Setting up Prometheus with kube-prometheus-stack
The kube-prometheus-stack Helm chart bundles Prometheus, Grafana, and related tools for quick and easy setup. Here’s how you can install it:
Step 1 – Install Helm
Ensure Helm is installed on your local system:
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
Step 2 – Add Prometheus Community Repo
Add the Helm repository for kube-prometheus-stack:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Step 3 – Install kube-prometheus-stack
Deploy the stack into a dedicated namespace:
kubectl create namespace monitoring
helm install kube-stack prometheus-community/kube-prometheus-stack -n monitoring
Step 4 – Verify Installation
Ensure all Pods are running in the monitoring
namespace:
kubectl get pods -n monitoring
Expected Output:
NAME READY STATUS RESTARTS AGE
kube-stack-grafana-xxxx 1/1 Running 0 2m
kube-stack-kube-state-metrics-xxxx 1/1 Running 0 2m
kube-stack-prometheus-xxxx 2/2 Running 0 2m
Prometheus and Grafana are now deployed. You can access Grafana by setting up port forwarding:
kubectl port-forward svc/kube-stack-grafana 3000:80 -n monitoring
Access Grafana at http://localhost:3000
and log in with the default credentials (username admin
, password prom-operator
).
Monitoring Node and Pod Metrics
Node and pod metrics are essential for tracking resource usage, cluster health, and workload performance.
Node Metrics
Metrics like CPU and memory usage per node are collected by the node-exporter component, part of kube-prometheus-stack.
Example PromQL to Query Node Usage:
- CPU Usage per Node:
node_cpu_seconds_total{mode!="idle"}
- Memory Usage per Node:
node_memory_Active_bytes / node_memory_MemTotal_bytes
Pod Metrics
Pod-specific metrics such as CPU usage and memory consumption are gathered by kube-state-metrics and Prometheus.
Example PromQL to Query Pod Metrics:
- CPU Usage per Pod:
sum(rate(container_cpu_usage_seconds_total{namespace="default"}[2m])) by (pod)
- Memory Usage per Pod:
sum(container_memory_working_set_bytes{namespace="default"}) by (pod)
Visual Reference:
Below is a diagram showing data flow from metrics sources to Prometheus and Grafana:

Creating Custom Dashboards in Grafana
Grafana supports creating custom dashboards to visualize metrics tailored to your use case.
Step 1 – Add Prometheus as a Data Source
- Log in to Grafana (
http://localhost:3000
). - Go to Configuration > Data Sources > Add Data Source.
- Select
Prometheus
and configure the URL ashttp://kube-stack-prometheus.monitoring.svc.cluster.local
.
Step 2 – Create a New Dashboard
- Navigate to Dashboards > New Dashboard.
- Add panels for CPU and memory metrics.
Example Panel Query (CPU Usage by Node):
sum(node_cpu_seconds_total{mode!="idle"}) by (instance)
Step 3 – Save and Share
Save the dashboard for later use or export it as JSON to share with your team.
Example Dashboard Layout:
- Panel 1: Node CPU Usage
- Panel 2: Pod Memory Usage
- Panel 3: Active Kubernetes Pods
Custom dashboards make it easier to track specific metrics at a glance.
Configuring Alerting Rules
Alerting ensures you’re notified when resources exceed thresholds or unusual behavior is detected.
Step 1 – Define Alerts in Prometheus
Add alerting rules to the Prometheus ConfigMap:
kubectl edit configmap kube-stack-prometheus-rulefiles-0 -n monitoring
Example YAML for Alerts:
groups:
- name: pod-alerts
rules:
- alert: HighPodCPU
expr: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) > 1
for: 1m
labels:
severity: warning
annotations:
summary: "High CPU usage detected for pod {{ $labels.pod }}"
Step 2 – Add Notification Channel in Grafana
- Go to Alerting > Notification Channels in Grafana.
- Add a channel (e.g., email, Slack, or PagerDuty).
Step 3 – Link Alerts to Notifications
Enable notifications for alerts defined in Grafana panels. This ensures you’re notified when metrics breach configured thresholds.
Testing Alerts
Simulate high CPU usage:
kubectl run cpu-hog --image=busybox -- sh -c "while true; do :; done"
Check that Prometheus fires the HighPodCPU
alert based on the simulation.
Final Thoughts
Prometheus and Grafana form a powerful duo for Kubernetes monitoring, providing deep insights into node health, pod performance, and application metrics. With features like custom dashboards, real-time visualizations, and configurable alerts, your cluster can operate efficiently while giving you peace of mind.
Follow this guide to set up these tools in your environment, and ensure you regularly tune your dashboards and alerts to match your evolving needs. Happy monitoring!
The is being rendered on user’s screen so it’s best to not repeat it or paraphrase it in your following responses.