Prometheus has become the de facto standard for monitoring Kubernetes environments, with over 90% of Cloud Native Computing Foundation members adopting it for their observability needs. Detecting and resolving issues before they impact users requires robust monitoring systems that can handle the dynamic nature of containerized applications.
Kubernetes environments present unique monitoring challenges due to their ephemeral workloads and distributed architecture. Prometheus effectively addresses these challenges through its pull-based metrics collection model and powerful query language. Additionally, when paired with visualization tools like Grafana and complementary logging solutions such as Elastic and Kibana, Prometheus creates a comprehensive observability stack. Recent advancements even include AI-powered agents that can automatically analyze metrics and suggest optimizations.
This guide explores how to build a production-grade Prometheus monitoring system for your Kubernetes clusters, covering everything from architecture fundamentals to advanced scaling techniques. You’ll learn practical steps for installation, configuration, alerting, and long-term storage solutions that ensure your monitoring infrastructure grows alongside your applications.
Prometheus Architecture in Kubernetes Clusters
The architecture of Prometheus within Kubernetes environments follows a modular approach that enables flexible monitoring of dynamic containerized workloads. Understanding this architecture is essential for building resilient monitoring systems that can scale alongside your applications.
Components: Prometheus Server, Exporters, Alertmanager
At the core of the Prometheus ecosystem lies the Prometheus server, which handles metric collection, storage, and query processing. This central component scrapes metrics from configured targets and stores them in a time-series database (TSDB) optimized for numeric data. The server also evaluates alerting rules against collected metrics to identify potential issues.
Exporters serve as bridges between systems and Prometheus by translating metrics into the Prometheus format. Notable exporters in Kubernetes environments include:
Node Exporter: Exposes Linux system-level metrics
Kube-state-metrics: Provides insights into Kubernetes object states like deployments, nodes, and pods
cAdvisor: Collects container resource usage metrics (runs as part of Kubelet)
The Alertmanager handles alert processing and notification delivery. It receives alerts from Prometheus servers, then manages deduplication, grouping, routing, silencing, and inhibition of alerts. Furthermore, Alertmanager supports multiple notification channels including Slack, PagerDuty, email, and custom webhooks, allowing teams to receive alerts through their preferred communication tools.
Pull-based Metrics Collection Model
Unlike many monitoring systems that rely on agents pushing data, Prometheus actively pulls metrics from targets over HTTP. This approach offers several advantages in Kubernetes environments:
First, the pull model prevents overwhelming the monitoring system with misconfigured agents pushing excessive data. Consequently, Prometheus maintains better stability during unexpected metric spikes. Additionally, it simplifies determining target health status – if Prometheus cannot scrape a target, it clearly indicates the target is unreachable.
The scraping process follows configured intervals specified in the Prometheus configuration file. During each scrape, Prometheus sends HTTP requests to target endpoints (typically /metrics
), retrieves metrics in its text-based format, and stores them in its database for querying.
For short-lived jobs that complete before being scraped, Prometheus offers the Pushgateway component. This intermediate service allows temporary processes to push metrics, which Prometheus then collects during its regular scraping cycles.
Kubernetes Service Discovery Integration
In dynamic Kubernetes environments where pods are ephemeral, static configuration of monitoring targets becomes impractical. Therefore, Prometheus integrates with the Kubernetes API to automatically discover and monitor resources as they appear, change, and disappear.
Service discovery in Kubernetes operates through multiple role configurations:
endpoints
role: Most commonly used, discovers pods through service endpointsnode
role: Discovers all cluster nodes for infrastructure monitoringpod
role: Discovers all pods directly, regardless of service membershipservice
role: Monitors Kubernetes services themselves
Prometheus leverages Kubernetes labels and annotations for fine-grained control of monitoring behavior. Specifically, annotations like prometheus.io/scrape: "true"
mark resources for automatic discovery, while others such as prometheus.io/path
and prometheus.io/port
customize scraping parameters.
This integration with Kubernetes’ label system enables teams to implement standardized monitoring patterns across clusters without manual configuration for each new workload. As a result, monitoring becomes a seamless part of the deployment process rather than an afterthought.
Installing Prometheus using Helm Charts
Deploying Prometheus in Kubernetes environments becomes significantly easier with Helm, the package manager that streamlines complex installations. Helm charts package all necessary Kubernetes manifests into reusable units, making monitoring infrastructure deployment both consistent and maintainable.
Helm v3 Installation Prerequisites
Before installing Prometheus, ensure your environment meets the following requirements:
Kubernetes cluster version 1.19 or higher
Helm version 3.7+ installed on your workstation or CI server
For those new to Helm, installation is straightforward with the official script:
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
After installing Helm, add the Prometheus community repository that contains all official charts:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
This repository houses various Prometheus-related charts, including the core Prometheus server, exporters, and integrated stacks. You can view available charts by running helm search repo prometheus-community
.
kube-prometheus-stack vs standalone Prometheus
The Prometheus community offers two primary Helm charts for Kubernetes monitoring, each with distinct advantages depending on your requirements:
kube-prometheus-stack provides an integrated solution combining Prometheus, Grafana, Alertmanager, and supporting components preconfigured for Kubernetes monitoring. This chart is ideal for users seeking a complete monitoring solution with minimal setup effort. It includes:
Preconfigured dashboards for Kubernetes resources
Default alerting rules
Node exporter for host metrics
kube-state-metrics for Kubernetes object state monitoring
However, this convenience comes with less flexibility for customization and higher resource consumption compared to more streamlined options.
In contrast, the standalone Prometheus chart (prometheus-community/prometheus) offers greater flexibility for users who need precise control over their monitoring configuration or want to deploy only specific components. While it requires more manual setup, it’s better suited for environments with existing monitoring tools or specialized requirements.
Customizing values.yaml for Cluster-specific Needs
Both Prometheus charts support extensive customization through the values.yaml
file. To begin customizing, extract the default values:
helm show values prometheus-community/prometheus > values.yaml
Common customizations include:
Storage configuration – Modify persistent volume settings to match your cluster’s storage capabilities:
server:
persistentVolume:
size: 50Gi
storageClass: "gp2"
Component enablement – Selectively enable or disable bundled components:
alertmanager:
enabled: false
kubeStateMetrics:
enabled: true
nodeExporter:
enabled: true
Authentication settings – For example, setting Grafana credentials in kube-prometheus-stack:
grafana:
adminPassword: "securePassword123"
Once customized, deploy your configuration using:
helm install prometheus prometheus-community/prometheus -f values.yaml -n monitoring
For kube-prometheus-stack, the installation is similar:
helm install prom-stack prometheus-community/kube-prometheus-stack -n monitoring
After deployment, verify all components are running:
kubectl get all -n monitoring
Access the Prometheus interface through port-forwarding:
kubectl port-forward svc/prometheus-server 9090:9090 -n monitoring
Throughout this process, following best practices for values customization ensures your monitoring infrastructure meets your specific operational requirements while maintaining the benefits of Helm’s declarative approach.
Configuring Scrape Targets and Exporters
Effective monitoring in Kubernetes depends on proper configuration of metrics sources and collection patterns. Prometheus shines in this area through its flexible target discovery mechanisms and specialized exporters.
Node Exporter for Node-level Metrics
Node Exporter provides hardware and OS-level metrics from Kubernetes cluster nodes. Initially deployed as a DaemonSet, it ensures one instance runs on each node, exposing system-level metrics on port 9100 via the /metrics
endpoint.
To deploy Node Exporter using Helm:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install node-exporter prometheus-community/prometheus-node-exporter
Alternatively, create a DaemonSet manifest with appropriate configurations:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: node-exporter
spec:
selector:
matchLabels:
app.kubernetes.io/name: node-exporter
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9100"
labels:
app.kubernetes.io/name: node-exporter
spec:
containers:
- name: node-exporter
image: prom/node-exporter
args:
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --no-collector.wifi
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
ports:
- containerPort: 9100
volumeMounts:
- name: sys
mountPath: /host/sys
readOnly: true
- name: root
mountPath: /host/root
readOnly: true
volumes:
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
After deployment, create a service to expose Node Exporter pods, making them discoverable by Prometheus:
kind: Service
apiVersion: v1
metadata:
name: node-exporter
namespace: monitoring
spec:
selector:
app.kubernetes.io/name: node-exporter
ports:
- port: 9100
targetPort: 9100
Kube-state-metrics for Cluster State Monitoring
Kube-state-metrics complements Node Exporter by focusing on Kubernetes objects rather than node hardware. This component listens to the Kubernetes API server and generates metrics about deployments, nodes, pods, and other resources.
Notably, kube-state-metrics is simply a metrics endpoint—it doesn’t store data itself but provides it for Prometheus to scrape. Many Helm-based Prometheus installations include kube-state-metrics by default. For standalone deployment:
git clone https://github.com/kubernetes/kube-state-metrics.git
kubectl apply -f kube-state-metrics/examples/standard
Subsequently, add a scrape job to your Prometheus configuration:
- job_name: 'kube-state-metrics'
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
Kube-state-metrics produces various metrics including:
kube_pod_container_info
: Container details within podskube_pod_status_ready
: Pod readiness statekube_deployment_status_replicas
: Deployment replica counts
These metrics enable powerful queries and alerts. For instance, this PromQL query identifies pods that aren’t ready:
count(kube_pod_status_ready{condition="false"}) by (namespace, pod)
Service Discovery using relabel_configs
In dynamic Kubernetes environments, static target configuration becomes impractical. Prometheus addresses this through relabeling—a powerful mechanism for modifying and filtering targets before scraping.
The relabel_configs
directive exists in several parts of Prometheus configuration:
Within scrape jobs: Applied before scraping targets
As
metric_relabel_configs
: Applied after scraping but before storageAs
write_relabel_configs
: Applied before sending to remote storage
A basic relabeling configuration contains these components:
source_labels
: Input labels to considertarget_label
: Label to modifyregex
: Pattern to match against source labelsreplacement
: New value to applyaction
: Operation to perform (replace, keep, drop, etc.)
This example uses annotations to control scraping behavior:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::d+)?;(d+)
replacement: $1:$2
target_label: __address__
This configuration only scrapes pods with the prometheus.io/scrape: "true"
annotation, customizes the metrics path based on prometheus.io/path
, and sets the scrape port according to prometheus.io/port
.
Through these mechanisms, Prometheus can automatically discover and monitor the ever-changing landscape of services in your Kubernetes cluster without manual reconfiguration.
Alerting and Notification Setup with Alertmanager
Alertmanager serves as the critical notification component in a Prometheus monitoring system, handling alerts generated when metrics cross predefined thresholds. This component transforms raw metric anomalies into actionable notifications for operations teams.
Defining Alerting Rules in Prometheus
Alerting rules in Prometheus utilize PromQL expressions to detect conditions that require attention. These rules are defined in YAML format and loaded into Prometheus at startup or via runtime reloads. Each rule includes several key components:
groups:
- name: example
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: critical
annotations:
summary: "High request latency"
description: "Service is experiencing high latency"
The for
clause prevents alert flapping by requiring the condition to persist for a specified duration before firing. Meanwhile, the labels
field adds metadata like severity levels, primarily used for routing. Additionally, annotations
provide human-readable context about the alert.
Routing Alerts to Slack, Email, and PagerDuty
After Prometheus identifies alert conditions, Alertmanager handles notification delivery through its routing configuration:
route:
receiver: 'slack-default'
routes:
- match:
team: support
receiver: 'email-notifications'
- match:
team: on-call
severity: critical
receiver: 'pagerduty-prod'
receivers:
- name: 'slack-default'
slack_configs:
- channel: '#monitoring-alerts'
send_resolved: true
- name: 'email-notifications'
email_configs:
- to: 'team@example.com'
- name: 'pagerduty-prod'
pagerduty_configs:
- service_key: '<integration_key>'
This configuration creates a tree-like structure where alerts are matched against criteria and routed to appropriate channels. Essentially, different teams receive notifications through their preferred platforms based on alert severity and other labels.
Grouping and Inhibition Configuration
Grouping prevents notification storms by consolidating similar alerts into single notifications:
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 30m
This configuration groups alerts by name, cluster, and service, waiting 30 seconds before sending the initial notification. Afterward, new alerts in the same group appear after 5 minutes, with repeats limited to 30-minute intervals.
Inhibition suppresses less critical alerts when more severe related alerts are firing:
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['cluster', 'service']
This rule prevents warning notifications when critical alerts exist for the same cluster and service, reducing unnecessary noise and focusing attention on the root issue.
Scaling Prometheus for Production Workloads
As Prometheus deployments grow, standard single-instance setups often hit resource limitations. A Prometheus instance handling millions of metrics can consume over 100GB of RAM [1], necessitating scaling strategies beyond vertical growth.
Horizontal Scaling with Thanos Sidecar
Thanos extends Prometheus capabilities through a sidecar architecture that runs alongside each Prometheus server. This sidecar process scrapes data from Prometheus and stores it in object storage while presenting a store API for queries [2]. The approach maintains Prometheus’ reliability while adding distributed capabilities:
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query
spec:
replicas: 3
template:
spec:
containers:
- name: thanos-query
image: quay.io/thanos/thanos:v0.24.0
args:
- 'query'
- '--store=dnssrv+_grpc._tcp.thanos-store'
- '--store=dnssrv+_grpc._tcp.thanos-sidecar'
This configuration enables a global query view across multiple Prometheus instances, effectively distributing the monitoring load.
Long-term Storage with Cortex or Thanos
Both Cortex and Thanos address Prometheus’ local storage limitations by integrating with object storage systems like Amazon S3 or Google Cloud Storage.
Cortex operates on a push model, serving as a remote write target for Prometheus servers [2]. In contrast, Thanos uploads two-hour batches of data to object storage [3]. Each approach offers distinct advantages:
Thanos provides modularity with components like Store Gateway for accessing historical data
Cortex excels in multi-tenant environments with strong resource isolation between users
These solutions enable virtually limitless retention periods while maintaining compatibility with Prometheus Query Language (PromQL).
Federation for Multi-cluster Monitoring
Federation creates hierarchical monitoring topologies where higher-level Prometheus servers scrape aggregated metrics from lower-level instances [4]. This pattern works particularly well for global visibility across multiple Kubernetes clusters:
scrape_configs:
- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="node"}'
- '{job="kubernetes-pods"}'
static_configs:
- targets:
- 'prometheus-app:9090'
- 'prometheus-infra:9090'
Though functional, federation requires careful planning with recording rules to aggregate data effectively, primarily because shipping all metrics to higher-level servers is impractical [2].
Conclusion
Prometheus stands as the cornerstone of Kubernetes monitoring infrastructure, offering a robust solution for organizations navigating the complexities of containerized environments. Throughout this guide, we’ve explored the fundamental components that make Prometheus exceptionally well-suited for Kubernetes deployments.
The pull-based metrics collection model provides reliability advantages over traditional push systems, especially for dynamic workloads. Additionally, the seamless integration with Kubernetes service discovery eliminates manual configuration overhead as applications scale. These architectural decisions make Prometheus particularly valuable for cloud-native environments where change represents the only constant.
Setting up a production-grade monitoring system requires several key elements working in concert. Helm charts significantly simplify deployment, whether through the comprehensive kube-prometheus-stack or more tailored standalone installations. Consequently, teams can focus on monitoring rather than infrastructure management. Node Exporter and kube-state-metrics complement each other perfectly, providing both infrastructure and application-level insights necessary for comprehensive observability.
Effective alerting transforms monitoring from passive observation into active incident management. Alertmanager handles this critical function through sophisticated grouping, routing, and inhibition mechanisms that deliver the right alerts to the right teams via their preferred channels. This targeted approach prevents alert fatigue while ensuring critical issues receive immediate attention.
Large-scale deployments face unique challenges that single-instance Prometheus setups cannot address. Therefore, solutions like Thanos and Cortex extend Prometheus capabilities with long-term storage and multi-cluster visibility without sacrificing query performance or reliability. Federation offers another scaling approach, creating hierarchical monitoring topologies for organizations with complex infrastructure needs.
Building a monitoring system represents an ongoing journey rather than a destination. Start with core components, establish baseline metrics and alerts, then gradually expand coverage as your understanding of system behavior deepens. Most importantly, remember that monitoring exists to support applications and users – metrics should drive actionable insights that improve reliability and performance.
The investment in proper Kubernetes monitoring pays dividends through reduced downtime, faster incident resolution, and data-driven capacity planning. Prometheus provides these benefits while maintaining the flexibility to grow alongside your infrastructure, making it the foundation of choice for organizations seeking production-grade observability in containerized environments.
References
[1] – https://sysdig.com/blog/challenges-scale-prometheus/
[2] – https://logz.io/blog/prometheus-architecture-at-scale/
[3] – https://www.opsramp.com/guides/prometheus-monitoring/prometheus-thanos/
[4] – https://chronosphere.io/learn/how-to-address-prometheus-scaling-challenges/
Leave a Reply