Prometheus in Kubernetes: Building Production-Grade Monitoring Systems

Prometheus has become the de facto standard for monitoring Kubernetes environments, with over 90% of Cloud Native Computing Foundation members adopting it for their observability needs. Detecting and resolving issues before they impact users requires robust monitoring systems that can handle the dynamic nature of containerized applications.

Kubernetes environments present unique monitoring challenges due to their ephemeral workloads and distributed architecture. Prometheus effectively addresses these challenges through its pull-based metrics collection model and powerful query language. Additionally, when paired with visualization tools like Grafana and complementary logging solutions such as Elastic and Kibana, Prometheus creates a comprehensive observability stack. Recent advancements even include AI-powered agents that can automatically analyze metrics and suggest optimizations.

This guide explores how to build a production-grade Prometheus monitoring system for your Kubernetes clusters, covering everything from architecture fundamentals to advanced scaling techniques. You’ll learn practical steps for installation, configuration, alerting, and long-term storage solutions that ensure your monitoring infrastructure grows alongside your applications.

Prometheus Architecture in Kubernetes Clusters

The architecture of Prometheus within Kubernetes environments follows a modular approach that enables flexible monitoring of dynamic containerized workloads. Understanding this architecture is essential for building resilient monitoring systems that can scale alongside your applications.

Components: Prometheus Server, Exporters, Alertmanager

At the core of the Prometheus ecosystem lies the Prometheus server, which handles metric collection, storage, and query processing. This central component scrapes metrics from configured targets and stores them in a time-series database (TSDB) optimized for numeric data. The server also evaluates alerting rules against collected metrics to identify potential issues.

Exporters serve as bridges between systems and Prometheus by translating metrics into the Prometheus format. Notable exporters in Kubernetes environments include:

Node Exporter: Exposes Linux system-level metrics
Kube-state-metrics: Provides insights into Kubernetes object states like deployments, nodes, and pods
cAdvisor: Collects container resource usage metrics (runs as part of Kubelet)

The Alertmanager handles alert processing and notification delivery. It receives alerts from Prometheus servers, then manages deduplication, grouping, routing, silencing, and inhibition of alerts. Furthermore, Alertmanager supports multiple notification channels including Slack, PagerDuty, email, and custom webhooks, allowing teams to receive alerts through their preferred communication tools.

Pull-based Metrics Collection Model

Unlike many monitoring systems that rely on agents pushing data, Prometheus actively pulls metrics from targets over HTTP. This approach offers several advantages in Kubernetes environments:

First, the pull model prevents overwhelming the monitoring system with misconfigured agents pushing excessive data. Consequently, Prometheus maintains better stability during unexpected metric spikes. Additionally, it simplifies determining target health status – if Prometheus cannot scrape a target, it clearly indicates the target is unreachable.

The scraping process follows configured intervals specified in the Prometheus configuration file. During each scrape, Prometheus sends HTTP requests to target endpoints (typically /metrics), retrieves metrics in its text-based format, and stores them in its database for querying.

For short-lived jobs that complete before being scraped, Prometheus offers the Pushgateway component. This intermediate service allows temporary processes to push metrics, which Prometheus then collects during its regular scraping cycles.

Kubernetes Service Discovery Integration

In dynamic Kubernetes environments where pods are ephemeral, static configuration of monitoring targets becomes impractical. Therefore, Prometheus integrates with the Kubernetes API to automatically discover and monitor resources as they appear, change, and disappear.

Service discovery in Kubernetes operates through multiple role configurations:

endpoints role: Most commonly used, discovers pods through service endpoints
node role: Discovers all cluster nodes for infrastructure monitoring
pod role: Discovers all pods directly, regardless of service membership
service role: Monitors Kubernetes services themselves

Prometheus leverages Kubernetes labels and annotations for fine-grained control of monitoring behavior. Specifically, annotations like prometheus.io/scrape: "true" mark resources for automatic discovery, while others such as prometheus.io/path and prometheus.io/port customize scraping parameters.

This integration with Kubernetes’ label system enables teams to implement standardized monitoring patterns across clusters without manual configuration for each new workload. As a result, monitoring becomes a seamless part of the deployment process rather than an afterthought.

Installing Prometheus using Helm Charts

Deploying Prometheus in Kubernetes environments becomes significantly easier with Helm, the package manager that streamlines complex installations. Helm charts package all necessary Kubernetes manifests into reusable units, making monitoring infrastructure deployment both consistent and maintainable.

Helm v3 Installation Prerequisites

Before installing Prometheus, ensure your environment meets the following requirements:

Kubernetes cluster version 1.19 or higher
Helm version 3.7+ installed on your workstation or CI server

For those new to Helm, installation is straightforward with the official script:

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

After installing Helm, add the Prometheus community repository that contains all official charts:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

This repository houses various Prometheus-related charts, including the core Prometheus server, exporters, and integrated stacks. You can view available charts by running helm search repo prometheus-community.

kube-prometheus-stack vs standalone Prometheus

The Prometheus community offers two primary Helm charts for Kubernetes monitoring, each with distinct advantages depending on your requirements:

kube-prometheus-stack provides an integrated solution combining Prometheus, Grafana, Alertmanager, and supporting components preconfigured for Kubernetes monitoring. This chart is ideal for users seeking a complete monitoring solution with minimal setup effort. It includes:

Preconfigured dashboards for Kubernetes resources
Default alerting rules
Node exporter for host metrics
kube-state-metrics for Kubernetes object state monitoring

However, this convenience comes with less flexibility for customization and higher resource consumption compared to more streamlined options.

In contrast, the standalone Prometheus chart (prometheus-community/prometheus) offers greater flexibility for users who need precise control over their monitoring configuration or want to deploy only specific components. While it requires more manual setup, it’s better suited for environments with existing monitoring tools or specialized requirements.

Customizing values.yaml for Cluster-specific Needs

Both Prometheus charts support extensive customization through the values.yaml file. To begin customizing, extract the default values:

helm show values prometheus-community/prometheus > values.yaml

Common customizations include:

Storage configuration – Modify persistent volume settings to match your cluster’s storage capabilities:

server:
  persistentVolume:
    size: 50Gi
    storageClass: "gp2"

Component enablement – Selectively enable or disable bundled components:

alertmanager:
  enabled: false
kubeStateMetrics:
  enabled: true
nodeExporter:
  enabled: true

Authentication settings – For example, setting Grafana credentials in kube-prometheus-stack:

grafana:
  adminPassword: "securePassword123"

Once customized, deploy your configuration using:

helm install prometheus prometheus-community/prometheus -f values.yaml -n monitoring

For kube-prometheus-stack, the installation is similar:

helm install prom-stack prometheus-community/kube-prometheus-stack -n monitoring

After deployment, verify all components are running:

kubectl get all -n monitoring

Access the Prometheus interface through port-forwarding:

kubectl port-forward svc/prometheus-server 9090:9090 -n monitoring

Throughout this process, following best practices for values customization ensures your monitoring infrastructure meets your specific operational requirements while maintaining the benefits of Helm’s declarative approach.

Configuring Scrape Targets and Exporters

Effective monitoring in Kubernetes depends on proper configuration of metrics sources and collection patterns. Prometheus shines in this area through its flexible target discovery mechanisms and specialized exporters.

Node Exporter for Node-level Metrics

Node Exporter provides hardware and OS-level metrics from Kubernetes cluster nodes. Initially deployed as a DaemonSet, it ensures one instance runs on each node, exposing system-level metrics on port 9100 via the /metrics endpoint.

To deploy Node Exporter using Helm:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install node-exporter prometheus-community/prometheus-node-exporter

Alternatively, create a DaemonSet manifest with appropriate configurations:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    app.kubernetes.io/component: exporter
    app.kubernetes.io/name: node-exporter
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: node-exporter
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9100"
      labels:
        app.kubernetes.io/name: node-exporter
    spec:
      containers:
      - name: node-exporter
        image: prom/node-exporter
        args:
          - --path.sysfs=/host/sys
          - --path.rootfs=/host/root
          - --no-collector.wifi
          - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
        ports:
        - containerPort: 9100
        volumeMounts:
        - name: sys
          mountPath: /host/sys
          readOnly: true
        - name: root
          mountPath: /host/root
          readOnly: true
      volumes:
      - name: sys
        hostPath:
          path: /sys
      - name: root
        hostPath:
          path: /

After deployment, create a service to expose Node Exporter pods, making them discoverable by Prometheus:

kind: Service
apiVersion: v1
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    app.kubernetes.io/name: node-exporter
  ports:
  - port: 9100
    targetPort: 9100

Kube-state-metrics for Cluster State Monitoring

Kube-state-metrics complements Node Exporter by focusing on Kubernetes objects rather than node hardware. This component listens to the Kubernetes API server and generates metrics about deployments, nodes, pods, and other resources.

Notably, kube-state-metrics is simply a metrics endpoint—it doesn’t store data itself but provides it for Prometheus to scrape. Many Helm-based Prometheus installations include kube-state-metrics by default. For standalone deployment:

git clone https://github.com/kubernetes/kube-state-metrics.git
kubectl apply -f kube-state-metrics/examples/standard

Subsequently, add a scrape job to your Prometheus configuration:

- job_name: 'kube-state-metrics'
  static_configs:
  - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']

Kube-state-metrics produces various metrics including:

kube_pod_container_info: Container details within pods
kube_pod_status_ready: Pod readiness state
kube_deployment_status_replicas: Deployment replica counts

These metrics enable powerful queries and alerts. For instance, this PromQL query identifies pods that aren’t ready:

count(kube_pod_status_ready{condition="false"}) by (namespace, pod)

Service Discovery using relabel_configs

In dynamic Kubernetes environments, static target configuration becomes impractical. Prometheus addresses this through relabeling—a powerful mechanism for modifying and filtering targets before scraping.

The relabel_configs directive exists in several parts of Prometheus configuration:

Within scrape jobs: Applied before scraping targets
As metric_relabel_configs: Applied after scraping but before storage
As write_relabel_configs: Applied before sending to remote storage

A basic relabeling configuration contains these components:

source_labels: Input labels to consider
target_label: Label to modify
regex: Pattern to match against source labels
replacement: New value to apply
action: Operation to perform (replace, keep, drop, etc.)

This example uses annotations to control scraping behavior:

- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::d+)?;(d+)
    replacement: $1:$2
    target_label: __address__

This configuration only scrapes pods with the prometheus.io/scrape: "true" annotation, customizes the metrics path based on prometheus.io/path, and sets the scrape port according to prometheus.io/port.

Through these mechanisms, Prometheus can automatically discover and monitor the ever-changing landscape of services in your Kubernetes cluster without manual reconfiguration.

Alerting and Notification Setup with Alertmanager

Alertmanager serves as the critical notification component in a Prometheus monitoring system, handling alerts generated when metrics cross predefined thresholds. This component transforms raw metric anomalies into actionable notifications for operations teams.

Defining Alerting Rules in Prometheus

Alerting rules in Prometheus utilize PromQL expressions to detect conditions that require attention. These rules are defined in YAML format and loaded into Prometheus at startup or via runtime reloads. Each rule includes several key components:

groups:
- name: example
  rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "High request latency"
      description: "Service is experiencing high latency"

The for clause prevents alert flapping by requiring the condition to persist for a specified duration before firing. Meanwhile, the labels field adds metadata like severity levels, primarily used for routing. Additionally, annotations provide human-readable context about the alert.

Routing Alerts to Slack, Email, and PagerDuty

After Prometheus identifies alert conditions, Alertmanager handles notification delivery through its routing configuration:

route:
  receiver: 'slack-default'
  routes:
  - match:
      team: support
    receiver: 'email-notifications'
  - match:
      team: on-call
      severity: critical
    receiver: 'pagerduty-prod'

receivers:
- name: 'slack-default'
  slack_configs:
  - channel: '#monitoring-alerts'
    send_resolved: true
- name: 'email-notifications'
  email_configs:
  - to: 'team@example.com'
- name: 'pagerduty-prod'
  pagerduty_configs:
  - service_key: '<integration_key>'

This configuration creates a tree-like structure where alerts are matched against criteria and routed to appropriate channels. Essentially, different teams receive notifications through their preferred platforms based on alert severity and other labels.

Grouping and Inhibition Configuration

Grouping prevents notification storms by consolidating similar alerts into single notifications:

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 30m

This configuration groups alerts by name, cluster, and service, waiting 30 seconds before sending the initial notification. Afterward, new alerts in the same group appear after 5 minutes, with repeats limited to 30-minute intervals.

Inhibition suppresses less critical alerts when more severe related alerts are firing:

inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  equal: ['cluster', 'service']

This rule prevents warning notifications when critical alerts exist for the same cluster and service, reducing unnecessary noise and focusing attention on the root issue.

Scaling Prometheus for Production Workloads

As Prometheus deployments grow, standard single-instance setups often hit resource limitations. A Prometheus instance handling millions of metrics can consume over 100GB of RAM ^[1], necessitating scaling strategies beyond vertical growth.

Horizontal Scaling with Thanos Sidecar

Thanos extends Prometheus capabilities through a sidecar architecture that runs alongside each Prometheus server. This sidecar process scrapes data from Prometheus and stores it in object storage while presenting a store API for queries ^[2]. The approach maintains Prometheus’ reliability while adding distributed capabilities:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: thanos-query
        image: quay.io/thanos/thanos:v0.24.0
        args:
          - 'query'
          - '--store=dnssrv+_grpc._tcp.thanos-store'
          - '--store=dnssrv+_grpc._tcp.thanos-sidecar'

This configuration enables a global query view across multiple Prometheus instances, effectively distributing the monitoring load.

Long-term Storage with Cortex or Thanos

Both Cortex and Thanos address Prometheus’ local storage limitations by integrating with object storage systems like Amazon S3 or Google Cloud Storage.

Cortex operates on a push model, serving as a remote write target for Prometheus servers ^[2]. In contrast, Thanos uploads two-hour batches of data to object storage ^[3]. Each approach offers distinct advantages:

Thanos provides modularity with components like Store Gateway for accessing historical data
Cortex excels in multi-tenant environments with strong resource isolation between users

These solutions enable virtually limitless retention periods while maintaining compatibility with Prometheus Query Language (PromQL).

Federation for Multi-cluster Monitoring

Federation creates hierarchical monitoring topologies where higher-level Prometheus servers scrape aggregated metrics from lower-level instances ^[4]. This pattern works particularly well for global visibility across multiple Kubernetes clusters:

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="node"}'
        - '{job="kubernetes-pods"}'
    static_configs:
      - targets:
        - 'prometheus-app:9090'
        - 'prometheus-infra:9090'

Though functional, federation requires careful planning with recording rules to aggregate data effectively, primarily because shipping all metrics to higher-level servers is impractical ^[2].

Conclusion

Prometheus stands as the cornerstone of Kubernetes monitoring infrastructure, offering a robust solution for organizations navigating the complexities of containerized environments. Throughout this guide, we’ve explored the fundamental components that make Prometheus exceptionally well-suited for Kubernetes deployments.

The pull-based metrics collection model provides reliability advantages over traditional push systems, especially for dynamic workloads. Additionally, the seamless integration with Kubernetes service discovery eliminates manual configuration overhead as applications scale. These architectural decisions make Prometheus particularly valuable for cloud-native environments where change represents the only constant.

Setting up a production-grade monitoring system requires several key elements working in concert. Helm charts significantly simplify deployment, whether through the comprehensive kube-prometheus-stack or more tailored standalone installations. Consequently, teams can focus on monitoring rather than infrastructure management. Node Exporter and kube-state-metrics complement each other perfectly, providing both infrastructure and application-level insights necessary for comprehensive observability.

Effective alerting transforms monitoring from passive observation into active incident management. Alertmanager handles this critical function through sophisticated grouping, routing, and inhibition mechanisms that deliver the right alerts to the right teams via their preferred channels. This targeted approach prevents alert fatigue while ensuring critical issues receive immediate attention.

Large-scale deployments face unique challenges that single-instance Prometheus setups cannot address. Therefore, solutions like Thanos and Cortex extend Prometheus capabilities with long-term storage and multi-cluster visibility without sacrificing query performance or reliability. Federation offers another scaling approach, creating hierarchical monitoring topologies for organizations with complex infrastructure needs.

Building a monitoring system represents an ongoing journey rather than a destination. Start with core components, establish baseline metrics and alerts, then gradually expand coverage as your understanding of system behavior deepens. Most importantly, remember that monitoring exists to support applications and users – metrics should drive actionable insights that improve reliability and performance.

The investment in proper Kubernetes monitoring pays dividends through reduced downtime, faster incident resolution, and data-driven capacity planning. Prometheus provides these benefits while maintaining the flexibility to grow alongside your infrastructure, making it the foundation of choice for organizations seeking production-grade observability in containerized environments.

References

[1] – https://sysdig.com/blog/challenges-scale-prometheus/
[2] – https://logz.io/blog/prometheus-architecture-at-scale/
[3] – https://www.opsramp.com/guides/prometheus-monitoring/prometheus-thanos/
[4] – https://chronosphere.io/learn/how-to-address-prometheus-scaling-challenges/