Auto Draft

Grafana + Prometheus on Docker: Build a Full Homelab Monitoring Stack

Your homelab runs 24/7. Containers spin up, services restart, disks fill, and RAM gets eaten alive by that one VM you forgot about. Without visibility into what’s actually happening, you’re flying blind — and nothing says “weekend ruined” like waking up to a dead Plex server because your data drive quietly hit 100% three days ago.

Grafana and Prometheus fix that. Together they give you real-time metrics, beautiful dashboards, and alerting that catches problems before they become disasters. In this guide I’ll walk through building a complete homelab monitoring stack — Prometheus for metric collection, Node Exporter for host-level stats, cAdvisor for Docker container metrics, and Grafana for visualization — all running as Docker containers with a single Compose file.

If you’re new to Docker or just getting your homelab off the ground, check out the complete Docker beginner’s guide first. And if you already have Uptime Kuma running for uptime checks, this stack complements it perfectly — Uptime Kuma tells you when something is down, Grafana and Prometheus tell you why.

What We’re Building

Here’s the full architecture:

  • Prometheus — a time-series database that scrapes metrics from configured targets on a set interval. Think of it as the collector and storage engine.
  • Node Exporter — runs on each host you want to monitor, exposing CPU, memory, disk, network, and filesystem metrics in Prometheus format.
  • cAdvisor (Container Advisor) — a Google-built exporter that surfaces per-container CPU, memory, network, and I/O metrics. Essential if you’re running a lot of Docker workloads.
  • Grafana — the visualization layer. Pulls data from Prometheus and renders it into dashboards you can actually read.

All four services run as containers on your homelab host, connected via a shared Docker network. Prometheus scrapes Node Exporter and cAdvisor every 15 seconds, stores the data locally, and Grafana queries Prometheus on demand when you load a dashboard.

The data flow is straightforward: exporters expose metrics on HTTP endpoints → Prometheus pulls and stores them → Grafana queries and visualizes. No agents, no cloud services, no subscription fees.

Prerequisites

  • A Linux host (bare metal, VM, or Proxmox guest) with Docker and Docker Compose v2 installed
  • At least 2 GB of RAM free — Prometheus and Grafana together use roughly 300–500 MB under typical homelab load
  • Basic familiarity with YAML and the Docker CLI
  • Ports 3000 (Grafana), 9090 (Prometheus), 9100 (Node Exporter), and 8080 (cAdvisor) available on your host

I’m running this on an Ubuntu 24.04 LTS VM. The setup is identical on a Raspberry Pi 5, a bare metal server, or any modern Linux box. If you plan to run this alongside other services, it pairs naturally with a reverse proxy like Traefik or Nginx Proxy Manager.

Directory Structure

Create a working directory for this stack:

mkdir -p ~/monitoring/prometheus
cd ~/monitoring

Your final layout will be:

~/monitoring/
├── docker-compose.yml
└── prometheus/
    └── prometheus.yml

Grafana persists its dashboards, data sources, and user settings in a named Docker volume so everything survives container restarts and image upgrades.

The Docker Compose File

Create ~/monitoring/docker-compose.yml:

version: '3.8'

networks:
  monitoring:
    driver: bridge

volumes:
  grafana_data:
  prometheus_data:

services:
  prometheus:
    image: prom/prometheus:v2.52.0
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - "9090:9090"
    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:v1.8.1
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.49.1
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    devices:
      - /dev/kmsg
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro
      - /cgroup:/cgroup:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring

  grafana:
    image: grafana/grafana-oss:11.1.0
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=changeme_now
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_ROOT_URL=http://YOUR_HOST_IP:3000
    ports:
      - "3000:3000"
    networks:
      - monitoring
    depends_on:
      - prometheus

Key points before you launch:

  • Change GF_SECURITY_ADMIN_PASSWORD before starting the stack. Don’t skip this.
  • --storage.tsdb.retention.time=30d keeps 30 days of metrics. Drop it to 15d if your disk is constrained.
  • --web.enable-lifecycle lets you hot-reload Prometheus config without restarting the container: curl -X POST http://localhost:9090/-/reload
  • Node Exporter mounts /proc, /sys, and the root filesystem as read-only — it needs those to collect host metrics. The :ro flag is important.
  • cAdvisor requires privileged: true to read Docker’s cgroup data. This is expected and necessary.

Prometheus Configuration

Create ~/monitoring/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets: []

rule_files: []

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        replacement: 'homelab-host'

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
    metric_relabel_configs:
      - source_labels: [container_label_com_docker_compose_service]
        target_label: service

  - job_name: 'remote-node'
    static_configs:
      - targets: ['192.168.1.50:9100']
    relabel_configs:
      - source_labels: [__address__]
        replacement: 'nas-box'
        target_label: instance

The remote-node job demonstrates how to monitor additional hosts. Install Node Exporter on each remote machine, add its IP here, and reload Prometheus. Replace 192.168.1.50 with your actual NAS, Pi, or secondary VM IP — or remove that block entirely if you’re only monitoring one host to start.

Notice how service names are used as scrape targets (node-exporter:9100, cadvisor:8080) rather than IP addresses. Because all containers share the monitoring network, Docker’s internal DNS resolves these names automatically.

Launch the Stack

cd ~/monitoring
docker compose up -d

Give it 30–60 seconds, then verify all targets are healthy:

# Check Prometheus targets via API
curl -s http://localhost:9090/api/v1/targets | \
  python3 -c "import json,sys; [print(t['labels']['job'], '-', t['health']) for t in json.load(sys.stdin)['data']['activeTargets']]"

Expected output:

cadvisor - up
node-exporter - up
prometheus - up

If a target shows down, debug it like this:

# Can Prometheus reach Node Exporter?
docker exec prometheus wget -qO- http://node-exporter:9100/metrics | head -10

# Check Prometheus config for syntax errors
docker exec prometheus promtool check config /etc/prometheus/prometheus.yml

# View Prometheus logs
docker logs prometheus --tail 50

Setting Up Grafana

Open http://YOUR_HOST_IP:3000 and log in with admin and the password you configured. The first thing Grafana will ask you to do is change the password — do it.

Add Prometheus as a Data Source

  1. Navigate to Connections → Data sources → Add data source
  2. Select Prometheus
  3. Set the URL to http://prometheus:9090 — use the Docker service name, not localhost
  4. Leave scrape interval at 15s to match your Prometheus config
  5. Click Save & Test — you should see “Successfully queried the Prometheus API.”

If the connection fails, confirm Grafana is on the same Docker network as Prometheus. Both services have networks: - monitoring in the Compose file, so this shouldn’t be an issue unless you modified that.

Import Pre-Built Dashboards

Don’t build dashboards from scratch — the Grafana community has already done the work. Go to Dashboards → New → Import and import these by ID:

  • 1860 — Node Exporter Full. Comprehensive host metrics: CPU per-core breakdown, memory (including buffers/cache), disk I/O, network throughput, filesystem usage. This is the one dashboard you should import first.
  • 14282 — cAdvisor Exporter. Per-container CPU, memory, network, and block I/O via cAdvisor. Great for seeing which containers are eating your resources.
  • 3662 — Prometheus 2.0 Stats. Meta-monitoring: tracks Prometheus’s own memory, ingestion rate, query performance, and storage size.

After importing dashboard 1860, you’ll immediately see per-core CPU graphs, memory breakdown, filesystem usage per mount, and network I/O per interface — all populated with live data from your host. No panel configuration required.

PromQL Essentials

Once you’re comfortable with the pre-built dashboards, you’ll want to create custom panels for your specific setup. Prometheus Query Language (PromQL) is how you do that. Here are the queries I use most often:

# CPU usage percentage (5m average)
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Available memory in GB
node_memory_MemAvailable_bytes / 1024 / 1024 / 1024

# Disk usage percentage per mount point
(node_filesystem_size_bytes{fstype!="tmpfs"} - node_filesystem_free_bytes{fstype!="tmpfs"})
  / node_filesystem_size_bytes{fstype!="tmpfs"} * 100

# Network receive throughput in Mbps (excludes loopback)
rate(node_network_receive_bytes_total{device!="lo"}[5m]) * 8 / 1024 / 1024

# Container CPU usage percentage
rate(container_cpu_usage_seconds_total{name!=""}[5m]) * 100

# Container memory usage in MB
container_memory_usage_bytes{name!=""} / 1024 / 1024

# Disk I/O utilization percentage
rate(node_disk_io_time_seconds_total[5m]) * 100

The rate() function converts cumulative counters (monotonically increasing values like bytes received or CPU seconds) into per-second rates — this is fundamental to PromQL. The by(instance) clause aggregates results per host, which becomes important once you add multiple machines. Labels like {mode="idle"} and {device!="lo"} filter metrics the same way WHERE clauses work in SQL.

Alerting in Grafana

Grafana’s built-in alerting engine handles homelab use cases well without needing a separate Alertmanager deployment. Navigate to Alerting → Alert rules → New alert rule.

A practical disk space alert:

  • Rule name: Disk Usage Critical
  • Query A: (node_filesystem_size_bytes{fstype!="tmpfs"} - node_filesystem_free_bytes{fstype!="tmpfs"}) / node_filesystem_size_bytes{fstype!="tmpfs"} * 100
  • Condition: WHEN last() OF A IS ABOVE 85
  • Pending period: 5 minutes (avoids flapping alerts on temporary spikes)

Set up contact points under Alerting → Contact points. Grafana supports email, Slack, Discord, PagerDuty, and webhooks. For ntfy.sh integration, add a webhook contact point with your ntfy.sh topic URL — you’ll get push alerts on your phone when disk space, CPU, or memory crosses your thresholds.

Monitoring Additional Hosts

Installing Node Exporter on a remote host — a NAS, a Raspberry Pi, a second VM — is straightforward:

# On the remote host, run Node Exporter in host network mode
docker run -d \
  --name node-exporter \
  --restart unless-stopped \
  --net host \
  --pid host \
  -v "/:/host:ro,rslave" \
  prom/node-exporter:v1.8.1 \
  --path.rootfs=/host

Add the remote host’s IP to your prometheus.yml and reload:

curl -X POST http://localhost:9090/-/reload

The new host appears in your dashboards instantly — just use the instance dropdown to switch between hosts. If you have several machines to configure, this is an ideal use case for Ansible playbooks — write a single role that installs and starts Node Exporter on any host in your inventory, then run it against your entire homelab at once.

Storage Planning

Prometheus uses an efficient compressed time-series format. Realistic estimates at 15-second scrape intervals:

  • ~2–3 MB per day per monitored host
  • ~60–90 MB per host over 30 days
  • 5 hosts at 30-day retention: roughly 300–450 MB total

That’s negligible on any modern disk. Storage scales linearly with the number of time series (unique metric + label combinations), scrape frequency, and retention period. Monitor Prometheus’s own footprint via the Prometheus 2.0 Stats dashboard, or check directly:

docker exec prometheus du -sh /prometheus

If you need to reduce storage, either shorten retention (--storage.tsdb.retention.time=15d) or drop high-cardinality metrics at scrape time using metric_relabel_configs with action: drop.

What’s Next

Reverse proxy Grafana: Rather than exposing port 3000 directly, put Grafana behind Nginx or Traefik with a clean local hostname like grafana.home.lab. Traefik integrates natively with Docker Compose via labels.

Grafana provisioning: You can define data sources and dashboards as YAML files mounted into the Grafana container. This makes your setup fully reproducible — commit the provisioning files to git and you can rebuild the entire monitoring stack from scratch in minutes.

Blackbox Exporter: Probes HTTP endpoints, TCP ports, ICMP, and DNS from within your network. Pairs well with Uptime Kuma for external monitoring — Blackbox handles internal endpoint health checks with detailed latency histograms in Grafana.

SNMP Exporter: If you have managed switches or a NAS that speaks SNMP, the Prometheus SNMP Exporter pulls interface stats, CPU load, and hardware health into your dashboards. Configure it with the snmp.yml generator for your specific devices.

Wrapping Up

The Grafana + Prometheus stack is one of those homelab tools that, once running, you’ll wonder how you managed without it. The ability to correlate a slow Plex stream with a spike in disk I/O, or catch a container silently consuming 12 GB of RAM before it OOM-kills something important — that visibility pays for itself in the first week.

The whole stack comes up in under two minutes with a single docker compose up -d, uses roughly 300 MB of RAM under typical homelab load, and scales gracefully as you add more hosts and services. Start with the pre-built dashboards, spend an hour learning PromQL, then build custom panels for whatever matters most in your environment. Once alerting is configured and tested, you’ve got a genuinely professional monitoring setup running entirely on hardware you already own.

Enjoying this post?

Get more guides like this delivered straight to your inbox. No spam, just tech and trails.