AI-Driven Monitoring Pipeline

Overview

Every hour, a pipeline wakes up, polls four data sources across the homelab, runs a local LLM over the results, and drops a plain-English health report into a dashboard at status.nbkelley.com. No cloud, no subscriptions — just a workflow and a model running on hardware I already had.

Why I Built It

I had Prometheus and Uptime Kuma running and collecting plenty of data, but I was never actually reading it. The dashboards were great if I went looking, but I wanted something that would proactively tell me “hey, two services are degraded and your gateway is running hot” without me having to know to check.

The appeal of using a local LLM for this — rather than alerting rules or threshold triggers — was that I could describe what I wanted in plain English and get an answer in plain English. Threshold alerts are brittle; you have to know in advance what to look for. A language model can notice that three different metrics are trending in the same direction and just say so. It turned into a genuinely fun architecture problem.

How It Works

The pipeline is orchestrated by n8n on an hourly schedule. It collects data from four sources in parallel, runs a local Ollama model over each, and synthesizes the results into a single status report.

ComponentRole
Prometheus7 per-host metrics (CPU, memory, disk, load, network RX/TX, up/down) across all VMs and LXCs
Uptime Kuma17 named service monitors — HTTP uptime %, current status, last ping latency
UniFi UCG ExpressGateway stats: WAN latency, packet drops, client counts, CPU temp, per-VLAN traffic
Synology NASDisk health, volume utilization, CPU/memory load, UPS charge and runtime
n8nOrchestrates the full workflow on an hourly schedule trigger
Ollama (gemma4:e4b)5 inference calls total: 4 per-source summaries + 1 final synthesis
PostgresStores analysis results and per-hour raw snapshots for delta computation
status.nbkelley.comNode.js/Express dashboard that reads from Postgres and displays the reports

All four data sources are collected simultaneously, so the collection phase takes as long as the slowest one — not all four added together.

The two-section prompt trick

The most interesting part of the design is how the individual source summaries feed into the final synthesis. Each of the four LLM calls is prompted to produce exactly two sections: a clean human-readable summary, and a “context for next run” block of concise bullet metrics. The summaries go straight to the dashboard. The context bullets feed the final synthesis call.

This keeps the synthesis prompt lean — it sees just the key signals from each source rather than having to re-read and re-summarize full paragraphs. The final call has one job: write a 2-3 sentence overall status based on what the four sources flagged as interesting.

Delta computation for UniFi

The gateway reports WAN metrics (packet drops, bytes transferred) as cumulative totals since last reboot. So the raw answer to “how many packet drops?” is something like 14,823 — which is meaningless without knowing the time window. The pipeline stores a raw snapshot to Postgres every run and diffs it against the previous one, so the model sees “3 drops this hour” instead of a since-boot total that grows forever.

Challenges

Result

It’s been running every hour since mid-April and it works. The dashboard shows a rolling view of recent hourly snapshots, each with a generated summary and per-source breakdowns. Glancing at it actually tells me something.

The funniest thing it’s consistently surfaced: the gateway CPU runs at ~71°C in every single report, no exceptions. That’s within spec for the hardware, but it’s become a kind of mascot data point — the one thing I can always count on the pipeline to mention.