AI-Driven Monitoring Pipeline
Overview
Every hour, a pipeline wakes up, polls four data sources across the homelab, runs a local LLM over the results, and drops a plain-English health report into a dashboard at status.nbkelley.com. No cloud, no subscriptions — just a workflow and a model running on hardware I already had.
Why I Built It
I had Prometheus and Uptime Kuma running and collecting plenty of data, but I was never actually reading it. The dashboards were great if I went looking, but I wanted something that would proactively tell me “hey, two services are degraded and your gateway is running hot” without me having to know to check.
The appeal of using a local LLM for this — rather than alerting rules or threshold triggers — was that I could describe what I wanted in plain English and get an answer in plain English. Threshold alerts are brittle; you have to know in advance what to look for. A language model can notice that three different metrics are trending in the same direction and just say so. It turned into a genuinely fun architecture problem.
How It Works
The pipeline is orchestrated by n8n on an hourly schedule. It collects data from four sources in parallel, runs a local Ollama model over each, and synthesizes the results into a single status report.
| Component | Role |
|---|---|
| Prometheus | 7 per-host metrics (CPU, memory, disk, load, network RX/TX, up/down) across all VMs and LXCs |
| Uptime Kuma | 17 named service monitors — HTTP uptime %, current status, last ping latency |
| UniFi UCG Express | Gateway stats: WAN latency, packet drops, client counts, CPU temp, per-VLAN traffic |
| Synology NAS | Disk health, volume utilization, CPU/memory load, UPS charge and runtime |
| n8n | Orchestrates the full workflow on an hourly schedule trigger |
| Ollama (gemma4:e4b) | 5 inference calls total: 4 per-source summaries + 1 final synthesis |
| Postgres | Stores analysis results and per-hour raw snapshots for delta computation |
| status.nbkelley.com | Node.js/Express dashboard that reads from Postgres and displays the reports |
All four data sources are collected simultaneously, so the collection phase takes as long as the slowest one — not all four added together.
The two-section prompt trick
The most interesting part of the design is how the individual source summaries feed into the final synthesis. Each of the four LLM calls is prompted to produce exactly two sections: a clean human-readable summary, and a “context for next run” block of concise bullet metrics. The summaries go straight to the dashboard. The context bullets feed the final synthesis call.
This keeps the synthesis prompt lean — it sees just the key signals from each source rather than having to re-read and re-summarize full paragraphs. The final call has one job: write a 2-3 sentence overall status based on what the four sources flagged as interesting.
Delta computation for UniFi
The gateway reports WAN metrics (packet drops, bytes transferred) as cumulative totals since last reboot. So the raw answer to “how many packet drops?” is something like 14,823 — which is meaningless without knowing the time window. The pipeline stores a raw snapshot to Postgres every run and diffs it against the previous one, so the model sees “3 drops this hour” instead of a since-boot total that grows forever.
Challenges
- UniFi’s cumulative counters. I spent a session confused about why the drop count was consistently in the tens of thousands before realizing the API just doesn’t give you rates. Had to design a snapshot table and build the diff logic myself.
- n8n’s silent input merging. When multiple nodes wire into the same downstream node, n8n silently passes only one input — the others are dropped. This is non-obvious behavior that cost some debugging time. The fix is an explicit Merge (Combine) node, but you have to know to add it.
- Prompt structure took iteration. The two-section format (summary / context) was the key to keeping the synthesis call clean, but I arrived at it after a few approaches that produced either too much redundancy or too little context in the final call.
- Apostrophes in AI output. Raw SQL INSERTs do not survive unescaped single quotes in the values. Learned this the hard way when the first few runs failed silently in the reshape step.
- Runtime. The pipeline takes about 13 minutes end-to-end on a CPU-only machine (Ollama on an i7 Pavilion, no usable GPU). That’s acceptable for an hourly job, and a Mac Studio M1 Max recently joined the fleet — runtime should drop substantially once inference moves there.
Result
It’s been running every hour since mid-April and it works. The dashboard shows a rolling view of recent hourly snapshots, each with a generated summary and per-source breakdowns. Glancing at it actually tells me something.
The funniest thing it’s consistently surfaced: the gateway CPU runs at ~71°C in every single report, no exceptions. That’s within spec for the hardware, but it’s become a kind of mascot data point — the one thing I can always count on the pipeline to mention.