AI-Driven Monitoring Pipeline

Overview

Every hour, a pipeline wakes up, polls four data sources across the homelab, runs a local LLM over the results, and drops a plain-English health report into a dashboard at status.nbkelley.com. No cloud, no subscriptions — just a workflow and a model running on hardware I already had.

Why I Built It

I had Prometheus and Uptime Kuma running and collecting plenty of data, but I was never actually reading it. The dashboards were great if I went looking, but I wanted something that would proactively tell me “hey, two services are degraded and your gateway is running hot” without me having to know to check.

The appeal of using a local LLM for this — rather than alerting rules or threshold triggers — was that I could describe what I wanted in plain English and get an answer in plain English. Threshold alerts are brittle; you have to know in advance what to look for. A language model can notice that three different metrics are trending in the same direction and just say so. It turned into a genuinely fun architecture problem.

How It Works

The pipeline is orchestrated by n8n on an hourly schedule. It collects data from four sources in parallel, runs a local Ollama model over each, and synthesizes the results into a single status report.

Component	Role
Prometheus	7 per-host metrics (CPU, memory, disk, load, network RX/TX, up/down) across all VMs and LXCs
Uptime Kuma	17 named service monitors — HTTP uptime %, current status, last ping latency
UniFi UCG Express	Gateway stats: WAN latency, packet drops, client counts, CPU temp, per-VLAN traffic
Synology NAS	Disk health, volume utilization, CPU/memory load, UPS charge and runtime
n8n	Orchestrates the full workflow on an hourly schedule trigger
Ollama (gemma4:e4b)	5 inference calls total: 4 per-source summaries + 1 final synthesis
Postgres	Stores analysis results and per-hour raw snapshots for delta computation
status.nbkelley.com	Node.js/Express dashboard that reads from Postgres and displays the reports

All four data sources are collected simultaneously, so the collection phase takes as long as the slowest one — not all four added together.

The two-section prompt trick

The most interesting part of the design is how the individual source summaries feed into the final synthesis. Each of the four LLM calls is prompted to produce exactly two sections: a clean human-readable summary, and a “context for next run” block of concise bullet metrics. The summaries go straight to the dashboard. The context bullets feed the final synthesis call.

This keeps the synthesis prompt lean — it sees just the key signals from each source rather than having to re-read and re-summarize full paragraphs. The final call has one job: write a 2-3 sentence overall status based on what the four sources flagged as interesting.

Delta computation for UniFi

The gateway reports WAN metrics (packet drops, bytes transferred) as cumulative totals since last reboot. So the raw answer to “how many packet drops?” is something like 14,823 — which is meaningless without knowing the time window. The pipeline stores a raw snapshot to Postgres every run and diffs it against the previous one, so the model sees “3 drops this hour” instead of a since-boot total that grows forever.

Challenges

UniFi’s cumulative counters. I spent a session confused about why the drop count was consistently in the tens of thousands before realizing the API just doesn’t give you rates. Had to design a snapshot table and build the diff logic myself.
n8n’s silent input merging. When multiple nodes wire into the same downstream node, n8n silently passes only one input — the others are dropped. This is non-obvious behavior that cost some debugging time. The fix is an explicit Merge (Combine) node, but you have to know to add it.
Prompt structure took iteration. The two-section format (summary / context) was the key to keeping the synthesis call clean, but I arrived at it after a few approaches that produced either too much redundancy or too little context in the final call.
Apostrophes in AI output. Raw SQL INSERTs do not survive unescaped single quotes in the values. Learned this the hard way when the first few runs failed silently in the reshape step.
Runtime. The pipeline takes about 13 minutes end-to-end on a CPU-only machine (Ollama on an i7 Pavilion, no usable GPU). That’s acceptable for an hourly job, and a Mac Studio M1 Max recently joined the fleet — runtime should drop substantially once inference moves there.

Result

It’s been running every hour since mid-April and it works. The dashboard shows a rolling view of recent hourly snapshots, each with a generated summary and per-source breakdowns. Glancing at it actually tells me something.

The funniest thing it’s consistently surfaced: the gateway CPU runs at ~71°C in every single report, no exceptions. That’s within spec for the hardware, but it’s become a kind of mascot data point — the one thing I can always count on the pipeline to mention.

2026-05-02

../