Building Log Patrol: AI-Assisted Log Monitoring That Actually Reduces Alert Fatigue

The Problem With Naive Log Alerting

Every production system eventually develops the same problem: alerting that cries wolf. The standard pattern is simple: watch for ERROR or CRITICAL level log events, fire an alert, open a ticket. In a quiet system, this works. In any real system with meaningful traffic, it produces a constant stream of low-signal notifications that teams learn to tune out. Once your engineers stop reading alerts, you've lost the entire value of the monitoring system.

I hit this problem running a personal Loki log aggregation instance watching several services. The noise-to-signal ratio was bad enough that I started ignoring entire log streams. That's worse than no alerting. It creates false confidence.

I built Log Patrol to solve this. The design requirement was simple to state and hard to implement: open a GitLab issue only when something genuinely warrants human attention, and stay quiet the rest of the time.

Three Layers, Three Failure Modes

The core insight is that "is this log event incident-worthy?" is a question that can't be answered well by any single technique. I ended up with three complementary layers, each designed to catch what the others miss.

Layer 1: Deterministic Rules (Fast Path)

Some events should always produce an incident. An out-of-memory kill, a database connection failure, a TLS certificate expiration. These are unambiguous. For these, I use deterministic pattern matching: if the log line matches this regex, it's an incident. No ML, no inference overhead, no false-negative risk. The fast path runs first and short-circuits evaluation for clear-cut cases.

The key discipline here is keeping the fast path narrow. If you add too many patterns, you're just rebuilding the naive approach with extra syntax. I only put a rule in the fast path if I'd be comfortable calling it a pager-worthy event at 3 AM.

Layer 2: Drain3 Log Template Mining (Slow Path)

The harder problem is the log events that aren't obvious: novel patterns, unusual frequencies, errors that only appear after a specific sequence of operations. For these I use Drain3, a log template mining library that clusters log messages into structured templates by stripping variable parts (IP addresses, request IDs, timestamps) and extracting the invariant pattern.

Drain3 runs on the slow path, processing the log stream in the background. When it encounters a template it hasn't seen before, or one that suddenly increases in frequency, that's a candidate for investigation. The value of template mining over raw pattern matching is that it generalizes: it can surface a novel error message format it's never seen before, not just the ones I thought to add rules for.

Layer 3: LLM Sentiment Gate

Both layers above produce candidates, not verdicts. Before Log Patrol opens or updates a GitLab issue, it runs the candidate finding through a locally-hosted Ollama model with a structured prompt asking a single question: is this finding genuinely incident-worthy, or is it noise?

The LLM gate exists to handle the gap between "structurally unusual" and "actually a problem." Drain3 might flag a new log template that looks alarming in isolation but represents expected behavior during a maintenance window. A deterministic rule might fire on a log line that matches the pattern but appears in a context where it's harmless. The LLM has enough contextual understanding to make that distinction with reasonable accuracy.

Running Ollama locally was a deliberate choice. Log data is sensitive and I didn't want to route it through an external API. The latency tradeoff is acceptable because this path is asynchronous; nothing is waiting on the LLM to return before the next patrol loop runs.

Deduplication and State Management

A recurring incident should produce one issue, not one per patrol loop. Log Patrol fingerprints every finding (a hash of the log template, source service, and error class) and persists patrol state in SQLite. If a finding's fingerprint already maps to an open issue, it updates that issue rather than creating a new one. If a fingerprint stops appearing across multiple consecutive patrol loops, the associated issue is automatically closed.

This deduplication design means the GitLab issue queue reflects the current state of the system, not a historical log of every anomaly. An issue being open means the problem is likely still occurring. An issue being closed means Log Patrol stopped seeing it.

Deployment and Testing

The entire system deploys via Docker Compose. Configuration is environment-variable driven: Loki endpoint, GitLab project ID and token, Ollama model name, patrol interval. I didn't want ops overhead that required touching the codebase to change deployment targets.

Testing is split between a pytest suite for business logic and a smoke test harness that exercises the end-to-end path against a real Loki instance with synthetic log injection. The code is fully typed (mypy strict) and linted (pylint, pydocstyle Google convention). The typing discipline was worth the overhead. Several subtle bugs in the fingerprinting logic would have been runtime surprises without it.

What I'd Do Differently

The LLM gate adds meaningful latency to the slow path, and its accuracy is model-dependent in ways that are hard to test systematically. A better approach might be to build a small labeled dataset of "incident-worthy" vs. "noise" findings from real patrol runs and use that to fine-tune a smaller, faster classifier rather than relying on a general-purpose LLM. That's future work.

The SQLite state store works, but it's a single point of failure for a system that's supposed to monitor other services. In a more critical deployment I'd want the state replicated or backed by something more durable.

Source

Log Patrol is open source at github.com/Tuxprogrammer/log-patrol. Issues and pull requests welcome.