The Blind Spot
I had a monitoring script running on the server via cron every 5 minutes. It checked services, disk usage, memory, CPU, SSL cert expiry, SSH attacks, and fail2ban status. Alerts went out through ntfy.sh. It felt comprehensive.
But there was a fundamental problem: the monitoring lived on the same machine it was monitoring. If the VM went down — kernel panic, OCI reboot, network failure — the script died with it. No alert. No notification. Just silence.
And silence looked exactly like "everything is fine."
Two External Layers
The fix required something outside the server. I set up two independent mechanisms, each catching failures the other misses.
Layer 1: External uptime check. UptimeRobot hits https://omarmerroun.duckdns.org/blog/infrastructure every 5 minutes. Not the home page — a blog post route, because it exercises more of the stack. The request passes through nginx, Gunicorn, Flask, the markdown parser, and Jinja2 templating. If any of those break, the check fails and I get alerted.
This required zero changes to the server. Port 443 was already open. The monitoring service is just another visitor.
Layer 2: Dead man's switch. healthchecks.io expects a heartbeat from my server every 5 minutes. A simple curl to a unique URL, appended to my existing crontab line with a semicolon after the monitoring script. If the heartbeat stops, I get alerted after a 10-minute grace period.
What Each Layer Catches
The uptime check catches application-level failures. Flask crashes but nginx is still running? UptimeRobot sees a 502 instead of a 200.
The dead man's switch catches infrastructure-level failures. The entire VM is offline? The heartbeat stops arriving, and healthchecks.io notices the silence.
Neither alone is sufficient. Together, they close the gap that on-server monitoring can never fill: detecting its own death.