Home Lab Observability: Logs, Metrics, and Proactive Alerts with Graylog & Zabbix
Home Lab Observability: Logs, Metrics, and Proactive Alerts with Graylog & Zabbix
Our home lab has evolved from a simple collection of devices into a fully observable network and security environment. By combining Graylog and Zabbix, we achieve both log-based security visibility and metric-based performance monitoring, mirroring enterprise-level monitoring architectures.
What You Will Learn
- How to set up Graylog for centralized firewall and network logging
- How to deploy Zabbix for metrics monitoring and proactive alerts
- Secure SNMPv3 configuration for your ASA and switches
- Segmentation using Zabbix Proxy to reduce server load
- Monitoring of switch uplinks and ICMP latency/packet loss
- Implementing alert escalation strategies
- Tuning Zabbix database for high performance
- Advanced event correlation to reduce alert noise and detect root causes
This multi-part post walks through:
- Graylog centralized logging
- Zabbix metrics monitoring
- SNMPv3 secure telemetry
- Zabbix Proxy segmentation
- Switch uplinks and ICMP monitoring
- Alert escalation strategies
- Database tuning
- Advanced event correlation
Part 1: Graylog — Centralized Log Collection
We started with Graylog to centralize firewall logs from our Cisco ASA 5525-X:
- Blocked connection tracking
- Source/destination IP visibility
- Rule hit frequency dashboards
- Security incident analysis
Graylog provides event visibility — answering “What happened?” — but cannot provide performance metrics, like WAN utilization or interface saturation.
Screenshot Placeholder: Graylog Dashboard showing ASA firewall blocked connections
Part 2: Introducing Zabbix for Metrics Monitoring
Why Zabbix?
Zabbix adds metric collection, historical trends, and proactive alerts. We now monitor:
- Interface bandwidth utilization
- Packet drops and errors
- Firewall CPU/memory usage
- ICMP latency and packet loss
- Switch uplinks
This complements Graylog’s logs with performance visibility — answering “How is the network behaving over time?”
Architecture Overview
Network diagram showing Graylog, Zabbix server, Zabbix proxy, Cisco ASA 5525-X, switches, and device connectivity.
Part 3: Deploying Zabbix
Server Sizing (Lab Scale)
| Component | Spec |
|---|---|
| CPU | 2 vCPU |
| RAM | 8 GB |
| Disk | 150 GB SSD |
| Database | PostgreSQL |
| Retention | 60 days history, 365 days trends |
Installation (Ubuntu 22.04 example)
wget https://repo.zabbix.com/zabbix/6.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_latest_6.0+ubuntu22.04_all.deb dpkg -i zabbix-release_latest_6.0+ubuntu22.04_all.deb apt update apt install zabbix-server-pgsql zabbix-frontend-php zabbix-nginx-conf zabbix-sql-scripts zabbix-agent postgresql
Database Setup
sudo -u postgres createuser --pwprompt zabbix sudo -u postgres createdb -O zabbix zabbix zcat /usr/share/zabbix-sql-scripts/postgresql/server.sql.gz | sudo -u zabbix psql zabbix
Zabbix Server Config
DBName=zabbix DBUser=zabbix DBPassword=<password>
Start Services
systemctl enable zabbix-server zabbix-agent nginx php8.1-fpm systemctl start zabbix-server zabbix-agent nginx php8.1-fpm
Screenshot Placeholder: Zabbix Server dashboard
Part 4: ASA 5525 SNMPv3 Monitoring
ASA Configuration
conf t snmp-server group ZABBIX-GROUP v3 priv snmp-server user zabbix-user ZABBIX-GROUP v3 auth sha AuthPass123 priv aes 128 PrivPass123 snmp-server host inside 192.168.10.10 version 3 priv zabbix-user write memory
Zabbix Host Configuration
- Version: SNMPv3
- Security Name: zabbix-user
- Auth Protocol: SHA
- Auth Pass: AuthPass123
- Privacy Protocol: AES
- Privacy Pass: PrivPass123
Test Connectivity
snmpwalk -v3 -u zabbix-user -l authPriv -a SHA -A AuthPass123 -x AES -X PrivPass123 192.168.10.1
Screenshot Placeholder: SNMPv3 test output
Part 5: Zabbix Proxy for Segmentation
Install Proxy
apt install zabbix-proxy-pgsql postgresql sudo -u postgres createuser --pwprompt zabbix_proxy sudo -u postgres createdb -O zabbix_proxy zabbix_proxy zcat /usr/share/zabbix-sql-scripts/postgresql/proxy.sql.gz | sudo -u zabbix_proxy psql zabbix_proxy
Proxy Config
Server=192.168.1.50 Hostname=ZBX-PROXY-01 DBName=zabbix_proxy DBUser=zabbix_proxy DBPassword=<password> ProxyMode=0
systemctl enable zabbix-proxy systemctl start zabbix-proxy
Screenshot Placeholder: Zabbix Proxy registration
Part 6: Switch Uplink & ICMP Monitoring
Switch Uplink Trigger (1Gbps)
{Switch01:net.if.out.util[GigabitEthernet1/0/48].avg(5m)} > 75
ICMP Latency & Packet Loss
Latency: {ASA5525:icmppingsec.avg(5m)} > 0.05
Packet Loss: {ASA5525:icmppingloss.avg(5m)} > 5
Screenshot Placeholder: Uplink and ICMP graphs
Part 7: Alert Escalation Strategy
WAN Interface Utilization Example
| Severity | Threshold | Action |
|---|---|---|
| Warning | 70–80% | Slack notification |
| High | 80–90% | Email + Slack |
| Disaster | >90% | Phone call / PagerDuty |
ICMP Packet Loss Escalation
- >5% → Initial email
- >15% sustained → Escalate to senior engineer
- >30% → Notify ISP
Screenshot Placeholder: Alert action configuration
Part 8: Tuning Zabbix Database Performance
- SSD storage is critical
- Enable housekeeping and retention (History 60 days, Trends 365 days)
- Consider TimescaleDB for high-frequency metrics
- Index key tables: history(itemid, clock)
- Optimize polling intervals
- Use Zabbix proxies to offload writes
TimescaleDB Example
CREATE EXTENSION IF NOT EXISTS timescaledb;
SELECT create_hypertable('history', 'clock');
Screenshot Placeholder: DB performance dashboard
Part 9: Observability Outcome
- Historical WAN & interface utilization
- Proactive saturation alerts
- Packet loss & latency detection
- CPU/memory monitoring
- Switch uplink visibility
- Segmented polling via Proxy
- Security events from Graylog intact
Screenshot Placeholder: Combined dashboards
Part 10: Next Steps
- Add additional devices (APs, routers, switches)
- Build SLA dashboards in Zabbix/Grafana
- Implement event correlation rules
- Explore TimescaleDB compression
- Document alert runbooks
Part 11: Advanced Event Correlation
Why Event Correlation?
- Reduce noisy alerts
- Combine related events
- Trigger escalations only for true root causes
WAN Dependency Example
| Trigger Name | Depends On |
|---|---|
| ASA ICMP Latency High | WAN Interface Down |
| ASA ICMP Packet Loss | WAN Interface Down |
| ASA VPN CPU Spike | WAN Interface Down |
Event Correlation Rule Example
- Tag triggers: type=network, device=ASA5525
- Condition: >2 events in 5 minutes
- Operations: Notify Admin via Slack + Email, optionally PagerDuty
Screenshot Placeholder: Event correlation rule setup
Integration Summary
| Layer | Tool | Function |
|---|---|---|
| Logs | Graylog | Security & firewall events |
| Metrics | Zabbix Server | Performance monitoring & availability |
| Segmentation | Zabbix Proxy | Distributed polling |
| Network | SNMPv3 | Encrypted telemetry |
| Health | ICMP | Latency & packet loss |
| Event Correlation | Zabbix Rules | Intelligent alert escalation |
With these configurations, your home lab now mirrors enterprise NOC/SOC observability workflows.
Comments
Post a Comment
Got something to say? Drop a comment below — let’s chat!