obmp-docker/postgres/scripts/010_churn_metrics.sql
sam d7084aba54 Add fast-path churn monitor and churn-storm load tool
obmp-churn-monitor: a decoupled fast-path BGP churn consumer. Reads
openbmp.parsed.unicast_prefix with its own Kafka consumer group and only
counts announcements/withdrawals per (router,peer) into churn_metrics
(010_churn_metrics.sql) -- no relational RIB write. Storm-tested: it
stayed real-time (tracked 1k->85k msg/s) while the psql-app bulk
pipeline lag grew 3.8M->5.6M. Live BGP Churn dashboard reads it.

tools/churn_storm.py: programmatic churn-storm generator (flaps GoBGP's
eBGP sessions to the lab cores) for load testing.

Stress-test finding: fleet-wide full table from 18 routers exceeds this
31 GiB host. The bottleneck is RAM, not CPU -- at 16 cores the host
still hit load 33 because it was swap-thrashing (swap 2/2 full, <1.5 GiB
free). Lag ran away 3.8M->20M+. Recourse: more host RAM for bulk
throughput; the fast-path consumer for visibility regardless.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 13:17:09 -07:00

31 lines
1.2 KiB
SQL

-- 010_churn_metrics.sql
-- Fast-path BGP churn metrics, written by the obmp-churn-monitor service.
--
-- obmp-churn-monitor reads openbmp.parsed.unicast_prefix from Kafka with its
-- own consumer group and only COUNTS announcements/withdrawals per
-- (router, peer) -- no relational RIB maintenance. Because counting is far
-- cheaper than psql-app's per-route upserts, it stays real-time even when the
-- main ingestion pipeline lags minutes behind under a churn storm. This is the
-- decoupled "visibility path": it does not speed up the bulk DB write, it
-- guarantees churn visibility survives a storm the bulk pipeline cannot.
CREATE TABLE IF NOT EXISTS churn_metrics (
ts timestamptz NOT NULL DEFAULT now(),
router_ip inet,
peer_ip inet,
peer_asn bigint,
adds integer,
dels integer
);
SELECT create_hypertable('churn_metrics', 'ts', if_not_exists => TRUE);
CREATE INDEX IF NOT EXISTS idx_churn_ts ON churn_metrics (ts DESC);
CREATE INDEX IF NOT EXISTS idx_churn_router_ts ON churn_metrics (router_ip, ts DESC);
DO $$ BEGIN
PERFORM add_retention_policy('churn_metrics', INTERVAL '7 days');
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'churn_metrics retention policy not added: %', SQLERRM;
END $$;