obmp-churn-monitor: a decoupled fast-path BGP churn consumer. Reads openbmp.parsed.unicast_prefix with its own Kafka consumer group and only counts announcements/withdrawals per (router,peer) into churn_metrics (010_churn_metrics.sql) -- no relational RIB write. Storm-tested: it stayed real-time (tracked 1k->85k msg/s) while the psql-app bulk pipeline lag grew 3.8M->5.6M. Live BGP Churn dashboard reads it. tools/churn_storm.py: programmatic churn-storm generator (flaps GoBGP's eBGP sessions to the lab cores) for load testing. Stress-test finding: fleet-wide full table from 18 routers exceeds this 31 GiB host. The bottleneck is RAM, not CPU -- at 16 cores the host still hit load 33 because it was swap-thrashing (swap 2/2 full, <1.5 GiB free). Lag ran away 3.8M->20M+. Recourse: more host RAM for bulk throughput; the fast-path consumer for visibility regardless. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
31 lines
1.2 KiB
SQL
31 lines
1.2 KiB
SQL
-- 010_churn_metrics.sql
|
|
-- Fast-path BGP churn metrics, written by the obmp-churn-monitor service.
|
|
--
|
|
-- obmp-churn-monitor reads openbmp.parsed.unicast_prefix from Kafka with its
|
|
-- own consumer group and only COUNTS announcements/withdrawals per
|
|
-- (router, peer) -- no relational RIB maintenance. Because counting is far
|
|
-- cheaper than psql-app's per-route upserts, it stays real-time even when the
|
|
-- main ingestion pipeline lags minutes behind under a churn storm. This is the
|
|
-- decoupled "visibility path": it does not speed up the bulk DB write, it
|
|
-- guarantees churn visibility survives a storm the bulk pipeline cannot.
|
|
|
|
CREATE TABLE IF NOT EXISTS churn_metrics (
|
|
ts timestamptz NOT NULL DEFAULT now(),
|
|
router_ip inet,
|
|
peer_ip inet,
|
|
peer_asn bigint,
|
|
adds integer,
|
|
dels integer
|
|
);
|
|
|
|
SELECT create_hypertable('churn_metrics', 'ts', if_not_exists => TRUE);
|
|
|
|
CREATE INDEX IF NOT EXISTS idx_churn_ts ON churn_metrics (ts DESC);
|
|
CREATE INDEX IF NOT EXISTS idx_churn_router_ts ON churn_metrics (router_ip, ts DESC);
|
|
|
|
DO $$ BEGIN
|
|
PERFORM add_retention_policy('churn_metrics', INTERVAL '7 days');
|
|
EXCEPTION WHEN OTHERS THEN
|
|
RAISE NOTICE 'churn_metrics retention policy not added: %', SQLERRM;
|
|
END $$;
|