obmp-docker

Author	SHA1	Message	Date
sam	2a82bd9a94	ip_rib perf tuning: per-table autovacuum + drop 4 unused indexes Derived from the 2026-05-19 ingestion stress-test session. psql-app's unicast_prefix drain rate caps at a few-hundred msg/s under continuous Postgres maintenance (autovacuum on ip_rib + update_global_ip_rib() / update_chg_stats() / update_peer_rib_counts() crons) competing for ip_rib disk I/O. ALTER TABLE ip_rib SET autovacuum_vacuum_scale_factor=0.02 -- run more often on smaller chunks. cost_limit kept at its OpenBMP-default 3000 so each run finishes fast; the consumer runs flat out between bursts instead of being throttled continuously. DROP INDEX for four unused/redundant indexes (every INSERT updates every index; these all had 0 scans in ~2h of heavy activity): - ip_rib_hash_id_idx (907 MB) - ip_rib_base_attr_hash_id_idx (558 MB) - ip_rib_prefix_idx (1538 MB, GiST) - ip_rib_origin_as_idx (364 MB) 9 -> 5 indexes; ~3.4 GB freed (6,715 MB -> 3,348 MB). Reduces index write-amplification per UPSERT by ~45% and shortens autovacuum on ip_rib by ~the same. Measurement note: across-cycle 25-min runs were inconclusive on the sustained-rate effect (inflow was near-zero by then -- gobgp stopped -- so the consumer was largely idle). The real test is re-enabling the fleet-wide feed with the consumer-replica + 62 GiB RAM and seeing whether unicast_prefix keeps up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 16:50:15 -07:00
sam	d7084aba54	Add fast-path churn monitor and churn-storm load tool obmp-churn-monitor: a decoupled fast-path BGP churn consumer. Reads openbmp.parsed.unicast_prefix with its own Kafka consumer group and only counts announcements/withdrawals per (router,peer) into churn_metrics (010_churn_metrics.sql) -- no relational RIB write. Storm-tested: it stayed real-time (tracked 1k->85k msg/s) while the psql-app bulk pipeline lag grew 3.8M->5.6M. Live BGP Churn dashboard reads it. tools/churn_storm.py: programmatic churn-storm generator (flaps GoBGP's eBGP sessions to the lab cores) for load testing. Stress-test finding: fleet-wide full table from 18 routers exceeds this 31 GiB host. The bottleneck is RAM, not CPU -- at 16 cores the host still hit load 33 because it was swap-thrashing (swap 2/2 full, <1.5 GiB free). Lag ran away 3.8M->20M+. Recourse: more host RAM for bulk throughput; the fast-path consumer for visibility regardless. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 13:17:09 -07:00
sam	b681c473c0	Add Policy Diff, fleet-wide full-table feed, and Kafka lag monitoring Policy Diff (roadmap E2 follow-up): obmp-rib-poller pulls per-router post-policy accepted/advertised prefix counts and route-policy bindings over CLI+NETCONF (BMP on XRv9000 24.3.1 carries only pre-policy Adj-RIB-In). New tables in 008_obmp_policy_diff.sql; Policy Diff dashboard joins them against BMP ip_rib for received-vs-kept-vs-rejected. GoBGP fleet-wide feed: GoBGP re-advertises the full Bromirski table to both labs' core routers (CML AS65020, PROX AS65021) over eBGP; as route reflectors the cores propagate it to every R9K client, so all 18 lab routers carry and BMP-export a full table -- an intentional stress test of the ingestion/storage path. cml/gobgp_peering_config.py applies and rolls back the core-side config; gobgp/README.md documents the rollback. Kafka lag monitoring: kafka-lag-monitor samples consumer-group lag every 30s into TimescaleDB (009_kafka_lag.sql); Kafka Ingestion Lag dashboard gives visibility into the pipeline under churn load. Peer Detail dashboard: the Peer selector is now router-qualified (router -> peer) so it is unambiguous in an iBGP route-reflector mesh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 12:42:25 -07:00
sam	2d83d6c02e	Add evpn_rib schema; update production sizing with measured data - postgres/scripts/007_obmp_evpn.sql: the evpn_rib landing table (roadmap E5 step 1), applied to the live DB. Mirrors l3vpn_rib; a dedicated consumer will populate it. - production-sizing.md: corrected retention figures to the actual policy values, added a measured-data section (one full feed ≈ +5 GB current state; DB now ~30 GB), and a horizontal-scaling section — the bottleneck is the psql-app consumer + disk IOPS, so scale psql-app as a Kafka consumer group (cap = partition count), treat multi-collector as HA/locality not throughput. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 08:44:09 -07:00
sam	cc0d20bf9e	Back AS Relationship Map with a materialized view The AS map previously exploded ~4.4M base_attrs AS_PATH rows live, three times per load (one per panel), ~1.8s each — slow enough that navigating away cancelled the queries mid-flight. Add mv_as_adjacency: undirected consecutive-AS pairs with occurrence counts over the full RIB (17k rows), refreshed hourly by pg_cron via REFRESH ... CONCURRENTLY. The dashboard panels now read the view in ~1ms. Min-occurrence options rescaled for full-RIB counts (2000/5000/10000/50000, default 2000 -> ~63-node graph). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 07:04:38 -07:00
Tim Evens	3f38af5312	Version 2.2.3 updates * collector v2.2.3 * collector using debian-stable-slim * dev-image updated to use debian-stable-slim * Upgraded librdkafka to v1.9.2 * Fixed permission problems with postgres * Grafana upgraded to 9.1.7 * psql-app v2.2.2 * postgres updated to use timescaledb-ha:pg14-ts2.8	2022-10-20 07:12:08 -07:00
Tim Evens	6e616efe10	Updates for 2.2.0 * Use timescaleDB CE intead of OSS * Have psql-app wait for psql to startup during init db * Add version file to postgres container	2022-06-12 11:04:59 -07:00
Tim Evens	84bec5293b	version 2.2.0 updates	2022-06-08 11:53:55 -07:00
Tim Evens	0a0d2ceec1	2.1.1 updates * Fix vpnv6/l3vpn next-hop decoding * Fix ip_rib_log enabling compression to be after hypertable creation * Add pg_cron to postgres container * Upgraded postgres container to timescaledb 2.6.0-pg14	2022-03-28 12:43:37 -07:00
RaviTeja Buddabathuni (rbuddaba)	620bd517cc	fix: added build stage to reduce the image size	2022-03-15 18:10:20 -05:00
RaviTeja Buddabathuni (rbuddaba)	36ef1e478b	fix: add pg_cron	2022-03-15 15:04:44 -05:00
RaviTeja Buddabathuni (rbuddaba)	a630c5db7d	feat: add pg_cron extension for cron jobs	2022-03-15 13:08:48 -05:00
Tim Evens	05737d2682	v2.1.0 updates * Add peeringdb script and cron job * Fix running more than one cronjob at a time * Update upgrade script for psql-app	2022-03-04 07:27:23 -08:00
Tim Evens	492c000ce9	Add whois and upgrade to 2.1.0	2022-02-22 14:30:05 -08:00
Tim Evens	aae49149af	initial 2.1.0	2022-02-10 21:02:19 -08:00
Tim Evens	fd2874d00e	Fix for collector kafka startup issue When first deploying the collector and kafka, it takes kafka a couple minutes to start. In some cases, the collector would proceed to startup without waiting for kafka. This resulted in the first few messages to be dropped, such as dropping the router init and peer up messages.	2022-02-01 12:49:17 -08:00
Tim Evens	a0e6a5bc6f	Fixes to psql-app, version 2.0.2	2022-01-31 11:05:58 -08:00
Tim Evens	c3839aa8fb	Security fixes, issues resolved, and more * Upgrades to all containers * Resolves #7, resolves #6, resolves #2 * Compose changed to use versions instead of latest * OBMP containers now use a version tag instead of build numbers	2022-01-28 15:12:01 -08:00
Tim Evens	3847a19ea9	Fix psql-app cron entry and update postgres install	2021-04-12 11:47:32 -07:00
Tim Evens	eba244cdf7	Fix postgres to create ts. Update compose to use latest	2021-03-31 00:13:09 -07:00
Tim Evens	c61f766cc3	Adjust defaults in compose and fix postgres mem setting	2021-03-30 22:31:06 -07:00
Tim Evens	574bf5e8a9	Add psql-app conatainer and docker compose	2021-03-30 14:25:24 -07:00
Tim Evens	8b3356086b	Updates to dev-image and added postgres	2021-03-29 11:13:57 -07:00

23 Commits