Policy Diff (roadmap E2 follow-up): obmp-rib-poller pulls per-router
post-policy accepted/advertised prefix counts and route-policy bindings
over CLI+NETCONF (BMP on XRv9000 24.3.1 carries only pre-policy
Adj-RIB-In). New tables in 008_obmp_policy_diff.sql; Policy Diff
dashboard joins them against BMP ip_rib for received-vs-kept-vs-rejected.
GoBGP fleet-wide feed: GoBGP re-advertises the full Bromirski table to
both labs' core routers (CML AS65020, PROX AS65021) over eBGP; as route
reflectors the cores propagate it to every R9K client, so all 18 lab
routers carry and BMP-export a full table -- an intentional stress test
of the ingestion/storage path. cml/gobgp_peering_config.py applies and
rolls back the core-side config; gobgp/README.md documents the rollback.
Kafka lag monitoring: kafka-lag-monitor samples consumer-group lag every
30s into TimescaleDB (009_kafka_lag.sql); Kafka Ingestion Lag dashboard
gives visibility into the pipeline under churn load.
Peer Detail dashboard: the Peer selector is now router-qualified
(router -> peer) so it is unambiguous in an iBGP route-reflector mesh.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- postgres/scripts/007_obmp_evpn.sql: the evpn_rib landing table
(roadmap E5 step 1), applied to the live DB. Mirrors l3vpn_rib;
a dedicated consumer will populate it.
- production-sizing.md: corrected retention figures to the actual
policy values, added a measured-data section (one full feed ≈
+5 GB current state; DB now ~30 GB), and a horizontal-scaling
section — the bottleneck is the psql-app consumer + disk IOPS, so
scale psql-app as a Kafka consumer group (cap = partition count),
treat multi-collector as HA/locality not throughput.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The AS map previously exploded ~4.4M base_attrs AS_PATH rows live,
three times per load (one per panel), ~1.8s each — slow enough that
navigating away cancelled the queries mid-flight.
Add mv_as_adjacency: undirected consecutive-AS pairs with occurrence
counts over the full RIB (17k rows), refreshed hourly by pg_cron via
REFRESH ... CONCURRENTLY. The dashboard panels now read the view in
~1ms. Min-occurrence options rescaled for full-RIB counts
(2000/5000/10000/50000, default 2000 -> ~63-node graph).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* collector v2.2.3
* collector using debian-stable-slim
* dev-image updated to use debian-stable-slim
* Upgraded librdkafka to v1.9.2
* Fixed permission problems with postgres
* Grafana upgraded to 9.1.7
* psql-app v2.2.2
* postgres updated to use timescaledb-ha:pg14-ts2.8
When first deploying the collector and kafka, it takes
kafka a couple minutes to start. In some cases, the
collector would proceed to startup without waiting for
kafka. This resulted in the first few messages to be dropped,
such as dropping the router init and peer up messages.
* Upgrades to all containers
* Resolves#7, resolves#6, resolves#2
* Compose changed to use versions instead of latest
* OBMP containers now use a version tag instead of build numbers