obmp-docker

sam/obmp-docker

Fork 0

Commit Graph

Author	SHA1	Message	Date
sam	ef932fe1e8	Dashboard QoL: fill the viewport, push legends to bottom Two recurring layout issues across dashboards I built this session: 1) Right-placed legend tables ate 30% of each panel width. 2) Default h:9 panels left ~50% of the viewport empty on a 1080p display (total dashboard height ~18 grid rows vs ~30 available). Stack Resources (Telemetry-3001/stack_resources.json): * 3 timeseries: legend placement right -> bottom, calcs [max] -> [last,max], added sortBy: Max desc so top consumers float to the top of the legend. * Bumped all 4 panels h: 9 -> 14 (dashboard total 18 -> 28 rows). Kafka Ingestion Lag and Live BGP Churn (Telemetry-3001/): Bumped timeseries panels h: 9 -> 12; second-row y: 13 -> 16. Dashboard total 22 -> 28 rows. Policy Diff (obmp/History-1002/policy_diff.json): * Bumped bottom-row panels h: 8 -> 11. Total 24 -> 27 rows. Untouched (already adequate, scrollable by design, or built earlier): evpn_rib (30 rows), global_table (38), router_diff (52), and the Maps-1006 dashboards (already h:22-28 single panels). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 19:58:33 -07:00
sam	d7084aba54	Add fast-path churn monitor and churn-storm load tool obmp-churn-monitor: a decoupled fast-path BGP churn consumer. Reads openbmp.parsed.unicast_prefix with its own Kafka consumer group and only counts announcements/withdrawals per (router,peer) into churn_metrics (010_churn_metrics.sql) -- no relational RIB write. Storm-tested: it stayed real-time (tracked 1k->85k msg/s) while the psql-app bulk pipeline lag grew 3.8M->5.6M. Live BGP Churn dashboard reads it. tools/churn_storm.py: programmatic churn-storm generator (flaps GoBGP's eBGP sessions to the lab cores) for load testing. Stress-test finding: fleet-wide full table from 18 routers exceeds this 31 GiB host. The bottleneck is RAM, not CPU -- at 16 cores the host still hit load 33 because it was swap-thrashing (swap 2/2 full, <1.5 GiB free). Lag ran away 3.8M->20M+. Recourse: more host RAM for bulk throughput; the fast-path consumer for visibility regardless. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 13:17:09 -07:00

Author

SHA1

Message

Date

sam

ef932fe1e8

Dashboard QoL: fill the viewport, push legends to bottom

Two recurring layout issues across dashboards I built this session:

  1) Right-placed legend tables ate 30% of each panel width.
  2) Default h:9 panels left ~50% of the viewport empty on a 1080p
     display (total dashboard height ~18 grid rows vs ~30 available).

Stack Resources (Telemetry-3001/stack_resources.json):
  * 3 timeseries: legend placement right -> bottom, calcs [max] -> [last,max],
    added sortBy: Max desc so top consumers float to the top of the legend.
  * Bumped all 4 panels h: 9 -> 14 (dashboard total 18 -> 28 rows).

Kafka Ingestion Lag and Live BGP Churn (Telemetry-3001/*):
  * Bumped timeseries panels h: 9 -> 12; second-row y: 13 -> 16.
    Dashboard total 22 -> 28 rows.

Policy Diff (obmp/History-1002/policy_diff.json):
  * Bumped bottom-row panels h: 8 -> 11. Total 24 -> 27 rows.

Untouched (already adequate, scrollable by design, or built earlier):
  evpn_rib (30 rows), global_table (38), router_diff (52), and the
  Maps-1006 dashboards (already h:22-28 single panels).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-19 19:58:33 -07:00

sam

d7084aba54

Add fast-path churn monitor and churn-storm load tool

obmp-churn-monitor: a decoupled fast-path BGP churn consumer. Reads
openbmp.parsed.unicast_prefix with its own Kafka consumer group and only
counts announcements/withdrawals per (router,peer) into churn_metrics
(010_churn_metrics.sql) -- no relational RIB write. Storm-tested: it
stayed real-time (tracked 1k->85k msg/s) while the psql-app bulk
pipeline lag grew 3.8M->5.6M. Live BGP Churn dashboard reads it.

tools/churn_storm.py: programmatic churn-storm generator (flaps GoBGP's
eBGP sessions to the lab cores) for load testing.

Stress-test finding: fleet-wide full table from 18 routers exceeds this
31 GiB host. The bottleneck is RAM, not CPU -- at 16 cores the host
still hit load 33 because it was swap-thrashing (swap 2/2 full, <1.5 GiB
free). Lag ran away 3.8M->20M+. Recourse: more host RAM for bulk
throughput; the fast-path consumer for visibility regardless.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-19 13:17:09 -07:00

2 Commits