obmp-docker

sam/obmp-docker

Fork 0

Commit Graph

Author	SHA1	Message	Date
sam	ef932fe1e8	Dashboard QoL: fill the viewport, push legends to bottom Two recurring layout issues across dashboards I built this session: 1) Right-placed legend tables ate 30% of each panel width. 2) Default h:9 panels left ~50% of the viewport empty on a 1080p display (total dashboard height ~18 grid rows vs ~30 available). Stack Resources (Telemetry-3001/stack_resources.json): * 3 timeseries: legend placement right -> bottom, calcs [max] -> [last,max], added sortBy: Max desc so top consumers float to the top of the legend. * Bumped all 4 panels h: 9 -> 14 (dashboard total 18 -> 28 rows). Kafka Ingestion Lag and Live BGP Churn (Telemetry-3001/): Bumped timeseries panels h: 9 -> 12; second-row y: 13 -> 16. Dashboard total 22 -> 28 rows. Policy Diff (obmp/History-1002/policy_diff.json): * Bumped bottom-row panels h: 8 -> 11. Total 24 -> 27 rows. Untouched (already adequate, scrollable by design, or built earlier): evpn_rib (30 rows), global_table (38), router_diff (52), and the Maps-1006 dashboards (already h:22-28 single panels). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 19:58:33 -07:00
sam	9d74940614	Fix ExaBGP OOM, add container health checks and resource monitoring RCA: the exabgp container was OOM-killed — its 512m mem_limit was far too small for the full-table feature (900K route objects in memory). Raises the limit to a parameterized 6g default (EXABGP_MEM_LIMIT). Adds Docker healthchecks to 14 services (port/HTTP probes) so unhealthy containers are visible. Adds a Telegraf docker input that collects per- container CPU/memory/IO into InfluxDB, plus a "Stack Resources" dashboard — so resource pressure is caught before it causes an OOM crash. telegraf runs with an overridden entrypoint so it keeps root and can read the docker socket. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 22:03:52 -07:00

Author

SHA1

Message

Date

sam

ef932fe1e8

Dashboard QoL: fill the viewport, push legends to bottom

Two recurring layout issues across dashboards I built this session:

  1) Right-placed legend tables ate 30% of each panel width.
  2) Default h:9 panels left ~50% of the viewport empty on a 1080p
     display (total dashboard height ~18 grid rows vs ~30 available).

Stack Resources (Telemetry-3001/stack_resources.json):
  * 3 timeseries: legend placement right -> bottom, calcs [max] -> [last,max],
    added sortBy: Max desc so top consumers float to the top of the legend.
  * Bumped all 4 panels h: 9 -> 14 (dashboard total 18 -> 28 rows).

Kafka Ingestion Lag and Live BGP Churn (Telemetry-3001/*):
  * Bumped timeseries panels h: 9 -> 12; second-row y: 13 -> 16.
    Dashboard total 22 -> 28 rows.

Policy Diff (obmp/History-1002/policy_diff.json):
  * Bumped bottom-row panels h: 8 -> 11. Total 24 -> 27 rows.

Untouched (already adequate, scrollable by design, or built earlier):
  evpn_rib (30 rows), global_table (38), router_diff (52), and the
  Maps-1006 dashboards (already h:22-28 single panels).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-19 19:58:33 -07:00

sam

9d74940614

Fix ExaBGP OOM, add container health checks and resource monitoring

RCA: the exabgp container was OOM-killed — its 512m mem_limit was far too
small for the full-table feature (900K route objects in memory). Raises the
limit to a parameterized 6g default (EXABGP_MEM_LIMIT).

Adds Docker healthchecks to 14 services (port/HTTP probes) so unhealthy
containers are visible. Adds a Telegraf docker input that collects per-
container CPU/memory/IO into InfluxDB, plus a "Stack Resources" dashboard —
so resource pressure is caught before it causes an OOM crash. telegraf runs
with an overridden entrypoint so it keeps root and can read the docker socket.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 22:03:52 -07:00

2 Commits