11 Commits

Author SHA1 Message Date
sam
7e3370b5a5 Rework Grafana dashboard information architecture
Reorganizes 31 dashboards into an operator-first structure with real
navigation. Adds Router Detail and Peer Detail drilldown dashboards; merges
LS Nodes+Links and the two L3VPN dashboards; modernizes all deprecated panels
(table-old/graph/worldmap). Every dashboard gets the obmp-nav dropdown so the
whole set is reachable from anywhere. Graduates the operational "Learning"
dashboards into Operations/Routing/LinkState folders, retires the Tops folder,
and relabels folders (Base->Operations, History->Routing, Learning->Reference).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:55:03 -07:00
sam
f430758992 Scope NOC Overview "Peers Down" panels to the dashboard time range
The scorecard and table counted every bgp_peers row in a down state,
including peers removed long ago (OpenBMP never prunes bgp_peers). They now
filter on the peer's last state-change timestamp via $__timeFilter, so the
panel reflects current/recent problems rather than all-time history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:29:59 -07:00
sam
960806fc06 Add NOC Overview dashboard and rebuild home as a navigation hub
NOC Overview is the new flagship operator landing dashboard — health
scorecards, peer session timeline, BGP update rate, and attention tables for
peers down, churning prefixes, RPKI invalids, and topology changes. All counts
come from stats_* aggregate tables so it stays fast at production scale.
OBMP-Home is rebuilt as a lightweight navigation hub pointing at NOC Overview.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:04:37 -07:00
sam
541f018bc5 Add RR Loc-RIB diff dashboard and route diversity config
Dashboard compares Adj-RIB-In tables between two Route Reflectors via BMP,
showing missing prefixes, attribute diffs (next-hop, AS path), and per-client
consistency. Route diversity script deploys 29 prefixes across R9K-01-07 via
NETCONF to create verifiable next-hop differences between RRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:23:19 -07:00
sam
422b98d555 Fix telemetry dashboards: update Flux queries and InfluxDB datasource URL
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:22:58 -07:00
sam
c28c9b2527 Fix gNMI telemetry: OpenConfig paths, json_ietf encoding, SSH config
- Switch Telegraf from native IOS-XR YANG paths to OpenConfig
  (openconfig-interfaces:interfaces/interface/state/counters)
- Use json_ietf encoding instead of proto (IOS-XR 24.3.1 compat)
- Target only CORE-01/CORE-02 (R9K routers blocked by CML mgmt net)
- Update all 3 Grafana dashboard queries to match OpenConfig field
  names (in-octets, out-octets, in-pkts, out-pkts, in-errors, etc.)
- Rewrite gnmi_grpc_config.py to use SSH/CLI via paramiko instead of
  NETCONF (IOS-XR 24.3.1 rejects NETCONF gRPC edit-config)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 16:19:16 -07:00
sam
dcebf15bb3 Add Phase 4: gNMI streaming telemetry and traffic generator
- gNMI integration: NETCONF script to enable gRPC on all 9 routers,
  Telegraf container with gnmi input plugin, InfluxDB for time-series
  storage, 3 Grafana telemetry dashboards (utilization, errors, combined)
- Traffic generator: Scapy-based dual-mode container (sender/responder)
  with Flask API, RFC 2544 test suite (throughput, latency, frame-loss,
  back-to-back), Vue 3 web UI with flow builder, test runner, real-time
  stats monitor, and results export
- docker-compose.yml updated with influxdb, telegraf, traffic-gen,
  traffic-gen-ui services
- Full documentation in DOCS.md sections 15-16

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 15:29:44 -07:00
sam
f23e222bc0 Add Phase 3: TE/SR analytics, anomaly detection, DB schema reference
- 4 new Grafana dashboards:
  - Database Schema Map (obmp-learn-07): interactive schema reference
    with live row counts, relationship diagrams, column details
  - TE & Segment Routing Analytics (obmp-learn-08): exposes BGP-LS TE/SR
    fields (bandwidth, admin groups, SRLG, SR SIDs, protection types)
  - Topology Change & Anomaly Detection (obmp-learn-09): link state
    change tracking, origin AS hijack detection, convergence timeline
  - Link Utilization & TE Thought Experiment (obmp-learn-10): capacity
    data from BGP-LS + streaming telemetry integration guide
- DB_SCHEMA.md: standalone database reference (33 tables, 11 views)
- 3 new ExaBGP scenarios: te_community_steering, origin_shift, path_diversity
- Updated DOCS.md with Phase 3 dashboards and scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 13:31:03 -07:00
sam
f4d5bd7c85 Fix LS Topology dashboard: relax igp_metric filter for CML lab
IOS-XR 9000v in CML uses igp_metric=16000000 on all IS-IS links.
The stock dashboard filter (< 16000000) excluded all links, making
the Node dropdown empty and topology panel show no data. Changed
to <= 16777215 (IS-IS wide metric max) so lab links are included.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 12:49:48 -07:00
sam
6621942032 Add Phase 2: Vue 3 control panel, 6 learning dashboards, new BGP scenarios
- exabgp-ui/: Vue 3 + Vite SPA served by NGINX on :5001; proxies /api/ to
  ExaBGP Flask on :5050; includes StatusBar, ScenarioPanel, RouteTable,
  AnnounceForm, PeerStatus, ChurnControl components
- docker-compose.yml: add obmp-exabgp-ui service (host network, port 5001)
- exabgp/scenarios/__init__.py: add convergence_test, route_leak,
  hijack_simulation scenarios for structured BGP learning exercises
- exabgp/inject.py: add 'peers' and 'monitor' subcommands; live-refresh
  terminal status view with ANSI cursor repositioning
- obmp-grafana/dashboards/Learning/: 6 new OBMP-Learning dashboards
  (update rate, peer health, AS path, RPKI, churn, attributes)
- obmp-grafana/provisioning/dashboards/openbmp-dashboards.yml: add
  OpenBMP-Learning folder provider pointing to dashboards/Learning/
- DOCS.md: document Web UI, 3 new scenarios, 6 learning dashboards;
  fix section numbering (10-14) and architecture diagram (23 dashboards)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 15:37:16 -07:00
sam
233dadbb41 Add ExaBGP route injector, Grafana dashboards, and full documentation
- Add exabgp/ container: ExaBGP 5.x + Flask REST API for on-demand BGP
  route injection into CML IOS-XR lab (AS 65020 via eBGP from AS 65100)
- Add 6 injection scenarios: internet_sample, churn, blackhole, anycast,
  full_table, lab_prefixes
- Add inject.py CLI wrapper for the ExaBGP API
- Add iosxr_bgp_config.md with IOS-XR neighbor config and NETCONF script
- Add obmp-grafana/ dashboards and provisioning (17 dashboards)
- Update docker-compose.yml: add exabgp service, fix Kafka external
  listener IP, extend log retention from 90min to 720min
- Add DOCS.md: full project documentation including architecture, setup,
  user guide, sanity checks, troubleshooting, and command reference
- Update .gitignore: exclude .env and .claude/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 14:46:37 -07:00