64 Commits

Author SHA1 Message Date
sam
4e9bd7cc5a Add container memory limits to all services
Sets mem_limit on every service to cap the OOM/swap-exhaustion risk (the lab
host had only 5 MiB swap free). The three heavy services (psql, kafka,
psql-app) read their limits from .env so production can raise them; the rest
use lab-appropriate fixed values. Total ~25 GB, leaving headroom on the 31 GB
lab host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:04:37 -07:00
sam
8ac156ce86 Add second-lab ExaBGP peering and bulk BMP config script
Generalizes exabgp/startup.sh to template BGP neighbors from an EXABGP_PEERS
list (ip:peer_as:description), so ExaBGP peers with multiple labs. Adds
cml/proxmox_bmp_config.py to apply the bmp server block to a lab's IOS-XR
routers over SSH (BMP config is not exposed via NETCONF YANG on current XR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 19:21:11 -07:00
sam
cf4e5b07c6 Add Compose profiles, setup.sh bootstrap, and config templates for portable deployment
Pins the Compose project name and splits services into core / test / auth
profiles so the BMP collector core can deploy standalone. Adds setup.sh
(idempotent bootstrap), .env.example, and repo-resident Authelia config
templates so a fresh host deploys without manual steps. Parameterizes
hardcoded host IP and domain; points the Grafana InfluxDB datasource at the
container name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 19:21:04 -07:00
sam
31286d5d3e Add platform roadmap: multi-lab CML integration and production deployment
Four-track roadmap covering configuration centralization (inventory.yaml),
CML API automation (virl2_client), production ISP deployment (multi-vendor
IOS-XR + Junos), and packaging for distribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:23:38 -07:00
sam
da49b3e462 Add CML integration: XRd and ExaBGP node/image definitions and build scripts
CML 2.9 node definitions for XRd Control-Plane (third RR) and ExaBGP route
injector as Docker-based CML nodes. Includes build scripts to export Docker
images as tars for CML import, with IOS-XR startup configs for IS-IS, BGP,
and BMP.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:23:30 -07:00
sam
541f018bc5 Add RR Loc-RIB diff dashboard and route diversity config
Dashboard compares Adj-RIB-In tables between two Route Reflectors via BMP,
showing missing prefixes, attribute diffs (next-hop, AS path), and per-client
consistency. Route diversity script deploys 29 prefixes across R9K-01-07 via
NETCONF to create verifiable next-hop differences between RRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:23:19 -07:00
sam
45f4c9859d Add Authelia auth gateway, portal landing page, and subpath routing
Adds Authelia (forward-auth) and nginx portal container for single-endpoint
authenticated access via Caddy reverse proxy. Configures Grafana auth proxy
for header-based auto-login. Updates Vue UI base paths and API routes for
/exabgp/ and /traffic/ subpath serving. Adds traffic-gen responder container
on dedicated Docker network.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:23:09 -07:00
sam
422b98d555 Fix telemetry dashboards: update Flux queries and InfluxDB datasource URL
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:22:58 -07:00
sam
d691b512f9 Add full internet table injection with background worker and progress tracking
Generates realistic IPv4 routing tables (1K-900K prefixes) with DFZ-like
prefix length distribution, varied AS paths, and transit ASN diversity.
Background injection with progress API, CLI follow mode, and Vue UI
component with preset sizes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:22:51 -07:00
sam
1f0936763b Add traffic generator improvements: mode switching, ping, responder echo, RFC2544 fixes
Adds sender/responder mode switching via API, QuickPing component, echo-mode
responder with dedicated container, improved flow state sync, and RFC2544
test runner enhancements. Includes UI improvements across all traffic-gen
components.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:22:41 -07:00
sam
c28c9b2527 Fix gNMI telemetry: OpenConfig paths, json_ietf encoding, SSH config
- Switch Telegraf from native IOS-XR YANG paths to OpenConfig
  (openconfig-interfaces:interfaces/interface/state/counters)
- Use json_ietf encoding instead of proto (IOS-XR 24.3.1 compat)
- Target only CORE-01/CORE-02 (R9K routers blocked by CML mgmt net)
- Update all 3 Grafana dashboard queries to match OpenConfig field
  names (in-octets, out-octets, in-pkts, out-pkts, in-errors, etc.)
- Rewrite gnmi_grpc_config.py to use SSH/CLI via paramiko instead of
  NETCONF (IOS-XR 24.3.1 rejects NETCONF gRPC edit-config)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 16:19:16 -07:00
sam
6b45f124f0 Remove __pycache__ from tracking and add to .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 15:40:14 -07:00
sam
dcebf15bb3 Add Phase 4: gNMI streaming telemetry and traffic generator
- gNMI integration: NETCONF script to enable gRPC on all 9 routers,
  Telegraf container with gnmi input plugin, InfluxDB for time-series
  storage, 3 Grafana telemetry dashboards (utilization, errors, combined)
- Traffic generator: Scapy-based dual-mode container (sender/responder)
  with Flask API, RFC 2544 test suite (throughput, latency, frame-loss,
  back-to-back), Vue 3 web UI with flow builder, test runner, real-time
  stats monitor, and results export
- docker-compose.yml updated with influxdb, telegraf, traffic-gen,
  traffic-gen-ui services
- Full documentation in DOCS.md sections 15-16

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 15:29:44 -07:00
sam
f23e222bc0 Add Phase 3: TE/SR analytics, anomaly detection, DB schema reference
- 4 new Grafana dashboards:
  - Database Schema Map (obmp-learn-07): interactive schema reference
    with live row counts, relationship diagrams, column details
  - TE & Segment Routing Analytics (obmp-learn-08): exposes BGP-LS TE/SR
    fields (bandwidth, admin groups, SRLG, SR SIDs, protection types)
  - Topology Change & Anomaly Detection (obmp-learn-09): link state
    change tracking, origin AS hijack detection, convergence timeline
  - Link Utilization & TE Thought Experiment (obmp-learn-10): capacity
    data from BGP-LS + streaming telemetry integration guide
- DB_SCHEMA.md: standalone database reference (33 tables, 11 views)
- 3 new ExaBGP scenarios: te_community_steering, origin_shift, path_diversity
- Updated DOCS.md with Phase 3 dashboards and scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 13:31:03 -07:00
sam
f4d5bd7c85 Fix LS Topology dashboard: relax igp_metric filter for CML lab
IOS-XR 9000v in CML uses igp_metric=16000000 on all IS-IS links.
The stock dashboard filter (< 16000000) excluded all links, making
the Node dropdown empty and topology panel show no data. Changed
to <= 16777215 (IS-IS wide metric max) so lab links are included.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 12:49:48 -07:00
sam
39a130922a Add BGP-LS config script and fix ExaBGP peer event tracking
- exabgp/bgpls_config.py: NETCONF script that audits and fixes BGP-LS
  config on all 9 spoke routers; adds IS-IS distribute and lsls AF
  activation toward both COREs where missing; handles routers needing
  global AF initialization before per-neighbor activation
- exabgp/startup.sh: add neighbor-changes to ExaBGP api blocks so peer
  up/down events are sent to Flask server.py stdin

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 12:39:40 -07:00
sam
6621942032 Add Phase 2: Vue 3 control panel, 6 learning dashboards, new BGP scenarios
- exabgp-ui/: Vue 3 + Vite SPA served by NGINX on :5001; proxies /api/ to
  ExaBGP Flask on :5050; includes StatusBar, ScenarioPanel, RouteTable,
  AnnounceForm, PeerStatus, ChurnControl components
- docker-compose.yml: add obmp-exabgp-ui service (host network, port 5001)
- exabgp/scenarios/__init__.py: add convergence_test, route_leak,
  hijack_simulation scenarios for structured BGP learning exercises
- exabgp/inject.py: add 'peers' and 'monitor' subcommands; live-refresh
  terminal status view with ANSI cursor repositioning
- obmp-grafana/dashboards/Learning/: 6 new OBMP-Learning dashboards
  (update rate, peer health, AS path, RPKI, churn, attributes)
- obmp-grafana/provisioning/dashboards/openbmp-dashboards.yml: add
  OpenBMP-Learning folder provider pointing to dashboards/Learning/
- DOCS.md: document Web UI, 3 new scenarios, 6 learning dashboards;
  fix section numbering (10-14) and architecture diagram (23 dashboards)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 15:37:16 -07:00
sam
233dadbb41 Add ExaBGP route injector, Grafana dashboards, and full documentation
- Add exabgp/ container: ExaBGP 5.x + Flask REST API for on-demand BGP
  route injection into CML IOS-XR lab (AS 65020 via eBGP from AS 65100)
- Add 6 injection scenarios: internet_sample, churn, blackhole, anycast,
  full_table, lab_prefixes
- Add inject.py CLI wrapper for the ExaBGP API
- Add iosxr_bgp_config.md with IOS-XR neighbor config and NETCONF script
- Add obmp-grafana/ dashboards and provisioning (17 dashboards)
- Update docker-compose.yml: add exabgp service, fix Kafka external
  listener IP, extend log retention from 90min to 720min
- Add DOCS.md: full project documentation including architecture, setup,
  user guide, sanity checks, troubleshooting, and command reference
- Update .gitignore: exclude .env and .claude/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 14:46:37 -07:00
Tim Evens
3f38af5312 Version 2.2.3 updates
* collector v2.2.3
* collector using debian-stable-slim
* dev-image updated to use debian-stable-slim
* Upgraded librdkafka to v1.9.2
* Fixed permission problems with postgres
* Grafana upgraded to 9.1.7
* psql-app v2.2.2
* postgres updated to use timescaledb-ha:pg14-ts2.8
2022-10-20 07:12:08 -07:00
Tim Evens
0f3312a719 Updates for v2.2.1 2022-06-17 18:20:05 -07:00
Tim Evens
6e616efe10 Updates for 2.2.0
* Use timescaleDB CE intead of OSS
* Have psql-app wait for psql to startup during init db
* Add version file to postgres container
2022-06-12 11:04:59 -07:00
Tim Evens
e19e5ac73a Fix psql container rm file issue 2022-06-10 12:53:24 -07:00
Tim Evens
c1dd8fc15c Add db-import and update irr cron jobs in psql-app container 2022-06-10 06:47:01 -07:00
Tim Evens
f7f13db676 Update logging files for cron jobs 2022-06-08 16:53:03 -07:00
Tim Evens
237345b476 Add ENABLE_DBIP to ``psql-app`` container to auto import DB-IP geo data 2022-06-08 14:53:55 -07:00
Tim Evens
a50c553f66 Version 2.2.0 schema upgrade 2022-06-08 11:54:17 -07:00
Tim Evens
84bec5293b version 2.2.0 updates 2022-06-08 11:53:55 -07:00
Tim Evens
e7fad858d9 Change global_ip_rib function cron job 2022-05-17 10:38:19 -07:00
Tim Evens
eb52eace41 Updates for 2.1.1 2022-03-31 12:13:46 -07:00
Tim Evens
c5f3d6ef59 2.1.1 Updates
* Update psql-app container to use MEM for heap setting
  This fixes issue where psql-app would run out of memory
* Update psql-app container to restart psql consumer if
  if stops.  This handles restart on out of memory exit.
2022-03-28 15:51:15 -07:00
Tim Evens
0a0d2ceec1 2.1.1 updates
* Fix vpnv6/l3vpn next-hop decoding
* Fix ip_rib_log enabling compression to be after hypertable creation
* Add pg_cron to postgres container
* Upgraded postgres container to timescaledb 2.6.0-pg14
2022-03-28 12:43:37 -07:00
Tim Evens
611bfbbc2b
Merge pull request #12 from ravitejb/main
feat: add pg_cron extension for cron jobs
2022-03-22 15:32:34 -07:00
RaviTeja Buddabathuni (rbuddaba)
620bd517cc fix: added build stage to reduce the image size 2022-03-15 18:10:20 -05:00
RaviTeja Buddabathuni (rbuddaba)
36ef1e478b fix: add pg_cron 2022-03-15 15:04:44 -05:00
RaviTeja Buddabathuni (rbuddaba)
a630c5db7d feat: add pg_cron extension for cron jobs 2022-03-15 13:08:48 -05:00
Tim Evens
b9b8c44713 Change max_wal_size to 10GB by default and add missing upgrade sql file 2022-03-09 10:48:58 -08:00
Tim Evens
43efeb5049 Change collector to log to stdout and peeringdb to 12 hours 2022-03-06 09:51:35 -08:00
Tim Evens
05737d2682 v2.1.0 updates
* Add peeringdb script and cron job
* Fix running more than one cronjob at a time
* Update upgrade script for psql-app
2022-03-04 07:27:23 -08:00
Tim Evens
b0511daf00
Merge pull request #9 from OpenBMP/2.1.0
2.1.0
2022-02-22 14:55:15 -08:00
Tim Evens
492c000ce9 Add whois and upgrade to 2.1.0 2022-02-22 14:30:05 -08:00
Tim Evens
a1d00198dd Updates for l3vpn 2022-02-14 14:36:36 -08:00
Tim Evens
7d4480a558 Merge branch 'main' into 2.1.0 2022-02-14 13:38:55 -08:00
Tim Evens
b4ff872aa9
Merge pull request #8 from pae23/pae23-patch-1
typo in collector scripts run
2022-02-14 13:36:40 -08:00
Tim Evens
aae49149af initial 2.1.0 2022-02-10 21:02:19 -08:00
pae23
6580338253
typo in collector scripts run 2022-02-07 23:36:42 +01:00
Tim Evens
fd2874d00e Fix for collector kafka startup issue
When first deploying the collector and kafka, it takes
kafka a couple minutes to start. In some cases, the
collector would proceed to startup without waiting for
kafka. This resulted in the first few messages to be dropped,
such as dropping the router init and peer up messages.
2022-02-01 12:49:17 -08:00
Tim Evens
a0e6a5bc6f Fixes to psql-app, version 2.0.2 2022-01-31 11:05:58 -08:00
Tim Evens
c3839aa8fb Security fixes, issues resolved, and more
* Upgrades to all containers
* Resolves #7, resolves #6, resolves #2
* Compose changed to use versions instead of latest
* OBMP containers now use a version tag instead of build numbers
2022-01-28 15:12:01 -08:00
Tim Evens
bb5df212df
Merge pull request #4 from sydon7/rpkifixup3
Rpkifixup3
2021-08-11 10:38:20 -07:00
Tim Evens
a9234b0a9a
Merge pull request #3 from sydon7/rpkifixup2
fixing two typos in psql-app/scripts/run
2021-08-11 10:38:05 -07:00