Plan for a local full-Internet routing table, a generalized N-way router diff, and VRF/RD scoping: - E1: GoBGP container peering AS57355 (Bromirski lab route server) for a live full v4/v6 table, MRT RIB dumps as a 2-hourly fallback, BMP-exported into ip_rib as a GLOBAL-FEED peer. - E2: generic up-to-4-router diff dashboard (presence matrix), generalized from the RR-specific rr_locrib_diff. - E3: global table exploration dashboard. - E4: VRF/RD scoping across unicast + L3VPN dashboards (built to schema; not lab-verifiable with CML IOS-XR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14 KiB
OpenBMP Platform Roadmap
Context
This BMP monitoring platform is being developed against CML virtual labs (IOS-XR) and will be deployed into an ISP production network running IOS-XR and Juniper routers/route reflectors. The two tracks share a common foundation: configuration must be environment-agnostic so the same stack runs identically against virtual or production routers.
Currently, router IPs, AS numbers, and credentials are hardcoded across 8+ files, tightly coupling the stack to a single CML lab. This roadmap addresses both the multi-lab development workflow and production deployment.
Track A: Configuration Centralization (Foundation for Both Tracks)
A1. Create inventory.yaml — unified topology inventory
File: inventory.yaml (new)
Single source of truth for all environments. Structure:
platform:
host_ip: 10.40.40.202
bmp_port: 5000
exabgp_port: 5050
environments:
cml-lab1:
type: cml # cml | production
description: "CML RR cluster - 9 IOS-XR virtual routers"
cml_server: "https://10.40.40.174"
cml_user: webui
bgp_as: 65020
netconf: { user: webui, password: cisco, port: 830 }
exabgp:
local_as: 65100
peers:
- { ip: 10.100.0.100, name: CORE-01, peer_as: 65020 }
- { ip: 10.100.0.200, name: CORE-02, peer_as: 65020 }
routers:
CORE-01: { mgmt: 10.100.0.100, loopback: 10.10.255.0, role: rr, vendor: iosxr, gnmi: true }
CORE-02: { mgmt: 10.100.0.200, loopback: 10.10.255.20, role: rr, vendor: iosxr, gnmi: true }
R9K-01: { mgmt: 10.100.0.1, loopback: 10.10.255.1, role: client, vendor: iosxr }
# ...
cml-lab2:
type: cml
description: "Second CML Lab (TBD topology)"
cml_server: "https://<lab2-ip>"
routers: {}
production:
type: production
description: "ISP production network"
bgp_as: <prod-as>
netconf: { user: <prod-user>, port: 830 }
routers:
# IOS-XR and Juniper RRs + routers
PROD-RR1: { mgmt: x.x.x.x, role: rr, vendor: iosxr, gnmi: true }
PROD-RR2: { mgmt: x.x.x.x, role: rr, vendor: junos }
# ...
Key design decisions:
vendor: iosxr | junos— drives NETCONF dialect, gNMI paths, and config templatestype: cml | production— CML environments havecml_serverfor API automation; production does not- Credentials in
inventory.yaml(gitignored) or pulled from env vars
A2. Create config_loader.py — Python inventory helper
File: config_loader.py (new)
Functions: get_env(name), get_all_routers(), get_routers_by_vendor(vendor), get_exabgp_peers(), get_gnmi_targets(), get_routers_for_env(env_name)
A3. Refactor hardcoded Python scripts
Replace ROUTERS dicts/lists with config_loader calls:
exabgp/route_diversity_config.py(line 47)exabgp/bgpls_config.py(line 35)gnmi/gnmi_grpc_config.py(line 25)
A4. Expand .env and parameterize docker-compose.yml
Add to .env:
OBMP_DATA_ROOT=/var/openbmp
DOCKER_HOST_IP=10.40.40.202
EXABGP_LOCAL_IP=10.40.40.202
EXABGP_LOCAL_AS=65100
EXABGP_PEER_AS=65020
EXABGP_PEER_1=10.100.0.100
EXABGP_PEER_2=10.100.0.200
Replace hardcoded IPs in docker-compose.yml (Kafka listener, ExaBGP env vars).
A5. Telegraf config parameterization
Replace hardcoded gNMI addresses in telegraf/telegraf.conf with env var substitution. Pass GNMI_TARGETS from docker-compose.yml.
A6. Fix InfluxDB datasource URL
obmp-grafana/provisioning/datasources/influxdb-ds.yml: replace http://10.40.40.202:8086 with http://obmp-influxdb:8086.
Track B: Multi-Lab CML Development
B1. Dynamic ExaBGP multi-peer support
File: exabgp/startup.sh
Accept EXABGP_PEERS env var (comma-separated ip:as:description), generate N neighbor blocks. Keep PEER_1/PEER_2 fallback.
B2. CML API client module
File: cml/cml_client.py (new)
Python module using virl2_client SDK:
- Connect to CML server (creds from
inventory.yaml) - Upload node/image definitions
- Import/export topology YAML
- Start/stop/destroy labs
- Get node status
B3. Topology template system
File: cml/templates/xrd_rr.j2 (new)
Jinja2 templates for XRd startup config. Parameterize: hostname, loopback, link IPs, IS-IS NET, BGP AS, neighbor IPs, BMP target.
B4. CLI deployment tool
File: cml/deploy.py (new)
python3 cml/deploy.py --env cml-lab1 status
python3 cml/deploy.py --env cml-lab1 upload-images
python3 cml/deploy.py --env cml-lab2 create
python3 cml/deploy.py --env cml-lab2 start
python3 cml/deploy.py --env cml-lab2 destroy
B5. Update build scripts with API push
cml/build-cml-image.sh and cml/build-xrd-image.sh get --push <env-name> flag.
Track C: Production ISP Deployment
C1. Multi-vendor NETCONF support
Current scripts assume IOS-XR NETCONF only. For Juniper RRs:
config_loader.pyprovidesvendorfield per router- NETCONF scripts branch on vendor for dialect differences (
device_params='iosxr'vsdevice_params='junos') - Route diversity, BGP-LS config scripts get Junos templates alongside IOS-XR
C2. Multi-vendor gNMI paths
Telegraf gNMI subscriptions currently use OpenConfig paths which work for both IOS-XR and Junos, but:
- Verify Juniper gNMI support on target hardware
- Add vendor-specific path overrides in
inventory.yamlif needed - Telegraf can subscribe to multiple targets with different configs via
[[inputs.gnmi]]blocks
C3. BMP considerations for production
- BMP collector (port 5000) accepts connections from any router — no changes needed
- Production routers need BMP config pushed (manual or via NETCONF automation)
- Consider: separate BMP server IDs per environment for dashboard filtering
- Juniper BMP config differs from IOS-XR — add Junos BMP config templates
C4. Dashboard multi-environment awareness
- Add a Grafana template variable for environment filtering (by router name prefix or a tag)
- Consider a "Network Overview" dashboard that shows all environments side-by-side
- Existing dashboards work as-is — router dropdowns will show all BMP-reporting routers
C5. Security hardening for production
- Move credentials out of
inventory.yamlinto environment variables or a secrets manager - Authelia config: stronger passwords, TOTP enforcement, session timeouts
- PostgreSQL: restrict access, enable SSL
- Kafka: consider authentication if exposed beyond localhost
- BMP port: firewall to only accept connections from known router management IPs
C6. Scalability considerations
- Monitor PostgreSQL disk usage and query performance with production-scale RIBs
- TimescaleDB compression policies for historical data (ip_rib_log, ls_*_log)
- Kafka topic partitioning if message throughput is high
- Consider read replicas or materialized views for heavy Grafana queries
Track D: Packaging & Distribution
D1. Configuration templates
inventory.yaml.example— documented example with placeholder values.env.example— all environment variables with descriptions
D2. Bootstrap script
setup.sh that:
- Creates required directories (
$OBMP_DATA_ROOT/authelia, etc.) - Copies example configs if originals don't exist
- Validates inventory.yaml syntax
- Generates Telegraf config from inventory
D3. Published Docker images
Push custom images to a registry (Docker Hub or GHCR):
obmp-exabgpobmp-exabgp-uiobmp-traffic-genobmp-traffic-gen-uiobmp-portal
Replace build: with image: in docker-compose.yml (keep build as override).
D4. Documentation
docs/quickstart.md— 5-minute setup guidedocs/adding-a-lab.md— how to add a CML lab environmentdocs/production-deployment.md— production hardening checklistdocs/architecture.md— system diagram, data flow, port map
Track E: Internet-Scale Routing Analytics
Adds a local copy of the real global routing table, generalizes router comparison to an N-way diff, and threads VRF/RD scoping through the dashboards. The full-table feed (E1) is the foundation — E2/E3 consume it.
E1. GoBGP full-table feed → BMP → ip_rib
Files: docker-compose.yml (new gobgp service), gobgp/gobgpd.conf (new), gobgp/mrt-refresh.sh (new)
Stand up a GoBGP container that obtains a full Internet table (IPv4 ~1M +
IPv6 ~200k) and BMP-exports it to the existing OpenBMP collector, so the
global table lands in ip_rib as an ordinary monitored peer — every
existing dashboard and the diff then work against it for free.
- Primary feed — eBGP multihop session to Łukasz Bromirski's lab route
server, AS57355 (
85.232.240.179,2001:1a68:2c:2::179). Local ASN private (e.g. 65199); announce nothing;ebgp-multihopTTL ~64; receive-only. - BMP export — GoBGP
[[bmp-servers]]block at the collector (port 5000),route-monitoring-policy = pre-policy. - Fallback / seed —
gobgp/mrt-refresh.sh, run every 2h (host cron or a sidecar): download the latest RouteViews (archive.routeviews.org) or RIPE-RIS MRT RIB dump andgobgp mrt injectit into the same instance. - Identification — distinct BMP router name (e.g.
GLOBAL-FEED) so dashboards can include/exclude it.
Caveats:
- The route server is a single volunteer-run host, no SLA — the MRT fallback is the reliability backstop, not optional.
- A full table roughly triples
ip_ribsize — see E-scale below. - The feed carries no VRF/L3VPN routes — global unicast only.
E2. Generic multi-router diff dashboard
File: obmp-grafana/dashboards/.../router_diff.json (new, uid router-diff), generalized from rr_locrib_diff.json
Replace the hardwired RR1-vs-RR2 model with up to 4 selectable routers:
- Template vars
router1-router4(query type);router1/router2required,router3/router4default to a "— none —" sentinel and their panels hide when unset. - Presence matrix — rows = prefixes, columns = selected routers, cell = present / next-hop / origin-AS; the core view.
- Divergence view — table of prefixes where the selected routers disagree (missing on some, or differing best-path attributes).
- Keep the per-prefix all-paths drill-down from the RR diff.
- The global feed (E1) is selectable as any of the 4 → "lab vs the real
Internet." The existing
rr-locrib-diffstays as the RR-specific quick view.
E3. Global table exploration dashboard
File: obmp-grafana/dashboards/.../global_table.json (new)
Explorable dashboard over the GLOBAL-FEED peer: prefix count by AFI,
origin-AS distribution, prefix-length histogram, search by prefix/AS,
more-/less-specific lookups. Doubles as the comparison baseline for E2.
E4. VRF / RD awareness
Files: existing unicast + L3VPN dashboards
Thread a Route-Distinguisher / VRF scoping dimension through the dashboards:
- Add a
vrf/rdtemplate variable to the L3VPN dashboards and unicast dashboards where applicable. - VRF/RD columns and filters on RIB tables.
- The diff (E2) gains a per-VRF scope.
Constraint (stated plainly): CML IOS-XR images can't originate L3VPN routes and the global feed carries none — so E4 is built to the L3VPN schema and unverifiable in this lab; it validates only against production routers. Keep E4 scope minimal until there's a real L3VPN source.
E-scale. PostgreSQL sizing for a full table
A full v4+v6 table is ~1.2M prefixes; with attributes and history this is a
multi-GB addition to ip_rib / ip_rib_log. Before enabling E1 continuously:
confirm disk headroom on $OBMP_DATA_ROOT, apply TimescaleDB compression to
ip_rib_log (also flagged in C6). The mv_as_adjacency materialized view
(already in place — postgres/scripts/006_obmp_matviews.sql) becomes far
more valuable once real-Internet AS paths are present.
Implementation Order
| Priority | Step | Track | Description |
|---|---|---|---|
| 1 | A1 | Foundation | Create inventory.yaml |
| 2 | A2 | Foundation | Create config_loader.py |
| 3 | A3 | Foundation | Refactor hardcoded Python scripts |
| 4 | A4 | Foundation | Parameterize .env + docker-compose |
| 5 | A5-A6 | Foundation | Telegraf + InfluxDB datasource fixes |
| 6 | B1 | CML Dev | Dynamic ExaBGP multi-peer |
| 7 | B2-B4 | CML Dev | CML API client + deploy CLI |
| 8 | C1 | Production | Multi-vendor NETCONF (Junos support) |
| 9 | C3 | Production | Junos BMP config templates |
| 10 | C5 | Production | Security hardening |
| 11 | D1-D2 | Packaging | Config templates + bootstrap script |
| 12 | D3 | Packaging | Publish Docker images to registry |
| 13 | D4 | Packaging | Documentation |
| 14 | E1 | Analytics | GoBGP full-table feed (AS57355 live + MRT fallback) |
| 15 | E2 | Analytics | Generic 4-router diff dashboard |
| 16 | E3 | Analytics | Global table exploration dashboard |
| 17 | E4 | Analytics | VRF/RD scoping (to schema, lab-unverifiable) |
Steps 1-5 (Track A) unblock everything else. Steps 6-7 and 8-10 can proceed in parallel once the foundation is in place. Track E is independent of A-D: E1 is the foundation for E2/E3; E4 can proceed any time but is lab-unverifiable.
Verification
- Config centralization: Change a router IP in
inventory.yaml, verify all scripts pick it up - ExaBGP multi-peer: Set 3+ peers, restart, verify BGP sessions establish
- CML API:
deploy.py --env cml-lab1 statusconnects and lists nodes - BMP multi-source: Router from lab 2 sends BMP, appears in
SELECT * FROM routersand Grafana - Junos support: NETCONF script connects to a Juniper router, pushes config
- Production dry-run: Point a test router from the ISP network at the collector, verify end-to-end
- Clean deploy: Clone repo on a fresh host, run
setup.sh,docker compose up, confirm stack starts
Risks
- Router name collisions: Enforce unique hostnames across all environments
- Address space overlap: Each environment needs distinct management subnets
- Juniper BMP differences: Junos BMP implementation may differ in supported tables/TLVs — test early
- Production scale: 500K-route labs are slow; production full tables will stress PostgreSQL more
- Credentials in inventory: Must be gitignored; consider env var fallback for CI/CD
- Volunteer route server (E1): the AS57355 full-table feed has no SLA and can flap or be retired — the 2-hourly MRT fallback is mandatory, not optional
- Full-table DB growth (E1): a live global feed roughly triples
ip_rib; size disk and enableip_rib_logcompression before turning it on continuously - VRF work unverifiable (E4): no L3VPN source in the CML lab — E4 ships to schema correctness only, validated later against production