# OpenBMP Platform Roadmap ## Context This BMP monitoring platform is being developed against CML virtual labs (IOS-XR) and will be deployed into an ISP production network running IOS-XR and Juniper routers/route reflectors. The two tracks share a common foundation: configuration must be environment-agnostic so the same stack runs identically against virtual or production routers. Currently, router IPs, AS numbers, and credentials are hardcoded across 8+ files, tightly coupling the stack to a single CML lab. This roadmap addresses both the multi-lab development workflow and production deployment. --- ## Track A: Configuration Centralization (Foundation for Both Tracks) ### A1. Create `inventory.yaml` — unified topology inventory **File**: `inventory.yaml` (new) Single source of truth for all environments. Structure: ```yaml platform: host_ip: 10.40.40.202 bmp_port: 5000 exabgp_port: 5050 environments: cml-lab1: type: cml # cml | production description: "CML RR cluster - 9 IOS-XR virtual routers" cml_server: "https://10.40.40.174" cml_user: webui bgp_as: 65020 netconf: { user: webui, password: cisco, port: 830 } exabgp: local_as: 65100 peers: - { ip: 10.100.0.100, name: CORE-01, peer_as: 65020 } - { ip: 10.100.0.200, name: CORE-02, peer_as: 65020 } routers: CORE-01: { mgmt: 10.100.0.100, loopback: 10.10.255.0, role: rr, vendor: iosxr, gnmi: true } CORE-02: { mgmt: 10.100.0.200, loopback: 10.10.255.20, role: rr, vendor: iosxr, gnmi: true } R9K-01: { mgmt: 10.100.0.1, loopback: 10.10.255.1, role: client, vendor: iosxr } # ... cml-lab2: type: cml description: "Second CML Lab (TBD topology)" cml_server: "https://" routers: {} production: type: production description: "ISP production network" bgp_as: netconf: { user: , port: 830 } routers: # IOS-XR and Juniper RRs + routers PROD-RR1: { mgmt: x.x.x.x, role: rr, vendor: iosxr, gnmi: true } PROD-RR2: { mgmt: x.x.x.x, role: rr, vendor: junos } # ... ``` Key design decisions: - `vendor: iosxr | junos` — drives NETCONF dialect, gNMI paths, and config templates - `type: cml | production` — CML environments have `cml_server` for API automation; production does not - Credentials in `inventory.yaml` (gitignored) or pulled from env vars ### A2. Create `config_loader.py` — Python inventory helper **File**: `config_loader.py` (new) Functions: `get_env(name)`, `get_all_routers()`, `get_routers_by_vendor(vendor)`, `get_exabgp_peers()`, `get_gnmi_targets()`, `get_routers_for_env(env_name)` ### A3. Refactor hardcoded Python scripts Replace `ROUTERS` dicts/lists with `config_loader` calls: - `exabgp/route_diversity_config.py` (line 47) - `exabgp/bgpls_config.py` (line 35) - `gnmi/gnmi_grpc_config.py` (line 25) ### A4. Expand `.env` and parameterize `docker-compose.yml` Add to `.env`: ```env OBMP_DATA_ROOT=/var/openbmp DOCKER_HOST_IP=10.40.40.202 EXABGP_LOCAL_IP=10.40.40.202 EXABGP_LOCAL_AS=65100 EXABGP_PEER_AS=65020 EXABGP_PEER_1=10.100.0.100 EXABGP_PEER_2=10.100.0.200 ``` Replace hardcoded IPs in `docker-compose.yml` (Kafka listener, ExaBGP env vars). ### A5. Telegraf config parameterization Replace hardcoded gNMI addresses in `telegraf/telegraf.conf` with env var substitution. Pass `GNMI_TARGETS` from docker-compose.yml. ### A6. Fix InfluxDB datasource URL `obmp-grafana/provisioning/datasources/influxdb-ds.yml`: replace `http://10.40.40.202:8086` with `http://obmp-influxdb:8086`. --- ## Track B: Multi-Lab CML Development ### B1. Dynamic ExaBGP multi-peer support **File**: `exabgp/startup.sh` Accept `EXABGP_PEERS` env var (comma-separated `ip:as:description`), generate N neighbor blocks. Keep `PEER_1`/`PEER_2` fallback. ### B2. CML API client module **File**: `cml/cml_client.py` (new) Python module using `virl2_client` SDK: - Connect to CML server (creds from `inventory.yaml`) - Upload node/image definitions - Import/export topology YAML - Start/stop/destroy labs - Get node status ### B3. Topology template system **File**: `cml/templates/xrd_rr.j2` (new) Jinja2 templates for XRd startup config. Parameterize: hostname, loopback, link IPs, IS-IS NET, BGP AS, neighbor IPs, BMP target. ### B4. CLI deployment tool **File**: `cml/deploy.py` (new) ```bash python3 cml/deploy.py --env cml-lab1 status python3 cml/deploy.py --env cml-lab1 upload-images python3 cml/deploy.py --env cml-lab2 create python3 cml/deploy.py --env cml-lab2 start python3 cml/deploy.py --env cml-lab2 destroy ``` ### B5. Update build scripts with API push `cml/build-cml-image.sh` and `cml/build-xrd-image.sh` get `--push ` flag. --- ## Track C: Production ISP Deployment ### C1. Multi-vendor NETCONF support Current scripts assume IOS-XR NETCONF only. For Juniper RRs: - `config_loader.py` provides `vendor` field per router - NETCONF scripts branch on vendor for dialect differences (`device_params='iosxr'` vs `device_params='junos'`) - Route diversity, BGP-LS config scripts get Junos templates alongside IOS-XR ### C2. Multi-vendor gNMI paths Telegraf gNMI subscriptions currently use OpenConfig paths which work for both IOS-XR and Junos, but: - Verify Juniper gNMI support on target hardware - Add vendor-specific path overrides in `inventory.yaml` if needed - Telegraf can subscribe to multiple targets with different configs via `[[inputs.gnmi]]` blocks ### C3. BMP considerations for production - BMP collector (port 5000) accepts connections from any router — no changes needed - Production routers need BMP config pushed (manual or via NETCONF automation) - Consider: separate BMP server IDs per environment for dashboard filtering - Juniper BMP config differs from IOS-XR — add Junos BMP config templates ### C4. Dashboard multi-environment awareness - Add a Grafana template variable for environment filtering (by router name prefix or a tag) - Consider a "Network Overview" dashboard that shows all environments side-by-side - Existing dashboards work as-is — router dropdowns will show all BMP-reporting routers ### C5. Security hardening for production - Move credentials out of `inventory.yaml` into environment variables or a secrets manager - Authelia config: stronger passwords, TOTP enforcement, session timeouts - PostgreSQL: restrict access, enable SSL - Kafka: consider authentication if exposed beyond localhost - BMP port: firewall to only accept connections from known router management IPs ### C6. Scalability considerations - Monitor PostgreSQL disk usage and query performance with production-scale RIBs - TimescaleDB compression policies for historical data (ip_rib_log, ls_*_log) - Kafka topic partitioning if message throughput is high - Consider read replicas or materialized views for heavy Grafana queries --- ## Track D: Packaging & Distribution ### D1. Configuration templates - `inventory.yaml.example` — documented example with placeholder values - `.env.example` — all environment variables with descriptions ### D2. Bootstrap script `setup.sh` that: - Creates required directories (`$OBMP_DATA_ROOT/authelia`, etc.) - Copies example configs if originals don't exist - Validates inventory.yaml syntax - Generates Telegraf config from inventory ### D3. Published Docker images Push custom images to a registry (Docker Hub or GHCR): - `obmp-exabgp` - `obmp-exabgp-ui` - `obmp-traffic-gen` - `obmp-traffic-gen-ui` - `obmp-portal` Replace `build:` with `image:` in docker-compose.yml (keep build as override). ### D4. Documentation - `docs/quickstart.md` — 5-minute setup guide - `docs/adding-a-lab.md` — how to add a CML lab environment - `docs/production-deployment.md` — production hardening checklist - `docs/architecture.md` — system diagram, data flow, port map --- ## Implementation Order | Priority | Step | Track | Description | |----------|------|-------|-------------| | 1 | A1 | Foundation | Create `inventory.yaml` | | 2 | A2 | Foundation | Create `config_loader.py` | | 3 | A3 | Foundation | Refactor hardcoded Python scripts | | 4 | A4 | Foundation | Parameterize `.env` + docker-compose | | 5 | A5-A6 | Foundation | Telegraf + InfluxDB datasource fixes | | 6 | B1 | CML Dev | Dynamic ExaBGP multi-peer | | 7 | B2-B4 | CML Dev | CML API client + deploy CLI | | 8 | C1 | Production | Multi-vendor NETCONF (Junos support) | | 9 | C3 | Production | Junos BMP config templates | | 10 | C5 | Production | Security hardening | | 11 | D1-D2 | Packaging | Config templates + bootstrap script | | 12 | D3 | Packaging | Publish Docker images to registry | | 13 | D4 | Packaging | Documentation | Steps 1-5 (Track A) unblock everything else. Steps 6-7 and 8-10 can proceed in parallel once the foundation is in place. --- ## Verification 1. **Config centralization**: Change a router IP in `inventory.yaml`, verify all scripts pick it up 2. **ExaBGP multi-peer**: Set 3+ peers, restart, verify BGP sessions establish 3. **CML API**: `deploy.py --env cml-lab1 status` connects and lists nodes 4. **BMP multi-source**: Router from lab 2 sends BMP, appears in `SELECT * FROM routers` and Grafana 5. **Junos support**: NETCONF script connects to a Juniper router, pushes config 6. **Production dry-run**: Point a test router from the ISP network at the collector, verify end-to-end 7. **Clean deploy**: Clone repo on a fresh host, run `setup.sh`, `docker compose up`, confirm stack starts --- ## Risks - **Router name collisions**: Enforce unique hostnames across all environments - **Address space overlap**: Each environment needs distinct management subnets - **Juniper BMP differences**: Junos BMP implementation may differ in supported tables/TLVs — test early - **Production scale**: 500K-route labs are slow; production full tables will stress PostgreSQL more - **Credentials in inventory**: Must be gitignored; consider env var fallback for CI/CD