From 31286d5d3e983585204dfe0250bd490dbf3024fd Mon Sep 17 00:00:00 2001 From: sam Date: Fri, 15 May 2026 14:23:38 -0700 Subject: [PATCH] Add platform roadmap: multi-lab CML integration and production deployment Four-track roadmap covering configuration centralization (inventory.yaml), CML API automation (virl2_client), production ISP deployment (multi-vendor IOS-XR + Junos), and packaging for distribution. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/ROADMAP.md | 269 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 269 insertions(+) create mode 100644 docs/ROADMAP.md diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md new file mode 100644 index 0000000..6312c6e --- /dev/null +++ b/docs/ROADMAP.md @@ -0,0 +1,269 @@ +# OpenBMP Platform Roadmap + +## Context + +This BMP monitoring platform is being developed against CML virtual labs (IOS-XR) and will be deployed into an ISP production network running IOS-XR and Juniper routers/route reflectors. The two tracks share a common foundation: configuration must be environment-agnostic so the same stack runs identically against virtual or production routers. + +Currently, router IPs, AS numbers, and credentials are hardcoded across 8+ files, tightly coupling the stack to a single CML lab. This roadmap addresses both the multi-lab development workflow and production deployment. + +--- + +## Track A: Configuration Centralization (Foundation for Both Tracks) + +### A1. Create `inventory.yaml` — unified topology inventory + +**File**: `inventory.yaml` (new) + +Single source of truth for all environments. Structure: + +```yaml +platform: + host_ip: 10.40.40.202 + bmp_port: 5000 + exabgp_port: 5050 + +environments: + cml-lab1: + type: cml # cml | production + description: "CML RR cluster - 9 IOS-XR virtual routers" + cml_server: "https://10.40.40.174" + cml_user: webui + bgp_as: 65020 + netconf: { user: webui, password: cisco, port: 830 } + exabgp: + local_as: 65100 + peers: + - { ip: 10.100.0.100, name: CORE-01, peer_as: 65020 } + - { ip: 10.100.0.200, name: CORE-02, peer_as: 65020 } + routers: + CORE-01: { mgmt: 10.100.0.100, loopback: 10.10.255.0, role: rr, vendor: iosxr, gnmi: true } + CORE-02: { mgmt: 10.100.0.200, loopback: 10.10.255.20, role: rr, vendor: iosxr, gnmi: true } + R9K-01: { mgmt: 10.100.0.1, loopback: 10.10.255.1, role: client, vendor: iosxr } + # ... + + cml-lab2: + type: cml + description: "Second CML Lab (TBD topology)" + cml_server: "https://" + routers: {} + + production: + type: production + description: "ISP production network" + bgp_as: + netconf: { user: , port: 830 } + routers: + # IOS-XR and Juniper RRs + routers + PROD-RR1: { mgmt: x.x.x.x, role: rr, vendor: iosxr, gnmi: true } + PROD-RR2: { mgmt: x.x.x.x, role: rr, vendor: junos } + # ... +``` + +Key design decisions: +- `vendor: iosxr | junos` — drives NETCONF dialect, gNMI paths, and config templates +- `type: cml | production` — CML environments have `cml_server` for API automation; production does not +- Credentials in `inventory.yaml` (gitignored) or pulled from env vars + +### A2. Create `config_loader.py` — Python inventory helper + +**File**: `config_loader.py` (new) + +Functions: `get_env(name)`, `get_all_routers()`, `get_routers_by_vendor(vendor)`, `get_exabgp_peers()`, `get_gnmi_targets()`, `get_routers_for_env(env_name)` + +### A3. Refactor hardcoded Python scripts + +Replace `ROUTERS` dicts/lists with `config_loader` calls: +- `exabgp/route_diversity_config.py` (line 47) +- `exabgp/bgpls_config.py` (line 35) +- `gnmi/gnmi_grpc_config.py` (line 25) + +### A4. Expand `.env` and parameterize `docker-compose.yml` + +Add to `.env`: +```env +OBMP_DATA_ROOT=/var/openbmp +DOCKER_HOST_IP=10.40.40.202 +EXABGP_LOCAL_IP=10.40.40.202 +EXABGP_LOCAL_AS=65100 +EXABGP_PEER_AS=65020 +EXABGP_PEER_1=10.100.0.100 +EXABGP_PEER_2=10.100.0.200 +``` + +Replace hardcoded IPs in `docker-compose.yml` (Kafka listener, ExaBGP env vars). + +### A5. Telegraf config parameterization + +Replace hardcoded gNMI addresses in `telegraf/telegraf.conf` with env var substitution. Pass `GNMI_TARGETS` from docker-compose.yml. + +### A6. Fix InfluxDB datasource URL + +`obmp-grafana/provisioning/datasources/influxdb-ds.yml`: replace `http://10.40.40.202:8086` with `http://obmp-influxdb:8086`. + +--- + +## Track B: Multi-Lab CML Development + +### B1. Dynamic ExaBGP multi-peer support + +**File**: `exabgp/startup.sh` + +Accept `EXABGP_PEERS` env var (comma-separated `ip:as:description`), generate N neighbor blocks. Keep `PEER_1`/`PEER_2` fallback. + +### B2. CML API client module + +**File**: `cml/cml_client.py` (new) + +Python module using `virl2_client` SDK: +- Connect to CML server (creds from `inventory.yaml`) +- Upload node/image definitions +- Import/export topology YAML +- Start/stop/destroy labs +- Get node status + +### B3. Topology template system + +**File**: `cml/templates/xrd_rr.j2` (new) + +Jinja2 templates for XRd startup config. Parameterize: hostname, loopback, link IPs, IS-IS NET, BGP AS, neighbor IPs, BMP target. + +### B4. CLI deployment tool + +**File**: `cml/deploy.py` (new) + +```bash +python3 cml/deploy.py --env cml-lab1 status +python3 cml/deploy.py --env cml-lab1 upload-images +python3 cml/deploy.py --env cml-lab2 create +python3 cml/deploy.py --env cml-lab2 start +python3 cml/deploy.py --env cml-lab2 destroy +``` + +### B5. Update build scripts with API push + +`cml/build-cml-image.sh` and `cml/build-xrd-image.sh` get `--push ` flag. + +--- + +## Track C: Production ISP Deployment + +### C1. Multi-vendor NETCONF support + +Current scripts assume IOS-XR NETCONF only. For Juniper RRs: +- `config_loader.py` provides `vendor` field per router +- NETCONF scripts branch on vendor for dialect differences (`device_params='iosxr'` vs `device_params='junos'`) +- Route diversity, BGP-LS config scripts get Junos templates alongside IOS-XR + +### C2. Multi-vendor gNMI paths + +Telegraf gNMI subscriptions currently use OpenConfig paths which work for both IOS-XR and Junos, but: +- Verify Juniper gNMI support on target hardware +- Add vendor-specific path overrides in `inventory.yaml` if needed +- Telegraf can subscribe to multiple targets with different configs via `[[inputs.gnmi]]` blocks + +### C3. BMP considerations for production + +- BMP collector (port 5000) accepts connections from any router — no changes needed +- Production routers need BMP config pushed (manual or via NETCONF automation) +- Consider: separate BMP server IDs per environment for dashboard filtering +- Juniper BMP config differs from IOS-XR — add Junos BMP config templates + +### C4. Dashboard multi-environment awareness + +- Add a Grafana template variable for environment filtering (by router name prefix or a tag) +- Consider a "Network Overview" dashboard that shows all environments side-by-side +- Existing dashboards work as-is — router dropdowns will show all BMP-reporting routers + +### C5. Security hardening for production + +- Move credentials out of `inventory.yaml` into environment variables or a secrets manager +- Authelia config: stronger passwords, TOTP enforcement, session timeouts +- PostgreSQL: restrict access, enable SSL +- Kafka: consider authentication if exposed beyond localhost +- BMP port: firewall to only accept connections from known router management IPs + +### C6. Scalability considerations + +- Monitor PostgreSQL disk usage and query performance with production-scale RIBs +- TimescaleDB compression policies for historical data (ip_rib_log, ls_*_log) +- Kafka topic partitioning if message throughput is high +- Consider read replicas or materialized views for heavy Grafana queries + +--- + +## Track D: Packaging & Distribution + +### D1. Configuration templates + +- `inventory.yaml.example` — documented example with placeholder values +- `.env.example` — all environment variables with descriptions + +### D2. Bootstrap script + +`setup.sh` that: +- Creates required directories (`$OBMP_DATA_ROOT/authelia`, etc.) +- Copies example configs if originals don't exist +- Validates inventory.yaml syntax +- Generates Telegraf config from inventory + +### D3. Published Docker images + +Push custom images to a registry (Docker Hub or GHCR): +- `obmp-exabgp` +- `obmp-exabgp-ui` +- `obmp-traffic-gen` +- `obmp-traffic-gen-ui` +- `obmp-portal` + +Replace `build:` with `image:` in docker-compose.yml (keep build as override). + +### D4. Documentation + +- `docs/quickstart.md` — 5-minute setup guide +- `docs/adding-a-lab.md` — how to add a CML lab environment +- `docs/production-deployment.md` — production hardening checklist +- `docs/architecture.md` — system diagram, data flow, port map + +--- + +## Implementation Order + +| Priority | Step | Track | Description | +|----------|------|-------|-------------| +| 1 | A1 | Foundation | Create `inventory.yaml` | +| 2 | A2 | Foundation | Create `config_loader.py` | +| 3 | A3 | Foundation | Refactor hardcoded Python scripts | +| 4 | A4 | Foundation | Parameterize `.env` + docker-compose | +| 5 | A5-A6 | Foundation | Telegraf + InfluxDB datasource fixes | +| 6 | B1 | CML Dev | Dynamic ExaBGP multi-peer | +| 7 | B2-B4 | CML Dev | CML API client + deploy CLI | +| 8 | C1 | Production | Multi-vendor NETCONF (Junos support) | +| 9 | C3 | Production | Junos BMP config templates | +| 10 | C5 | Production | Security hardening | +| 11 | D1-D2 | Packaging | Config templates + bootstrap script | +| 12 | D3 | Packaging | Publish Docker images to registry | +| 13 | D4 | Packaging | Documentation | + +Steps 1-5 (Track A) unblock everything else. Steps 6-7 and 8-10 can proceed in parallel once the foundation is in place. + +--- + +## Verification + +1. **Config centralization**: Change a router IP in `inventory.yaml`, verify all scripts pick it up +2. **ExaBGP multi-peer**: Set 3+ peers, restart, verify BGP sessions establish +3. **CML API**: `deploy.py --env cml-lab1 status` connects and lists nodes +4. **BMP multi-source**: Router from lab 2 sends BMP, appears in `SELECT * FROM routers` and Grafana +5. **Junos support**: NETCONF script connects to a Juniper router, pushes config +6. **Production dry-run**: Point a test router from the ISP network at the collector, verify end-to-end +7. **Clean deploy**: Clone repo on a fresh host, run `setup.sh`, `docker compose up`, confirm stack starts + +--- + +## Risks + +- **Router name collisions**: Enforce unique hostnames across all environments +- **Address space overlap**: Each environment needs distinct management subnets +- **Juniper BMP differences**: Junos BMP implementation may differ in supported tables/TLVs — test early +- **Production scale**: 500K-route labs are slow; production full tables will stress PostgreSQL more +- **Credentials in inventory**: Must be gitignored; consider env var fallback for CI/CD