obmp-docker/DOCS.md

1116 lines
37 KiB
Markdown
Raw Normal View History

# OpenBMP + ExaBGP Route Injector — Full Documentation
## Table of Contents
1. [What Is This Project?](#1-what-is-this-project)
2. [Architecture](#2-architecture)
3. [Prerequisites](#3-prerequisites)
4. [Initial Setup (First Time)](#4-initial-setup-first-time)
5. [IOS-XR Router Configuration](#5-ios-xr-router-configuration)
6. [Starting and Stopping](#6-starting-and-stopping)
7. [Route Injection User Guide](#7-route-injection-user-guide)
8. [ExaBGP Control Panel (Web UI)](#8-exabgp-control-panel-web-ui)
9. [Grafana Dashboards](#9-grafana-dashboards)
10. [Sanity Checks](#10-sanity-checks)
11. [Relevant Commands Reference](#11-relevant-commands-reference)
12. [Troubleshooting](#12-troubleshooting)
13. [Data Retention](#13-data-retention)
14. [Environment Variables Reference](#14-environment-variables-reference)
15. [gNMI Streaming Telemetry (Phase 4)](#15-gnmi-streaming-telemetry-phase-4)
16. [Traffic Generator (Phase 4)](#16-traffic-generator-phase-4)
---
## 1. What Is This Project?
This is a **BGP Monitoring Platform (BMP) lab stack** deployed via Docker Compose. It collects, stores, and visualizes BGP routing data from a Cisco IOS-XR lab network (running in Cisco Modeling Labs / CML).
**What it does:**
- Receives BMP (BGP Monitoring Protocol, RFC 7854) telemetry from routers on TCP port 5000
- Streams BMP data through Kafka into a TimescaleDB/PostgreSQL database
- Provides **30 Grafana dashboards** (17 operational + 6 learning + 4 advanced analytics + 3 streaming telemetry) for real-time and historical BGP analysis
- Includes an **ExaBGP route injector** that peers with the two CORE routers and injects synthetic BGP routes, enabling testing of BGP policy, route propagation, and Grafana dashboards without needing internet connectivity
- Provides a **Vue 3 web UI** at `:5001` for point-and-click scenario management, live route tables, and peer monitoring
**The lab network:**
- AS 65020 — 9 Cisco IOS-XR routers in CML (iBGP full mesh via route-reflectors)
- AS 65100 — ExaBGP container (eBGP peer to both CORE routers)
- CORE-01: `10.100.0.100` (CML-R9K-CORE-01)
- CORE-02: `10.100.0.200` (CML-R9K-CORE-02)
- Host IP: `10.40.40.202` (ExaBGP binds here; reachable from CML management network)
---
## 2. Architecture
```
IOS-XR Routers (9x, AS 65020)
BMP telemetry on TCP 5000
|
v
obmp-collector (openbmp/collector:2.2.3)
|
v
obmp-kafka (confluentinc/cp-kafka:7.1.1)
+ obmp-zookeeper (confluentinc/cp-zookeeper:7.1.1)
|
v
obmp-psql-app (openbmp/psql-app:2.2.2)
Java consumer — writes parsed BGP data to PostgreSQL
|
v
obmp-psql (openbmp/postgres:2.2.1)
PostgreSQL 14 + TimescaleDB
|
+---------> obmp-grafana (grafana/grafana:9.1.7) :3000
| 30 dashboards, PostgreSQL + InfluxDB datasources
+---------> obmp-whois (openbmp/whois:2.2.0) :4300
WHOIS query server backed by the DB
ExaBGP (obmp-exabgp, built locally)
python:3.11-slim + exabgp 5.x + Flask API
Peers eBGP to CORE-01 and CORE-02 (AS 65100 -> AS 65020)
HTTP API on :5050 — inject/withdraw routes on demand
Routes propagate via iBGP mesh to all 9 routers -> BMP -> DB -> Grafana
gNMI Streaming Telemetry (Phase 4):
IOS-XR Routers (gRPC :57400)
|
v
obmp-telegraf (telegraf:1.28 + gnmi plugin)
|
v
obmp-influxdb (influxdb:2.7) :8086
|
v
obmp-grafana (InfluxDB datasource -> Telemetry dashboards)
Traffic Generator (Phase 4):
obmp-traffic-gen (python:3.11 + Scapy + Flask) :5051
Dual-mode: sender (generate traffic) / responder (echo/log)
RFC 2544 testing, custom packet flows
obmp-traffic-gen-ui (Vue 3 + NGINX) :5002
```
### Container Summary
| Container | Image | Port(s) | Role |
|-----------|-------|---------|------|
| obmp-zookeeper | confluentinc/cp-zookeeper:7.1.1 | 2181 (internal) | Kafka coordination |
| obmp-kafka | confluentinc/cp-kafka:7.1.1 | 9092 | Message broker |
| obmp-collector | openbmp/collector:2.2.3 | 5000 | BMP receiver |
| obmp-psql-app | openbmp/psql-app:2.2.2 | 9005 | Kafka→PostgreSQL consumer |
| obmp-psql | openbmp/postgres:2.2.1 | 5432 | TimescaleDB storage |
| obmp-grafana | grafana/grafana:9.1.7 | 3000 | Visualization |
| obmp-whois | openbmp/whois:2.2.0 | 4300 | WHOIS query server |
| obmp-exabgp | local build | 5050 (host net) | BGP route injector |
| obmp-exabgp-ui | local build | 5001 (host net) | Route injector web UI |
| obmp-influxdb | influxdb:2.7 | 8086 | Time-series DB for telemetry |
| obmp-telegraf | local build | - (host net) | gNMI telemetry collector |
| obmp-traffic-gen | local build | 5051 (host net) | Scapy traffic generator |
| obmp-traffic-gen-ui | local build | 5002 (host net) | Traffic generator web UI |
---
## 3. Prerequisites
- Docker Engine (20.10+) and Docker Compose v2
- Host IP `10.40.40.202` reachable from the CML management network
- CML routers with BMP configured pointing to `10.40.40.202:5000`
- CML CORE routers configured with ExaBGP as eBGP neighbor (see Section 5)
- `OBMP_DATA_ROOT` directory created (default: `/var/openbmp`)
---
## 4. Initial Setup (First Time)
### 4.0 Quick deploy (recommended)
`setup.sh` bootstraps a fresh host — it creates the data directories, syncs
Grafana provisioning, generates Authelia secrets, and renders config. It is
idempotent and safe to re-run.
```bash
git clone <this-repo-url>
cd obmp-docker
cp .env.example .env
$EDITOR .env # set HOST_IP, OBMP_DOMAIN, OBMP_COOKIE_DOMAIN, credentials
./setup.sh
docker compose up -d # BMP collector core only
docker compose --profile test --profile auth up -d # full stack (lab tools + auth)
```
The stack uses Docker Compose **profiles**:
| Command | Brings up |
|---------|-----------|
| `docker compose up -d` | Collector core only — zookeeper, kafka, collector, psql, psql-app, grafana, whois |
| `docker compose --profile test up -d` | Core **+** ExaBGP, traffic generator, telegraf, influxdb |
| `docker compose --profile auth up -d` | Core **+** Authelia gateway and portal |
| `docker compose --profile test --profile auth up -d` | Everything |
The bare `docker compose up` is the shippable standalone BMP collector — it has
no dependency on the lab/test tooling.
The sections below (4.14.6) document the equivalent **manual** steps if you
prefer not to use `setup.sh`.
### 4.1 Clone the repository
```bash
git clone <this-repo-url>
cd obmp-docker
```
### 4.2 Create persistent data directories
```bash
export OBMP_DATA_ROOT=/var/openbmp
sudo mkdir -p $OBMP_DATA_ROOT
mkdir -p ${OBMP_DATA_ROOT}/config
mkdir -p ${OBMP_DATA_ROOT}/kafka-data
mkdir -p ${OBMP_DATA_ROOT}/zk-data
mkdir -p ${OBMP_DATA_ROOT}/zk-log
mkdir -p ${OBMP_DATA_ROOT}/postgres/data
mkdir -p ${OBMP_DATA_ROOT}/postgres/ts
mkdir -p ${OBMP_DATA_ROOT}/grafana
mkdir -p ${OBMP_DATA_ROOT}/grafana/dashboards
sudo chmod -R 777 $OBMP_DATA_ROOT
```
### 4.3 Initialise the database (first run only)
Create the init trigger file — this causes psql-app to create all tables on startup:
```bash
touch ${OBMP_DATA_ROOT}/config/init_db
```
> **Warning:** Do not create this file on subsequent runs unless you want to wipe and recreate the entire database.
### 4.4 Copy Grafana provisioning files
```bash
cp -r obmp-grafana/provisioning ${OBMP_DATA_ROOT}/grafana/
cp -r obmp-grafana/dashboards ${OBMP_DATA_ROOT}/grafana/
```
### 4.5 Start the stack
```bash
OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d
```
Wait ~2 minutes for all services to initialise (especially PostgreSQL and psql-app which run schema migrations).
### 4.6 Verify everything is running
```bash
docker compose -p obmp ps
docker compose -p obmp logs --tail=20 psql-app
```
---
## 5. IOS-XR Router Configuration
The ExaBGP container peers eBGP with both CORE routers. Each CORE router must be configured with:
### 5.1 Route policies (apply once per router)
```
route-policy EXABGP_IN
pass
end-policy
route-policy EXABGP_OUT
drop
end-policy
```
### 5.2 BGP neighbor block
```
router bgp 65020
neighbor 10.40.40.202
remote-as 65100
description ExaBGP-Route-Injector
ebgp-multihop 5
update-source MgmtEth0/RP0/CPU0/0
!
address-family ipv4 unicast
route-policy EXABGP_IN in
route-policy EXABGP_OUT out
next-hop-self
!
!
!
```
### 5.3 Static route for next-hop resolution
IOS-XR BGP does not use the default route (0.0.0.0/0) to resolve BGP next-hops. A more-specific static route for the ExaBGP host subnet is required in the default VRF:
```
router static
address-family ipv4 unicast
10.40.40.0/24 10.100.0.254
!
!
```
### 5.4 Config notes
| Knob | Why |
|------|-----|
| `remote-as 65100` | ExaBGP presents as AS 65100 (eBGP to your AS 65020 mesh) |
| `ebgp-multihop 5` | Host and router are on different subnets |
| `update-source MgmtEth0/RP0/CPU0/0` | ExaBGP is reachable via the management interface |
| `next-hop-self` | Replace ExaBGP's next-hop (10.40.40.202) with the CORE router's address when reflecting into iBGP — ensures all routers can resolve the next-hop |
| `EXABGP_OUT` drops | Prevents the lab from advertising its own prefixes back to ExaBGP |
| Static route | Required: IOS-XR BGP will not install injected routes as bestpaths without a specific route to the next-hop |
### 5.5 NETCONF alternative
See `exabgp/iosxr_bgp_config.md` for a Python/ncclient script that pushes all of the above config programmatically.
Credentials: `username=webui`, `password=cisco`, port 830.
### 5.6 Bulk BMP config (`cml/proxmox_bmp_config.py`)
To point a whole lab of IOS-XR routers at the BMP collector at once,
`cml/proxmox_bmp_config.py` applies the `bmp server 1` block over SSH (IOS-XR
BMP config is not exposed via NETCONF YANG on current releases). It is
idempotent.
```bash
pip install paramiko
python3 cml/proxmox_bmp_config.py # all routers in the inventory
python3 cml/proxmox_bmp_config.py r9k-05 # a single router (smoke test)
```
Edit the `ROUTERS` list at the top of the script for your inventory and the
`COLLECTOR_HOST` constant for the collector address.
---
## 6. Starting and Stopping
### Start all services
```bash
OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d
```
### Stop all services (preserve data)
```bash
docker compose -p obmp down
```
### Stop and remove all data (full reset)
```bash
docker compose -p obmp down -v
sudo rm -rf /var/openbmp
```
### Rebuild the ExaBGP container (after code changes)
```bash
docker compose -p obmp build exabgp
docker compose -p obmp up -d exabgp
```
### Restart a single service
```bash
docker compose -p obmp restart <service>
# e.g.:
docker compose -p obmp restart exabgp
docker compose -p obmp restart psql-app
```
---
## 7. Route Injection User Guide
The ExaBGP container exposes a Flask REST API on port 5050 (host network). The `inject.py` CLI wraps this API.
### 7.1 Setup
```bash
cd exabgp
pip install requests # only needed if running inject.py from the host
```
### 7.2 Check status
```bash
python3 inject.py status
```
Output shows API health, active route count, and peer states:
```json
{
"status": "ok",
"active_routes": 77,
"peers": {
"10.100.0.100": {"state": "up", "updated": "2026-03-05T10:00:00Z"},
"10.100.0.200": {"state": "up", "updated": "2026-03-05T10:00:00Z"}
}
}
```
### 7.3 List available scenarios
```bash
python3 inject.py scenarios
```
| Scenario | Routes | Description |
|----------|--------|-------------|
| `internet_sample` | ~94 | Partial internet table — real public prefixes, realistic AS paths (Cloudflare, Google, AWS, Azure, etc.) |
| `churn` | 30 | RFC documentation prefixes for announce/withdraw churn testing |
| `blackhole` | 5 | /32 prefixes with RTBH community (65100:666 + 65535:666) |
| `anycast` | 3 | Same prefixes with varying AS paths and MEDs (best-path testing) |
| `full_table` | 500+ | Large partial internet table with synthetic /24s |
| `lab_prefixes` | 8 | Enterprise/SP-style routes with communities and local-pref |
| `convergence_test` | 10 | Prefixes for timing BGP convergence — announce then check ip_rib_log timestamps |
| `route_leak` | 10 | Real prefixes re-announced with short AS paths — simulates a route leak (community 65100:999) |
| `hijack_simulation` | 10 | Prefixes claimed directly by AS 65100 — simulates a prefix hijack (community 65100:hijack) |
| `te_community_steering` | 15 | Routes tagged with TE communities for color-based steering (65020:100=red, 65020:200=blue, 65020:300=green) |
| `origin_shift` | 5 | Prefixes with changed origin AS — simulates origin migration for anomaly detection |
| `path_diversity` | 10 | Same prefixes with different AS paths/MEDs — demonstrates best-path selection |
### 7.4 Load a scenario
```bash
python3 inject.py scenario internet_sample
```
Routes propagate: ExaBGP → CORE-01/CORE-02 (eBGP) → all 9 routers (iBGP) → BMP → Kafka → PostgreSQL → Grafana.
### 7.5 Withdraw a scenario
```bash
python3 inject.py withdraw-scenario internet_sample
```
### 7.6 Announce individual prefixes
```bash
python3 inject.py announce 10.0.0.0/8 \
--as-path 65100 3356 15169 \
--community 65100:100 \
--med 100
```
### 7.7 Withdraw individual prefixes
```bash
python3 inject.py withdraw 10.0.0.0/8
```
### 7.8 Withdraw everything
```bash
python3 inject.py withdraw-all
```
### 7.9 Generate route churn (populate history tables)
The `churn` command cycles the churn scenario repeatedly, generating `ip_rib_log` and `stats_chg_*` entries that power Grafana's history dashboards.
```bash
# 5 cycles, 30 seconds apart
python3 inject.py churn --count 5 --interval 30
# Run indefinitely until Ctrl+C
python3 inject.py churn
```
### 7.10 REST API directly (curl)
```bash
BASE=http://localhost:5050
# Health
curl $BASE/healthz
# List scenarios
curl $BASE/scenarios
# Load scenario
curl -X POST $BASE/scenario/internet_sample
# Announce custom prefix
curl -X POST $BASE/announce \
-H 'Content-Type: application/json' \
-d '{"prefixes":["10.0.0.0/8"],"as_path":[65100,3356,15169],"communities":["65100:100"]}'
# Withdraw all
curl -X POST $BASE/withdraw/all
# Peer state
curl $BASE/peers
```
### 7.11 Adding custom scenarios
Edit `exabgp/scenarios/__init__.py`. Add an entry to `SCENARIOS` following the existing pattern:
```python
SCENARIOS['my_scenario'] = {
'description': 'My custom routes',
'routes': [
_r('192.0.2.0/24', [65100, 65200], communities=['65100:100']),
],
}
```
The `scenarios/` directory is volume-mounted into the container, so changes are live without rebuilding. However, the Python module is imported at container start — **restart the container** after editing:
```bash
docker compose -p obmp restart exabgp
```
---
## 8. ExaBGP Control Panel (Web UI)
Access: `http://10.40.40.202:5001`
A Vue 3 single-page app served by NGINX that proxies `/api/` to the ExaBGP Flask API on port 5050. No login required.
### Layout
```
┌─────────────────────────────────────────────────────────────┐
│ OpenBMP Route Injector [API OK] [77 routes] [2/2 UP] │
├──────────────────────┬──────────────────────────────────────┤
│ SCENARIOS │ [Routes] [Inject] [Peers] tabs │
│ │ │
│ [internet_sample] │ Routes tab: searchable/paginated │
│ [LOAD] [UNLOAD] │ table with per-row Withdraw button │
│ │ │
│ [churn] │ Inject tab: manual prefix form │
│ [LOAD] [START CHURN]│ (prefix, AS path, communities, MED) │
│ │ │
│ [blackhole] ... │ Peers tab: per-peer UP/DOWN cards │
├──────────────────────┴──────────────────────────────────────┤
│ Refreshing every 5s │
└─────────────────────────────────────────────────────────────┘
```
### Features
- **Live status bar** — API health, active route count, peer UP/DOWN badges; auto-refreshes every 5 seconds
- **Scenario panel** — Load/Unload buttons for all 9 scenarios with loading states and feedback
- **Churn control** — Start/stop churn cycles with configurable count and interval sliders directly in the browser
- **Route table** — Searchable, paginated (20/page) table of active routes; per-row Withdraw button; Withdraw All
- **Manual inject form** — Announce any prefix with custom AS path, communities, MED, local-pref
- **Peer cards** — Per-peer state display with UP (green) / DOWN (red pulsing) indicators
### Rebuild after code changes
```bash
docker compose -p obmp build exabgp-ui
docker compose -p obmp up -d exabgp-ui
```
---
## 9. Grafana Dashboards
Access: `http://10.40.40.202:3000`
Default credentials: `admin` / `openbmp` (anonymous access also enabled)
### Dashboard Categories
| Category | Dashboard | Description |
|----------|-----------|-------------|
| General | OBMP Home | Overview / landing page |
| Base | Inventory | Router and peer inventory |
| Base | Looking Glass | Real-time RIB lookup by prefix |
| Base | ASN View | ASN-level routing view |
| History | Prefix History | Route change history for a prefix |
| History | Prefix History by ASN | Filtered by origin AS |
| History | Prefix History by Community | Filtered by BGP community |
| Tops | Top Prefixes | Most-updated prefixes |
| Tops | Top L3VPN Prefixes | L3VPN equivalent |
| Link State | LS Nodes | IS-IS link-state node database |
| Link State | LS Links | IS-IS link-state link database |
| Link State | LS Topology | Network topology map |
| Link State | LS Prefixes | Link-state prefix database |
| Link State | LS History | Link-state change history |
| L3VPN | L3VPN Looking Glass | VPN RIB lookup |
| L3VPN | L3VPN Prefix History | VPN route change history |
| L3VPN | L3VPN RIB Browser | Full VPN RIB browser |
> History dashboards require `ip_rib_log` and `stats_chg_*` table data. Run `inject.py churn` to populate these.
### OBMP-Learning Dashboards (folder: `OBMP-Learning`)
Six learning-focused dashboards in a separate folder, designed to teach BGP concepts using live lab data.
| Dashboard | UID | What it teaches |
|-----------|-----|-----------------|
| BGP Update Rate & Churn | `obmp-learn-01` | Network stability — advertisements vs withdrawals over time from `ip_rib_log`; per-peer update counts |
| Peer Session Health & Flap Analysis | `obmp-learn-02` | BGP session stability — state timeline, flap count, uptime %, last reset reason |
| AS Path Analysis | `obmp-learn-03` | Internet topology — path length distribution, longest paths, top origin ASNs, transit frequency |
| RPKI Validation Status | `obmp-learn-04` | BGP security — Valid / Invalid / NotFound breakdown; invalid routes (potential hijacks) table |
| Route Churn & Stability Score | `obmp-learn-05` | Prefix stability — tiered churn score (Very Stable / Stable / Moderate / Unstable) per prefix |
| BGP Attribute Explorer | `obmp-learn-06` | BGP path attributes — community list distribution, MED values, local-pref spread per peer |
> **RPKI note:** The `rpki_validator` table is populated by a cron job in `psql-app` every 2 hours. Dashboard `obmp-learn-04` will show zero counts until the cron runs — check `ENABLE_RPKI=1` in `docker-compose.yml`.
### Advanced Analytics Dashboards (folder: `OBMP-Learning`)
Four advanced dashboards that go beyond basic BMP monitoring, unlocking TE/SR data and providing heuristic analysis.
| Dashboard | UID | What it provides |
|-----------|-----|-----------------|
| Database Schema Map | `obmp-learn-07` | Interactive schema reference — live table row counts, entity relationships, column details for all 33 tables and 11 views |
| TE & Segment Routing Analytics | `obmp-learn-08` | Exposes TE/SR fields from BGP-LS: link bandwidth, admin groups, SRLG, SR SIDs, adjacency SIDs, protection types |
| Topology Change & Anomaly Detection | `obmp-learn-09` | Heuristic analysis: link state changes over time, origin AS hijack detection, convergence timeline, route consistency |
| Link Utilization & TE Thought Experiment | `obmp-learn-10` | BGP-LS capacity data (bandwidth, TE metrics) + integration guide for streaming telemetry (gNMI/MDT) |
> **TE/SR data note:** Some TE fields (admin_group, max_link_bw, srlg, sr_adjacency_sids) may be NULL if routers don't advertise those TLVs. Enable `mpls traffic-eng` under IS-IS and `segment-routing mpls` for full data.
### Database Schema Reference
A standalone database schema reference is also available at `DB_SCHEMA.md` in the repo root. It documents all 33 tables, 11 views, TE/SR columns, enum types, and common query patterns.
---
## 10. Sanity Checks
### 9.1 All containers running
```bash
docker compose -p obmp ps
```
All containers should show `running`. If any are restarting, check logs:
```bash
docker compose -p obmp logs --tail=50 <service>
```
### 9.2 ExaBGP peers up
```bash
python3 exabgp/inject.py status
```
Both `10.100.0.100` and `10.100.0.200` should show `"state": "up"`.
Or check from the router side:
```
show bgp neighbors 10.40.40.202
show bgp summary | inc 10.40.40.202
```
### 9.3 Routes accepted by CORE routers
After loading `internet_sample`:
```bash
# On CORE-01 or CORE-02:
show bgp summary
# Expect: 77 accepted prefixes, 77 are bestpaths from 10.40.40.202
show bgp 8.8.8.0/24
# Expect: best path via 10.40.40.202 (eBGP), also iBGP copies from other routers
```
### 9.4 Routes in OpenBMP database
```bash
docker exec -it obmp-psql psql -U openbmp -c "
SELECT count(DISTINCT prefix) AS unique_prefixes,
count(DISTINCT peer_hash_id) AS peers_reporting
FROM ip_rib
WHERE isIPv4 = true AND isWithdrawn = false;
"
```
Expect `~129 unique prefixes` and `56 peers_reporting` (9 routers × ~6 peers each) after loading `internet_sample`.
### 9.5 Kafka is healthy
```bash
docker exec -it obmp-kafka kafka-topics --bootstrap-server localhost:29092 --list
```
Should show topics like `openbmp.parsed.unicast_prefix`, `openbmp.parsed.peer`, etc.
### 9.6 Grafana datasource
Open `http://10.40.40.202:3000` → Configuration → Data Sources → OpenBMP → Test.
Should return "Database Connection OK".
### 9.7 BMP collector receiving data
```bash
docker compose -p obmp logs --tail=30 collector
```
Should show connections from router management IPs.
### 9.8 psql-app consumer is caught up
```bash
docker compose -p obmp logs --tail=30 psql-app
```
Should show periodic cron job outputs (RPKI sync, IRR sync, global_ip_rib updates).
---
## 11. Relevant Commands Reference
### Docker Compose
```bash
# Start stack
OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d
# Stop stack
docker compose -p obmp down
# Show status
docker compose -p obmp ps
# Follow logs (all services)
docker compose -p obmp logs -f
# Follow logs (specific service)
docker compose -p obmp logs -f exabgp
docker compose -p obmp logs -f psql-app
docker compose -p obmp logs -f collector
# Rebuild and restart ExaBGP
docker compose -p obmp build exabgp && docker compose -p obmp up -d exabgp
# Restart a service
docker compose -p obmp restart psql-app
```
### Route Injection (from `exabgp/` directory)
```bash
# API health and peer states
python3 inject.py status
# List active routes
python3 inject.py routes
# List scenarios
python3 inject.py scenarios
# Load a scenario
python3 inject.py scenario internet_sample
python3 inject.py scenario churn
python3 inject.py scenario blackhole
python3 inject.py scenario full_table
python3 inject.py scenario lab_prefixes
# Withdraw a scenario
python3 inject.py withdraw-scenario internet_sample
# Withdraw all active routes
python3 inject.py withdraw-all
# Announce a specific prefix
python3 inject.py announce 10.0.0.0/8 --as-path 65100 3356 15169 --community 65100:100
# Withdraw a specific prefix
python3 inject.py withdraw 10.0.0.0/8
# Run churn (populate history tables)
python3 inject.py churn --count 5 --interval 30
```
### Database Queries
```bash
# Connect to database
docker exec -it obmp-psql psql -U openbmp -d openbmp
# Count unique prefixes in RIB
SELECT count(DISTINCT prefix) FROM ip_rib WHERE isIPv4=true AND isWithdrawn=false;
# Show recent route changes
SELECT prefix, origin_as, iswithdrawn, timestamp
FROM ip_rib_log
ORDER BY timestamp DESC LIMIT 20;
# Show peer summary
SELECT name, state, timestamp_last_updated
FROM bgp_peers
ORDER BY state, name;
# Show routes from ExaBGP peer
SELECT prefix, origin_as, as_path
FROM ip_rib
WHERE peer_hash_id IN (
SELECT hash_id FROM bgp_peers WHERE peer_addr = '10.40.40.202'
)
AND isWithdrawn = false;
```
### IOS-XR Verification (on router CLI)
```
show bgp neighbors 10.40.40.202
show bgp neighbors 10.40.40.202 received routes
show bgp summary
show bgp 8.8.8.0/24
show bgp 1.1.1.0/24
show route 8.8.8.0/24
```
---
## 12. Troubleshooting
### ExaBGP container keeps restarting
Check logs:
```bash
docker compose -p obmp logs --tail=50 exabgp
```
Common causes and fixes:
| Symptom | Cause | Fix |
|---------|-------|-----|
| Exits after "welcome" banner | Missing or wrong env file path | `startup.sh` generates `/usr/local/etc/exabgp/exabgp.env` — verify this path exists in container |
| Process `api` killed 5 times | Wrong Python path in conf | Conf uses `/usr/local/bin/python3` — correct for python:3.11-slim |
| `drop = true` in env | ExaBGP drops privileges to nobody, can't bind 179 | `startup.sh` patches `drop = false` — check the sed lines ran |
| `__pycache__ Permission denied` during build | Root-owned cache from previous container run | `.dockerignore` excludes `**/__pycache__` — confirm file exists |
### BGP sessions not establishing
1. Verify host IP `10.40.40.202` is reachable from CML management network: `ping 10.40.40.202` from router
2. Check ExaBGP peer state: `python3 exabgp/inject.py status`
3. On router: `show bgp neighbors 10.40.40.202` — look for error codes
4. Common IOS-XR errors:
- `no-update-source-config` — add `update-source MgmtEth0/RP0/CPU0/0`
- `no-ipv6-address` — ensure only IPv4 unicast AF is configured (no IPv6)
- TCP refused — check port 179 is reachable (ExaBGP uses `network_mode: host`)
### Routes received but not bestpath
IOS-XR BGP requires a specific route to resolve the BGP next-hop (10.40.40.202). The default route (0.0.0.0/0) is insufficient.
```
router static
address-family ipv4 unicast
10.40.40.0/24 10.100.0.254
```
Verify: `show bgp 1.1.1.0/24` — should show `Status: s (active), bestpath`.
### Grafana shows no data
1. Check datasource: Configuration → Data Sources → OpenBMP → Test
2. Verify psql-app is writing: `docker compose -p obmp logs psql-app`
3. Check the database directly (see database queries above)
4. History dashboards need route churn — run `python3 inject.py churn`
### Kafka not starting
Zookeeper must be healthy first. Check:
```bash
docker compose -p obmp logs zookeeper
docker compose -p obmp restart kafka
```
### psql-app fails to start
Usually a PostgreSQL connection issue or schema mismatch. Check:
```bash
docker compose -p obmp logs psql-app
# If "relation does not exist" errors: re-trigger DB init
touch /var/openbmp/config/init_db
docker compose -p obmp restart psql-app
```
---
## 13. Data Retention
Configured in `docker-compose.yml` via `POSTGRES_DROP_*` environment variables:
| Table | Default Retention |
|-------|-------------------|
| peer_event_log | 1 year |
| stat_reports | 4 weeks |
| ip_rib_log | 4 weeks |
| alerts | 4 weeks |
| ls_nodes_log | 4 months |
| ls_links_log | 4 months |
| ls_prefixes_log | 4 months |
| stats_chg_byprefix | 4 weeks |
| stats_chg_byasn | 4 weeks |
| stats_chg_bypeer | 4 weeks |
| stats_ip_origins | 4 weeks |
| stats_peer_rib | 4 weeks |
| stats_peer_update_counts | 4 weeks |
Adjust in `docker-compose.yml` under the `psql-app` service environment block.
---
## 14. Environment Variables Reference
### ExaBGP container
| Variable | Default | Description |
|----------|---------|-------------|
| `EXABGP_LOCAL_IP` | `10.40.40.202` | Host IP ExaBGP binds to and uses as router-id |
| `EXABGP_LOCAL_AS` | `65100` | ExaBGP's AS number |
| `EXABGP_PEER_AS` | `65020` | AS of the IOS-XR lab |
| `EXABGP_PEER_1` | `10.100.0.100` | First CORE router to peer with |
| `EXABGP_PEER_2` | `10.100.0.200` | Second CORE router to peer with |
| `EXABGP_API_PORT` | `5050` | Flask API port |
### psql-app container (key variables)
| Variable | Default | Description |
|----------|---------|-------------|
| `MEM` | `3` | JVM heap in GB |
| `ENABLE_RPKI` | `1` | Enable RPKI sync from Cloudflare |
| `ENABLE_IRR` | `1` | Enable IRR sync |
| `ENABLE_DBIP` | `1` | Enable DB-IP geolocation import |
| `POSTGRES_REPORT_WINDOW` | `8 minute` | Aggregation window for summary tables |
### inject.py (CLI)
| Variable | Default | Description |
|----------|---------|-------------|
| `EXABGP_API` | `http://localhost:5050` | ExaBGP API base URL |
---
## 15. gNMI Streaming Telemetry (Phase 4)
### Overview
gNMI (gRPC Network Management Interface) adds **data-plane visibility** alongside BMP's control-plane monitoring. Telegraf collects real-time interface counters from all 9 IOS-XR routers via gNMI subscriptions and stores them in InfluxDB. Grafana queries InfluxDB for telemetry dashboards.
### Architecture
```
IOS-XR Routers (9x, gRPC port 57400)
|
gNMI subscriptions (10s sample)
|
v
obmp-telegraf (telegraf:1.28 + gnmi input plugin)
host networking → reaches routers on 10.100.0.x
|
v
obmp-influxdb (influxdb:2.7, port 8086)
bucket: "telemetry", org: "openbmp"
|
v
obmp-grafana (InfluxDB datasource, Flux queries)
3 dashboards in OBMP-Telemetry folder
```
### Enabling gRPC on Routers
The routers need gRPC enabled before Telegraf can collect telemetry. A NETCONF script is provided:
```bash
# From the host (requires ncclient: pip install ncclient)
cd /home/user/obmp-docker/gnmi
python3 gnmi_grpc_config.py
```
This connects to all 9 routers via NETCONF (port 830, credentials webui/cisco) and pushes:
```
grpc
port 57400
no-tls
```
**Verify on router:**
```
show grpc status
```
Expected: gRPC listening on port 57400.
### Telemetry Data Collected
Telegraf subscribes to two IOS-XR YANG paths at 10-second intervals:
| Subscription | YANG Path | Data |
|-------------|-----------|------|
| interface_counters | `Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters` | bytes/packets in/out, errors, drops, CRC |
| interface_rates | `Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/data-rate` | bits/sec in/out, packet rate |
### InfluxDB Access
- **URL:** `http://localhost:8086`
- **Org:** `openbmp`
- **Bucket:** `telemetry`
- **Token:** `openbmp-telemetry-token`
- **Retention:** 30 days
### Grafana Telemetry Dashboards
Three dashboards in the **OBMP-Telemetry** folder:
| Dashboard | UID | Description |
|-----------|-----|-------------|
| Interface Utilization | obmp-telem-01 | Input/output bytes rate, packets rate, top interfaces by throughput |
| Interface Errors | obmp-telem-02 | CRC errors, input/output errors, drops, overruns |
| Combined BMP + Telemetry | obmp-telem-03 | Mixed datasource — BGP peer status (PostgreSQL) alongside interface counters (InfluxDB) |
All dashboards have `$router` and `$interface` template variables for filtering.
### Troubleshooting gNMI
```bash
# Check Telegraf logs for gNMI connection status
docker logs obmp-telegraf --tail 50
# Verify InfluxDB has data
curl -s -H "Authorization: Token openbmp-telemetry-token" \
"http://localhost:8086/api/v2/query?org=openbmp" \
--data-urlencode 'q=from(bucket:"telemetry") |> range(start: -5m) |> limit(n:5)'
# Check InfluxDB health
curl http://localhost:8086/health
```
---
## 16. Traffic Generator (Phase 4)
### Overview
A portable, containerized traffic generator with a web UI for RFC 2544 testing and custom packet flows. Built with Scapy + Flask (backend) and Vue 3 + NGINX (frontend). The container supports **dual-mode operation**: sender (generate traffic) or responder (receive/echo packets).
### Accessing the UI
- **Web UI:** `http://localhost:5002`
- **API:** `http://localhost:5051`
### Dual-Mode Operation
Set via `TRAFFIC_GEN_MODE` environment variable in `docker-compose.yml`:
| Mode | Description |
|------|-------------|
| `sender` (default) | Generates traffic, runs RFC 2544 tests, sends custom flows |
| `responder` | Listens for incoming test packets, echoes/timestamps them, reports receive stats |
**Typical deployment:** One instance as `sender` on the host, optionally a second instance as `responder` on another endpoint. Without a responder, the sender uses ICMP echo for latency measurement (routers respond natively).
### Creating Flows
Use the **Flow Builder** panel (left sidebar) in the UI:
| Field | Default | Description |
|-------|---------|-------------|
| Name | - | Human-readable flow name |
| Destination IP | `10.100.0.100` | Target router IP |
| Source IP | `10.40.40.202` | Host IP |
| Protocol | UDP | UDP, TCP, or ICMP |
| Source Port | 50000 | (UDP/TCP only) |
| Destination Port | 5001 | (UDP/TCP only) |
| Frame Size | 512 | Packet size in bytes |
| Rate (pps) | 1000 | Packets per second |
| Duration | 30 | Seconds (0 = infinite) |
| DSCP | 0 | Differentiated Services Code Point |
After creating a flow, use the **Flows** tab to Start/Stop/Delete flows.
### RFC 2544 Testing
Use the **Tests** tab to configure and run RFC 2544 tests:
| Test Type | Description |
|-----------|-------------|
| **Throughput** | Binary search for maximum zero-loss forwarding rate |
| **Latency** | Measure round-trip time at determined throughput rate |
| **Frame Loss** | Loss percentage vs. offered load curve |
| **Back-to-Back** | Maximum burst length at line rate with zero loss |
**Parameters:**
- **Base Flow:** Select a previously created flow as the test template
- **Frame Sizes:** Standard sizes: 64, 128, 256, 512, 1024, 1280, 1518 bytes
- **Trial Duration:** Per-frame-size test duration (5300 sec)
- **Max Rate (pps):** Upper bound for binary search
- **Acceptable Loss %:** Threshold for pass/fail
### Quick Presets
Six built-in presets are available in the **Tests** tab:
| Preset | Description |
|--------|-------------|
| quick_icmp | ICMP ping to CORE-01 at 10 pps |
| udp_flood_small | 64-byte UDP at 5000 pps |
| udp_flood_large | 1518-byte UDP at 1000 pps |
| rfc2544_throughput | Full throughput test with standard frame sizes |
| rfc2544_latency | Latency measurement with standard frame sizes |
| tcp_session | TCP flow at 500 pps |
### API Reference
| Method | Path | Description |
|--------|------|-------------|
| GET | `/healthz` | Health check + engine status |
| GET | `/interfaces` | Available network interfaces |
| GET | `/mode` | Current mode (sender/responder) |
| GET/POST | `/flows` | List / create flows |
| GET/PUT/DELETE | `/flows/<id>` | Get / update / delete flow |
| POST | `/flows/<id>/start` | Start sending |
| POST | `/flows/<id>/stop` | Stop sending |
| GET | `/flows/<id>/stats` | Real-time stats for a flow |
| GET/POST | `/tests` | List / create RFC 2544 tests |
| GET | `/tests/<id>` | Test details + results |
| POST | `/tests/<id>/start` | Start test execution |
| POST | `/tests/<id>/stop` | Abort test |
| GET | `/tests/<id>/results` | Exportable results |
| GET | `/presets` | Available test presets |
| POST | `/presets/<name>` | Create flow + test from preset |
| GET | `/stats/history` | Stats ring buffer (300 samples) |
| GET | `/responder/stats` | Responder-mode receive stats |
| POST | `/responder/reset` | Reset responder counters |
### Integration with gNMI Telemetry
The key value of combining the traffic generator with gNMI: **send traffic while watching real-time interface counters**.
1. Create a UDP flow targeting a router (e.g., R9K-01 at 10.100.0.1)
2. Open the Grafana **Interface Utilization** dashboard, select that router
3. Start the flow — gNMI counters show traffic appearing on the interface
4. Run an RFC 2544 throughput test — Grafana shows the stepped traffic pattern from binary search iterations
5. Compare Scapy-reported stats with gNMI-reported counters for cross-validation
The **Combined BMP + Telemetry** dashboard shows both control-plane (BMP BGP updates) and data-plane (gNMI interface counters) side by side, enabling correlation of BGP changes with traffic impact.
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `TRAFFIC_GEN_API_PORT` | `5051` | Flask API listen port |
| `TRAFFIC_GEN_MODE` | `sender` | Operating mode: `sender` or `responder` |
| `INFLUXDB_TOKEN` | `openbmp-telemetry-token` | InfluxDB auth token (Telegraf) |