obmp-docker/DOCS.md
sam dcebf15bb3 Add Phase 4: gNMI streaming telemetry and traffic generator
- gNMI integration: NETCONF script to enable gRPC on all 9 routers,
  Telegraf container with gnmi input plugin, InfluxDB for time-series
  storage, 3 Grafana telemetry dashboards (utilization, errors, combined)
- Traffic generator: Scapy-based dual-mode container (sender/responder)
  with Flask API, RFC 2544 test suite (throughput, latency, frame-loss,
  back-to-back), Vue 3 web UI with flow builder, test runner, real-time
  stats monitor, and results export
- docker-compose.yml updated with influxdb, telegraf, traffic-gen,
  traffic-gen-ui services
- Full documentation in DOCS.md sections 15-16

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 15:29:44 -07:00

1069 lines
35 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# OpenBMP + ExaBGP Route Injector — Full Documentation
## Table of Contents
1. [What Is This Project?](#1-what-is-this-project)
2. [Architecture](#2-architecture)
3. [Prerequisites](#3-prerequisites)
4. [Initial Setup (First Time)](#4-initial-setup-first-time)
5. [IOS-XR Router Configuration](#5-ios-xr-router-configuration)
6. [Starting and Stopping](#6-starting-and-stopping)
7. [Route Injection User Guide](#7-route-injection-user-guide)
8. [ExaBGP Control Panel (Web UI)](#8-exabgp-control-panel-web-ui)
9. [Grafana Dashboards](#9-grafana-dashboards)
10. [Sanity Checks](#10-sanity-checks)
11. [Relevant Commands Reference](#11-relevant-commands-reference)
12. [Troubleshooting](#12-troubleshooting)
13. [Data Retention](#13-data-retention)
14. [Environment Variables Reference](#14-environment-variables-reference)
15. [gNMI Streaming Telemetry (Phase 4)](#15-gnmi-streaming-telemetry-phase-4)
16. [Traffic Generator (Phase 4)](#16-traffic-generator-phase-4)
---
## 1. What Is This Project?
This is a **BGP Monitoring Platform (BMP) lab stack** deployed via Docker Compose. It collects, stores, and visualizes BGP routing data from a Cisco IOS-XR lab network (running in Cisco Modeling Labs / CML).
**What it does:**
- Receives BMP (BGP Monitoring Protocol, RFC 7854) telemetry from routers on TCP port 5000
- Streams BMP data through Kafka into a TimescaleDB/PostgreSQL database
- Provides **30 Grafana dashboards** (17 operational + 6 learning + 4 advanced analytics + 3 streaming telemetry) for real-time and historical BGP analysis
- Includes an **ExaBGP route injector** that peers with the two CORE routers and injects synthetic BGP routes, enabling testing of BGP policy, route propagation, and Grafana dashboards without needing internet connectivity
- Provides a **Vue 3 web UI** at `:5001` for point-and-click scenario management, live route tables, and peer monitoring
**The lab network:**
- AS 65020 — 9 Cisco IOS-XR routers in CML (iBGP full mesh via route-reflectors)
- AS 65100 — ExaBGP container (eBGP peer to both CORE routers)
- CORE-01: `10.100.0.100` (CML-R9K-CORE-01)
- CORE-02: `10.100.0.200` (CML-R9K-CORE-02)
- Host IP: `10.40.40.202` (ExaBGP binds here; reachable from CML management network)
---
## 2. Architecture
```
IOS-XR Routers (9x, AS 65020)
BMP telemetry on TCP 5000
|
v
obmp-collector (openbmp/collector:2.2.3)
|
v
obmp-kafka (confluentinc/cp-kafka:7.1.1)
+ obmp-zookeeper (confluentinc/cp-zookeeper:7.1.1)
|
v
obmp-psql-app (openbmp/psql-app:2.2.2)
Java consumer — writes parsed BGP data to PostgreSQL
|
v
obmp-psql (openbmp/postgres:2.2.1)
PostgreSQL 14 + TimescaleDB
|
+---------> obmp-grafana (grafana/grafana:9.1.7) :3000
| 30 dashboards, PostgreSQL + InfluxDB datasources
+---------> obmp-whois (openbmp/whois:2.2.0) :4300
WHOIS query server backed by the DB
ExaBGP (obmp-exabgp, built locally)
python:3.11-slim + exabgp 5.x + Flask API
Peers eBGP to CORE-01 and CORE-02 (AS 65100 -> AS 65020)
HTTP API on :5050 — inject/withdraw routes on demand
Routes propagate via iBGP mesh to all 9 routers -> BMP -> DB -> Grafana
gNMI Streaming Telemetry (Phase 4):
IOS-XR Routers (gRPC :57400)
|
v
obmp-telegraf (telegraf:1.28 + gnmi plugin)
|
v
obmp-influxdb (influxdb:2.7) :8086
|
v
obmp-grafana (InfluxDB datasource -> Telemetry dashboards)
Traffic Generator (Phase 4):
obmp-traffic-gen (python:3.11 + Scapy + Flask) :5051
Dual-mode: sender (generate traffic) / responder (echo/log)
RFC 2544 testing, custom packet flows
obmp-traffic-gen-ui (Vue 3 + NGINX) :5002
```
### Container Summary
| Container | Image | Port(s) | Role |
|-----------|-------|---------|------|
| obmp-zookeeper | confluentinc/cp-zookeeper:7.1.1 | 2181 (internal) | Kafka coordination |
| obmp-kafka | confluentinc/cp-kafka:7.1.1 | 9092 | Message broker |
| obmp-collector | openbmp/collector:2.2.3 | 5000 | BMP receiver |
| obmp-psql-app | openbmp/psql-app:2.2.2 | 9005 | Kafka→PostgreSQL consumer |
| obmp-psql | openbmp/postgres:2.2.1 | 5432 | TimescaleDB storage |
| obmp-grafana | grafana/grafana:9.1.7 | 3000 | Visualization |
| obmp-whois | openbmp/whois:2.2.0 | 4300 | WHOIS query server |
| obmp-exabgp | local build | 5050 (host net) | BGP route injector |
| obmp-exabgp-ui | local build | 5001 (host net) | Route injector web UI |
| obmp-influxdb | influxdb:2.7 | 8086 | Time-series DB for telemetry |
| obmp-telegraf | local build | - (host net) | gNMI telemetry collector |
| obmp-traffic-gen | local build | 5051 (host net) | Scapy traffic generator |
| obmp-traffic-gen-ui | local build | 5002 (host net) | Traffic generator web UI |
---
## 3. Prerequisites
- Docker Engine (20.10+) and Docker Compose v2
- Host IP `10.40.40.202` reachable from the CML management network
- CML routers with BMP configured pointing to `10.40.40.202:5000`
- CML CORE routers configured with ExaBGP as eBGP neighbor (see Section 5)
- `OBMP_DATA_ROOT` directory created (default: `/var/openbmp`)
---
## 4. Initial Setup (First Time)
### 4.1 Clone the repository
```bash
git clone <this-repo-url>
cd obmp-docker
```
### 4.2 Create persistent data directories
```bash
export OBMP_DATA_ROOT=/var/openbmp
sudo mkdir -p $OBMP_DATA_ROOT
mkdir -p ${OBMP_DATA_ROOT}/config
mkdir -p ${OBMP_DATA_ROOT}/kafka-data
mkdir -p ${OBMP_DATA_ROOT}/zk-data
mkdir -p ${OBMP_DATA_ROOT}/zk-log
mkdir -p ${OBMP_DATA_ROOT}/postgres/data
mkdir -p ${OBMP_DATA_ROOT}/postgres/ts
mkdir -p ${OBMP_DATA_ROOT}/grafana
mkdir -p ${OBMP_DATA_ROOT}/grafana/dashboards
sudo chmod -R 777 $OBMP_DATA_ROOT
```
### 4.3 Initialise the database (first run only)
Create the init trigger file — this causes psql-app to create all tables on startup:
```bash
touch ${OBMP_DATA_ROOT}/config/init_db
```
> **Warning:** Do not create this file on subsequent runs unless you want to wipe and recreate the entire database.
### 4.4 Copy Grafana provisioning files
```bash
cp -r obmp-grafana/provisioning ${OBMP_DATA_ROOT}/grafana/
cp -r obmp-grafana/dashboards ${OBMP_DATA_ROOT}/grafana/
```
### 4.5 Start the stack
```bash
OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d
```
Wait ~2 minutes for all services to initialise (especially PostgreSQL and psql-app which run schema migrations).
### 4.6 Verify everything is running
```bash
docker compose -p obmp ps
docker compose -p obmp logs --tail=20 psql-app
```
---
## 5. IOS-XR Router Configuration
The ExaBGP container peers eBGP with both CORE routers. Each CORE router must be configured with:
### 5.1 Route policies (apply once per router)
```
route-policy EXABGP_IN
pass
end-policy
route-policy EXABGP_OUT
drop
end-policy
```
### 5.2 BGP neighbor block
```
router bgp 65020
neighbor 10.40.40.202
remote-as 65100
description ExaBGP-Route-Injector
ebgp-multihop 5
update-source MgmtEth0/RP0/CPU0/0
!
address-family ipv4 unicast
route-policy EXABGP_IN in
route-policy EXABGP_OUT out
next-hop-self
!
!
!
```
### 5.3 Static route for next-hop resolution
IOS-XR BGP does not use the default route (0.0.0.0/0) to resolve BGP next-hops. A more-specific static route for the ExaBGP host subnet is required in the default VRF:
```
router static
address-family ipv4 unicast
10.40.40.0/24 10.100.0.254
!
!
```
### 5.4 Config notes
| Knob | Why |
|------|-----|
| `remote-as 65100` | ExaBGP presents as AS 65100 (eBGP to your AS 65020 mesh) |
| `ebgp-multihop 5` | Host and router are on different subnets |
| `update-source MgmtEth0/RP0/CPU0/0` | ExaBGP is reachable via the management interface |
| `next-hop-self` | Replace ExaBGP's next-hop (10.40.40.202) with the CORE router's address when reflecting into iBGP — ensures all routers can resolve the next-hop |
| `EXABGP_OUT` drops | Prevents the lab from advertising its own prefixes back to ExaBGP |
| Static route | Required: IOS-XR BGP will not install injected routes as bestpaths without a specific route to the next-hop |
### 5.5 NETCONF alternative
See `exabgp/iosxr_bgp_config.md` for a Python/ncclient script that pushes all of the above config programmatically.
Credentials: `username=webui`, `password=cisco`, port 830.
---
## 6. Starting and Stopping
### Start all services
```bash
OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d
```
### Stop all services (preserve data)
```bash
docker compose -p obmp down
```
### Stop and remove all data (full reset)
```bash
docker compose -p obmp down -v
sudo rm -rf /var/openbmp
```
### Rebuild the ExaBGP container (after code changes)
```bash
docker compose -p obmp build exabgp
docker compose -p obmp up -d exabgp
```
### Restart a single service
```bash
docker compose -p obmp restart <service>
# e.g.:
docker compose -p obmp restart exabgp
docker compose -p obmp restart psql-app
```
---
## 7. Route Injection User Guide
The ExaBGP container exposes a Flask REST API on port 5050 (host network). The `inject.py` CLI wraps this API.
### 7.1 Setup
```bash
cd exabgp
pip install requests # only needed if running inject.py from the host
```
### 7.2 Check status
```bash
python3 inject.py status
```
Output shows API health, active route count, and peer states:
```json
{
"status": "ok",
"active_routes": 77,
"peers": {
"10.100.0.100": {"state": "up", "updated": "2026-03-05T10:00:00Z"},
"10.100.0.200": {"state": "up", "updated": "2026-03-05T10:00:00Z"}
}
}
```
### 7.3 List available scenarios
```bash
python3 inject.py scenarios
```
| Scenario | Routes | Description |
|----------|--------|-------------|
| `internet_sample` | ~94 | Partial internet table — real public prefixes, realistic AS paths (Cloudflare, Google, AWS, Azure, etc.) |
| `churn` | 30 | RFC documentation prefixes for announce/withdraw churn testing |
| `blackhole` | 5 | /32 prefixes with RTBH community (65100:666 + 65535:666) |
| `anycast` | 3 | Same prefixes with varying AS paths and MEDs (best-path testing) |
| `full_table` | 500+ | Large partial internet table with synthetic /24s |
| `lab_prefixes` | 8 | Enterprise/SP-style routes with communities and local-pref |
| `convergence_test` | 10 | Prefixes for timing BGP convergence — announce then check ip_rib_log timestamps |
| `route_leak` | 10 | Real prefixes re-announced with short AS paths — simulates a route leak (community 65100:999) |
| `hijack_simulation` | 10 | Prefixes claimed directly by AS 65100 — simulates a prefix hijack (community 65100:hijack) |
| `te_community_steering` | 15 | Routes tagged with TE communities for color-based steering (65020:100=red, 65020:200=blue, 65020:300=green) |
| `origin_shift` | 5 | Prefixes with changed origin AS — simulates origin migration for anomaly detection |
| `path_diversity` | 10 | Same prefixes with different AS paths/MEDs — demonstrates best-path selection |
### 7.4 Load a scenario
```bash
python3 inject.py scenario internet_sample
```
Routes propagate: ExaBGP → CORE-01/CORE-02 (eBGP) → all 9 routers (iBGP) → BMP → Kafka → PostgreSQL → Grafana.
### 7.5 Withdraw a scenario
```bash
python3 inject.py withdraw-scenario internet_sample
```
### 7.6 Announce individual prefixes
```bash
python3 inject.py announce 10.0.0.0/8 \
--as-path 65100 3356 15169 \
--community 65100:100 \
--med 100
```
### 7.7 Withdraw individual prefixes
```bash
python3 inject.py withdraw 10.0.0.0/8
```
### 7.8 Withdraw everything
```bash
python3 inject.py withdraw-all
```
### 7.9 Generate route churn (populate history tables)
The `churn` command cycles the churn scenario repeatedly, generating `ip_rib_log` and `stats_chg_*` entries that power Grafana's history dashboards.
```bash
# 5 cycles, 30 seconds apart
python3 inject.py churn --count 5 --interval 30
# Run indefinitely until Ctrl+C
python3 inject.py churn
```
### 7.10 REST API directly (curl)
```bash
BASE=http://localhost:5050
# Health
curl $BASE/healthz
# List scenarios
curl $BASE/scenarios
# Load scenario
curl -X POST $BASE/scenario/internet_sample
# Announce custom prefix
curl -X POST $BASE/announce \
-H 'Content-Type: application/json' \
-d '{"prefixes":["10.0.0.0/8"],"as_path":[65100,3356,15169],"communities":["65100:100"]}'
# Withdraw all
curl -X POST $BASE/withdraw/all
# Peer state
curl $BASE/peers
```
### 7.11 Adding custom scenarios
Edit `exabgp/scenarios/__init__.py`. Add an entry to `SCENARIOS` following the existing pattern:
```python
SCENARIOS['my_scenario'] = {
'description': 'My custom routes',
'routes': [
_r('192.0.2.0/24', [65100, 65200], communities=['65100:100']),
],
}
```
The `scenarios/` directory is volume-mounted into the container, so changes are live without rebuilding. However, the Python module is imported at container start — **restart the container** after editing:
```bash
docker compose -p obmp restart exabgp
```
---
## 8. ExaBGP Control Panel (Web UI)
Access: `http://10.40.40.202:5001`
A Vue 3 single-page app served by NGINX that proxies `/api/` to the ExaBGP Flask API on port 5050. No login required.
### Layout
```
┌─────────────────────────────────────────────────────────────┐
│ OpenBMP Route Injector [API OK] [77 routes] [2/2 UP] │
├──────────────────────┬──────────────────────────────────────┤
│ SCENARIOS │ [Routes] [Inject] [Peers] tabs │
│ │ │
│ [internet_sample] │ Routes tab: searchable/paginated │
│ [LOAD] [UNLOAD] │ table with per-row Withdraw button │
│ │ │
│ [churn] │ Inject tab: manual prefix form │
│ [LOAD] [START CHURN]│ (prefix, AS path, communities, MED) │
│ │ │
│ [blackhole] ... │ Peers tab: per-peer UP/DOWN cards │
├──────────────────────┴──────────────────────────────────────┤
│ Refreshing every 5s │
└─────────────────────────────────────────────────────────────┘
```
### Features
- **Live status bar** — API health, active route count, peer UP/DOWN badges; auto-refreshes every 5 seconds
- **Scenario panel** — Load/Unload buttons for all 9 scenarios with loading states and feedback
- **Churn control** — Start/stop churn cycles with configurable count and interval sliders directly in the browser
- **Route table** — Searchable, paginated (20/page) table of active routes; per-row Withdraw button; Withdraw All
- **Manual inject form** — Announce any prefix with custom AS path, communities, MED, local-pref
- **Peer cards** — Per-peer state display with UP (green) / DOWN (red pulsing) indicators
### Rebuild after code changes
```bash
docker compose -p obmp build exabgp-ui
docker compose -p obmp up -d exabgp-ui
```
---
## 9. Grafana Dashboards
Access: `http://10.40.40.202:3000`
Default credentials: `admin` / `openbmp` (anonymous access also enabled)
### Dashboard Categories
| Category | Dashboard | Description |
|----------|-----------|-------------|
| General | OBMP Home | Overview / landing page |
| Base | Inventory | Router and peer inventory |
| Base | Looking Glass | Real-time RIB lookup by prefix |
| Base | ASN View | ASN-level routing view |
| History | Prefix History | Route change history for a prefix |
| History | Prefix History by ASN | Filtered by origin AS |
| History | Prefix History by Community | Filtered by BGP community |
| Tops | Top Prefixes | Most-updated prefixes |
| Tops | Top L3VPN Prefixes | L3VPN equivalent |
| Link State | LS Nodes | IS-IS link-state node database |
| Link State | LS Links | IS-IS link-state link database |
| Link State | LS Topology | Network topology map |
| Link State | LS Prefixes | Link-state prefix database |
| Link State | LS History | Link-state change history |
| L3VPN | L3VPN Looking Glass | VPN RIB lookup |
| L3VPN | L3VPN Prefix History | VPN route change history |
| L3VPN | L3VPN RIB Browser | Full VPN RIB browser |
> History dashboards require `ip_rib_log` and `stats_chg_*` table data. Run `inject.py churn` to populate these.
### OBMP-Learning Dashboards (folder: `OBMP-Learning`)
Six learning-focused dashboards in a separate folder, designed to teach BGP concepts using live lab data.
| Dashboard | UID | What it teaches |
|-----------|-----|-----------------|
| BGP Update Rate & Churn | `obmp-learn-01` | Network stability — advertisements vs withdrawals over time from `ip_rib_log`; per-peer update counts |
| Peer Session Health & Flap Analysis | `obmp-learn-02` | BGP session stability — state timeline, flap count, uptime %, last reset reason |
| AS Path Analysis | `obmp-learn-03` | Internet topology — path length distribution, longest paths, top origin ASNs, transit frequency |
| RPKI Validation Status | `obmp-learn-04` | BGP security — Valid / Invalid / NotFound breakdown; invalid routes (potential hijacks) table |
| Route Churn & Stability Score | `obmp-learn-05` | Prefix stability — tiered churn score (Very Stable / Stable / Moderate / Unstable) per prefix |
| BGP Attribute Explorer | `obmp-learn-06` | BGP path attributes — community list distribution, MED values, local-pref spread per peer |
> **RPKI note:** The `rpki_validator` table is populated by a cron job in `psql-app` every 2 hours. Dashboard `obmp-learn-04` will show zero counts until the cron runs — check `ENABLE_RPKI=1` in `docker-compose.yml`.
### Advanced Analytics Dashboards (folder: `OBMP-Learning`)
Four advanced dashboards that go beyond basic BMP monitoring, unlocking TE/SR data and providing heuristic analysis.
| Dashboard | UID | What it provides |
|-----------|-----|-----------------|
| Database Schema Map | `obmp-learn-07` | Interactive schema reference — live table row counts, entity relationships, column details for all 33 tables and 11 views |
| TE & Segment Routing Analytics | `obmp-learn-08` | Exposes TE/SR fields from BGP-LS: link bandwidth, admin groups, SRLG, SR SIDs, adjacency SIDs, protection types |
| Topology Change & Anomaly Detection | `obmp-learn-09` | Heuristic analysis: link state changes over time, origin AS hijack detection, convergence timeline, route consistency |
| Link Utilization & TE Thought Experiment | `obmp-learn-10` | BGP-LS capacity data (bandwidth, TE metrics) + integration guide for streaming telemetry (gNMI/MDT) |
> **TE/SR data note:** Some TE fields (admin_group, max_link_bw, srlg, sr_adjacency_sids) may be NULL if routers don't advertise those TLVs. Enable `mpls traffic-eng` under IS-IS and `segment-routing mpls` for full data.
### Database Schema Reference
A standalone database schema reference is also available at `DB_SCHEMA.md` in the repo root. It documents all 33 tables, 11 views, TE/SR columns, enum types, and common query patterns.
---
## 10. Sanity Checks
### 9.1 All containers running
```bash
docker compose -p obmp ps
```
All containers should show `running`. If any are restarting, check logs:
```bash
docker compose -p obmp logs --tail=50 <service>
```
### 9.2 ExaBGP peers up
```bash
python3 exabgp/inject.py status
```
Both `10.100.0.100` and `10.100.0.200` should show `"state": "up"`.
Or check from the router side:
```
show bgp neighbors 10.40.40.202
show bgp summary | inc 10.40.40.202
```
### 9.3 Routes accepted by CORE routers
After loading `internet_sample`:
```bash
# On CORE-01 or CORE-02:
show bgp summary
# Expect: 77 accepted prefixes, 77 are bestpaths from 10.40.40.202
show bgp 8.8.8.0/24
# Expect: best path via 10.40.40.202 (eBGP), also iBGP copies from other routers
```
### 9.4 Routes in OpenBMP database
```bash
docker exec -it obmp-psql psql -U openbmp -c "
SELECT count(DISTINCT prefix) AS unique_prefixes,
count(DISTINCT peer_hash_id) AS peers_reporting
FROM ip_rib
WHERE isIPv4 = true AND isWithdrawn = false;
"
```
Expect `~129 unique prefixes` and `56 peers_reporting` (9 routers × ~6 peers each) after loading `internet_sample`.
### 9.5 Kafka is healthy
```bash
docker exec -it obmp-kafka kafka-topics --bootstrap-server localhost:29092 --list
```
Should show topics like `openbmp.parsed.unicast_prefix`, `openbmp.parsed.peer`, etc.
### 9.6 Grafana datasource
Open `http://10.40.40.202:3000` → Configuration → Data Sources → OpenBMP → Test.
Should return "Database Connection OK".
### 9.7 BMP collector receiving data
```bash
docker compose -p obmp logs --tail=30 collector
```
Should show connections from router management IPs.
### 9.8 psql-app consumer is caught up
```bash
docker compose -p obmp logs --tail=30 psql-app
```
Should show periodic cron job outputs (RPKI sync, IRR sync, global_ip_rib updates).
---
## 11. Relevant Commands Reference
### Docker Compose
```bash
# Start stack
OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d
# Stop stack
docker compose -p obmp down
# Show status
docker compose -p obmp ps
# Follow logs (all services)
docker compose -p obmp logs -f
# Follow logs (specific service)
docker compose -p obmp logs -f exabgp
docker compose -p obmp logs -f psql-app
docker compose -p obmp logs -f collector
# Rebuild and restart ExaBGP
docker compose -p obmp build exabgp && docker compose -p obmp up -d exabgp
# Restart a service
docker compose -p obmp restart psql-app
```
### Route Injection (from `exabgp/` directory)
```bash
# API health and peer states
python3 inject.py status
# List active routes
python3 inject.py routes
# List scenarios
python3 inject.py scenarios
# Load a scenario
python3 inject.py scenario internet_sample
python3 inject.py scenario churn
python3 inject.py scenario blackhole
python3 inject.py scenario full_table
python3 inject.py scenario lab_prefixes
# Withdraw a scenario
python3 inject.py withdraw-scenario internet_sample
# Withdraw all active routes
python3 inject.py withdraw-all
# Announce a specific prefix
python3 inject.py announce 10.0.0.0/8 --as-path 65100 3356 15169 --community 65100:100
# Withdraw a specific prefix
python3 inject.py withdraw 10.0.0.0/8
# Run churn (populate history tables)
python3 inject.py churn --count 5 --interval 30
```
### Database Queries
```bash
# Connect to database
docker exec -it obmp-psql psql -U openbmp -d openbmp
# Count unique prefixes in RIB
SELECT count(DISTINCT prefix) FROM ip_rib WHERE isIPv4=true AND isWithdrawn=false;
# Show recent route changes
SELECT prefix, origin_as, iswithdrawn, timestamp
FROM ip_rib_log
ORDER BY timestamp DESC LIMIT 20;
# Show peer summary
SELECT name, state, timestamp_last_updated
FROM bgp_peers
ORDER BY state, name;
# Show routes from ExaBGP peer
SELECT prefix, origin_as, as_path
FROM ip_rib
WHERE peer_hash_id IN (
SELECT hash_id FROM bgp_peers WHERE peer_addr = '10.40.40.202'
)
AND isWithdrawn = false;
```
### IOS-XR Verification (on router CLI)
```
show bgp neighbors 10.40.40.202
show bgp neighbors 10.40.40.202 received routes
show bgp summary
show bgp 8.8.8.0/24
show bgp 1.1.1.0/24
show route 8.8.8.0/24
```
---
## 12. Troubleshooting
### ExaBGP container keeps restarting
Check logs:
```bash
docker compose -p obmp logs --tail=50 exabgp
```
Common causes and fixes:
| Symptom | Cause | Fix |
|---------|-------|-----|
| Exits after "welcome" banner | Missing or wrong env file path | `startup.sh` generates `/usr/local/etc/exabgp/exabgp.env` — verify this path exists in container |
| Process `api` killed 5 times | Wrong Python path in conf | Conf uses `/usr/local/bin/python3` — correct for python:3.11-slim |
| `drop = true` in env | ExaBGP drops privileges to nobody, can't bind 179 | `startup.sh` patches `drop = false` — check the sed lines ran |
| `__pycache__ Permission denied` during build | Root-owned cache from previous container run | `.dockerignore` excludes `**/__pycache__` — confirm file exists |
### BGP sessions not establishing
1. Verify host IP `10.40.40.202` is reachable from CML management network: `ping 10.40.40.202` from router
2. Check ExaBGP peer state: `python3 exabgp/inject.py status`
3. On router: `show bgp neighbors 10.40.40.202` — look for error codes
4. Common IOS-XR errors:
- `no-update-source-config` — add `update-source MgmtEth0/RP0/CPU0/0`
- `no-ipv6-address` — ensure only IPv4 unicast AF is configured (no IPv6)
- TCP refused — check port 179 is reachable (ExaBGP uses `network_mode: host`)
### Routes received but not bestpath
IOS-XR BGP requires a specific route to resolve the BGP next-hop (10.40.40.202). The default route (0.0.0.0/0) is insufficient.
```
router static
address-family ipv4 unicast
10.40.40.0/24 10.100.0.254
```
Verify: `show bgp 1.1.1.0/24` — should show `Status: s (active), bestpath`.
### Grafana shows no data
1. Check datasource: Configuration → Data Sources → OpenBMP → Test
2. Verify psql-app is writing: `docker compose -p obmp logs psql-app`
3. Check the database directly (see database queries above)
4. History dashboards need route churn — run `python3 inject.py churn`
### Kafka not starting
Zookeeper must be healthy first. Check:
```bash
docker compose -p obmp logs zookeeper
docker compose -p obmp restart kafka
```
### psql-app fails to start
Usually a PostgreSQL connection issue or schema mismatch. Check:
```bash
docker compose -p obmp logs psql-app
# If "relation does not exist" errors: re-trigger DB init
touch /var/openbmp/config/init_db
docker compose -p obmp restart psql-app
```
---
## 13. Data Retention
Configured in `docker-compose.yml` via `POSTGRES_DROP_*` environment variables:
| Table | Default Retention |
|-------|-------------------|
| peer_event_log | 1 year |
| stat_reports | 4 weeks |
| ip_rib_log | 4 weeks |
| alerts | 4 weeks |
| ls_nodes_log | 4 months |
| ls_links_log | 4 months |
| ls_prefixes_log | 4 months |
| stats_chg_byprefix | 4 weeks |
| stats_chg_byasn | 4 weeks |
| stats_chg_bypeer | 4 weeks |
| stats_ip_origins | 4 weeks |
| stats_peer_rib | 4 weeks |
| stats_peer_update_counts | 4 weeks |
Adjust in `docker-compose.yml` under the `psql-app` service environment block.
---
## 14. Environment Variables Reference
### ExaBGP container
| Variable | Default | Description |
|----------|---------|-------------|
| `EXABGP_LOCAL_IP` | `10.40.40.202` | Host IP ExaBGP binds to and uses as router-id |
| `EXABGP_LOCAL_AS` | `65100` | ExaBGP's AS number |
| `EXABGP_PEER_AS` | `65020` | AS of the IOS-XR lab |
| `EXABGP_PEER_1` | `10.100.0.100` | First CORE router to peer with |
| `EXABGP_PEER_2` | `10.100.0.200` | Second CORE router to peer with |
| `EXABGP_API_PORT` | `5050` | Flask API port |
### psql-app container (key variables)
| Variable | Default | Description |
|----------|---------|-------------|
| `MEM` | `3` | JVM heap in GB |
| `ENABLE_RPKI` | `1` | Enable RPKI sync from Cloudflare |
| `ENABLE_IRR` | `1` | Enable IRR sync |
| `ENABLE_DBIP` | `1` | Enable DB-IP geolocation import |
| `POSTGRES_REPORT_WINDOW` | `8 minute` | Aggregation window for summary tables |
### inject.py (CLI)
| Variable | Default | Description |
|----------|---------|-------------|
| `EXABGP_API` | `http://localhost:5050` | ExaBGP API base URL |
---
## 15. gNMI Streaming Telemetry (Phase 4)
### Overview
gNMI (gRPC Network Management Interface) adds **data-plane visibility** alongside BMP's control-plane monitoring. Telegraf collects real-time interface counters from all 9 IOS-XR routers via gNMI subscriptions and stores them in InfluxDB. Grafana queries InfluxDB for telemetry dashboards.
### Architecture
```
IOS-XR Routers (9x, gRPC port 57400)
|
gNMI subscriptions (10s sample)
|
v
obmp-telegraf (telegraf:1.28 + gnmi input plugin)
host networking → reaches routers on 10.100.0.x
|
v
obmp-influxdb (influxdb:2.7, port 8086)
bucket: "telemetry", org: "openbmp"
|
v
obmp-grafana (InfluxDB datasource, Flux queries)
3 dashboards in OBMP-Telemetry folder
```
### Enabling gRPC on Routers
The routers need gRPC enabled before Telegraf can collect telemetry. A NETCONF script is provided:
```bash
# From the host (requires ncclient: pip install ncclient)
cd /home/user/obmp-docker/gnmi
python3 gnmi_grpc_config.py
```
This connects to all 9 routers via NETCONF (port 830, credentials webui/cisco) and pushes:
```
grpc
port 57400
no-tls
```
**Verify on router:**
```
show grpc status
```
Expected: gRPC listening on port 57400.
### Telemetry Data Collected
Telegraf subscribes to two IOS-XR YANG paths at 10-second intervals:
| Subscription | YANG Path | Data |
|-------------|-----------|------|
| interface_counters | `Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters` | bytes/packets in/out, errors, drops, CRC |
| interface_rates | `Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/data-rate` | bits/sec in/out, packet rate |
### InfluxDB Access
- **URL:** `http://localhost:8086`
- **Org:** `openbmp`
- **Bucket:** `telemetry`
- **Token:** `openbmp-telemetry-token`
- **Retention:** 30 days
### Grafana Telemetry Dashboards
Three dashboards in the **OBMP-Telemetry** folder:
| Dashboard | UID | Description |
|-----------|-----|-------------|
| Interface Utilization | obmp-telem-01 | Input/output bytes rate, packets rate, top interfaces by throughput |
| Interface Errors | obmp-telem-02 | CRC errors, input/output errors, drops, overruns |
| Combined BMP + Telemetry | obmp-telem-03 | Mixed datasource — BGP peer status (PostgreSQL) alongside interface counters (InfluxDB) |
All dashboards have `$router` and `$interface` template variables for filtering.
### Troubleshooting gNMI
```bash
# Check Telegraf logs for gNMI connection status
docker logs obmp-telegraf --tail 50
# Verify InfluxDB has data
curl -s -H "Authorization: Token openbmp-telemetry-token" \
"http://localhost:8086/api/v2/query?org=openbmp" \
--data-urlencode 'q=from(bucket:"telemetry") |> range(start: -5m) |> limit(n:5)'
# Check InfluxDB health
curl http://localhost:8086/health
```
---
## 16. Traffic Generator (Phase 4)
### Overview
A portable, containerized traffic generator with a web UI for RFC 2544 testing and custom packet flows. Built with Scapy + Flask (backend) and Vue 3 + NGINX (frontend). The container supports **dual-mode operation**: sender (generate traffic) or responder (receive/echo packets).
### Accessing the UI
- **Web UI:** `http://localhost:5002`
- **API:** `http://localhost:5051`
### Dual-Mode Operation
Set via `TRAFFIC_GEN_MODE` environment variable in `docker-compose.yml`:
| Mode | Description |
|------|-------------|
| `sender` (default) | Generates traffic, runs RFC 2544 tests, sends custom flows |
| `responder` | Listens for incoming test packets, echoes/timestamps them, reports receive stats |
**Typical deployment:** One instance as `sender` on the host, optionally a second instance as `responder` on another endpoint. Without a responder, the sender uses ICMP echo for latency measurement (routers respond natively).
### Creating Flows
Use the **Flow Builder** panel (left sidebar) in the UI:
| Field | Default | Description |
|-------|---------|-------------|
| Name | - | Human-readable flow name |
| Destination IP | `10.100.0.100` | Target router IP |
| Source IP | `10.40.40.202` | Host IP |
| Protocol | UDP | UDP, TCP, or ICMP |
| Source Port | 50000 | (UDP/TCP only) |
| Destination Port | 5001 | (UDP/TCP only) |
| Frame Size | 512 | Packet size in bytes |
| Rate (pps) | 1000 | Packets per second |
| Duration | 30 | Seconds (0 = infinite) |
| DSCP | 0 | Differentiated Services Code Point |
After creating a flow, use the **Flows** tab to Start/Stop/Delete flows.
### RFC 2544 Testing
Use the **Tests** tab to configure and run RFC 2544 tests:
| Test Type | Description |
|-----------|-------------|
| **Throughput** | Binary search for maximum zero-loss forwarding rate |
| **Latency** | Measure round-trip time at determined throughput rate |
| **Frame Loss** | Loss percentage vs. offered load curve |
| **Back-to-Back** | Maximum burst length at line rate with zero loss |
**Parameters:**
- **Base Flow:** Select a previously created flow as the test template
- **Frame Sizes:** Standard sizes: 64, 128, 256, 512, 1024, 1280, 1518 bytes
- **Trial Duration:** Per-frame-size test duration (5300 sec)
- **Max Rate (pps):** Upper bound for binary search
- **Acceptable Loss %:** Threshold for pass/fail
### Quick Presets
Six built-in presets are available in the **Tests** tab:
| Preset | Description |
|--------|-------------|
| quick_icmp | ICMP ping to CORE-01 at 10 pps |
| udp_flood_small | 64-byte UDP at 5000 pps |
| udp_flood_large | 1518-byte UDP at 1000 pps |
| rfc2544_throughput | Full throughput test with standard frame sizes |
| rfc2544_latency | Latency measurement with standard frame sizes |
| tcp_session | TCP flow at 500 pps |
### API Reference
| Method | Path | Description |
|--------|------|-------------|
| GET | `/healthz` | Health check + engine status |
| GET | `/interfaces` | Available network interfaces |
| GET | `/mode` | Current mode (sender/responder) |
| GET/POST | `/flows` | List / create flows |
| GET/PUT/DELETE | `/flows/<id>` | Get / update / delete flow |
| POST | `/flows/<id>/start` | Start sending |
| POST | `/flows/<id>/stop` | Stop sending |
| GET | `/flows/<id>/stats` | Real-time stats for a flow |
| GET/POST | `/tests` | List / create RFC 2544 tests |
| GET | `/tests/<id>` | Test details + results |
| POST | `/tests/<id>/start` | Start test execution |
| POST | `/tests/<id>/stop` | Abort test |
| GET | `/tests/<id>/results` | Exportable results |
| GET | `/presets` | Available test presets |
| POST | `/presets/<name>` | Create flow + test from preset |
| GET | `/stats/history` | Stats ring buffer (300 samples) |
| GET | `/responder/stats` | Responder-mode receive stats |
| POST | `/responder/reset` | Reset responder counters |
### Integration with gNMI Telemetry
The key value of combining the traffic generator with gNMI: **send traffic while watching real-time interface counters**.
1. Create a UDP flow targeting a router (e.g., R9K-01 at 10.100.0.1)
2. Open the Grafana **Interface Utilization** dashboard, select that router
3. Start the flow — gNMI counters show traffic appearing on the interface
4. Run an RFC 2544 throughput test — Grafana shows the stepped traffic pattern from binary search iterations
5. Compare Scapy-reported stats with gNMI-reported counters for cross-validation
The **Combined BMP + Telemetry** dashboard shows both control-plane (BMP BGP updates) and data-plane (gNMI interface counters) side by side, enabling correlation of BGP changes with traffic impact.
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `TRAFFIC_GEN_API_PORT` | `5051` | Flask API listen port |
| `TRAFFIC_GEN_MODE` | `sender` | Operating mode: `sender` or `responder` |
| `INFLUXDB_TOKEN` | `openbmp-telemetry-token` | InfluxDB auth token (Telegraf) |