obmp-docker/DOCS.md
sam 233dadbb41 Add ExaBGP route injector, Grafana dashboards, and full documentation
- Add exabgp/ container: ExaBGP 5.x + Flask REST API for on-demand BGP
  route injection into CML IOS-XR lab (AS 65020 via eBGP from AS 65100)
- Add 6 injection scenarios: internet_sample, churn, blackhole, anycast,
  full_table, lab_prefixes
- Add inject.py CLI wrapper for the ExaBGP API
- Add iosxr_bgp_config.md with IOS-XR neighbor config and NETCONF script
- Add obmp-grafana/ dashboards and provisioning (17 dashboards)
- Update docker-compose.yml: add exabgp service, fix Kafka external
  listener IP, extend log retention from 90min to 720min
- Add DOCS.md: full project documentation including architecture, setup,
  user guide, sanity checks, troubleshooting, and command reference
- Update .gitignore: exclude .env and .claude/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 14:46:37 -07:00

20 KiB
Raw Blame History

OpenBMP + ExaBGP Route Injector — Full Documentation

Table of Contents

  1. What Is This Project?
  2. Architecture
  3. Prerequisites
  4. Initial Setup (First Time)
  5. IOS-XR Router Configuration
  6. Starting and Stopping
  7. Route Injection User Guide
  8. Grafana Dashboards
  9. Sanity Checks
  10. Relevant Commands Reference
  11. Troubleshooting
  12. Data Retention
  13. Environment Variables Reference

1. What Is This Project?

This is a BGP Monitoring Platform (BMP) lab stack deployed via Docker Compose. It collects, stores, and visualizes BGP routing data from a Cisco IOS-XR lab network (running in Cisco Modeling Labs / CML).

What it does:

  • Receives BMP (BGP Monitoring Protocol, RFC 7854) telemetry from routers on TCP port 5000
  • Streams BMP data through Kafka into a TimescaleDB/PostgreSQL database
  • Provides 17 Grafana dashboards for real-time and historical BGP analysis
  • Includes an ExaBGP route injector that peers with the two CORE routers and injects synthetic BGP routes, enabling testing of BGP policy, route propagation, and Grafana dashboards without needing internet connectivity

The lab network:

  • AS 65020 — 9 Cisco IOS-XR routers in CML (iBGP full mesh via route-reflectors)
  • AS 65100 — ExaBGP container (eBGP peer to both CORE routers)
  • CORE-01: 10.100.0.100 (CML-R9K-CORE-01)
  • CORE-02: 10.100.0.200 (CML-R9K-CORE-02)
  • Host IP: 10.40.40.202 (ExaBGP binds here; reachable from CML management network)

2. Architecture

IOS-XR Routers (9x, AS 65020)
  BMP telemetry on TCP 5000
         |
         v
  obmp-collector (openbmp/collector:2.2.3)
         |
         v
  obmp-kafka (confluentinc/cp-kafka:7.1.1)
    + obmp-zookeeper (confluentinc/cp-zookeeper:7.1.1)
         |
         v
  obmp-psql-app (openbmp/psql-app:2.2.2)
    Java consumer — writes parsed BGP data to PostgreSQL
         |
         v
  obmp-psql (openbmp/postgres:2.2.1)
    PostgreSQL 14 + TimescaleDB
         |
         +---------> obmp-grafana (grafana/grafana:9.1.7)  :3000
         |              17 dashboards, PostgreSQL datasource
         +---------> obmp-whois (openbmp/whois:2.2.0)      :4300
                       WHOIS query server backed by the DB

ExaBGP (obmp-exabgp, built locally)
  python:3.11-slim + exabgp 5.x + Flask API
  Peers eBGP to CORE-01 and CORE-02 (AS 65100 -> AS 65020)
  HTTP API on :5050 — inject/withdraw routes on demand
  Routes propagate via iBGP mesh to all 9 routers -> BMP -> DB -> Grafana

Container Summary

Container Image Port(s) Role
obmp-zookeeper confluentinc/cp-zookeeper:7.1.1 2181 (internal) Kafka coordination
obmp-kafka confluentinc/cp-kafka:7.1.1 9092 Message broker
obmp-collector openbmp/collector:2.2.3 5000 BMP receiver
obmp-psql-app openbmp/psql-app:2.2.2 9005 Kafka→PostgreSQL consumer
obmp-psql openbmp/postgres:2.2.1 5432 TimescaleDB storage
obmp-grafana grafana/grafana:9.1.7 3000 Visualization
obmp-whois openbmp/whois:2.2.0 4300 WHOIS query server
obmp-exabgp local build 5050 (host net) BGP route injector

3. Prerequisites

  • Docker Engine (20.10+) and Docker Compose v2
  • Host IP 10.40.40.202 reachable from the CML management network
  • CML routers with BMP configured pointing to 10.40.40.202:5000
  • CML CORE routers configured with ExaBGP as eBGP neighbor (see Section 5)
  • OBMP_DATA_ROOT directory created (default: /var/openbmp)

4. Initial Setup (First Time)

4.1 Clone the repository

git clone <this-repo-url>
cd obmp-docker

4.2 Create persistent data directories

export OBMP_DATA_ROOT=/var/openbmp
sudo mkdir -p $OBMP_DATA_ROOT
mkdir -p ${OBMP_DATA_ROOT}/config
mkdir -p ${OBMP_DATA_ROOT}/kafka-data
mkdir -p ${OBMP_DATA_ROOT}/zk-data
mkdir -p ${OBMP_DATA_ROOT}/zk-log
mkdir -p ${OBMP_DATA_ROOT}/postgres/data
mkdir -p ${OBMP_DATA_ROOT}/postgres/ts
mkdir -p ${OBMP_DATA_ROOT}/grafana
mkdir -p ${OBMP_DATA_ROOT}/grafana/dashboards
sudo chmod -R 777 $OBMP_DATA_ROOT

4.3 Initialise the database (first run only)

Create the init trigger file — this causes psql-app to create all tables on startup:

touch ${OBMP_DATA_ROOT}/config/init_db

Warning: Do not create this file on subsequent runs unless you want to wipe and recreate the entire database.

4.4 Copy Grafana provisioning files

cp -r obmp-grafana/provisioning ${OBMP_DATA_ROOT}/grafana/
cp -r obmp-grafana/dashboards   ${OBMP_DATA_ROOT}/grafana/

4.5 Start the stack

OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d

Wait ~2 minutes for all services to initialise (especially PostgreSQL and psql-app which run schema migrations).

4.6 Verify everything is running

docker compose -p obmp ps
docker compose -p obmp logs --tail=20 psql-app

5. IOS-XR Router Configuration

The ExaBGP container peers eBGP with both CORE routers. Each CORE router must be configured with:

5.1 Route policies (apply once per router)

route-policy EXABGP_IN
  pass
end-policy

route-policy EXABGP_OUT
  drop
end-policy

5.2 BGP neighbor block

router bgp 65020
 neighbor 10.40.40.202
  remote-as 65100
  description ExaBGP-Route-Injector
  ebgp-multihop 5
  update-source MgmtEth0/RP0/CPU0/0
  !
  address-family ipv4 unicast
   route-policy EXABGP_IN in
   route-policy EXABGP_OUT out
   next-hop-self
  !
 !
!

5.3 Static route for next-hop resolution

IOS-XR BGP does not use the default route (0.0.0.0/0) to resolve BGP next-hops. A more-specific static route for the ExaBGP host subnet is required in the default VRF:

router static
 address-family ipv4 unicast
  10.40.40.0/24 10.100.0.254
 !
!

5.4 Config notes

Knob Why
remote-as 65100 ExaBGP presents as AS 65100 (eBGP to your AS 65020 mesh)
ebgp-multihop 5 Host and router are on different subnets
update-source MgmtEth0/RP0/CPU0/0 ExaBGP is reachable via the management interface
next-hop-self Replace ExaBGP's next-hop (10.40.40.202) with the CORE router's address when reflecting into iBGP — ensures all routers can resolve the next-hop
EXABGP_OUT drops Prevents the lab from advertising its own prefixes back to ExaBGP
Static route Required: IOS-XR BGP will not install injected routes as bestpaths without a specific route to the next-hop

5.5 NETCONF alternative

See exabgp/iosxr_bgp_config.md for a Python/ncclient script that pushes all of the above config programmatically.

Credentials: username=webui, password=cisco, port 830.


6. Starting and Stopping

Start all services

OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d

Stop all services (preserve data)

docker compose -p obmp down

Stop and remove all data (full reset)

docker compose -p obmp down -v
sudo rm -rf /var/openbmp

Rebuild the ExaBGP container (after code changes)

docker compose -p obmp build exabgp
docker compose -p obmp up -d exabgp

Restart a single service

docker compose -p obmp restart <service>
# e.g.:
docker compose -p obmp restart exabgp
docker compose -p obmp restart psql-app

7. Route Injection User Guide

The ExaBGP container exposes a Flask REST API on port 5050 (host network). The inject.py CLI wraps this API.

7.1 Setup

cd exabgp
pip install requests   # only needed if running inject.py from the host

7.2 Check status

python3 inject.py status

Output shows API health, active route count, and peer states:

{
  "status": "ok",
  "active_routes": 77,
  "peers": {
    "10.100.0.100": {"state": "up", "updated": "2026-03-05T10:00:00Z"},
    "10.100.0.200": {"state": "up", "updated": "2026-03-05T10:00:00Z"}
  }
}

7.3 List available scenarios

python3 inject.py scenarios
Scenario Routes Description
internet_sample ~94 Partial internet table — real public prefixes, realistic AS paths (Cloudflare, Google, AWS, Azure, etc.)
churn 30 RFC documentation prefixes for announce/withdraw churn testing
blackhole 5 /32 prefixes with RTBH community (65100:666 + 65535:666)
anycast 3 Same prefixes with varying AS paths and MEDs (best-path testing)
full_table 500+ Large partial internet table with synthetic /24s
lab_prefixes 8 Enterprise/SP-style routes with communities and local-pref

7.4 Load a scenario

python3 inject.py scenario internet_sample

Routes propagate: ExaBGP → CORE-01/CORE-02 (eBGP) → all 9 routers (iBGP) → BMP → Kafka → PostgreSQL → Grafana.

7.5 Withdraw a scenario

python3 inject.py withdraw-scenario internet_sample

7.6 Announce individual prefixes

python3 inject.py announce 10.0.0.0/8 \
  --as-path 65100 3356 15169 \
  --community 65100:100 \
  --med 100

7.7 Withdraw individual prefixes

python3 inject.py withdraw 10.0.0.0/8

7.8 Withdraw everything

python3 inject.py withdraw-all

7.9 Generate route churn (populate history tables)

The churn command cycles the churn scenario repeatedly, generating ip_rib_log and stats_chg_* entries that power Grafana's history dashboards.

# 5 cycles, 30 seconds apart
python3 inject.py churn --count 5 --interval 30

# Run indefinitely until Ctrl+C
python3 inject.py churn

7.10 REST API directly (curl)

BASE=http://localhost:5050

# Health
curl $BASE/healthz

# List scenarios
curl $BASE/scenarios

# Load scenario
curl -X POST $BASE/scenario/internet_sample

# Announce custom prefix
curl -X POST $BASE/announce \
  -H 'Content-Type: application/json' \
  -d '{"prefixes":["10.0.0.0/8"],"as_path":[65100,3356,15169],"communities":["65100:100"]}'

# Withdraw all
curl -X POST $BASE/withdraw/all

# Peer state
curl $BASE/peers

7.11 Adding custom scenarios

Edit exabgp/scenarios/__init__.py. Add an entry to SCENARIOS following the existing pattern:

SCENARIOS['my_scenario'] = {
    'description': 'My custom routes',
    'routes': [
        _r('192.0.2.0/24', [65100, 65200], communities=['65100:100']),
    ],
}

The scenarios/ directory is volume-mounted into the container, so changes are live without rebuilding. However, the Python module is imported at container start — restart the container after editing:

docker compose -p obmp restart exabgp

8. Grafana Dashboards

Access: http://10.40.40.202:3000 Default credentials: admin / openbmp (anonymous access also enabled)

Dashboard Categories

Category Dashboard Description
General OBMP Home Overview / landing page
Base Inventory Router and peer inventory
Base Looking Glass Real-time RIB lookup by prefix
Base ASN View ASN-level routing view
History Prefix History Route change history for a prefix
History Prefix History by ASN Filtered by origin AS
History Prefix History by Community Filtered by BGP community
Tops Top Prefixes Most-updated prefixes
Tops Top L3VPN Prefixes L3VPN equivalent
Link State LS Nodes IS-IS link-state node database
Link State LS Links IS-IS link-state link database
Link State LS Topology Network topology map
Link State LS Prefixes Link-state prefix database
Link State LS History Link-state change history
L3VPN L3VPN Looking Glass VPN RIB lookup
L3VPN L3VPN Prefix History VPN route change history
L3VPN L3VPN RIB Browser Full VPN RIB browser

History dashboards require ip_rib_log and stats_chg_* table data. Run inject.py churn to populate these.


9. Sanity Checks

9.1 All containers running

docker compose -p obmp ps

All containers should show running. If any are restarting, check logs:

docker compose -p obmp logs --tail=50 <service>

9.2 ExaBGP peers up

python3 exabgp/inject.py status

Both 10.100.0.100 and 10.100.0.200 should show "state": "up".

Or check from the router side:

show bgp neighbors 10.40.40.202
show bgp summary | inc 10.40.40.202

9.3 Routes accepted by CORE routers

After loading internet_sample:

# On CORE-01 or CORE-02:
show bgp summary
# Expect: 77 accepted prefixes, 77 are bestpaths from 10.40.40.202

show bgp 8.8.8.0/24
# Expect: best path via 10.40.40.202 (eBGP), also iBGP copies from other routers

9.4 Routes in OpenBMP database

docker exec -it obmp-psql psql -U openbmp -c "
  SELECT count(DISTINCT prefix) AS unique_prefixes,
         count(DISTINCT peer_hash_id) AS peers_reporting
  FROM ip_rib
  WHERE isIPv4 = true AND isWithdrawn = false;
"

Expect ~129 unique prefixes and 56 peers_reporting (9 routers × ~6 peers each) after loading internet_sample.

9.5 Kafka is healthy

docker exec -it obmp-kafka kafka-topics --bootstrap-server localhost:29092 --list

Should show topics like openbmp.parsed.unicast_prefix, openbmp.parsed.peer, etc.

9.6 Grafana datasource

Open http://10.40.40.202:3000 → Configuration → Data Sources → OpenBMP → Test. Should return "Database Connection OK".

9.7 BMP collector receiving data

docker compose -p obmp logs --tail=30 collector

Should show connections from router management IPs.

9.8 psql-app consumer is caught up

docker compose -p obmp logs --tail=30 psql-app

Should show periodic cron job outputs (RPKI sync, IRR sync, global_ip_rib updates).


10. Relevant Commands Reference

Docker Compose

# Start stack
OBMP_DATA_ROOT=/var/openbmp docker compose -p obmp up -d

# Stop stack
docker compose -p obmp down

# Show status
docker compose -p obmp ps

# Follow logs (all services)
docker compose -p obmp logs -f

# Follow logs (specific service)
docker compose -p obmp logs -f exabgp
docker compose -p obmp logs -f psql-app
docker compose -p obmp logs -f collector

# Rebuild and restart ExaBGP
docker compose -p obmp build exabgp && docker compose -p obmp up -d exabgp

# Restart a service
docker compose -p obmp restart psql-app

Route Injection (from exabgp/ directory)

# API health and peer states
python3 inject.py status

# List active routes
python3 inject.py routes

# List scenarios
python3 inject.py scenarios

# Load a scenario
python3 inject.py scenario internet_sample
python3 inject.py scenario churn
python3 inject.py scenario blackhole
python3 inject.py scenario full_table
python3 inject.py scenario lab_prefixes

# Withdraw a scenario
python3 inject.py withdraw-scenario internet_sample

# Withdraw all active routes
python3 inject.py withdraw-all

# Announce a specific prefix
python3 inject.py announce 10.0.0.0/8 --as-path 65100 3356 15169 --community 65100:100

# Withdraw a specific prefix
python3 inject.py withdraw 10.0.0.0/8

# Run churn (populate history tables)
python3 inject.py churn --count 5 --interval 30

Database Queries

# Connect to database
docker exec -it obmp-psql psql -U openbmp -d openbmp

# Count unique prefixes in RIB
SELECT count(DISTINCT prefix) FROM ip_rib WHERE isIPv4=true AND isWithdrawn=false;

# Show recent route changes
SELECT prefix, origin_as, iswithdrawn, timestamp
FROM ip_rib_log
ORDER BY timestamp DESC LIMIT 20;

# Show peer summary
SELECT name, state, timestamp_last_updated
FROM bgp_peers
ORDER BY state, name;

# Show routes from ExaBGP peer
SELECT prefix, origin_as, as_path
FROM ip_rib
WHERE peer_hash_id IN (
  SELECT hash_id FROM bgp_peers WHERE peer_addr = '10.40.40.202'
)
AND isWithdrawn = false;

IOS-XR Verification (on router CLI)

show bgp neighbors 10.40.40.202
show bgp neighbors 10.40.40.202 received routes
show bgp summary
show bgp 8.8.8.0/24
show bgp 1.1.1.0/24
show route 8.8.8.0/24

11. Troubleshooting

ExaBGP container keeps restarting

Check logs:

docker compose -p obmp logs --tail=50 exabgp

Common causes and fixes:

Symptom Cause Fix
Exits after "welcome" banner Missing or wrong env file path startup.sh generates /usr/local/etc/exabgp/exabgp.env — verify this path exists in container
Process api killed 5 times Wrong Python path in conf Conf uses /usr/local/bin/python3 — correct for python:3.11-slim
drop = true in env ExaBGP drops privileges to nobody, can't bind 179 startup.sh patches drop = false — check the sed lines ran
__pycache__ Permission denied during build Root-owned cache from previous container run .dockerignore excludes **/__pycache__ — confirm file exists

BGP sessions not establishing

  1. Verify host IP 10.40.40.202 is reachable from CML management network: ping 10.40.40.202 from router
  2. Check ExaBGP peer state: python3 exabgp/inject.py status
  3. On router: show bgp neighbors 10.40.40.202 — look for error codes
  4. Common IOS-XR errors:
    • no-update-source-config — add update-source MgmtEth0/RP0/CPU0/0
    • no-ipv6-address — ensure only IPv4 unicast AF is configured (no IPv6)
    • TCP refused — check port 179 is reachable (ExaBGP uses network_mode: host)

Routes received but not bestpath

IOS-XR BGP requires a specific route to resolve the BGP next-hop (10.40.40.202). The default route (0.0.0.0/0) is insufficient.

router static
 address-family ipv4 unicast
  10.40.40.0/24 10.100.0.254

Verify: show bgp 1.1.1.0/24 — should show Status: s (active), bestpath.

Grafana shows no data

  1. Check datasource: Configuration → Data Sources → OpenBMP → Test
  2. Verify psql-app is writing: docker compose -p obmp logs psql-app
  3. Check the database directly (see database queries above)
  4. History dashboards need route churn — run python3 inject.py churn

Kafka not starting

Zookeeper must be healthy first. Check:

docker compose -p obmp logs zookeeper
docker compose -p obmp restart kafka

psql-app fails to start

Usually a PostgreSQL connection issue or schema mismatch. Check:

docker compose -p obmp logs psql-app
# If "relation does not exist" errors: re-trigger DB init
touch /var/openbmp/config/init_db
docker compose -p obmp restart psql-app

12. Data Retention

Configured in docker-compose.yml via POSTGRES_DROP_* environment variables:

Table Default Retention
peer_event_log 1 year
stat_reports 4 weeks
ip_rib_log 4 weeks
alerts 4 weeks
ls_nodes_log 4 months
ls_links_log 4 months
ls_prefixes_log 4 months
stats_chg_byprefix 4 weeks
stats_chg_byasn 4 weeks
stats_chg_bypeer 4 weeks
stats_ip_origins 4 weeks
stats_peer_rib 4 weeks
stats_peer_update_counts 4 weeks

Adjust in docker-compose.yml under the psql-app service environment block.


13. Environment Variables Reference

ExaBGP container

Variable Default Description
EXABGP_LOCAL_IP 10.40.40.202 Host IP ExaBGP binds to and uses as router-id
EXABGP_LOCAL_AS 65100 ExaBGP's AS number
EXABGP_PEER_AS 65020 AS of the IOS-XR lab
EXABGP_PEER_1 10.100.0.100 First CORE router to peer with
EXABGP_PEER_2 10.100.0.200 Second CORE router to peer with
EXABGP_API_PORT 5050 Flask API port

psql-app container (key variables)

Variable Default Description
MEM 3 JVM heap in GB
ENABLE_RPKI 1 Enable RPKI sync from Cloudflare
ENABLE_IRR 1 Enable IRR sync
ENABLE_DBIP 1 Enable DB-IP geolocation import
POSTGRES_REPORT_WINDOW 8 minute Aggregation window for summary tables

inject.py (CLI)

Variable Default Description
EXABGP_API http://localhost:5050 ExaBGP API base URL