sam b681c473c0 Add Policy Diff, fleet-wide full-table feed, and Kafka lag monitoring
Policy Diff (roadmap E2 follow-up): obmp-rib-poller pulls per-router
post-policy accepted/advertised prefix counts and route-policy bindings
over CLI+NETCONF (BMP on XRv9000 24.3.1 carries only pre-policy
Adj-RIB-In). New tables in 008_obmp_policy_diff.sql; Policy Diff
dashboard joins them against BMP ip_rib for received-vs-kept-vs-rejected.

GoBGP fleet-wide feed: GoBGP re-advertises the full Bromirski table to
both labs' core routers (CML AS65020, PROX AS65021) over eBGP; as route
reflectors the cores propagate it to every R9K client, so all 18 lab
routers carry and BMP-export a full table -- an intentional stress test
of the ingestion/storage path. cml/gobgp_peering_config.py applies and
rolls back the core-side config; gobgp/README.md documents the rollback.

Kafka lag monitoring: kafka-lag-monitor samples consumer-group lag every
30s into TimescaleDB (009_kafka_lag.sql); Kafka Ingestion Lag dashboard
gives visibility into the pipeline under churn load.

Peer Detail dashboard: the Peer selector is now router-qualified
(router -> peer) so it is unambiguous in an iBGP route-reflector mesh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:42:25 -07:00
..

GoBGP global Internet table feed (roadmap E1)

This service runs GoBGP to pull the full real Internet routing table (IPv4 ~1M + IPv6 ~200k routes) from Łukasz Bromirski's lab route server (AS57355) and BMP-export every received route to the OpenBMP collector. The table lands in PostgreSQL ip_rib as a monitored peer.

  • Image: jauderho/gobgp:v4.5.0 — community-maintained, multi-arch, tracks upstream GoBGP releases (rebuilt within an hour of each release). Chosen because the official osrg/gobgp image is published less consistently.
  • Local AS: 65001 (private). Router-id: 10.40.40.250.
  • The session is receive-only — we announce nothing to the route server.

Files

File Purpose
gobgpd.conf GoBGP daemon config (global, neighbors, BMP export). TOML.
mrt-refresh.sh MRT full-table fallback loader (cron-driven).
mrt/ Created at runtime; cached RouteViews RIB dumps.

Bring it up

The gobgp service is defined in the repo docker-compose.yml, on the same default compose network as collector, and depends_on it.

docker compose config            # validate compose is well-formed
docker compose up -d gobgp       # start (collector must be running)
docker logs -f obmp-gobgp

The live BGP cutover is performed by a human — bringing the container up is all that is needed; GoBGP initiates the eBGP-multihop sessions automatically.

Confirm the session and route count

# session state — expect both neighbors in "Establ"
docker exec obmp-gobgp gobgp neighbor

# received route counts — expect ~1M IPv4, ~200k IPv6
docker exec obmp-gobgp gobgp global rib summary -a ipv4
docker exec obmp-gobgp gobgp global rib summary -a ipv6

How the data appears in OpenBMP

GoBGP opens an outbound BMP session to obmp-collector:5000 with route-monitoring-policy = "pre-policy" (Adj-RIB-In, pre import-policy — consistent with the rest of the OpenBMP fleet).

In OpenBMP / PostgreSQL the source is identified by the BMP router, which GoBGP reports using its router-id (10.40.40.250) and local-as (65001):

  • routers table — a row with ip_address / name derived from 10.40.40.250.
  • bgp_peers table — two peer rows for 85.232.240.179 and 2001:1a68:2c:2::179, both peer_as = 57355.
  • ip_rib — every prefix from the global table, attributed to those peers.

To find it in Grafana/SQL, filter on peer_as = 57355 or the router-id above.

Fleet-wide full-table feed into the CML lab (stress test)

GoBGP additionally re-advertises the full table to the two CML core routers (CORE-01/CORE-02, AS65020). As route reflectors the cores propagate it to all seven R9K clients, so every lab router carries and BMP-exports a full table — an intentional stress test of the OpenBMP ingestion/storage path (the database grows toward ~55-65 GB).

  • GoBGP sidegobgpd.conf neighbors 10.100.0.100 / 10.100.0.200 (peer-as 65020, eBGP-multihop, IPv4+IPv6, prefix-limit caps). The route-server sessions carry default-export-policy = "reject-route" so the lab's own routes can never leak back to AS57355.
  • Router sidecml/gobgp_peering_config.py adds the neighbor 10.40.40.202 config (with maximum-prefix 1.5M/400k caps) to both cores. GoBGP is host-networked, so it sources BGP TCP from the host IP 10.40.40.202, not its router-id 10.40.40.250 — the cores peer with the host IP.

Apply

python3 cml/gobgp_peering_config.py             # configure both cores
docker compose up -d --force-recreate gobgp     # load gobgpd.conf changes

A volume-mounted config change does NOT trigger a recreate on its own — --force-recreate is required for GoBGP to re-read gobgpd.conf.

Rollback

Emergency stop (fastest — feed off within seconds, no router change):

docker compose stop gobgp

Stopping GoBGP drops the eBGP sessions; the cores withdraw the full table and the withdrawal propagates to every client. The ip_rib rows are marked withdrawn and aged out by the existing TimescaleDB retention.

Full revert (also removes the router-side config):

python3 cml/gobgp_peering_config.py --remove    # delete neighbor from cores
docker compose stop gobgp

To keep the Bromirski feed running but drop only the lab injection, delete the two 10.100.0.x [[neighbors]] blocks from gobgpd.conf and docker compose up -d --force-recreate gobgp.

What to watch during convergence

docker exec obmp-gobgp gobgp neighbor                        # 4 sessions Establ
docker logs --tail 20 obmp-psql-app                          # consumer lag
docker exec obmp-psql psql -U openbmp -d openbmp -c \
  "SELECT count(*) FROM ip_rib WHERE iswithdrawn = false;"   # row growth

If psql-app consumer lag climbs without draining, or PostgreSQL CPU/IO saturates, use the emergency stop above.

MRT fallback

AS57355 is a single volunteer-run host with no SLA — it can and does go away. mrt-refresh.sh keeps the global table in ip_rib warm when the live feed is down:

  1. If any AS57355 session is Established, the script does nothing — the live feed is authoritative and must not be overwritten with a stale dump.
  2. Otherwise it downloads the latest full RIB dump from RouteViews (https://archive.routeviews.org/route-views/bgpdata/YYYY.MM/RIBS/rib.YYYYMMDD.HHMM.bz2, published every 2 hours UTC) and runs gobgp mrt inject global <file>, which installs every prefix into the running daemon. BMP export to the collector then happens automatically.

The script is idempotent (re-uses an already-downloaded dump), guarded by a flock against overlapping runs, and prunes to the 4 most recent dumps.

Schedule it (host crontab, 2-hour cadence)

0 */2 * * * docker exec obmp-gobgp /config/mrt-refresh.sh >> /var/log/gobgp-mrt.log 2>&1

Run it once manually to verify:

docker exec obmp-gobgp /config/mrt-refresh.sh

Caveats

  • No SLA. AS57355 is a volunteer lab route server; treat the live feed as best-effort and rely on the MRT fallback for continuity.
  • eBGP-multihop TTL is set to 64 — the route server is many hops away.
  • A full table is ~1M+ prefixes; expect a noticeable load spike in the collector and PostgreSQL when the session first establishes or an MRT dump is injected.