- postgres/scripts/007_obmp_evpn.sql: the evpn_rib landing table (roadmap E5 step 1), applied to the live DB. Mirrors l3vpn_rib; a dedicated consumer will populate it. - production-sizing.md: corrected retention figures to the actual policy values, added a measured-data section (one full feed ≈ +5 GB current state; DB now ~30 GB), and a horizontal-scaling section — the bottleneck is the psql-app consumer + disk IOPS, so scale psql-app as a Kafka consumer group (cap = partition count), treat multi-collector as HA/locality not throughput. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.6 KiB
OpenBMP Production Sizing — 40 Full-Table-Edge Routers
Sizing guidance for deploying the OpenBMP stack against a production ISP
network of 40 full-table-edge routers with gNMI streaming telemetry.
Derived from the OpenBMP psql-app sizing guidance and measured lab behavior.
Workload assumptions
| Parameter | Value |
|---|---|
| Monitored routers | 40, full-table edge |
| BMP RIB scope | Adj-RIB-In (see recommendation below) |
| Full feeds per router | ~2–3 eBGP peers carrying the full DFZ |
| Routes per full feed | ~1.2M (≈1M IPv4 + ~0.2M IPv6) |
| Estimated total NLRIs | ~100–150M in Adj-RIB-In |
| Telemetry | gNMI via Telegraf → InfluxDB, ~50–200 interfaces/router, 10 s interval |
| History retention | ip_rib_log 2 months, LS logs 8 weeks, peer_event_log 4 months (lab policy defaults; tunable) |
The NLRI estimate (40 × ~2.5 feeds × 1.2M) places this deployment at the top
of the OpenBMP psql-app guidance tier (150M NLRIs → 64 GB heap).
Measured data point (lab, 2026)
Real numbers from the lab after adding one full-table feed (GoBGP → AS57355, ~1.04M IPv4 + ~0.25M IPv6 routes):
| Metric | Before feed | After 1 full feed |
|---|---|---|
openbmp DB size |
~25 GB | ~30 GB |
ip_rib (current state) |
small | 5.3 GB |
ip_rib_log (history hypertable) |
— | 7.75 GB, 82/97 chunks compressed |
base_attrs |
~1 GB | 2.3 GB |
geo_ip (fixed reference data) |
8.8 GB | 8.8 GB |
So one full feed ≈ +5 GB current-state, plus history that accrues against
the 2-month ip_rib_log retention. The ~1.3M-route initial dump ingested in
minutes with no Kafka consumer lag. Extrapolating linearly, 40 routers × ~2.5
feeds ≈ 100 feed-equivalents → on the order of 0.5 TB current state before
history and indexes; the 2–4 TB storage target below holds with headroom.
BMP RIB scope — recommendation
Deploy with Adj-RIB-In only. It is the OpenBMP default, is what every dashboard is built on, and captures the highest-value data — what each peer advertises. Alternatives and their cost:
- Loc-RIB — adds a full post-best-path converged table per router (~40 × 1.2M ≈ +48M NLRIs). Add later, selectively, only where best-path analysis is needed; verify the IOS-XR release supports Loc-RIB BMP.
- Adj-RIB-Out — multiplies further (per advertised peer). Not recommended for the initial deployment.
- Post-policy Adj-RIB-In — if inbound policy is restrictive this trims volume meaningfully; with permissive import it is similar to pre-policy.
Compute & memory
| Component | Lab today | Production target | Rationale |
|---|---|---|---|
| Total RAM | 31 GB | 96–128 GB | psql-app heap 48–64 GB + PostgreSQL shared_buffers/cache + Kafka 4–8 GB + InfluxDB + Grafana + collector |
| CPU | 8 cores | 16–32 vCPU | PostgreSQL is CPU-bound under full-table churn — lab psql already sustains ~287% (3 cores) at 18 routers |
psql-app JVM heap (MEM) |
3 GB | 48–64 GB | OpenBMP guidance: 4 GB ≈ 10M NLRIs, 64 GB ≈ 150M NLRIs |
psql-app container mem_limit |
4 GB | heap + ~8 GB | Set PSQL_APP_MEM_LIMIT above the JVM heap |
psql container mem_limit |
6 GB | 48–64 GB | Set PSQL_MEM_LIMIT; PostgreSQL wants ~25% as shared_buffers and the rest for OS cache |
kafka container mem_limit |
4 GB | 8–12 GB | Set KAFKA_MEM_LIMIT; full-table initial dumps from 40 routers are bursty |
Storage
| Store | Lab today | Production target | Notes |
|---|---|---|---|
| PostgreSQL | 30 GB | 2–4 TB NVMe SSD | ip_rib current state (~100–150M rows) + ip_rib_log history (2-month retention, the dominant grower) + base_attrs + geo_ip (~9 GB fixed). OpenBMP guidance: 500 GB main + 1 TB TimescaleDB; add headroom. |
| Kafka | 0.2 GB | 100–500 GB | 12 h retention; sized for full-table initial-dump bursts × 40 routers |
| InfluxDB (telemetry) | minimal | 50–200 GB | 40 routers × ~50–200 interfaces × 10 s gNMI × 30 d; compresses well |
| Total | — | ~3–5 TB fast NVMe | Use NVMe; PostgreSQL random-IO under churn is the bottleneck on slow disks |
Put the PostgreSQL data directory and the TimescaleDB tablespace on NVMe.
ip_rib_log retention (2 months in the lab) is the main storage tuning knob
— revisit once production update volume is measured; halving it roughly
halves the dominant history table.
Architecture
A single host is viable only if large (≥128 GB RAM, ≥32 vCPU, multi-TB NVMe). Preferred: split services across hosts —
| Host | Services | Profile |
|---|---|---|
| DB host (heaviest) | postgres | — |
| Pipeline host | kafka, zookeeper, collector, psql-app | core |
| Presentation host | grafana, influxdb, telegraf, whois | core + telemetry |
Whichever layout: every service already carries a Compose mem_limit — raise
PSQL_MEM_LIMIT / PSQL_APP_MEM_LIMIT / KAFKA_MEM_LIMIT in .env for the
production hosts.
Horizontal scaling — where it actually helps
The ingestion bottleneck is not the collector or Kafka — it is the
psql-app consumer writing to PostgreSQL, and ultimately disk IOPS.
Plan scaling accordingly:
- Scale
psql-appas a Kafka consumer group. Run multiplepsql-appcontainers with the same group ID; Kafka rebalances partitions across them and fails over automatically. This is the real throughput lever and also provides HA. Hard cap = Kafka partition count — the compose setsKAFKA_NUM_PARTITIONS: 8, so ≤ 8 useful instances. Raise the partition count before scaling past a few consumers — it cannot easily be reduced later. - Disk IOPS is the named bottleneck. Target ≥ 5000 IOPS (NVMe) for the PostgreSQL store; this buys more headroom than any container count.
- Multiple collectors are an HA / locality decision, not a throughput
one. A BMP session is one stateful TCP connection and cannot be load
balanced — you distribute routers by pointing each router's
bmp serverconfig at a specific collector. All collectors feed one Kafka. Shard collectors for fault isolation / POP locality, not for performance, and note a dead collector's routers go dark until reconfigured (no auto- failover at the collector tier). - Within one
psql-app, writer threads already auto-scale per type (writer_max_threads_per_type); the consumer-group is the across-instance layer on top.
Bursts (every collector restart triggers simultaneous full-table dumps from all peers) are absorbed by Kafka — size Kafka retention so a slow consumer never loses data during a convergence storm.
PostgreSQL tuning
shared_buffers≈ 25% of host RAM; largeeffective_cache_size.- Raise
work_mem(dashboard aggregate queries) andmaintenance_work_mem. max_wal_sizealready 10 GB — keep or raise for churn bursts.- Enable parallel query (
max_parallel_workers_per_gather). - Aggressive autovacuum on churn tables (
ip_rib,base_attrs,ip_rib_log) — applied in the lab; persist these settings in production provisioning. - TimescaleDB compression is already enabled on
ip_rib_logand thestats_*hypertables — keep it.
Reference bill of materials (single-host option)
| Resource | Spec |
|---|---|
| CPU | 32 vCPU |
| RAM | 128 GB |
| Storage | 4 TB NVMe SSD |
| Network | 1 GbE+ to the routers' BMP source network |
For the split-host option, divide per the architecture table — the DB host takes the bulk of RAM and all of the fast storage.