sam 2d83d6c02e Add evpn_rib schema; update production sizing with measured data

- postgres/scripts/007_obmp_evpn.sql: the evpn_rib landing table
  (roadmap E5 step 1), applied to the live DB. Mirrors l3vpn_rib;
  a dedicated consumer will populate it.
- production-sizing.md: corrected retention figures to the actual
  policy values, added a measured-data section (one full feed ≈
  +5 GB current state; DB now ~30 GB), and a horizontal-scaling
  section — the bottleneck is the psql-app consumer + disk IOPS, so
  scale psql-app as a Kafka consumer group (cap = partition count),
  treat multi-collector as HA/locality not throughput.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-19 08:44:09 -07:00

7.6 KiB

Raw Blame History

OpenBMP Production Sizing — 40 Full-Table-Edge Routers

Sizing guidance for deploying the OpenBMP stack against a production ISP network of 40 full-table-edge routers with gNMI streaming telemetry. Derived from the OpenBMP psql-app sizing guidance and measured lab behavior.

Workload assumptions

Parameter	Value
Monitored routers	40, full-table edge
BMP RIB scope	Adj-RIB-In (see recommendation below)
Full feeds per router	~2–3 eBGP peers carrying the full DFZ
Routes per full feed	~1.2M (≈1M IPv4 + ~0.2M IPv6)
Estimated total NLRIs	~100–150M in Adj-RIB-In
Telemetry	gNMI via Telegraf → InfluxDB, ~50–200 interfaces/router, 10 s interval
History retention	`ip_rib_log` 2 months, LS logs 8 weeks, `peer_event_log` 4 months (lab policy defaults; tunable)

The NLRI estimate (40 × ~2.5 feeds × 1.2M) places this deployment at the top of the OpenBMP psql-app guidance tier (150M NLRIs → 64 GB heap).

Measured data point (lab, 2026)

Real numbers from the lab after adding one full-table feed (GoBGP → AS57355, ~1.04M IPv4 + ~0.25M IPv6 routes):

Metric	Before feed	After 1 full feed
`openbmp` DB size	~25 GB	~30 GB
`ip_rib` (current state)	small	5.3 GB
`ip_rib_log` (history hypertable)	—	7.75 GB, 82/97 chunks compressed
`base_attrs`	~1 GB	2.3 GB
`geo_ip` (fixed reference data)	8.8 GB	8.8 GB

So one full feed ≈ +5 GB current-state, plus history that accrues against the 2-month ip_rib_log retention. The ~1.3M-route initial dump ingested in minutes with no Kafka consumer lag. Extrapolating linearly, 40 routers × ~2.5 feeds ≈ 100 feed-equivalents → on the order of 0.5 TB current state before history and indexes; the 2–4 TB storage target below holds with headroom.

BMP RIB scope — recommendation

Deploy with Adj-RIB-In only. It is the OpenBMP default, is what every dashboard is built on, and captures the highest-value data — what each peer advertises. Alternatives and their cost:

Loc-RIB — adds a full post-best-path converged table per router (~40 × 1.2M ≈ +48M NLRIs). Add later, selectively, only where best-path analysis is needed; verify the IOS-XR release supports Loc-RIB BMP.
Adj-RIB-Out — multiplies further (per advertised peer). Not recommended for the initial deployment.
Post-policy Adj-RIB-In — if inbound policy is restrictive this trims volume meaningfully; with permissive import it is similar to pre-policy.

Compute & memory

Component	Lab today	Production target	Rationale
Total RAM	31 GB	96–128 GB	psql-app heap 48–64 GB + PostgreSQL shared_buffers/cache + Kafka 4–8 GB + InfluxDB + Grafana + collector
CPU	8 cores	16–32 vCPU	PostgreSQL is CPU-bound under full-table churn — lab psql already sustains ~287% (3 cores) at 18 routers
`psql-app` JVM heap (`MEM`)	3 GB	48–64 GB	OpenBMP guidance: 4 GB ≈ 10M NLRIs, 64 GB ≈ 150M NLRIs
`psql-app` container `mem_limit`	4 GB	heap + ~8 GB	Set `PSQL_APP_MEM_LIMIT` above the JVM heap
`psql` container `mem_limit`	6 GB	48–64 GB	Set `PSQL_MEM_LIMIT`; PostgreSQL wants ~25% as `shared_buffers` and the rest for OS cache
`kafka` container `mem_limit`	4 GB	8–12 GB	Set `KAFKA_MEM_LIMIT`; full-table initial dumps from 40 routers are bursty

Storage

Store	Lab today	Production target	Notes
PostgreSQL	30 GB	2–4 TB NVMe SSD	`ip_rib` current state (~100–150M rows) + `ip_rib_log` history (2-month retention, the dominant grower) + `base_attrs` + `geo_ip` (~9 GB fixed). OpenBMP guidance: 500 GB main + 1 TB TimescaleDB; add headroom.
Kafka	0.2 GB	100–500 GB	12 h retention; sized for full-table initial-dump bursts × 40 routers
InfluxDB (telemetry)	minimal	50–200 GB	40 routers × ~50–200 interfaces × 10 s gNMI × 30 d; compresses well
Total	—	~3–5 TB fast NVMe	Use NVMe; PostgreSQL random-IO under churn is the bottleneck on slow disks

Put the PostgreSQL data directory and the TimescaleDB tablespace on NVMe. ip_rib_log retention (2 months in the lab) is the main storage tuning knob — revisit once production update volume is measured; halving it roughly halves the dominant history table.

Architecture

A single host is viable only if large (≥128 GB RAM, ≥32 vCPU, multi-TB NVMe). Preferred: split services across hosts —

Host	Services	Profile
DB host (heaviest)	postgres	—
Pipeline host	kafka, zookeeper, collector, psql-app	core
Presentation host	grafana, influxdb, telegraf, whois	core + telemetry

Whichever layout: every service already carries a Compose mem_limit — raise PSQL_MEM_LIMIT / PSQL_APP_MEM_LIMIT / KAFKA_MEM_LIMIT in .env for the production hosts.

Horizontal scaling — where it actually helps

The ingestion bottleneck is not the collector or Kafka — it is the psql-app consumer writing to PostgreSQL, and ultimately disk IOPS. Plan scaling accordingly:

Scale psql-app as a Kafka consumer group. Run multiple psql-app containers with the same group ID; Kafka rebalances partitions across them and fails over automatically. This is the real throughput lever and also provides HA. Hard cap = Kafka partition count — the compose sets KAFKA_NUM_PARTITIONS: 8, so ≤ 8 useful instances. Raise the partition count before scaling past a few consumers — it cannot easily be reduced later.
Disk IOPS is the named bottleneck. Target ≥ 5000 IOPS (NVMe) for the PostgreSQL store; this buys more headroom than any container count.
Multiple collectors are an HA / locality decision, not a throughput one. A BMP session is one stateful TCP connection and cannot be load balanced — you distribute routers by pointing each router's bmp server config at a specific collector. All collectors feed one Kafka. Shard collectors for fault isolation / POP locality, not for performance, and note a dead collector's routers go dark until reconfigured (no auto- failover at the collector tier).
Within one psql-app, writer threads already auto-scale per type (writer_max_threads_per_type); the consumer-group is the across-instance layer on top.

Bursts (every collector restart triggers simultaneous full-table dumps from all peers) are absorbed by Kafka — size Kafka retention so a slow consumer never loses data during a convergence storm.

PostgreSQL tuning

shared_buffers ≈ 25% of host RAM; large effective_cache_size.
Raise work_mem (dashboard aggregate queries) and maintenance_work_mem.
max_wal_size already 10 GB — keep or raise for churn bursts.
Enable parallel query (max_parallel_workers_per_gather).
Aggressive autovacuum on churn tables (ip_rib, base_attrs, ip_rib_log) — applied in the lab; persist these settings in production provisioning.
TimescaleDB compression is already enabled on ip_rib_log and the stats_* hypertables — keep it.

Reference bill of materials (single-host option)

Resource	Spec
CPU	32 vCPU
RAM	128 GB
Storage	4 TB NVMe SSD
Network	1 GbE+ to the routers' BMP source network

For the split-host option, divide per the architecture table — the DB host takes the bulk of RAM and all of the fast storage.

7.6 KiB Raw Blame History Unescape Escape