obmp-docker/docs/production-sizing.md
sam f1558946ae Add production sizing guide for 40 full-table-edge routers
Documents compute, memory, and storage requirements for a production
deployment: ~100-150M NLRI estimate, 96-128 GB RAM, 16-32 vCPU, 3-5 TB NVMe,
a split-host architecture option, PostgreSQL tuning, and a BMP RIB-scope
recommendation (Adj-RIB-In only initially).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:06:25 -07:00

97 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# OpenBMP Production Sizing — 40 Full-Table-Edge Routers
Sizing guidance for deploying the OpenBMP stack against a production ISP
network of **40 full-table-edge routers** with gNMI streaming telemetry.
Derived from the OpenBMP `psql-app` sizing guidance and measured lab behavior.
## Workload assumptions
| Parameter | Value |
|-----------|-------|
| Monitored routers | 40, full-table edge |
| BMP RIB scope | Adj-RIB-In (see recommendation below) |
| Full feeds per router | ~23 eBGP peers carrying the full DFZ |
| Routes per full feed | ~1.2M (≈1M IPv4 + ~0.2M IPv6) |
| **Estimated total NLRIs** | **~100150M** in Adj-RIB-In |
| Telemetry | gNMI via Telegraf → InfluxDB, ~50200 interfaces/router, 10 s interval |
| History retention | `ip_rib_log` 4 weeks, LS logs 4 months, `peer_event_log` 1 year |
The NLRI estimate (40 × ~2.5 feeds × 1.2M) places this deployment at the top
of the OpenBMP `psql-app` guidance tier (150M NLRIs → 64 GB heap).
## BMP RIB scope — recommendation
**Deploy with Adj-RIB-In only.** It is the OpenBMP default, is what every
dashboard is built on, and captures the highest-value data — what each peer
advertises. Alternatives and their cost:
- **Loc-RIB** — adds a full post-best-path converged table per router
(~40 × 1.2M ≈ +48M NLRIs). Add later, selectively, only where best-path
analysis is needed; verify the IOS-XR release supports Loc-RIB BMP.
- **Adj-RIB-Out** — multiplies further (per advertised peer). Not recommended
for the initial deployment.
- **Post-policy Adj-RIB-In** — if inbound policy is restrictive this trims
volume meaningfully; with permissive import it is similar to pre-policy.
## Compute & memory
| Component | Lab today | Production target | Rationale |
|-----------|-----------|-------------------|-----------|
| **Total RAM** | 31 GB | **96128 GB** | psql-app heap 4864 GB + PostgreSQL shared_buffers/cache + Kafka 48 GB + InfluxDB + Grafana + collector |
| **CPU** | 8 cores | **1632 vCPU** | PostgreSQL is CPU-bound under full-table churn — lab psql already sustains ~287% (3 cores) at 18 routers |
| `psql-app` JVM heap (`MEM`) | 3 GB | **4864 GB** | OpenBMP guidance: 4 GB ≈ 10M NLRIs, 64 GB ≈ 150M NLRIs |
| `psql-app` container `mem_limit` | 4 GB | **heap + ~8 GB** | Set `PSQL_APP_MEM_LIMIT` above the JVM heap |
| `psql` container `mem_limit` | 6 GB | **4864 GB** | Set `PSQL_MEM_LIMIT`; PostgreSQL wants ~25% as `shared_buffers` and the rest for OS cache |
| `kafka` container `mem_limit` | 4 GB | **812 GB** | Set `KAFKA_MEM_LIMIT`; full-table initial dumps from 40 routers are bursty |
## Storage
| Store | Lab today | Production target | Notes |
|-------|-----------|-------------------|-------|
| **PostgreSQL** | 25 GB | **24 TB NVMe SSD** | `ip_rib` current state (~100150M rows) + `ip_rib_log` history (4-week retention, the dominant grower) + `base_attrs` + `geo_ip` (~7 GB fixed). OpenBMP guidance: 500 GB main + 1 TB TimescaleDB; add headroom. |
| **Kafka** | 0.2 GB | **100500 GB** | 12 h retention; sized for full-table initial-dump bursts × 40 routers |
| **InfluxDB (telemetry)** | minimal | **50200 GB** | 40 routers × ~50200 interfaces × 10 s gNMI × 30 d; compresses well |
| **Total** | — | **~35 TB fast NVMe** | Use NVMe; PostgreSQL random-IO under churn is the bottleneck on slow disks |
Put the PostgreSQL data directory and the TimescaleDB tablespace on NVMe.
`ip_rib_log` 4-week retention is the main storage tuning knob — revisit once
production update volume is measured.
## Architecture
A single host is viable only if large (**≥128 GB RAM, ≥32 vCPU, multi-TB
NVMe**). **Preferred: split services across hosts**
| Host | Services | Profile |
|------|----------|---------|
| **DB host** (heaviest) | postgres | — |
| **Pipeline host** | kafka, zookeeper, collector, psql-app | core |
| **Presentation host** | grafana, influxdb, telegraf, whois | core + telemetry |
Whichever layout: every service already carries a Compose `mem_limit` — raise
`PSQL_MEM_LIMIT` / `PSQL_APP_MEM_LIMIT` / `KAFKA_MEM_LIMIT` in `.env` for the
production hosts.
## PostgreSQL tuning
- `shared_buffers` ≈ 25% of host RAM; large `effective_cache_size`.
- Raise `work_mem` (dashboard aggregate queries) and `maintenance_work_mem`.
- `max_wal_size` already 10 GB — keep or raise for churn bursts.
- Enable parallel query (`max_parallel_workers_per_gather`).
- Aggressive autovacuum on churn tables (`ip_rib`, `base_attrs`, `ip_rib_log`)
— applied in the lab; persist these settings in production provisioning.
- TimescaleDB compression is already enabled on `ip_rib_log` and the `stats_*`
hypertables — keep it.
## Reference bill of materials (single-host option)
| Resource | Spec |
|----------|------|
| CPU | 32 vCPU |
| RAM | 128 GB |
| Storage | 4 TB NVMe SSD |
| Network | 1 GbE+ to the routers' BMP source network |
For the split-host option, divide per the architecture table — the DB host
takes the bulk of RAM and all of the fast storage.