Add evpn_rib schema; update production sizing with measured data

- postgres/scripts/007_obmp_evpn.sql: the evpn_rib landing table (roadmap E5 step 1), applied to the live DB. Mirrors l3vpn_rib; a dedicated consumer will populate it. - production-sizing.md: corrected retention figures to the actual policy values, added a measured-data section (one full feed ≈ +5 GB current state; DB now ~30 GB), and a horizontal-scaling section — the bottleneck is the psql-app consumer + disk IOPS, so scale psql-app as a Kafka consumer group (cap = partition count), treat multi-collector as HA/locality not throughput. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 08:44:09 -07:00 · 2026-05-19 08:44:09 -07:00 · 2d83d6c02e
commit 2d83d6c02e
parent c18d11a48f
2 changed files with 98 additions and 4 deletions
--- a/docs/production-sizing.md
+++ b/docs/production-sizing.md
@ -14,11 +14,30 @@ Derived from the OpenBMP `psql-app` sizing guidance and measured lab behavior.
 | Routes per full feed | ~1.2M (≈1M IPv4 + ~0.2M IPv6) |
 | **Estimated total NLRIs** | **~100–150M** in Adj-RIB-In |
 | Telemetry | gNMI via Telegraf → InfluxDB, ~50–200 interfaces/router, 10 s interval |
-| History retention | `ip_rib_log` 4 weeks, LS logs 4 months, `peer_event_log` 1 year |
+| History retention | `ip_rib_log` 2 months, LS logs 8 weeks, `peer_event_log` 4 months (lab policy defaults; tunable) |
 The NLRI estimate (40 × ~2.5 feeds × 1.2M) places this deployment at the top
 of the OpenBMP `psql-app` guidance tier (150M NLRIs → 64 GB heap).
 ## Measured data point (lab, 2026)
 Real numbers from the lab after adding **one** full-table feed (GoBGP →
 AS57355, ~1.04M IPv4 + ~0.25M IPv6 routes):
 | Metric | Before feed | After 1 full feed |
 |--------|-------------|-------------------|
 | `openbmp` DB size | ~25 GB | **~30 GB** |
 | `ip_rib` (current state) | small | 5.3 GB |
 | `ip_rib_log` (history hypertable) | — | 7.75 GB, 82/97 chunks compressed |
 | `base_attrs` | ~1 GB | 2.3 GB |
 | `geo_ip` (fixed reference data) | 8.8 GB | 8.8 GB |
 So **one full feed ≈ +5 GB current-state**, plus history that accrues against
 the 2-month `ip_rib_log` retention. The ~1.3M-route initial dump ingested in
 minutes with no Kafka consumer lag. Extrapolating linearly, 40 routers × ~2.5
 feeds ≈ 100 feed-equivalents → on the order of **0.5 TB current state** before
 history and indexes; the 2–4 TB storage target below holds with headroom.
 ## BMP RIB scope — recommendation
 **Deploy with Adj-RIB-In only.** It is the OpenBMP default, is what every
@ -48,14 +67,15 @@ advertises. Alternatives and their cost:
 | Store | Lab today | Production target | Notes |
 |-------|-----------|-------------------|-------|
-| **PostgreSQL** | 25 GB | **2–4 TB NVMe SSD** | `ip_rib` current state (~100–150M rows) + `ip_rib_log` history (4-week retention, the dominant grower) + `base_attrs` + `geo_ip` (~7 GB fixed). OpenBMP guidance: 500 GB main + 1 TB TimescaleDB; add headroom. |
+| **PostgreSQL** | 30 GB | **2–4 TB NVMe SSD** | `ip_rib` current state (~100–150M rows) + `ip_rib_log` history (2-month retention, the dominant grower) + `base_attrs` + `geo_ip` (~9 GB fixed). OpenBMP guidance: 500 GB main + 1 TB TimescaleDB; add headroom. |
 | **Kafka** | 0.2 GB | **100–500 GB** | 12 h retention; sized for full-table initial-dump bursts × 40 routers |
 | **InfluxDB (telemetry)** | minimal | **50–200 GB** | 40 routers × ~50–200 interfaces × 10 s gNMI × 30 d; compresses well |
 | **Total** | — | **~3–5 TB fast NVMe** | Use NVMe; PostgreSQL random-IO under churn is the bottleneck on slow disks |
 Put the PostgreSQL data directory and the TimescaleDB tablespace on NVMe.
-`ip_rib_log` 4-week retention is the main storage tuning knob — revisit once
+`ip_rib_log` retention (2 months in the lab) is the main storage tuning knob
-production update volume is measured.
+— revisit once production update volume is measured; halving it roughly
 halves the dominant history table.
 ## Architecture
@ -72,6 +92,36 @@ Whichever layout: every service already carries a Compose `mem_limit` — raise
 `PSQL_MEM_LIMIT` / `PSQL_APP_MEM_LIMIT` / `KAFKA_MEM_LIMIT` in `.env` for the
 production hosts.
 ## Horizontal scaling — where it actually helps
 The ingestion bottleneck is **not** the collector or Kafka — it is the
 `psql-app` consumer writing to PostgreSQL, and ultimately **disk IOPS**.
 Plan scaling accordingly:
 - **Scale `psql-app` as a Kafka consumer group.** Run multiple `psql-app`
  containers with the **same group ID**; Kafka rebalances partitions across
  them and fails over automatically. This is the real throughput lever and
  also provides HA. **Hard cap = Kafka partition count** — the compose sets
  `KAFKA_NUM_PARTITIONS: 8`, so ≤ 8 useful instances. **Raise the partition
  count before scaling past a few consumers** — it cannot easily be reduced
  later.
 - **Disk IOPS is the named bottleneck.** Target **≥ 5000 IOPS** (NVMe) for
  the PostgreSQL store; this buys more headroom than any container count.
 - **Multiple collectors are an HA / locality decision, not a throughput
  one.** A BMP session is one stateful TCP connection and cannot be load
  balanced — you distribute routers by pointing each router's `bmp server`
  config at a specific collector. All collectors feed one Kafka. Shard
  collectors for fault isolation / POP locality, not for performance, and
  note a dead collector's routers go dark until reconfigured (no auto-
  failover at the collector tier).
 - Within one `psql-app`, writer threads already auto-scale per type
  (`writer_max_threads_per_type`); the consumer-group is the across-instance
  layer on top.
 Bursts (every collector restart triggers simultaneous full-table dumps from
 all peers) are absorbed by Kafka — size Kafka retention so a slow consumer
 never loses data during a convergence storm.
 ## PostgreSQL tuning
 - `shared_buffers` ≈ 25% of host RAM; large `effective_cache_size`.
--- a/postgres/scripts/007_obmp_evpn.sql
+++ b/postgres/scripts/007_obmp_evpn.sql
@ -0,0 +1,44 @@
 -- BGP EVPN RIB table (roadmap E5)
 --
 -- The OpenBMP collector already decodes EVPN and emits the
 -- 'openbmp.parsed.evpn' Kafka topic, but the stock psql-app consumer never
 -- subscribes to it and the base schema has no table for it. This table is
 -- the landing zone; a dedicated consumer (obmp-evpn-consumer, separate)
 -- subscribes to the topic and writes here.
 --
 -- Mirrors l3vpn_rib conventions. route_type is derived by the consumer from
 -- which fields are populated (the parsed EVPN message has no explicit type),
 -- so it is nullable.
 CREATE TABLE IF NOT EXISTS evpn_rib (
    hash_id                uuid NOT NULL,
    base_attr_hash_id      uuid,
    peer_hash_id           uuid NOT NULL,
    rd                     varchar(128) NOT NULL,
    rd_type                smallint,
    route_type             smallint,                 -- EVPN route type 1..5
    origin_as              bigint,
    eth_segment_id         varchar(255),             -- ESI
    eth_tag_id             bigint,
    mac                    macaddr,
    mac_len                smallint,
    ip                     inet,
    ip_len                 smallint,
    orig_router_ip         inet,
    mpls_label1            bigint,                   -- VXLAN VNI when encap = vxlan
    mpls_label2            bigint,
    ext_community_list     varchar(50)[],            -- route-targets
    path_id                bigint,
    timestamp              timestamp(6) without time zone NOT NULL DEFAULT (now() AT TIME ZONE 'utc'),
    first_added_timestamp  timestamp(6) without time zone NOT NULL DEFAULT (now() AT TIME ZONE 'utc'),
    iswithdrawn            boolean NOT NULL DEFAULT false,
    isprepolicy            boolean NOT NULL DEFAULT true,
    isadjribin             boolean NOT NULL DEFAULT true,
    PRIMARY KEY (peer_hash_id, hash_id)
 );
 CREATE INDEX IF NOT EXISTS evpn_rib_hash_id_idx    ON evpn_rib (hash_id);
 CREATE INDEX IF NOT EXISTS evpn_rib_base_attr_idx  ON evpn_rib (base_attr_hash_id);
 CREATE INDEX IF NOT EXISTS evpn_rib_rd_idx         ON evpn_rib (rd);
 CREATE INDEX IF NOT EXISTS evpn_rib_route_type_idx ON evpn_rib (route_type);
 CREATE INDEX IF NOT EXISTS evpn_rib_mac_idx        ON evpn_rib (mac);
 CREATE INDEX IF NOT EXISTS evpn_rib_extcomm_idx    ON evpn_rib USING gin (ext_community_list);
 CREATE INDEX IF NOT EXISTS evpn_rib_timestamp_idx  ON evpn_rib ("timestamp");