Add evpn_rib schema; update production sizing with measured data
- postgres/scripts/007_obmp_evpn.sql: the evpn_rib landing table (roadmap E5 step 1), applied to the live DB. Mirrors l3vpn_rib; a dedicated consumer will populate it. - production-sizing.md: corrected retention figures to the actual policy values, added a measured-data section (one full feed ≈ +5 GB current state; DB now ~30 GB), and a horizontal-scaling section — the bottleneck is the psql-app consumer + disk IOPS, so scale psql-app as a Kafka consumer group (cap = partition count), treat multi-collector as HA/locality not throughput. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
c18d11a48f
commit
2d83d6c02e
@ -14,11 +14,30 @@ Derived from the OpenBMP `psql-app` sizing guidance and measured lab behavior.
|
|||||||
| Routes per full feed | ~1.2M (≈1M IPv4 + ~0.2M IPv6) |
|
| Routes per full feed | ~1.2M (≈1M IPv4 + ~0.2M IPv6) |
|
||||||
| **Estimated total NLRIs** | **~100–150M** in Adj-RIB-In |
|
| **Estimated total NLRIs** | **~100–150M** in Adj-RIB-In |
|
||||||
| Telemetry | gNMI via Telegraf → InfluxDB, ~50–200 interfaces/router, 10 s interval |
|
| Telemetry | gNMI via Telegraf → InfluxDB, ~50–200 interfaces/router, 10 s interval |
|
||||||
| History retention | `ip_rib_log` 4 weeks, LS logs 4 months, `peer_event_log` 1 year |
|
| History retention | `ip_rib_log` 2 months, LS logs 8 weeks, `peer_event_log` 4 months (lab policy defaults; tunable) |
|
||||||
|
|
||||||
The NLRI estimate (40 × ~2.5 feeds × 1.2M) places this deployment at the top
|
The NLRI estimate (40 × ~2.5 feeds × 1.2M) places this deployment at the top
|
||||||
of the OpenBMP `psql-app` guidance tier (150M NLRIs → 64 GB heap).
|
of the OpenBMP `psql-app` guidance tier (150M NLRIs → 64 GB heap).
|
||||||
|
|
||||||
|
## Measured data point (lab, 2026)
|
||||||
|
|
||||||
|
Real numbers from the lab after adding **one** full-table feed (GoBGP →
|
||||||
|
AS57355, ~1.04M IPv4 + ~0.25M IPv6 routes):
|
||||||
|
|
||||||
|
| Metric | Before feed | After 1 full feed |
|
||||||
|
|--------|-------------|-------------------|
|
||||||
|
| `openbmp` DB size | ~25 GB | **~30 GB** |
|
||||||
|
| `ip_rib` (current state) | small | 5.3 GB |
|
||||||
|
| `ip_rib_log` (history hypertable) | — | 7.75 GB, 82/97 chunks compressed |
|
||||||
|
| `base_attrs` | ~1 GB | 2.3 GB |
|
||||||
|
| `geo_ip` (fixed reference data) | 8.8 GB | 8.8 GB |
|
||||||
|
|
||||||
|
So **one full feed ≈ +5 GB current-state**, plus history that accrues against
|
||||||
|
the 2-month `ip_rib_log` retention. The ~1.3M-route initial dump ingested in
|
||||||
|
minutes with no Kafka consumer lag. Extrapolating linearly, 40 routers × ~2.5
|
||||||
|
feeds ≈ 100 feed-equivalents → on the order of **0.5 TB current state** before
|
||||||
|
history and indexes; the 2–4 TB storage target below holds with headroom.
|
||||||
|
|
||||||
## BMP RIB scope — recommendation
|
## BMP RIB scope — recommendation
|
||||||
|
|
||||||
**Deploy with Adj-RIB-In only.** It is the OpenBMP default, is what every
|
**Deploy with Adj-RIB-In only.** It is the OpenBMP default, is what every
|
||||||
@ -48,14 +67,15 @@ advertises. Alternatives and their cost:
|
|||||||
|
|
||||||
| Store | Lab today | Production target | Notes |
|
| Store | Lab today | Production target | Notes |
|
||||||
|-------|-----------|-------------------|-------|
|
|-------|-----------|-------------------|-------|
|
||||||
| **PostgreSQL** | 25 GB | **2–4 TB NVMe SSD** | `ip_rib` current state (~100–150M rows) + `ip_rib_log` history (4-week retention, the dominant grower) + `base_attrs` + `geo_ip` (~7 GB fixed). OpenBMP guidance: 500 GB main + 1 TB TimescaleDB; add headroom. |
|
| **PostgreSQL** | 30 GB | **2–4 TB NVMe SSD** | `ip_rib` current state (~100–150M rows) + `ip_rib_log` history (2-month retention, the dominant grower) + `base_attrs` + `geo_ip` (~9 GB fixed). OpenBMP guidance: 500 GB main + 1 TB TimescaleDB; add headroom. |
|
||||||
| **Kafka** | 0.2 GB | **100–500 GB** | 12 h retention; sized for full-table initial-dump bursts × 40 routers |
|
| **Kafka** | 0.2 GB | **100–500 GB** | 12 h retention; sized for full-table initial-dump bursts × 40 routers |
|
||||||
| **InfluxDB (telemetry)** | minimal | **50–200 GB** | 40 routers × ~50–200 interfaces × 10 s gNMI × 30 d; compresses well |
|
| **InfluxDB (telemetry)** | minimal | **50–200 GB** | 40 routers × ~50–200 interfaces × 10 s gNMI × 30 d; compresses well |
|
||||||
| **Total** | — | **~3–5 TB fast NVMe** | Use NVMe; PostgreSQL random-IO under churn is the bottleneck on slow disks |
|
| **Total** | — | **~3–5 TB fast NVMe** | Use NVMe; PostgreSQL random-IO under churn is the bottleneck on slow disks |
|
||||||
|
|
||||||
Put the PostgreSQL data directory and the TimescaleDB tablespace on NVMe.
|
Put the PostgreSQL data directory and the TimescaleDB tablespace on NVMe.
|
||||||
`ip_rib_log` 4-week retention is the main storage tuning knob — revisit once
|
`ip_rib_log` retention (2 months in the lab) is the main storage tuning knob
|
||||||
production update volume is measured.
|
— revisit once production update volume is measured; halving it roughly
|
||||||
|
halves the dominant history table.
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
@ -72,6 +92,36 @@ Whichever layout: every service already carries a Compose `mem_limit` — raise
|
|||||||
`PSQL_MEM_LIMIT` / `PSQL_APP_MEM_LIMIT` / `KAFKA_MEM_LIMIT` in `.env` for the
|
`PSQL_MEM_LIMIT` / `PSQL_APP_MEM_LIMIT` / `KAFKA_MEM_LIMIT` in `.env` for the
|
||||||
production hosts.
|
production hosts.
|
||||||
|
|
||||||
|
## Horizontal scaling — where it actually helps
|
||||||
|
|
||||||
|
The ingestion bottleneck is **not** the collector or Kafka — it is the
|
||||||
|
`psql-app` consumer writing to PostgreSQL, and ultimately **disk IOPS**.
|
||||||
|
Plan scaling accordingly:
|
||||||
|
|
||||||
|
- **Scale `psql-app` as a Kafka consumer group.** Run multiple `psql-app`
|
||||||
|
containers with the **same group ID**; Kafka rebalances partitions across
|
||||||
|
them and fails over automatically. This is the real throughput lever and
|
||||||
|
also provides HA. **Hard cap = Kafka partition count** — the compose sets
|
||||||
|
`KAFKA_NUM_PARTITIONS: 8`, so ≤ 8 useful instances. **Raise the partition
|
||||||
|
count before scaling past a few consumers** — it cannot easily be reduced
|
||||||
|
later.
|
||||||
|
- **Disk IOPS is the named bottleneck.** Target **≥ 5000 IOPS** (NVMe) for
|
||||||
|
the PostgreSQL store; this buys more headroom than any container count.
|
||||||
|
- **Multiple collectors are an HA / locality decision, not a throughput
|
||||||
|
one.** A BMP session is one stateful TCP connection and cannot be load
|
||||||
|
balanced — you distribute routers by pointing each router's `bmp server`
|
||||||
|
config at a specific collector. All collectors feed one Kafka. Shard
|
||||||
|
collectors for fault isolation / POP locality, not for performance, and
|
||||||
|
note a dead collector's routers go dark until reconfigured (no auto-
|
||||||
|
failover at the collector tier).
|
||||||
|
- Within one `psql-app`, writer threads already auto-scale per type
|
||||||
|
(`writer_max_threads_per_type`); the consumer-group is the across-instance
|
||||||
|
layer on top.
|
||||||
|
|
||||||
|
Bursts (every collector restart triggers simultaneous full-table dumps from
|
||||||
|
all peers) are absorbed by Kafka — size Kafka retention so a slow consumer
|
||||||
|
never loses data during a convergence storm.
|
||||||
|
|
||||||
## PostgreSQL tuning
|
## PostgreSQL tuning
|
||||||
|
|
||||||
- `shared_buffers` ≈ 25% of host RAM; large `effective_cache_size`.
|
- `shared_buffers` ≈ 25% of host RAM; large `effective_cache_size`.
|
||||||
|
|||||||
44
postgres/scripts/007_obmp_evpn.sql
Normal file
44
postgres/scripts/007_obmp_evpn.sql
Normal file
@ -0,0 +1,44 @@
|
|||||||
|
-- BGP EVPN RIB table (roadmap E5)
|
||||||
|
--
|
||||||
|
-- The OpenBMP collector already decodes EVPN and emits the
|
||||||
|
-- 'openbmp.parsed.evpn' Kafka topic, but the stock psql-app consumer never
|
||||||
|
-- subscribes to it and the base schema has no table for it. This table is
|
||||||
|
-- the landing zone; a dedicated consumer (obmp-evpn-consumer, separate)
|
||||||
|
-- subscribes to the topic and writes here.
|
||||||
|
--
|
||||||
|
-- Mirrors l3vpn_rib conventions. route_type is derived by the consumer from
|
||||||
|
-- which fields are populated (the parsed EVPN message has no explicit type),
|
||||||
|
-- so it is nullable.
|
||||||
|
CREATE TABLE IF NOT EXISTS evpn_rib (
|
||||||
|
hash_id uuid NOT NULL,
|
||||||
|
base_attr_hash_id uuid,
|
||||||
|
peer_hash_id uuid NOT NULL,
|
||||||
|
rd varchar(128) NOT NULL,
|
||||||
|
rd_type smallint,
|
||||||
|
route_type smallint, -- EVPN route type 1..5
|
||||||
|
origin_as bigint,
|
||||||
|
eth_segment_id varchar(255), -- ESI
|
||||||
|
eth_tag_id bigint,
|
||||||
|
mac macaddr,
|
||||||
|
mac_len smallint,
|
||||||
|
ip inet,
|
||||||
|
ip_len smallint,
|
||||||
|
orig_router_ip inet,
|
||||||
|
mpls_label1 bigint, -- VXLAN VNI when encap = vxlan
|
||||||
|
mpls_label2 bigint,
|
||||||
|
ext_community_list varchar(50)[], -- route-targets
|
||||||
|
path_id bigint,
|
||||||
|
timestamp timestamp(6) without time zone NOT NULL DEFAULT (now() AT TIME ZONE 'utc'),
|
||||||
|
first_added_timestamp timestamp(6) without time zone NOT NULL DEFAULT (now() AT TIME ZONE 'utc'),
|
||||||
|
iswithdrawn boolean NOT NULL DEFAULT false,
|
||||||
|
isprepolicy boolean NOT NULL DEFAULT true,
|
||||||
|
isadjribin boolean NOT NULL DEFAULT true,
|
||||||
|
PRIMARY KEY (peer_hash_id, hash_id)
|
||||||
|
);
|
||||||
|
CREATE INDEX IF NOT EXISTS evpn_rib_hash_id_idx ON evpn_rib (hash_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS evpn_rib_base_attr_idx ON evpn_rib (base_attr_hash_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS evpn_rib_rd_idx ON evpn_rib (rd);
|
||||||
|
CREATE INDEX IF NOT EXISTS evpn_rib_route_type_idx ON evpn_rib (route_type);
|
||||||
|
CREATE INDEX IF NOT EXISTS evpn_rib_mac_idx ON evpn_rib (mac);
|
||||||
|
CREATE INDEX IF NOT EXISTS evpn_rib_extcomm_idx ON evpn_rib USING gin (ext_community_list);
|
||||||
|
CREATE INDEX IF NOT EXISTS evpn_rib_timestamp_idx ON evpn_rib ("timestamp");
|
||||||
Loading…
x
Reference in New Issue
Block a user