Documents compute, memory, and storage requirements for a production
deployment: ~100-150M NLRI estimate, 96-128 GB RAM, 16-32 vCPU, 3-5 TB NVMe,
a split-host architecture option, PostgreSQL tuning, and a BMP RIB-scope
recommendation (Adj-RIB-In only initially).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NOC Overview is the new flagship operator landing dashboard — health
scorecards, peer session timeline, BGP update rate, and attention tables for
peers down, churning prefixes, RPKI invalids, and topology changes. All counts
come from stats_* aggregate tables so it stays fast at production scale.
OBMP-Home is rebuilt as a lightweight navigation hub pointing at NOC Overview.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sets mem_limit on every service to cap the OOM/swap-exhaustion risk (the lab
host had only 5 MiB swap free). The three heavy services (psql, kafka,
psql-app) read their limits from .env so production can raise them; the rest
use lab-appropriate fixed values. Total ~25 GB, leaving headroom on the 31 GB
lab host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generalizes exabgp/startup.sh to template BGP neighbors from an EXABGP_PEERS
list (ip:peer_as:description), so ExaBGP peers with multiple labs. Adds
cml/proxmox_bmp_config.py to apply the bmp server block to a lab's IOS-XR
routers over SSH (BMP config is not exposed via NETCONF YANG on current XR).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins the Compose project name and splits services into core / test / auth
profiles so the BMP collector core can deploy standalone. Adds setup.sh
(idempotent bootstrap), .env.example, and repo-resident Authelia config
templates so a fresh host deploys without manual steps. Parameterizes
hardcoded host IP and domain; points the Grafana InfluxDB datasource at the
container name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four-track roadmap covering configuration centralization (inventory.yaml),
CML API automation (virl2_client), production ISP deployment (multi-vendor
IOS-XR + Junos), and packaging for distribution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CML 2.9 node definitions for XRd Control-Plane (third RR) and ExaBGP route
injector as Docker-based CML nodes. Includes build scripts to export Docker
images as tars for CML import, with IOS-XR startup configs for IS-IS, BGP,
and BMP.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dashboard compares Adj-RIB-In tables between two Route Reflectors via BMP,
showing missing prefixes, attribute diffs (next-hop, AS path), and per-client
consistency. Route diversity script deploys 29 prefixes across R9K-01-07 via
NETCONF to create verifiable next-hop differences between RRs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Authelia (forward-auth) and nginx portal container for single-endpoint
authenticated access via Caddy reverse proxy. Configures Grafana auth proxy
for header-based auto-login. Updates Vue UI base paths and API routes for
/exabgp/ and /traffic/ subpath serving. Adds traffic-gen responder container
on dedicated Docker network.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds sender/responder mode switching via API, QuickPing component, echo-mode
responder with dedicated container, improved flow state sync, and RFC2544
test runner enhancements. Includes UI improvements across all traffic-gen
components.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch Telegraf from native IOS-XR YANG paths to OpenConfig
(openconfig-interfaces:interfaces/interface/state/counters)
- Use json_ietf encoding instead of proto (IOS-XR 24.3.1 compat)
- Target only CORE-01/CORE-02 (R9K routers blocked by CML mgmt net)
- Update all 3 Grafana dashboard queries to match OpenConfig field
names (in-octets, out-octets, in-pkts, out-pkts, in-errors, etc.)
- Rewrite gnmi_grpc_config.py to use SSH/CLI via paramiko instead of
NETCONF (IOS-XR 24.3.1 rejects NETCONF gRPC edit-config)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- gNMI integration: NETCONF script to enable gRPC on all 9 routers,
Telegraf container with gnmi input plugin, InfluxDB for time-series
storage, 3 Grafana telemetry dashboards (utilization, errors, combined)
- Traffic generator: Scapy-based dual-mode container (sender/responder)
with Flask API, RFC 2544 test suite (throughput, latency, frame-loss,
back-to-back), Vue 3 web UI with flow builder, test runner, real-time
stats monitor, and results export
- docker-compose.yml updated with influxdb, telegraf, traffic-gen,
traffic-gen-ui services
- Full documentation in DOCS.md sections 15-16
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
IOS-XR 9000v in CML uses igp_metric=16000000 on all IS-IS links.
The stock dashboard filter (< 16000000) excluded all links, making
the Node dropdown empty and topology panel show no data. Changed
to <= 16777215 (IS-IS wide metric max) so lab links are included.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- exabgp/bgpls_config.py: NETCONF script that audits and fixes BGP-LS
config on all 9 spoke routers; adds IS-IS distribute and lsls AF
activation toward both COREs where missing; handles routers needing
global AF initialization before per-neighbor activation
- exabgp/startup.sh: add neighbor-changes to ExaBGP api blocks so peer
up/down events are sent to Flask server.py stdin
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* collector v2.2.3
* collector using debian-stable-slim
* dev-image updated to use debian-stable-slim
* Upgraded librdkafka to v1.9.2
* Fixed permission problems with postgres
* Grafana upgraded to 9.1.7
* psql-app v2.2.2
* postgres updated to use timescaledb-ha:pg14-ts2.8
* Update psql-app container to use MEM for heap setting
This fixes issue where psql-app would run out of memory
* Update psql-app container to restart psql consumer if
if stops. This handles restart on out of memory exit.
When first deploying the collector and kafka, it takes
kafka a couple minutes to start. In some cases, the
collector would proceed to startup without waiting for
kafka. This resulted in the first few messages to be dropped,
such as dropping the router init and peer up messages.
* Upgrades to all containers
* Resolves#7, resolves#6, resolves#2
* Compose changed to use versions instead of latest
* OBMP containers now use a version tag instead of build numbers