Explores the real DFZ table received from the AS57355 route server
via the GoBGP feed (the '$feed' / GoBGP BMP peer): IPv4/IPv6 prefix
counts, distinct origin ASes, prefix-length distribution, top origin
ASes by prefix count, and an overlap-based prefix lookup. Serves as
the comparison baseline for the Router Diff dashboard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GoBGP's BMP config requires a literal IP — 'obmp-collector' failed
to parse and the container crash-looped. Point BMP export at the
docker host IP (10.40.40.202) where the collector publishes port
5000; stable across container recreation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generalizes the RR-specific RR Loc-RIB Diff into a comparison of up
to four selectable routers. router1/router2 are required; router3/
router4 default to a '-- none --' sentinel and drop cleanly out of
every query (no empty-IN, no dangling predicates).
Panels: per-router prefix counts, divergent-prefix count, a presence
matrix (row per prefix, column per router, cell = best-path next-hop),
a divergence detail table classifying missing / next-hop / AS-path
disagreement, and a per-prefix all-paths drill-down. Once the GoBGP
global feed (E1) is up, GLOBAL-FEED is selectable as any of the four
for lab-vs-Internet diffing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New gobgp service: GoBGP peers eBGP-multihop with the AS57355 lab
route server (Bromirski) for the full real IPv4 + IPv6 Internet table
and BMP-exports it to the OpenBMP collector, landing in ip_rib as a
monitored peer.
Config follows the route server's published peering spec: local AS
65001, no password, keepalive 3600 / hold-time 7200, IPv4 feed on the
v4 session and IPv6 feed on the v6 session. gobgp/mrt-refresh.sh is a
cron-safe fallback that injects RouteViews MRT RIB dumps when the live
session is down. The live BGP session is not started here — bringing
gobgp up establishes the external session and loads ~1M routes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan for a local full-Internet routing table, a generalized N-way
router diff, and VRF/RD scoping:
- E1: GoBGP container peering AS57355 (Bromirski lab route server)
for a live full v4/v6 table, MRT RIB dumps as a 2-hourly fallback,
BMP-exported into ip_rib as a GLOBAL-FEED peer.
- E2: generic up-to-4-router diff dashboard (presence matrix),
generalized from the RR-specific rr_locrib_diff.
- E3: global table exploration dashboard.
- E4: VRF/RD scoping across unicast + L3VPN dashboards (built to
schema; not lab-verifiable with CML IOS-XR).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The AS map previously exploded ~4.4M base_attrs AS_PATH rows live,
three times per load (one per panel), ~1.8s each — slow enough that
navigating away cancelled the queries mid-flight.
Add mv_as_adjacency: undirected consecutive-AS pairs with occurrence
counts over the full RIB (17k rows), refreshed hourly by pg_cron via
REFRESH ... CONCURRENTLY. The dashboard panels now read the view in
~1ms. Min-occurrence options rescaled for full-RIB counts
(2000/5000/10000/50000, default 2000 -> ~63-node graph).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The node graph rendered blank because the two CML/PROX labs formed
two disconnected components (iBGP-only meshes within each lab), and
Grafana's nodeGraph layout renders nothing for a disconnected graph.
Match BGP sessions to monitored routers by peer IP as well as peer
BGP-ID, so the real cross-lab eBGP sessions become graph edges. The
graph is now one connected component (30 iBGP + 4 eBGP edges) and
lays out. The companion external-neighbours table uses the same
peer-IP check so those sessions are no longer double-listed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The node graph rendered empty because the edges target returned a
string mainstat ('iBGP'/'eBGP'). Grafana's nodeGraph treats edge
mainStat as numeric for layout/labelling; a string value silently
breaks the layout so no nodes are drawn (the working LS map and the
original ls_topo both cast edge mainstat to an integer).
Edge mainstat is now COUNT(DISTINCT feed)::int (BMP peer-feed count
for the router pair); the iBGP/eBGP label moves to secondarystat and
detail__session_type, which accept strings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the route-reflector membership test to
= ANY(ARRAY[${rr_loopbacks:singlequote}]::text[]) — the singlequote
format is the one already proven to interpolate correctly in this
Grafana instance (rr_locrib_diff uses it), and the ARRAY[...]::text[]
wrapper stays valid (empty array) when the variable resolves empty.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The protocol variable has includeAll enabled, so Grafana auto-quotes
its value ('IS-IS_L2'); the SQL then wrapped it again, producing
''IS-IS_L2'' and a syntax error that blanked the node graph. Replace
the quoted equality filter with IN ($protocol) — Grafana already
emits a quoted CSV — and make the variable multi-select so "All"
expands cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the rr_loopbacks variable resolved empty, the IN
(${rr_loopbacks:singlequote}) clause expanded to IN (), a SQL
syntax error that blanked the topology panel. Switch to
= ANY(string_to_array('${rr_loopbacks:csv}', ',')), which yields
a no-match (not a syntax error) on an empty variable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Create a new OBMP-Maps Grafana folder (folderUid 1006) with four
data-visualization dashboards built on nodeGraph and geomap panels:
- BGP Peer Map: routers as nodes, BGP sessions as edges; iBGP/eBGP
edge typing and operator-editable rr_loopbacks variable to denote
route reflectors; companion table for sessions to non-monitored
neighbours.
- IGP / Link-State Topology Map: reworked from LinkState-1004 and
moved here (uid preserved); scoped by peer feed / protocol / AS so
the 489-node BGP-LS topology stays readable; SR-capability rings.
- AS Relationship Map: AS adjacency graph from consecutive AS_PATH
pairs over a 200k-route sample; min-occurrence and focus-AS
variables; nodes enriched from info_asn whois.
- Geographic Prefix Map: geomap of RIB prefixes and origin ASes by
IP geolocation, with a note that lab 10.x loopbacks do not
geolocate; bounded geo_ip join via a sample-size variable.
Also add a data link on the Looking Glass ASN Info panel's origin_as
column that jumps to the ASN View dashboard scoped to that AS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
asn_num was a fixed custom variable; converting it to a textbox lets an
operator look up any origin AS and see all of its RIB prefixes, upstreams,
and downstreams.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RCA: the exabgp container was OOM-killed — its 512m mem_limit was far too
small for the full-table feature (900K route objects in memory). Raises the
limit to a parameterized 6g default (EXABGP_MEM_LIMIT).
Adds Docker healthchecks to 14 services (port/HTTP probes) so unhealthy
containers are visible. Adds a Telegraf docker input that collects per-
container CPU/memory/IO into InfluxDB, plus a "Stack Resources" dashboard —
so resource pressure is caught before it causes an OOM crash. telegraf runs
with an overridden entrypoint so it keeps root and can read the docker socket.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The IOS-XR routers negotiate IPv6 unicast capability, but the generated
exabgp.conf declared only ipv4 unicast — producing repeated "route family
(ipv6/unicast) is not configured" errors that crashed ExaBGP. Declaring
ipv6 unicast on the neighbor matches the routers' capabilities and stops
the crash-restart cycle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a panel that flags the next-hop-self-on-an-RR anti-pattern: reflected
routes (those carrying ORIGINATOR_ID) whose NEXT_HOP is an RR loopback while
the route was originated by a different router — meaning the RR rewrote
next-hop to itself and has been pulled into the forwarding path. RR-originated
routes and legitimately-imported eBGP routes (originator == next-hop) are
excluded. An editable rr_loopbacks template variable keeps it environment-
agnostic — useful for validating RR behavior during an IOS-XR to Junos
migration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The telemetry dashboards' router/interface variables used a keep|distinct
Flux pattern that returned only one source; switch to schema.tagValues so all
streaming routers and interfaces are listed. Parameterize telegraf.conf gNMI
addresses and credentials via GNMI_ADDRESSES/GNMI_USERNAME/GNMI_PASSWORD so
the telemetry fleet can scale without editing the config.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a prioritized security-hardening checklist, a PostgreSQL logical-backup
script (pg-backup.sh) with a documented restore procedure, and Grafana
alerting provisioning (peer-down, flap-storm, RPKI-invalid, router-down rules
plus a contact-point template). The alerting YAML and contact points need
operator review before being relied on for paging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reorganizes 31 dashboards into an operator-first structure with real
navigation. Adds Router Detail and Peer Detail drilldown dashboards; merges
LS Nodes+Links and the two L3VPN dashboards; modernizes all deprecated panels
(table-old/graph/worldmap). Every dashboard gets the obmp-nav dropdown so the
whole set is reachable from anywhere. Graduates the operational "Learning"
dashboards into Operations/Routing/LinkState folders, retires the Tops folder,
and relabels folders (Base->Operations, History->Routing, Learning->Reference).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The scorecard and table counted every bgp_peers row in a down state,
including peers removed long ago (OpenBMP never prunes bgp_peers). They now
filter on the peer's last state-change timestamp via $__timeFilter, so the
panel reflects current/recent problems rather than all-time history.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents compute, memory, and storage requirements for a production
deployment: ~100-150M NLRI estimate, 96-128 GB RAM, 16-32 vCPU, 3-5 TB NVMe,
a split-host architecture option, PostgreSQL tuning, and a BMP RIB-scope
recommendation (Adj-RIB-In only initially).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NOC Overview is the new flagship operator landing dashboard — health
scorecards, peer session timeline, BGP update rate, and attention tables for
peers down, churning prefixes, RPKI invalids, and topology changes. All counts
come from stats_* aggregate tables so it stays fast at production scale.
OBMP-Home is rebuilt as a lightweight navigation hub pointing at NOC Overview.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sets mem_limit on every service to cap the OOM/swap-exhaustion risk (the lab
host had only 5 MiB swap free). The three heavy services (psql, kafka,
psql-app) read their limits from .env so production can raise them; the rest
use lab-appropriate fixed values. Total ~25 GB, leaving headroom on the 31 GB
lab host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generalizes exabgp/startup.sh to template BGP neighbors from an EXABGP_PEERS
list (ip:peer_as:description), so ExaBGP peers with multiple labs. Adds
cml/proxmox_bmp_config.py to apply the bmp server block to a lab's IOS-XR
routers over SSH (BMP config is not exposed via NETCONF YANG on current XR).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins the Compose project name and splits services into core / test / auth
profiles so the BMP collector core can deploy standalone. Adds setup.sh
(idempotent bootstrap), .env.example, and repo-resident Authelia config
templates so a fresh host deploys without manual steps. Parameterizes
hardcoded host IP and domain; points the Grafana InfluxDB datasource at the
container name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four-track roadmap covering configuration centralization (inventory.yaml),
CML API automation (virl2_client), production ISP deployment (multi-vendor
IOS-XR + Junos), and packaging for distribution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CML 2.9 node definitions for XRd Control-Plane (third RR) and ExaBGP route
injector as Docker-based CML nodes. Includes build scripts to export Docker
images as tars for CML import, with IOS-XR startup configs for IS-IS, BGP,
and BMP.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dashboard compares Adj-RIB-In tables between two Route Reflectors via BMP,
showing missing prefixes, attribute diffs (next-hop, AS path), and per-client
consistency. Route diversity script deploys 29 prefixes across R9K-01-07 via
NETCONF to create verifiable next-hop differences between RRs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Authelia (forward-auth) and nginx portal container for single-endpoint
authenticated access via Caddy reverse proxy. Configures Grafana auth proxy
for header-based auto-login. Updates Vue UI base paths and API routes for
/exabgp/ and /traffic/ subpath serving. Adds traffic-gen responder container
on dedicated Docker network.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds sender/responder mode switching via API, QuickPing component, echo-mode
responder with dedicated container, improved flow state sync, and RFC2544
test runner enhancements. Includes UI improvements across all traffic-gen
components.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch Telegraf from native IOS-XR YANG paths to OpenConfig
(openconfig-interfaces:interfaces/interface/state/counters)
- Use json_ietf encoding instead of proto (IOS-XR 24.3.1 compat)
- Target only CORE-01/CORE-02 (R9K routers blocked by CML mgmt net)
- Update all 3 Grafana dashboard queries to match OpenConfig field
names (in-octets, out-octets, in-pkts, out-pkts, in-errors, etc.)
- Rewrite gnmi_grpc_config.py to use SSH/CLI via paramiko instead of
NETCONF (IOS-XR 24.3.1 rejects NETCONF gRPC edit-config)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- gNMI integration: NETCONF script to enable gRPC on all 9 routers,
Telegraf container with gnmi input plugin, InfluxDB for time-series
storage, 3 Grafana telemetry dashboards (utilization, errors, combined)
- Traffic generator: Scapy-based dual-mode container (sender/responder)
with Flask API, RFC 2544 test suite (throughput, latency, frame-loss,
back-to-back), Vue 3 web UI with flow builder, test runner, real-time
stats monitor, and results export
- docker-compose.yml updated with influxdb, telegraf, traffic-gen,
traffic-gen-ui services
- Full documentation in DOCS.md sections 15-16
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
IOS-XR 9000v in CML uses igp_metric=16000000 on all IS-IS links.
The stock dashboard filter (< 16000000) excluded all links, making
the Node dropdown empty and topology panel show no data. Changed
to <= 16777215 (IS-IS wide metric max) so lab links are included.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- exabgp/bgpls_config.py: NETCONF script that audits and fixes BGP-LS
config on all 9 spoke routers; adds IS-IS distribute and lsls AF
activation toward both COREs where missing; handles routers needing
global AF initialization before per-neighbor activation
- exabgp/startup.sh: add neighbor-changes to ExaBGP api blocks so peer
up/down events are sent to Flask server.py stdin
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* collector v2.2.3
* collector using debian-stable-slim
* dev-image updated to use debian-stable-slim
* Upgraded librdkafka to v1.9.2
* Fixed permission problems with postgres
* Grafana upgraded to 9.1.7
* psql-app v2.2.2
* postgres updated to use timescaledb-ha:pg14-ts2.8