31 Commits

Author SHA1 Message Date
sam
cffb835f30 Enable IPv6 feed: run GoBGP in host network mode
The IPv6 eBGP session never established because the Docker bridge
has no IPv6. Switch the gobgp container to network_mode: host so it
uses the host's real dual-stack connectivity — both sessions to
AS57355 now source from the host's public v4/v6 addresses.

Host mode binds the host's port namespace, so disable GoBGP's
inbound BGP listener (port = -1) — we only originate outbound
sessions, and a non-root container cannot bind privileged port 179.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 08:08:55 -07:00
sam
88a5546e29 Add GoBGP full-table feed container (roadmap E1)
New gobgp service: GoBGP peers eBGP-multihop with the AS57355 lab
route server (Bromirski) for the full real IPv4 + IPv6 Internet table
and BMP-exports it to the OpenBMP collector, landing in ip_rib as a
monitored peer.

Config follows the route server's published peering spec: local AS
65001, no password, keepalive 3600 / hold-time 7200, IPv4 feed on the
v4 session and IPv6 feed on the v6 session. gobgp/mrt-refresh.sh is a
cron-safe fallback that injects RouteViews MRT RIB dumps when the live
session is down. The live BGP session is not started here — bringing
gobgp up establishes the external session and loads ~1M routes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 07:39:12 -07:00
sam
9d74940614 Fix ExaBGP OOM, add container health checks and resource monitoring
RCA: the exabgp container was OOM-killed — its 512m mem_limit was far too
small for the full-table feature (900K route objects in memory). Raises the
limit to a parameterized 6g default (EXABGP_MEM_LIMIT).

Adds Docker healthchecks to 14 services (port/HTTP probes) so unhealthy
containers are visible. Adds a Telegraf docker input that collects per-
container CPU/memory/IO into InfluxDB, plus a "Stack Resources" dashboard —
so resource pressure is caught before it causes an OOM crash. telegraf runs
with an overridden entrypoint so it keeps root and can read the docker socket.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 22:03:52 -07:00
sam
a662496e53 Fix telemetry dashboard variables and parameterize gNMI targets
The telemetry dashboards' router/interface variables used a keep|distinct
Flux pattern that returned only one source; switch to schema.tagValues so all
streaming routers and interfaces are listed. Parameterize telegraf.conf gNMI
addresses and credentials via GNMI_ADDRESSES/GNMI_USERNAME/GNMI_PASSWORD so
the telemetry fleet can scale without editing the config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 21:10:57 -07:00
sam
4e9bd7cc5a Add container memory limits to all services
Sets mem_limit on every service to cap the OOM/swap-exhaustion risk (the lab
host had only 5 MiB swap free). The three heavy services (psql, kafka,
psql-app) read their limits from .env so production can raise them; the rest
use lab-appropriate fixed values. Total ~25 GB, leaving headroom on the 31 GB
lab host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:04:37 -07:00
sam
cf4e5b07c6 Add Compose profiles, setup.sh bootstrap, and config templates for portable deployment
Pins the Compose project name and splits services into core / test / auth
profiles so the BMP collector core can deploy standalone. Adds setup.sh
(idempotent bootstrap), .env.example, and repo-resident Authelia config
templates so a fresh host deploys without manual steps. Parameterizes
hardcoded host IP and domain; points the Grafana InfluxDB datasource at the
container name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 19:21:04 -07:00
sam
45f4c9859d Add Authelia auth gateway, portal landing page, and subpath routing
Adds Authelia (forward-auth) and nginx portal container for single-endpoint
authenticated access via Caddy reverse proxy. Configures Grafana auth proxy
for header-based auto-login. Updates Vue UI base paths and API routes for
/exabgp/ and /traffic/ subpath serving. Adds traffic-gen responder container
on dedicated Docker network.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 14:23:09 -07:00
sam
dcebf15bb3 Add Phase 4: gNMI streaming telemetry and traffic generator
- gNMI integration: NETCONF script to enable gRPC on all 9 routers,
  Telegraf container with gnmi input plugin, InfluxDB for time-series
  storage, 3 Grafana telemetry dashboards (utilization, errors, combined)
- Traffic generator: Scapy-based dual-mode container (sender/responder)
  with Flask API, RFC 2544 test suite (throughput, latency, frame-loss,
  back-to-back), Vue 3 web UI with flow builder, test runner, real-time
  stats monitor, and results export
- docker-compose.yml updated with influxdb, telegraf, traffic-gen,
  traffic-gen-ui services
- Full documentation in DOCS.md sections 15-16

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 15:29:44 -07:00
sam
6621942032 Add Phase 2: Vue 3 control panel, 6 learning dashboards, new BGP scenarios
- exabgp-ui/: Vue 3 + Vite SPA served by NGINX on :5001; proxies /api/ to
  ExaBGP Flask on :5050; includes StatusBar, ScenarioPanel, RouteTable,
  AnnounceForm, PeerStatus, ChurnControl components
- docker-compose.yml: add obmp-exabgp-ui service (host network, port 5001)
- exabgp/scenarios/__init__.py: add convergence_test, route_leak,
  hijack_simulation scenarios for structured BGP learning exercises
- exabgp/inject.py: add 'peers' and 'monitor' subcommands; live-refresh
  terminal status view with ANSI cursor repositioning
- obmp-grafana/dashboards/Learning/: 6 new OBMP-Learning dashboards
  (update rate, peer health, AS path, RPKI, churn, attributes)
- obmp-grafana/provisioning/dashboards/openbmp-dashboards.yml: add
  OpenBMP-Learning folder provider pointing to dashboards/Learning/
- DOCS.md: document Web UI, 3 new scenarios, 6 learning dashboards;
  fix section numbering (10-14) and architecture diagram (23 dashboards)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 15:37:16 -07:00
sam
233dadbb41 Add ExaBGP route injector, Grafana dashboards, and full documentation
- Add exabgp/ container: ExaBGP 5.x + Flask REST API for on-demand BGP
  route injection into CML IOS-XR lab (AS 65020 via eBGP from AS 65100)
- Add 6 injection scenarios: internet_sample, churn, blackhole, anycast,
  full_table, lab_prefixes
- Add inject.py CLI wrapper for the ExaBGP API
- Add iosxr_bgp_config.md with IOS-XR neighbor config and NETCONF script
- Add obmp-grafana/ dashboards and provisioning (17 dashboards)
- Update docker-compose.yml: add exabgp service, fix Kafka external
  listener IP, extend log retention from 90min to 720min
- Add DOCS.md: full project documentation including architecture, setup,
  user guide, sanity checks, troubleshooting, and command reference
- Update .gitignore: exclude .env and .claude/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 14:46:37 -07:00
Tim Evens
3f38af5312 Version 2.2.3 updates
* collector v2.2.3
* collector using debian-stable-slim
* dev-image updated to use debian-stable-slim
* Upgraded librdkafka to v1.9.2
* Fixed permission problems with postgres
* Grafana upgraded to 9.1.7
* psql-app v2.2.2
* postgres updated to use timescaledb-ha:pg14-ts2.8
2022-10-20 07:12:08 -07:00
Tim Evens
0f3312a719 Updates for v2.2.1 2022-06-17 18:20:05 -07:00
Tim Evens
e19e5ac73a Fix psql container rm file issue 2022-06-10 12:53:24 -07:00
Tim Evens
237345b476 Add ENABLE_DBIP to ``psql-app`` container to auto import DB-IP geo data 2022-06-08 14:53:55 -07:00
Tim Evens
84bec5293b version 2.2.0 updates 2022-06-08 11:53:55 -07:00
Tim Evens
e7fad858d9 Change global_ip_rib function cron job 2022-05-17 10:38:19 -07:00
Tim Evens
eb52eace41 Updates for 2.1.1 2022-03-31 12:13:46 -07:00
Tim Evens
c5f3d6ef59 2.1.1 Updates
* Update psql-app container to use MEM for heap setting
  This fixes issue where psql-app would run out of memory
* Update psql-app container to restart psql consumer if
  if stops.  This handles restart on out of memory exit.
2022-03-28 15:51:15 -07:00
Tim Evens
0a0d2ceec1 2.1.1 updates
* Fix vpnv6/l3vpn next-hop decoding
* Fix ip_rib_log enabling compression to be after hypertable creation
* Add pg_cron to postgres container
* Upgraded postgres container to timescaledb 2.6.0-pg14
2022-03-28 12:43:37 -07:00
Tim Evens
b9b8c44713 Change max_wal_size to 10GB by default and add missing upgrade sql file 2022-03-09 10:48:58 -08:00
Tim Evens
05737d2682 v2.1.0 updates
* Add peeringdb script and cron job
* Fix running more than one cronjob at a time
* Update upgrade script for psql-app
2022-03-04 07:27:23 -08:00
Tim Evens
492c000ce9 Add whois and upgrade to 2.1.0 2022-02-22 14:30:05 -08:00
Tim Evens
fd2874d00e Fix for collector kafka startup issue
When first deploying the collector and kafka, it takes
kafka a couple minutes to start. In some cases, the
collector would proceed to startup without waiting for
kafka. This resulted in the first few messages to be dropped,
such as dropping the router init and peer up messages.
2022-02-01 12:49:17 -08:00
Tim Evens
a0e6a5bc6f Fixes to psql-app, version 2.0.2 2022-01-31 11:05:58 -08:00
Tim Evens
c3839aa8fb Security fixes, issues resolved, and more
* Upgrades to all containers
* Resolves #7, resolves #6, resolves #2
* Compose changed to use versions instead of latest
* OBMP containers now use a version tag instead of build numbers
2022-01-28 15:12:01 -08:00
sydon7
cd25509e39 adding cron drops for all timeseries tables 2021-07-30 22:55:53 +00:00
sydon7
fc362aab60 rpki updates 2021-04-30 14:14:27 +00:00
Tim Evens
eba244cdf7 Fix postgres to create ts. Update compose to use latest 2021-03-31 00:13:09 -07:00
Tim Evens
c61f766cc3 Adjust defaults in compose and fix postgres mem setting 2021-03-30 22:31:06 -07:00
Tim Evens
74154229ad more changes to compose 2021-03-30 19:00:25 -07:00
Tim Evens
574bf5e8a9 Add psql-app conatainer and docker compose 2021-03-30 14:25:24 -07:00