The bootstrap previously hard-required OBMP_DOMAIN and OBMP_COOKIE_DOMAIN
even when a user just wanted a local lab deployment with Grafana's built-in
login -- those vars only feed Authelia's session-cookie domain and the
public URL it lives behind. On a fresh host with no FQDN this made
./setup.sh impossible to pass without inventing dummy values.
New OBMP_AUTH_MODE=local|authelia in .env (default local) gates the FQDN
validation, Authelia secret generation, Authelia config rendering, and the
auth-profile image pull/build. setup.sh also writes GF_SERVER_ROOT_URL into
.env -- http://HOST_IP:3000/grafana/ for local, https://OBMP_DOMAIN/grafana/
for authelia -- and docker-compose.yml now reads ${GF_SERVER_ROOT_URL}
instead of hardcoding the apodacalab.com fallback.
Back-compat: an existing .env with no OBMP_AUTH_MODE but a real OBMP_DOMAIN
or an existing AUTHELIA_SESSION_SECRET is inferred as 'authelia' and the
mode is persisted -- a re-run on a live Authelia host won't silently flip
it to local and break the next docker compose up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
RCA: the exabgp container was OOM-killed — its 512m mem_limit was far too
small for the full-table feature (900K route objects in memory). Raises the
limit to a parameterized 6g default (EXABGP_MEM_LIMIT).
Adds Docker healthchecks to 14 services (port/HTTP probes) so unhealthy
containers are visible. Adds a Telegraf docker input that collects per-
container CPU/memory/IO into InfluxDB, plus a "Stack Resources" dashboard —
so resource pressure is caught before it causes an OOM crash. telegraf runs
with an overridden entrypoint so it keeps root and can read the docker socket.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The telemetry dashboards' router/interface variables used a keep|distinct
Flux pattern that returned only one source; switch to schema.tagValues so all
streaming routers and interfaces are listed. Parameterize telegraf.conf gNMI
addresses and credentials via GNMI_ADDRESSES/GNMI_USERNAME/GNMI_PASSWORD so
the telemetry fleet can scale without editing the config.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sets mem_limit on every service to cap the OOM/swap-exhaustion risk (the lab
host had only 5 MiB swap free). The three heavy services (psql, kafka,
psql-app) read their limits from .env so production can raise them; the rest
use lab-appropriate fixed values. Total ~25 GB, leaving headroom on the 31 GB
lab host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins the Compose project name and splits services into core / test / auth
profiles so the BMP collector core can deploy standalone. Adds setup.sh
(idempotent bootstrap), .env.example, and repo-resident Authelia config
templates so a fresh host deploys without manual steps. Parameterizes
hardcoded host IP and domain; points the Grafana InfluxDB datasource at the
container name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>