# OpenBMP Production Security Hardening A prioritized checklist for hardening the OpenBMP Docker stack before exposing it to a production ISP network of 40 full-table-edge routers. Work top to bottom — items are ordered roughly by risk reduction per unit effort. This document **recommends** changes. It does not modify `docker-compose.yml` or any running service. Apply the changes in a maintenance window and test. > Threat model in brief: the stack ingests BMP from production routers, stores > the full DFZ in PostgreSQL, and exposes Grafana to operators. The crown > jewels are (a) the database, (b) the Grafana admin plane, and (c) the BMP > ingest port. Everything below protects one of those three. --- ## Priority 0 — Credentials (do this first) Every service currently ships with the placeholder credential `openbmp` and related defaults are committed in `docker-compose.yml`: | Service | Setting | Current value | |---------|---------|---------------| | PostgreSQL | `POSTGRES_USER` / `POSTGRES_PASSWORD` | `openbmp` / `openbmp` | | psql-app | `POSTGRES_PASSWORD` | `openbmp` | | whois | `POSTGRES_PASSWORD` | `openbmp` | | Grafana | `GF_SECURITY_ADMIN_PASSWORD` | `openbmp` | | InfluxDB | `DOCKER_INFLUXDB_INIT_PASSWORD` | `openbmp123` | | InfluxDB | `DOCKER_INFLUXDB_INIT_ADMIN_TOKEN` | `openbmp-telemetry-token` | | Grafana datasource | `secureJsonData.password` | `openbmp` (in `openbmp-ds.yml`) | ### 0.1 Move every secret to `.env` (or a secrets manager) `.env` is git-ignored. As a minimum, replace the hardcoded literals in `docker-compose.yml` with `${VAR}` references and define them in `.env`: ```env # .env — never commit this file POSTGRES_PASSWORD= GF_SECURITY_ADMIN_PASSWORD= INFLUXDB_ADMIN_PASSWORD= INFLUXDB_ADMIN_TOKEN= ``` ```yaml # docker-compose.yml (recommended edit — operator applies) grafana: environment: - GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:?set in .env} psql: environment: - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:?set in .env} ``` The `:?` form makes the stack fail fast if a secret is missing rather than silently falling back to a default. Generate strong values: ```bash openssl rand -base64 32 # passwords openssl rand -hex 32 # tokens ``` ### 0.2 For a real production deployment, use a secrets manager `.env` on disk is better than committed literals, but it is still a plaintext file readable by anyone in the `docker` group. For production: - **Docker Compose secrets** (`secrets:` block, files mounted at `/run/secrets/...`) — the lowest-friction upgrade; keep the secret files outside the repo, `chmod 600`, owned by root. - **HashiCorp Vault**, **AWS Secrets Manager**, **Bitwarden Secrets**, or your existing ISP secret store — inject at deploy time via a wrapper that renders `.env` from the vault and shreds it after `docker compose up`. Whatever the choice: rotate all six credentials above on first production deploy — they have been in git history as `openbmp` and must be considered compromised. ### 0.3 Rotate the Grafana datasource password in lockstep `obmp-grafana/provisioning/datasources/openbmp-ds.yml` carries `secureJsonData.password`. It is read at Grafana start. When you change the PostgreSQL password, update this file too (it supports `$__file{}` and env-var expansion: `password: $POSTGRES_PASSWORD`) and restart Grafana. --- ## Priority 1 — Network exposure / firewalling The host currently publishes these ports to `0.0.0.0`: 5000 (BMP), 5432 (PostgreSQL), 9092 (Kafka), 3000 (Grafana), 8086 (InfluxDB), 4300 (whois), 9091 (Authelia). Most should not be world-reachable. ### 1.1 BMP collector (port 5000) — restrict to router management subnets The collector accepts a BMP session from any source. A rogue BMP feed can inject bogus routers/peers/prefixes into the database. Firewall it to the router management subnets only. `nftables` example (preferred on modern hosts): ```nft # /etc/nftables.conf — adjust subnets to your router management ranges table inet obmp { chain input { type filter hook input priority 0; policy accept; # BMP ingest — only from router management subnets tcp dport 5000 ip saddr { 10.100.0.0/24, 10.100.1.0/24 } accept tcp dport 5000 drop } } ``` `iptables` equivalent: ```bash iptables -A INPUT -p tcp --dport 5000 -s 10.100.0.0/24 -j ACCEPT iptables -A INPUT -p tcp --dport 5000 -s 10.100.1.0/24 -j ACCEPT iptables -A INPUT -p tcp --dport 5000 -j DROP ``` > Docker's `iptables` integration uses the `DOCKER-USER` chain for > container-published ports. Put the rules above in `DOCKER-USER` so Docker > does not bypass them: > ```bash > iptables -I DOCKER-USER -p tcp --dport 5000 -s 10.100.0.0/24 -j RETURN > iptables -I DOCKER-USER -p tcp --dport 5000 -s 10.100.1.0/24 -j RETURN > iptables -A DOCKER-USER -p tcp --dport 5000 -j DROP > ``` ### 1.2 PostgreSQL (5432), Kafka (9092), InfluxDB (8086), whois (4300) None of these need to be reachable from outside the stack: - **PostgreSQL** — only `psql-app`, `whois`, and `grafana` connect, all on the Compose network. Bind the published port to loopback only, or drop the `ports:` mapping entirely: ```yaml # docker-compose.yml — psql service ports: - "127.0.0.1:5432:5432" # localhost only; or remove entirely ``` - **Kafka 9092** — see Priority 2. - **InfluxDB 8086** — only Grafana and Telegraf use it; bind to loopback or drop the mapping (Telegraf uses host networking and reaches it via localhost; Grafana reaches it on the Compose network). - **whois 4300** — expose only if you actually offer a public whois service; otherwise bind to loopback. For anything that genuinely must be reachable, restrict by source with the firewall pattern from 1.1. ### 1.3 Grafana (3000) — keep it behind Authelia Authelia already fronts Grafana (the `auth` profile + `GF_AUTH_PROXY_*` settings). Make that the *only* path: - Bind Grafana's published port to loopback: `127.0.0.1:3000:3000`, and let the reverse proxy / Authelia terminate TLS and reach it internally. - Do **not** leave port 3000 directly reachable — `GF_AUTH_PROXY_ENABLED=true` trusts the `Remote-User` header, so any client that can reach 3000 directly and set that header bypasses authentication entirely. --- ## Priority 2 — Kafka transport security Kafka is currently **PLAINTEXT** and advertises a host-IP listener: ```yaml KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092,PLAINTEXT_HOST://${HOST_IP}:9092 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT ``` The `obmp-kafka:29092` listener is internal to the Compose network and is the only one the collector and psql-app use. The `PLAINTEXT_HOST://...:9092` listener exists only for outside access and is not needed by the core stack. **Recommended (simplest, most secure): remove the host listener.** If nothing outside the Compose network consumes Kafka, drop the `9092` port mapping and the `PLAINTEXT_HOST` advertised listener so Kafka is reachable only on the internal Docker network: ```yaml kafka: # remove the - "9092:9092" ports entry environment: KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT ``` **If external Kafka access is genuinely required** (e.g. a separate analytics consumer, or the split-host architecture in `production-sizing.md` where Kafka and the DB are on different hosts), do **not** leave it PLAINTEXT on a routed network. Enable SASL_SSL on the external listener: ```yaml KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092,SASL_SSL://${HOST_IP}:9092 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,SASL_SSL:SASL_SSL KAFKA_SASL_ENABLED_MECHANISMS: SCRAM-SHA-512 KAFKA_SSL_KEYSTORE_LOCATION: /etc/kafka/secrets/kafka.keystore.jks KAFKA_SSL_KEYSTORE_PASSWORD: ${KAFKA_KEYSTORE_PASSWORD} KAFKA_SSL_KEY_PASSWORD: ${KAFKA_KEY_PASSWORD} KAFKA_SSL_TRUSTSTORE_LOCATION: /etc/kafka/secrets/kafka.truststore.jks KAFKA_SSL_TRUSTSTORE_PASSWORD: ${KAFKA_TRUSTSTORE_PASSWORD} ``` Keep the internal `PLAINTEXT://obmp-kafka:29092` listener for the collector and psql-app — intra-Compose traffic on a private bridge does not need TLS and adding SASL there means re-configuring both clients. At minimum, never publish a PLAINTEXT Kafka listener on an IP that routes beyond the host. --- ## Priority 3 — PostgreSQL hardening ### 3.1 Change the default `openbmp` / `openbmp` credentials Covered in Priority 0. Note that `POSTGRES_USER`/`POSTGRES_PASSWORD` only take effect when the data directory is initialized. To rotate on an existing database, change the password in SQL and update every consumer: ```bash docker exec -it obmp-psql psql -U openbmp -d openbmp \ -c "ALTER ROLE openbmp WITH PASSWORD '';" ``` Then update `POSTGRES_PASSWORD` for `psql-app` and `whois`, the `secureJsonData.password` in `openbmp-ds.yml`, and restart those services. ### 3.2 Create a least-privilege role for Grafana Grafana only needs to read. Do not let it connect as the owning role: ```sql CREATE ROLE grafana_ro LOGIN PASSWORD ''; GRANT CONNECT ON DATABASE openbmp TO grafana_ro; GRANT USAGE ON SCHEMA public TO grafana_ro; GRANT SELECT ON ALL TABLES IN SCHEMA public TO grafana_ro; ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO grafana_ro; ``` Point `openbmp-ds.yml` at `grafana_ro`. This contains a Grafana compromise to read-only and blocks SQL-panel writes. ### 3.3 Restrict `pg_hba.conf` The default OpenBMP image is permissive (`host all all all md5` or similar). Tighten it so only the stack's own subnet can connect, and require `scram-sha-256`: ```conf # pg_hba.conf (inside the obmp-psql container / mounted) # TYPE DATABASE USER ADDRESS METHOD local all all scram-sha-256 host openbmp openbmp 172.16.0.0/12 scram-sha-256 # Docker bridge range host openbmp grafana_ro 172.16.0.0/12 scram-sha-256 hostssl openbmp openbmp 0.0.0.0/0 scram-sha-256 # only if remote DB host # reject everything else host all all 0.0.0.0/0 reject ``` Identify the actual Compose network subnet with `docker network inspect obmp_default` and scope `ADDRESS` to it. Reload with `docker exec obmp-psql psql -U openbmp -c "SELECT pg_reload_conf();"`. > `scram-sha-256` requires `password_encryption = scram-sha-256` in > `postgresql.conf` and that passwords were set/rotated *after* that change. ### 3.4 Enable SSL/TLS The Grafana datasource already requests `sslmode: "require"` — but the server must actually present a certificate. In `postgresql.conf`: ```conf ssl = on ssl_cert_file = '/var/lib/postgresql/server.crt' ssl_key_file = '/var/lib/postgresql/server.key' ``` Generate a cert (self-signed is acceptable for an internal DB; use your internal CA if you have one): ```bash openssl req -new -x509 -days 825 -nodes -text \ -out server.crt -keyout server.key -subj "/CN=obmp-psql" chmod 600 server.key # PostgreSQL refuses a world-readable key ``` Mount both files into the container's data directory. For the strongest posture, move clients to `sslmode: verify-full` once a proper CA chain is in place. This is most important if PostgreSQL runs on a separate host (the split-host architecture in `production-sizing.md`) — intra-host Compose traffic is lower-risk but TLS is still recommended. ### 3.5 Limit listen addresses If PostgreSQL must accept connections from another host (split-host layout), keep `listen_addresses` scoped — do not leave it at `*` if a single interface suffices: ```conf listen_addresses = 'localhost,172.18.0.1' # loopback + Docker bridge gateway ``` On a single-host deployment, drop the `5432` port mapping entirely (1.2) so the listener is reachable only on the Compose network. --- ## Priority 4 — Drop `privileged: true` on the `psql` service ```yaml psql: privileged: true # <-- remove or replace shm_size: 1536m sysctls: - net.ipv4.tcp_keepalive_intvl=30 - net.ipv4.tcp_keepalive_probes=5 - net.ipv4.tcp_keepalive_time=180 ``` **Why it is a risk:** `privileged: true` gives the container *all* Linux capabilities, disables seccomp/AppArmor confinement, and grants access to all host devices. A compromise of PostgreSQL — the process most exposed to untrusted route data — would then be a near-complete host compromise. This is the single largest container-isolation gap in the stack. **Why it is probably there:** PostgreSQL needs adequate shared memory and benefits from the TCP keepalive `sysctls`. The compose file already sets `shm_size: 1536m` and the `sysctls:` list explicitly — both of which Docker applies *without* needing privileged mode. So `privileged: true` is most likely a leftover, not a hard requirement. **Recommended action — test without it:** 1. In a maintenance window, remove `privileged: true` and start the service. 2. Confirm PostgreSQL starts, the namespaced `sysctls` apply (`docker exec obmp-psql sysctl net.ipv4.tcp_keepalive_time`), and shared memory is honored (`docker exec obmp-psql cat /proc/meminfo | grep Shmem`, and watch for `could not resize shared memory segment` errors in the log). 3. If everything is healthy, leave it removed. If a specific capability turns out to be needed, add only that one instead of going fully privileged: ```yaml psql: # privileged: true <-- removed shm_size: 1536m cap_drop: - ALL cap_add: - CHOWN - SETUID - SETGID - DAC_OVERRIDE # add only capabilities proven necessary by testing sysctls: - net.ipv4.tcp_keepalive_intvl=30 - net.ipv4.tcp_keepalive_probes=5 - net.ipv4.tcp_keepalive_time=180 ``` The `sysctls:` block stays — those are namespaced and do not require privileged mode. --- ## Priority 5 — Container hardening (defense in depth) Apply across services after the higher-priority items. Test each service individually — `read_only` in particular will surface paths a service writes to that then need explicit `tmpfs` mounts. ### 5.1 `no-new-privileges` Prevents a process inside a container from gaining privileges via setuid binaries. Safe to apply to every service: ```yaml security_opt: - no-new-privileges:true ``` ### 5.2 Drop capabilities Most of these services need almost no Linux capabilities. Start from zero and add back only what breaks: ```yaml cap_drop: - ALL ``` - `grafana`, `whois`, `portal`, `zookeeper` — typically run fine with `cap_drop: [ALL]`. - `collector`, `kafka`, `psql`, `psql-app` — drop ALL, then add back any capability proven necessary (see Priority 4 for `psql`). - `traffic-gen*` legitimately need `NET_RAW`/`NET_ADMIN` (Scapy) — leave those `cap_add` entries; they are already minimal. ### 5.3 Read-only root filesystem Make the root filesystem immutable where the service only writes to known volumes: ```yaml grafana: read_only: true tmpfs: - /tmp # /var/lib/grafana is already a bind mount — writes go there, not to rootfs portal: read_only: true # nginx:alpine static site; add tmpfs for nginx tmpfs: - /tmp - /var/cache/nginx - /var/run ``` `read_only` is straightforward for `grafana`, `portal`, and `whois`. It is trickier for `psql`, `kafka`, and `zookeeper` (they write to data volumes but also expect a writable rootfs in places) — test individually and add `tmpfs` mounts for any write paths, or skip `read_only` for those and rely on `cap_drop` + `no-new-privileges`. ### 5.4 Pin and scan images Images are already version-pinned (`grafana:9.1.7`, `cp-kafka:7.1.1`, `openbmp/postgres:2.2.1`, etc.) — good. Add periodic vulnerability scanning: ```bash trivy image openbmp/postgres:2.2.1 trivy image grafana/grafana:9.1.7 ``` Note Grafana 9.1.7 is old; review Grafana security advisories and plan an upgrade path. Track CVEs for the pinned Confluent and OpenBMP images too. ### 5.5 Resource limits Every service already has a `mem_limit`. For production also set `cpus:` (or `deploy.resources.limits`) so a runaway query or ingest burst cannot starve the host — this also mitigates local denial-of-service. See `docs/production-sizing.md` for target values. --- ## Priority 6 — Authelia / access control Authelia fronts Grafana (ROADMAP C5). For production: - Enforce **TOTP / 2FA** for all operator accounts; do not allow `one_factor` for the Grafana route. - Set short session timeouts and an inactivity expiry in the Authelia config. - Use strong, unique passwords; back the user store with your IdP / LDAP if available rather than the file backend. - Ensure Authelia's own secrets (`jwt_secret`, `session.secret`, `storage.encryption_key`) are strong and stored as secrets, not literals. - Confirm the reverse proxy strips any client-supplied `Remote-User` header before Authelia sets it — otherwise the auth-proxy trust model is bypassable (see 1.3). --- ## Quick checklist - [ ] Rotate all six default credentials; remove literals from compose, move to `.env` / secrets manager - [ ] Update `openbmp-ds.yml` datasource password to match - [ ] Firewall BMP port 5000 to router management subnets (`DOCKER-USER` chain) - [ ] Bind 5432 / 8086 / 4300 to loopback or drop the port mappings - [ ] Bind Grafana 3000 to loopback; reach it only via Authelia - [ ] Remove the Kafka `PLAINTEXT_HOST` listener + 9092 mapping (or enable SASL_SSL if external access needed) - [ ] Create `grafana_ro` least-privilege DB role; repoint the datasource - [ ] Tighten `pg_hba.conf`; require `scram-sha-256` - [ ] Enable PostgreSQL `ssl = on` with a server certificate - [ ] Test removing `privileged: true` from `psql`; replace with specific `cap_add` if needed - [ ] Add `security_opt: [no-new-privileges:true]` to all services - [ ] Add `cap_drop: [ALL]` and add back only required capabilities - [ ] Add `read_only: true` + `tmpfs` to `grafana` / `portal` / `whois` - [ ] Add `cpus:` limits per service - [ ] Scan images with `trivy`; plan a Grafana upgrade off 9.1.7 - [ ] Enforce TOTP and short sessions in Authelia