obmp-docker/docs/security-hardening.md
sam 0732ebfa07 Add production-readiness deliverables: security, backup, alerting
Adds a prioritized security-hardening checklist, a PostgreSQL logical-backup
script (pg-backup.sh) with a documented restore procedure, and Grafana
alerting provisioning (peer-down, flap-storm, RPKI-invalid, router-down rules
plus a contact-point template). The alerting YAML and contact points need
operator review before being relied on for paging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:55:03 -07:00

18 KiB

OpenBMP Production Security Hardening

A prioritized checklist for hardening the OpenBMP Docker stack before exposing it to a production ISP network of 40 full-table-edge routers. Work top to bottom — items are ordered roughly by risk reduction per unit effort.

This document recommends changes. It does not modify docker-compose.yml or any running service. Apply the changes in a maintenance window and test.

Threat model in brief: the stack ingests BMP from production routers, stores the full DFZ in PostgreSQL, and exposes Grafana to operators. The crown jewels are (a) the database, (b) the Grafana admin plane, and (c) the BMP ingest port. Everything below protects one of those three.


Priority 0 — Credentials (do this first)

Every service currently ships with the placeholder credential openbmp and related defaults are committed in docker-compose.yml:

Service Setting Current value
PostgreSQL POSTGRES_USER / POSTGRES_PASSWORD openbmp / openbmp
psql-app POSTGRES_PASSWORD openbmp
whois POSTGRES_PASSWORD openbmp
Grafana GF_SECURITY_ADMIN_PASSWORD openbmp
InfluxDB DOCKER_INFLUXDB_INIT_PASSWORD openbmp123
InfluxDB DOCKER_INFLUXDB_INIT_ADMIN_TOKEN openbmp-telemetry-token
Grafana datasource secureJsonData.password openbmp (in openbmp-ds.yml)

0.1 Move every secret to .env (or a secrets manager)

.env is git-ignored. As a minimum, replace the hardcoded literals in docker-compose.yml with ${VAR} references and define them in .env:

# .env  — never commit this file
POSTGRES_PASSWORD=<long-random-string>
GF_SECURITY_ADMIN_PASSWORD=<long-random-string>
INFLUXDB_ADMIN_PASSWORD=<long-random-string>
INFLUXDB_ADMIN_TOKEN=<long-random-token>
# docker-compose.yml (recommended edit — operator applies)
  grafana:
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:?set in .env}
  psql:
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:?set in .env}

The :? form makes the stack fail fast if a secret is missing rather than silently falling back to a default.

Generate strong values:

openssl rand -base64 32        # passwords
openssl rand -hex 32           # tokens

0.2 For a real production deployment, use a secrets manager

.env on disk is better than committed literals, but it is still a plaintext file readable by anyone in the docker group. For production:

  • Docker Compose secrets (secrets: block, files mounted at /run/secrets/...) — the lowest-friction upgrade; keep the secret files outside the repo, chmod 600, owned by root.
  • HashiCorp Vault, AWS Secrets Manager, Bitwarden Secrets, or your existing ISP secret store — inject at deploy time via a wrapper that renders .env from the vault and shreds it after docker compose up.

Whatever the choice: rotate all six credentials above on first production deploy — they have been in git history as openbmp and must be considered compromised.

0.3 Rotate the Grafana datasource password in lockstep

obmp-grafana/provisioning/datasources/openbmp-ds.yml carries secureJsonData.password. It is read at Grafana start. When you change the PostgreSQL password, update this file too (it supports $__file{} and env-var expansion: password: $POSTGRES_PASSWORD) and restart Grafana.


Priority 1 — Network exposure / firewalling

The host currently publishes these ports to 0.0.0.0: 5000 (BMP), 5432 (PostgreSQL), 9092 (Kafka), 3000 (Grafana), 8086 (InfluxDB), 4300 (whois), 9091 (Authelia). Most should not be world-reachable.

1.1 BMP collector (port 5000) — restrict to router management subnets

The collector accepts a BMP session from any source. A rogue BMP feed can inject bogus routers/peers/prefixes into the database. Firewall it to the router management subnets only.

nftables example (preferred on modern hosts):

# /etc/nftables.conf  — adjust subnets to your router management ranges
table inet obmp {
    chain input {
        type filter hook input priority 0; policy accept;

        # BMP ingest — only from router management subnets
        tcp dport 5000 ip saddr { 10.100.0.0/24, 10.100.1.0/24 } accept
        tcp dport 5000 drop
    }
}

iptables equivalent:

iptables -A INPUT -p tcp --dport 5000 -s 10.100.0.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 5000 -s 10.100.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 5000 -j DROP

Docker's iptables integration uses the DOCKER-USER chain for container-published ports. Put the rules above in DOCKER-USER so Docker does not bypass them:

iptables -I DOCKER-USER -p tcp --dport 5000 -s 10.100.0.0/24 -j RETURN
iptables -I DOCKER-USER -p tcp --dport 5000 -s 10.100.1.0/24 -j RETURN
iptables -A DOCKER-USER -p tcp --dport 5000 -j DROP

1.2 PostgreSQL (5432), Kafka (9092), InfluxDB (8086), whois (4300)

None of these need to be reachable from outside the stack:

  • PostgreSQL — only psql-app, whois, and grafana connect, all on the Compose network. Bind the published port to loopback only, or drop the ports: mapping entirely:
    # docker-compose.yml — psql service
    ports:
      - "127.0.0.1:5432:5432"   # localhost only; or remove entirely
    
  • Kafka 9092 — see Priority 2.
  • InfluxDB 8086 — only Grafana and Telegraf use it; bind to loopback or drop the mapping (Telegraf uses host networking and reaches it via localhost; Grafana reaches it on the Compose network).
  • whois 4300 — expose only if you actually offer a public whois service; otherwise bind to loopback.

For anything that genuinely must be reachable, restrict by source with the firewall pattern from 1.1.

1.3 Grafana (3000) — keep it behind Authelia

Authelia already fronts Grafana (the auth profile + GF_AUTH_PROXY_* settings). Make that the only path:

  • Bind Grafana's published port to loopback: 127.0.0.1:3000:3000, and let the reverse proxy / Authelia terminate TLS and reach it internally.
  • Do not leave port 3000 directly reachable — GF_AUTH_PROXY_ENABLED=true trusts the Remote-User header, so any client that can reach 3000 directly and set that header bypasses authentication entirely.

Priority 2 — Kafka transport security

Kafka is currently PLAINTEXT and advertises a host-IP listener:

KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092,PLAINTEXT_HOST://${HOST_IP}:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT

The obmp-kafka:29092 listener is internal to the Compose network and is the only one the collector and psql-app use. The PLAINTEXT_HOST://...:9092 listener exists only for outside access and is not needed by the core stack.

Recommended (simplest, most secure): remove the host listener. If nothing outside the Compose network consumes Kafka, drop the 9092 port mapping and the PLAINTEXT_HOST advertised listener so Kafka is reachable only on the internal Docker network:

  kafka:
    # remove the  - "9092:9092"  ports entry
    environment:
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT

If external Kafka access is genuinely required (e.g. a separate analytics consumer, or the split-host architecture in production-sizing.md where Kafka and the DB are on different hosts), do not leave it PLAINTEXT on a routed network. Enable SASL_SSL on the external listener:

KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092,SASL_SSL://${HOST_IP}:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,SASL_SSL:SASL_SSL
KAFKA_SASL_ENABLED_MECHANISMS: SCRAM-SHA-512
KAFKA_SSL_KEYSTORE_LOCATION: /etc/kafka/secrets/kafka.keystore.jks
KAFKA_SSL_KEYSTORE_PASSWORD: ${KAFKA_KEYSTORE_PASSWORD}
KAFKA_SSL_KEY_PASSWORD: ${KAFKA_KEY_PASSWORD}
KAFKA_SSL_TRUSTSTORE_LOCATION: /etc/kafka/secrets/kafka.truststore.jks
KAFKA_SSL_TRUSTSTORE_PASSWORD: ${KAFKA_TRUSTSTORE_PASSWORD}

Keep the internal PLAINTEXT://obmp-kafka:29092 listener for the collector and psql-app — intra-Compose traffic on a private bridge does not need TLS and adding SASL there means re-configuring both clients. At minimum, never publish a PLAINTEXT Kafka listener on an IP that routes beyond the host.


Priority 3 — PostgreSQL hardening

3.1 Change the default openbmp / openbmp credentials

Covered in Priority 0. Note that POSTGRES_USER/POSTGRES_PASSWORD only take effect when the data directory is initialized. To rotate on an existing database, change the password in SQL and update every consumer:

docker exec -it obmp-psql psql -U openbmp -d openbmp \
  -c "ALTER ROLE openbmp WITH PASSWORD '<new-strong-password>';"

Then update POSTGRES_PASSWORD for psql-app and whois, the secureJsonData.password in openbmp-ds.yml, and restart those services.

3.2 Create a least-privilege role for Grafana

Grafana only needs to read. Do not let it connect as the owning role:

CREATE ROLE grafana_ro LOGIN PASSWORD '<strong-password>';
GRANT CONNECT ON DATABASE openbmp TO grafana_ro;
GRANT USAGE ON SCHEMA public TO grafana_ro;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO grafana_ro;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO grafana_ro;

Point openbmp-ds.yml at grafana_ro. This contains a Grafana compromise to read-only and blocks SQL-panel writes.

3.3 Restrict pg_hba.conf

The default OpenBMP image is permissive (host all all all md5 or similar). Tighten it so only the stack's own subnet can connect, and require scram-sha-256:

# pg_hba.conf  (inside the obmp-psql container / mounted)
# TYPE  DATABASE  USER        ADDRESS              METHOD
local   all       all                              scram-sha-256
host    openbmp   openbmp     172.16.0.0/12        scram-sha-256   # Docker bridge range
host    openbmp   grafana_ro  172.16.0.0/12        scram-sha-256
hostssl openbmp   openbmp     0.0.0.0/0            scram-sha-256   # only if remote DB host
# reject everything else
host    all       all         0.0.0.0/0            reject

Identify the actual Compose network subnet with docker network inspect obmp_default and scope ADDRESS to it. Reload with docker exec obmp-psql psql -U openbmp -c "SELECT pg_reload_conf();".

scram-sha-256 requires password_encryption = scram-sha-256 in postgresql.conf and that passwords were set/rotated after that change.

3.4 Enable SSL/TLS

The Grafana datasource already requests sslmode: "require" — but the server must actually present a certificate. In postgresql.conf:

ssl = on
ssl_cert_file = '/var/lib/postgresql/server.crt'
ssl_key_file  = '/var/lib/postgresql/server.key'

Generate a cert (self-signed is acceptable for an internal DB; use your internal CA if you have one):

openssl req -new -x509 -days 825 -nodes -text \
  -out server.crt -keyout server.key -subj "/CN=obmp-psql"
chmod 600 server.key       # PostgreSQL refuses a world-readable key

Mount both files into the container's data directory. For the strongest posture, move clients to sslmode: verify-full once a proper CA chain is in place. This is most important if PostgreSQL runs on a separate host (the split-host architecture in production-sizing.md) — intra-host Compose traffic is lower-risk but TLS is still recommended.

3.5 Limit listen addresses

If PostgreSQL must accept connections from another host (split-host layout), keep listen_addresses scoped — do not leave it at * if a single interface suffices:

listen_addresses = 'localhost,172.18.0.1'   # loopback + Docker bridge gateway

On a single-host deployment, drop the 5432 port mapping entirely (1.2) so the listener is reachable only on the Compose network.


Priority 4 — Drop privileged: true on the psql service

  psql:
    privileged: true        # <-- remove or replace
    shm_size: 1536m
    sysctls:
      - net.ipv4.tcp_keepalive_intvl=30
      - net.ipv4.tcp_keepalive_probes=5
      - net.ipv4.tcp_keepalive_time=180

Why it is a risk: privileged: true gives the container all Linux capabilities, disables seccomp/AppArmor confinement, and grants access to all host devices. A compromise of PostgreSQL — the process most exposed to untrusted route data — would then be a near-complete host compromise. This is the single largest container-isolation gap in the stack.

Why it is probably there: PostgreSQL needs adequate shared memory and benefits from the TCP keepalive sysctls. The compose file already sets shm_size: 1536m and the sysctls: list explicitly — both of which Docker applies without needing privileged mode. So privileged: true is most likely a leftover, not a hard requirement.

Recommended action — test without it:

  1. In a maintenance window, remove privileged: true and start the service.
  2. Confirm PostgreSQL starts, the namespaced sysctls apply (docker exec obmp-psql sysctl net.ipv4.tcp_keepalive_time), and shared memory is honored (docker exec obmp-psql cat /proc/meminfo | grep Shmem, and watch for could not resize shared memory segment errors in the log).
  3. If everything is healthy, leave it removed.

If a specific capability turns out to be needed, add only that one instead of going fully privileged:

  psql:
    # privileged: true   <-- removed
    shm_size: 1536m
    cap_drop:
      - ALL
    cap_add:
      - CHOWN
      - SETUID
      - SETGID
      - DAC_OVERRIDE       # add only capabilities proven necessary by testing
    sysctls:
      - net.ipv4.tcp_keepalive_intvl=30
      - net.ipv4.tcp_keepalive_probes=5
      - net.ipv4.tcp_keepalive_time=180

The sysctls: block stays — those are namespaced and do not require privileged mode.


Priority 5 — Container hardening (defense in depth)

Apply across services after the higher-priority items. Test each service individually — read_only in particular will surface paths a service writes to that then need explicit tmpfs mounts.

5.1 no-new-privileges

Prevents a process inside a container from gaining privileges via setuid binaries. Safe to apply to every service:

    security_opt:
      - no-new-privileges:true

5.2 Drop capabilities

Most of these services need almost no Linux capabilities. Start from zero and add back only what breaks:

    cap_drop:
      - ALL
  • grafana, whois, portal, zookeeper — typically run fine with cap_drop: [ALL].
  • collector, kafka, psql, psql-app — drop ALL, then add back any capability proven necessary (see Priority 4 for psql).
  • traffic-gen* legitimately need NET_RAW/NET_ADMIN (Scapy) — leave those cap_add entries; they are already minimal.

5.3 Read-only root filesystem

Make the root filesystem immutable where the service only writes to known volumes:

  grafana:
    read_only: true
    tmpfs:
      - /tmp
    # /var/lib/grafana is already a bind mount — writes go there, not to rootfs

  portal:
    read_only: true        # nginx:alpine static site; add tmpfs for nginx
    tmpfs:
      - /tmp
      - /var/cache/nginx
      - /var/run

read_only is straightforward for grafana, portal, and whois. It is trickier for psql, kafka, and zookeeper (they write to data volumes but also expect a writable rootfs in places) — test individually and add tmpfs mounts for any write paths, or skip read_only for those and rely on cap_drop + no-new-privileges.

5.4 Pin and scan images

Images are already version-pinned (grafana:9.1.7, cp-kafka:7.1.1, openbmp/postgres:2.2.1, etc.) — good. Add periodic vulnerability scanning:

trivy image openbmp/postgres:2.2.1
trivy image grafana/grafana:9.1.7

Note Grafana 9.1.7 is old; review Grafana security advisories and plan an upgrade path. Track CVEs for the pinned Confluent and OpenBMP images too.

5.5 Resource limits

Every service already has a mem_limit. For production also set cpus: (or deploy.resources.limits) so a runaway query or ingest burst cannot starve the host — this also mitigates local denial-of-service. See docs/production-sizing.md for target values.


Priority 6 — Authelia / access control

Authelia fronts Grafana (ROADMAP C5). For production:

  • Enforce TOTP / 2FA for all operator accounts; do not allow one_factor for the Grafana route.
  • Set short session timeouts and an inactivity expiry in the Authelia config.
  • Use strong, unique passwords; back the user store with your IdP / LDAP if available rather than the file backend.
  • Ensure Authelia's own secrets (jwt_secret, session.secret, storage.encryption_key) are strong and stored as secrets, not literals.
  • Confirm the reverse proxy strips any client-supplied Remote-User header before Authelia sets it — otherwise the auth-proxy trust model is bypassable (see 1.3).

Quick checklist

  • Rotate all six default credentials; remove literals from compose, move to .env / secrets manager
  • Update openbmp-ds.yml datasource password to match
  • Firewall BMP port 5000 to router management subnets (DOCKER-USER chain)
  • Bind 5432 / 8086 / 4300 to loopback or drop the port mappings
  • Bind Grafana 3000 to loopback; reach it only via Authelia
  • Remove the Kafka PLAINTEXT_HOST listener + 9092 mapping (or enable SASL_SSL if external access needed)
  • Create grafana_ro least-privilege DB role; repoint the datasource
  • Tighten pg_hba.conf; require scram-sha-256
  • Enable PostgreSQL ssl = on with a server certificate
  • Test removing privileged: true from psql; replace with specific cap_add if needed
  • Add security_opt: [no-new-privileges:true] to all services
  • Add cap_drop: [ALL] and add back only required capabilities
  • Add read_only: true + tmpfs to grafana / portal / whois
  • Add cpus: limits per service
  • Scan images with trivy; plan a Grafana upgrade off 9.1.7
  • Enforce TOTP and short sessions in Authelia