Adds a prioritized security-hardening checklist, a PostgreSQL logical-backup script (pg-backup.sh) with a documented restore procedure, and Grafana alerting provisioning (peer-down, flap-storm, RPKI-invalid, router-down rules plus a contact-point template). The alerting YAML and contact points need operator review before being relied on for paging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
OpenBMP Production Security Hardening
A prioritized checklist for hardening the OpenBMP Docker stack before exposing it to a production ISP network of 40 full-table-edge routers. Work top to bottom — items are ordered roughly by risk reduction per unit effort.
This document recommends changes. It does not modify docker-compose.yml
or any running service. Apply the changes in a maintenance window and test.
Threat model in brief: the stack ingests BMP from production routers, stores the full DFZ in PostgreSQL, and exposes Grafana to operators. The crown jewels are (a) the database, (b) the Grafana admin plane, and (c) the BMP ingest port. Everything below protects one of those three.
Priority 0 — Credentials (do this first)
Every service currently ships with the placeholder credential openbmp and
related defaults are committed in docker-compose.yml:
| Service | Setting | Current value |
|---|---|---|
| PostgreSQL | POSTGRES_USER / POSTGRES_PASSWORD |
openbmp / openbmp |
| psql-app | POSTGRES_PASSWORD |
openbmp |
| whois | POSTGRES_PASSWORD |
openbmp |
| Grafana | GF_SECURITY_ADMIN_PASSWORD |
openbmp |
| InfluxDB | DOCKER_INFLUXDB_INIT_PASSWORD |
openbmp123 |
| InfluxDB | DOCKER_INFLUXDB_INIT_ADMIN_TOKEN |
openbmp-telemetry-token |
| Grafana datasource | secureJsonData.password |
openbmp (in openbmp-ds.yml) |
0.1 Move every secret to .env (or a secrets manager)
.env is git-ignored. As a minimum, replace the hardcoded literals in
docker-compose.yml with ${VAR} references and define them in .env:
# .env — never commit this file
POSTGRES_PASSWORD=<long-random-string>
GF_SECURITY_ADMIN_PASSWORD=<long-random-string>
INFLUXDB_ADMIN_PASSWORD=<long-random-string>
INFLUXDB_ADMIN_TOKEN=<long-random-token>
# docker-compose.yml (recommended edit — operator applies)
grafana:
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:?set in .env}
psql:
environment:
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:?set in .env}
The :? form makes the stack fail fast if a secret is missing rather than
silently falling back to a default.
Generate strong values:
openssl rand -base64 32 # passwords
openssl rand -hex 32 # tokens
0.2 For a real production deployment, use a secrets manager
.env on disk is better than committed literals, but it is still a
plaintext file readable by anyone in the docker group. For production:
- Docker Compose secrets (
secrets:block, files mounted at/run/secrets/...) — the lowest-friction upgrade; keep the secret files outside the repo,chmod 600, owned by root. - HashiCorp Vault, AWS Secrets Manager, Bitwarden Secrets, or your
existing ISP secret store — inject at deploy time via a wrapper that renders
.envfrom the vault and shreds it afterdocker compose up.
Whatever the choice: rotate all six credentials above on first production
deploy — they have been in git history as openbmp and must be considered
compromised.
0.3 Rotate the Grafana datasource password in lockstep
obmp-grafana/provisioning/datasources/openbmp-ds.yml carries
secureJsonData.password. It is read at Grafana start. When you change the
PostgreSQL password, update this file too (it supports $__file{} and
env-var expansion: password: $POSTGRES_PASSWORD) and restart Grafana.
Priority 1 — Network exposure / firewalling
The host currently publishes these ports to 0.0.0.0: 5000 (BMP), 5432
(PostgreSQL), 9092 (Kafka), 3000 (Grafana), 8086 (InfluxDB), 4300 (whois),
9091 (Authelia). Most should not be world-reachable.
1.1 BMP collector (port 5000) — restrict to router management subnets
The collector accepts a BMP session from any source. A rogue BMP feed can inject bogus routers/peers/prefixes into the database. Firewall it to the router management subnets only.
nftables example (preferred on modern hosts):
# /etc/nftables.conf — adjust subnets to your router management ranges
table inet obmp {
chain input {
type filter hook input priority 0; policy accept;
# BMP ingest — only from router management subnets
tcp dport 5000 ip saddr { 10.100.0.0/24, 10.100.1.0/24 } accept
tcp dport 5000 drop
}
}
iptables equivalent:
iptables -A INPUT -p tcp --dport 5000 -s 10.100.0.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 5000 -s 10.100.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 5000 -j DROP
Docker's
iptablesintegration uses theDOCKER-USERchain for container-published ports. Put the rules above inDOCKER-USERso Docker does not bypass them:iptables -I DOCKER-USER -p tcp --dport 5000 -s 10.100.0.0/24 -j RETURN iptables -I DOCKER-USER -p tcp --dport 5000 -s 10.100.1.0/24 -j RETURN iptables -A DOCKER-USER -p tcp --dport 5000 -j DROP
1.2 PostgreSQL (5432), Kafka (9092), InfluxDB (8086), whois (4300)
None of these need to be reachable from outside the stack:
- PostgreSQL — only
psql-app,whois, andgrafanaconnect, all on the Compose network. Bind the published port to loopback only, or drop theports:mapping entirely:# docker-compose.yml — psql service ports: - "127.0.0.1:5432:5432" # localhost only; or remove entirely - Kafka 9092 — see Priority 2.
- InfluxDB 8086 — only Grafana and Telegraf use it; bind to loopback or drop the mapping (Telegraf uses host networking and reaches it via localhost; Grafana reaches it on the Compose network).
- whois 4300 — expose only if you actually offer a public whois service; otherwise bind to loopback.
For anything that genuinely must be reachable, restrict by source with the firewall pattern from 1.1.
1.3 Grafana (3000) — keep it behind Authelia
Authelia already fronts Grafana (the auth profile + GF_AUTH_PROXY_*
settings). Make that the only path:
- Bind Grafana's published port to loopback:
127.0.0.1:3000:3000, and let the reverse proxy / Authelia terminate TLS and reach it internally. - Do not leave port 3000 directly reachable —
GF_AUTH_PROXY_ENABLED=truetrusts theRemote-Userheader, so any client that can reach 3000 directly and set that header bypasses authentication entirely.
Priority 2 — Kafka transport security
Kafka is currently PLAINTEXT and advertises a host-IP listener:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092,PLAINTEXT_HOST://${HOST_IP}:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
The obmp-kafka:29092 listener is internal to the Compose network and is the
only one the collector and psql-app use. The PLAINTEXT_HOST://...:9092
listener exists only for outside access and is not needed by the core stack.
Recommended (simplest, most secure): remove the host listener. If nothing
outside the Compose network consumes Kafka, drop the 9092 port mapping and
the PLAINTEXT_HOST advertised listener so Kafka is reachable only on the
internal Docker network:
kafka:
# remove the - "9092:9092" ports entry
environment:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
If external Kafka access is genuinely required (e.g. a separate analytics
consumer, or the split-host architecture in production-sizing.md where
Kafka and the DB are on different hosts), do not leave it PLAINTEXT on a
routed network. Enable SASL_SSL on the external listener:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://obmp-kafka:29092,SASL_SSL://${HOST_IP}:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,SASL_SSL:SASL_SSL
KAFKA_SASL_ENABLED_MECHANISMS: SCRAM-SHA-512
KAFKA_SSL_KEYSTORE_LOCATION: /etc/kafka/secrets/kafka.keystore.jks
KAFKA_SSL_KEYSTORE_PASSWORD: ${KAFKA_KEYSTORE_PASSWORD}
KAFKA_SSL_KEY_PASSWORD: ${KAFKA_KEY_PASSWORD}
KAFKA_SSL_TRUSTSTORE_LOCATION: /etc/kafka/secrets/kafka.truststore.jks
KAFKA_SSL_TRUSTSTORE_PASSWORD: ${KAFKA_TRUSTSTORE_PASSWORD}
Keep the internal PLAINTEXT://obmp-kafka:29092 listener for the collector
and psql-app — intra-Compose traffic on a private bridge does not need TLS and
adding SASL there means re-configuring both clients. At minimum, never publish
a PLAINTEXT Kafka listener on an IP that routes beyond the host.
Priority 3 — PostgreSQL hardening
3.1 Change the default openbmp / openbmp credentials
Covered in Priority 0. Note that POSTGRES_USER/POSTGRES_PASSWORD only take
effect when the data directory is initialized. To rotate on an existing
database, change the password in SQL and update every consumer:
docker exec -it obmp-psql psql -U openbmp -d openbmp \
-c "ALTER ROLE openbmp WITH PASSWORD '<new-strong-password>';"
Then update POSTGRES_PASSWORD for psql-app and whois, the
secureJsonData.password in openbmp-ds.yml, and restart those services.
3.2 Create a least-privilege role for Grafana
Grafana only needs to read. Do not let it connect as the owning role:
CREATE ROLE grafana_ro LOGIN PASSWORD '<strong-password>';
GRANT CONNECT ON DATABASE openbmp TO grafana_ro;
GRANT USAGE ON SCHEMA public TO grafana_ro;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO grafana_ro;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO grafana_ro;
Point openbmp-ds.yml at grafana_ro. This contains a Grafana compromise to
read-only and blocks SQL-panel writes.
3.3 Restrict pg_hba.conf
The default OpenBMP image is permissive (host all all all md5 or similar).
Tighten it so only the stack's own subnet can connect, and require
scram-sha-256:
# pg_hba.conf (inside the obmp-psql container / mounted)
# TYPE DATABASE USER ADDRESS METHOD
local all all scram-sha-256
host openbmp openbmp 172.16.0.0/12 scram-sha-256 # Docker bridge range
host openbmp grafana_ro 172.16.0.0/12 scram-sha-256
hostssl openbmp openbmp 0.0.0.0/0 scram-sha-256 # only if remote DB host
# reject everything else
host all all 0.0.0.0/0 reject
Identify the actual Compose network subnet with
docker network inspect obmp_default and scope ADDRESS to it. Reload with
docker exec obmp-psql psql -U openbmp -c "SELECT pg_reload_conf();".
scram-sha-256requirespassword_encryption = scram-sha-256inpostgresql.confand that passwords were set/rotated after that change.
3.4 Enable SSL/TLS
The Grafana datasource already requests sslmode: "require" — but the server
must actually present a certificate. In postgresql.conf:
ssl = on
ssl_cert_file = '/var/lib/postgresql/server.crt'
ssl_key_file = '/var/lib/postgresql/server.key'
Generate a cert (self-signed is acceptable for an internal DB; use your internal CA if you have one):
openssl req -new -x509 -days 825 -nodes -text \
-out server.crt -keyout server.key -subj "/CN=obmp-psql"
chmod 600 server.key # PostgreSQL refuses a world-readable key
Mount both files into the container's data directory. For the strongest
posture, move clients to sslmode: verify-full once a proper CA chain is in
place. This is most important if PostgreSQL runs on a separate host (the
split-host architecture in production-sizing.md) — intra-host Compose
traffic is lower-risk but TLS is still recommended.
3.5 Limit listen addresses
If PostgreSQL must accept connections from another host (split-host layout),
keep listen_addresses scoped — do not leave it at * if a single interface
suffices:
listen_addresses = 'localhost,172.18.0.1' # loopback + Docker bridge gateway
On a single-host deployment, drop the 5432 port mapping entirely (1.2) so
the listener is reachable only on the Compose network.
Priority 4 — Drop privileged: true on the psql service
psql:
privileged: true # <-- remove or replace
shm_size: 1536m
sysctls:
- net.ipv4.tcp_keepalive_intvl=30
- net.ipv4.tcp_keepalive_probes=5
- net.ipv4.tcp_keepalive_time=180
Why it is a risk: privileged: true gives the container all Linux
capabilities, disables seccomp/AppArmor confinement, and grants access to all
host devices. A compromise of PostgreSQL — the process most exposed to
untrusted route data — would then be a near-complete host compromise. This is
the single largest container-isolation gap in the stack.
Why it is probably there: PostgreSQL needs adequate shared memory and
benefits from the TCP keepalive sysctls. The compose file already sets
shm_size: 1536m and the sysctls: list explicitly — both of which Docker
applies without needing privileged mode. So privileged: true is most
likely a leftover, not a hard requirement.
Recommended action — test without it:
- In a maintenance window, remove
privileged: trueand start the service. - Confirm PostgreSQL starts, the namespaced
sysctlsapply (docker exec obmp-psql sysctl net.ipv4.tcp_keepalive_time), and shared memory is honored (docker exec obmp-psql cat /proc/meminfo | grep Shmem, and watch forcould not resize shared memory segmenterrors in the log). - If everything is healthy, leave it removed.
If a specific capability turns out to be needed, add only that one instead of going fully privileged:
psql:
# privileged: true <-- removed
shm_size: 1536m
cap_drop:
- ALL
cap_add:
- CHOWN
- SETUID
- SETGID
- DAC_OVERRIDE # add only capabilities proven necessary by testing
sysctls:
- net.ipv4.tcp_keepalive_intvl=30
- net.ipv4.tcp_keepalive_probes=5
- net.ipv4.tcp_keepalive_time=180
The sysctls: block stays — those are namespaced and do not require
privileged mode.
Priority 5 — Container hardening (defense in depth)
Apply across services after the higher-priority items. Test each service
individually — read_only in particular will surface paths a service writes
to that then need explicit tmpfs mounts.
5.1 no-new-privileges
Prevents a process inside a container from gaining privileges via setuid binaries. Safe to apply to every service:
security_opt:
- no-new-privileges:true
5.2 Drop capabilities
Most of these services need almost no Linux capabilities. Start from zero and add back only what breaks:
cap_drop:
- ALL
grafana,whois,portal,zookeeper— typically run fine withcap_drop: [ALL].collector,kafka,psql,psql-app— drop ALL, then add back any capability proven necessary (see Priority 4 forpsql).traffic-gen*legitimately needNET_RAW/NET_ADMIN(Scapy) — leave thosecap_addentries; they are already minimal.
5.3 Read-only root filesystem
Make the root filesystem immutable where the service only writes to known volumes:
grafana:
read_only: true
tmpfs:
- /tmp
# /var/lib/grafana is already a bind mount — writes go there, not to rootfs
portal:
read_only: true # nginx:alpine static site; add tmpfs for nginx
tmpfs:
- /tmp
- /var/cache/nginx
- /var/run
read_only is straightforward for grafana, portal, and whois. It is
trickier for psql, kafka, and zookeeper (they write to data volumes but
also expect a writable rootfs in places) — test individually and add tmpfs
mounts for any write paths, or skip read_only for those and rely on
cap_drop + no-new-privileges.
5.4 Pin and scan images
Images are already version-pinned (grafana:9.1.7, cp-kafka:7.1.1,
openbmp/postgres:2.2.1, etc.) — good. Add periodic vulnerability scanning:
trivy image openbmp/postgres:2.2.1
trivy image grafana/grafana:9.1.7
Note Grafana 9.1.7 is old; review Grafana security advisories and plan an upgrade path. Track CVEs for the pinned Confluent and OpenBMP images too.
5.5 Resource limits
Every service already has a mem_limit. For production also set cpus: (or
deploy.resources.limits) so a runaway query or ingest burst cannot starve
the host — this also mitigates local denial-of-service. See
docs/production-sizing.md for target values.
Priority 6 — Authelia / access control
Authelia fronts Grafana (ROADMAP C5). For production:
- Enforce TOTP / 2FA for all operator accounts; do not allow
one_factorfor the Grafana route. - Set short session timeouts and an inactivity expiry in the Authelia config.
- Use strong, unique passwords; back the user store with your IdP / LDAP if available rather than the file backend.
- Ensure Authelia's own secrets (
jwt_secret,session.secret,storage.encryption_key) are strong and stored as secrets, not literals. - Confirm the reverse proxy strips any client-supplied
Remote-Userheader before Authelia sets it — otherwise the auth-proxy trust model is bypassable (see 1.3).
Quick checklist
- Rotate all six default credentials; remove literals from compose, move to
.env/ secrets manager - Update
openbmp-ds.ymldatasource password to match - Firewall BMP port 5000 to router management subnets (
DOCKER-USERchain) - Bind 5432 / 8086 / 4300 to loopback or drop the port mappings
- Bind Grafana 3000 to loopback; reach it only via Authelia
- Remove the Kafka
PLAINTEXT_HOSTlistener + 9092 mapping (or enable SASL_SSL if external access needed) - Create
grafana_roleast-privilege DB role; repoint the datasource - Tighten
pg_hba.conf; requirescram-sha-256 - Enable PostgreSQL
ssl = onwith a server certificate - Test removing
privileged: truefrompsql; replace with specificcap_addif needed - Add
security_opt: [no-new-privileges:true]to all services - Add
cap_drop: [ALL]and add back only required capabilities - Add
read_only: true+tmpfstografana/portal/whois - Add
cpus:limits per service - Scan images with
trivy; plan a Grafana upgrade off 9.1.7 - Enforce TOTP and short sessions in Authelia