Adds a prioritized security-hardening checklist, a PostgreSQL logical-backup script (pg-backup.sh) with a documented restore procedure, and Grafana alerting provisioning (peer-down, flap-storm, RPKI-invalid, router-down rules plus a contact-point template). The alerting YAML and contact points need operator review before being relied on for paging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.0 KiB
OpenBMP Backup & Restore
How to back up and restore the OpenBMP PostgreSQL database, what the backup covers, and what it deliberately does not.
What scripts/pg-backup.sh backs up
The script runs pg_dump inside the obmp-psql container and produces a
single timestamped, compressed, custom-format dump of the entire openbmp
database:
- All BMP/BGP operational tables —
routers,bgp_peers,ip_rib,base_attrs,global_ip_rib,l3vpn_rib, thels_*link-state tables. - All history / TimescaleDB hypertables —
ip_rib_log,peer_event_log,stat_reports, and thestats_*aggregate tables. - Reference / enrichment data —
geo_ip,info_asn,info_route,rpki_validator,pdb_exchange_peers. - Schema objects — table definitions, indexes, views, functions, triggers, enum types, and the TimescaleDB hypertable configuration.
The dump is taken against a live database — pg_dump uses an MVCC
snapshot, so no downtime and no service stop is required. It is written
atomically (to a .partial file, renamed on success) so an interrupted run
never leaves a dump that looks valid but is truncated.
Output: ${OBMP_DATA_ROOT:-/var/openbmp}/backups/openbmp-YYYYMMDD-HHMMSS.dump
TimescaleDB note
The OpenBMP database uses TimescaleDB hypertables (ip_rib_log,
peer_event_log, the stats_* tables, with compression policies).
A pg_dump logical backup restores hypertables correctly — the dump
captures the _timescaledb_catalog metadata, and on restore the hypertable
structure, chunks, and compression settings are recreated. No special flags
are needed for the dump. The only requirement is that the restore target
has the TimescaleDB extension available — which the openbmp/postgres
image provides, so restoring into a fresh obmp-psql works out of the box.
Scheduling
Make the script executable once:
chmod +x scripts/pg-backup.sh
Add a cron entry (crontab -e) — daily at 02:30, logging to a file:
30 2 * * * OBMP_DATA_ROOT=/var/openbmp /home/user/obmp-docker/scripts/pg-backup.sh >> /var/openbmp/backups/pg-backup.log 2>&1
The cron user must be able to reach the Docker daemon — run it as a user in
the docker group, or as root. A systemd timer is an equally valid
alternative.
Configuration
All settings are environment variables with sensible defaults:
| Variable | Default | Purpose |
|---|---|---|
OBMP_DATA_ROOT |
/var/openbmp |
Base data dir; backups go to ${OBMP_DATA_ROOT}/backups |
OBMP_BACKUP_DIR |
(unset) | Explicit backup dir, overrides the default |
OBMP_PG_CONTAINER |
obmp-psql |
Postgres container name |
OBMP_PG_DB |
openbmp |
Database name |
OBMP_PG_USER |
openbmp |
Database user |
OBMP_BACKUP_RETENTION_DAYS |
14 |
Dumps older than this are pruned each run |
Retention only prunes files matching the script's own openbmp-*.dump
naming pattern — nothing else in the directory is touched.
Production recommendations
- Copy dumps off-host. A local backup does not survive host loss. Sync
the backup directory to object storage / a backup server (e.g. nightly
rclone,restic, or your existing ISP backup tooling). - Size the backup volume — at production scale (~100–150M NLRIs) the
dump can be tens of GB even compressed. See
docs/production-sizing.md. - Test restores periodically — an untested backup is not a backup.
- For tighter RPO than once-daily logical dumps, consider PostgreSQL
continuous archiving / PITR (WAL archiving +
pg_basebackup). That is out of scope for this script but worth planning for a production deployment.
Restore procedure
This restores a dump into a fresh, empty obmp-psql database. Restoring
over a populated database risks conflicts — start clean.
1. Stop the writers
Stop the services that write to the database so nothing races the restore:
docker compose -p obmp stop psql-app collector
Leave obmp-psql running.
2. Recreate an empty database
Drop and recreate the openbmp database inside the running container:
docker exec -i obmp-psql psql -U openbmp -d postgres <<'EOSQL'
DROP DATABASE IF EXISTS openbmp;
CREATE DATABASE openbmp OWNER openbmp;
EOSQL
Restoring into a brand-new container? Bring
obmp-psqlup first and let it initialize, but do not create theconfig/init_dbtrigger file — the schema comes from the dump, not from psql-app's first-run migration.
3. Restore the dump
Copy the dump into the container and run pg_restore:
DUMP=/var/openbmp/backups/openbmp-YYYYMMDD-HHMMSS.dump
docker cp "${DUMP}" obmp-psql:/tmp/restore.dump
docker exec -i obmp-psql \
pg_restore -U openbmp -d openbmp --no-owner --no-privileges \
--jobs=4 /tmp/restore.dump
docker exec obmp-psql rm -f /tmp/restore.dump
--no-owner --no-privileges— the dump was created with the same flags; objects are recreated owned by the connecting role.--jobs=4— parallel restore; raise it on a many-core host to speed up the largeip_rib/ip_rib_logtables. Custom-format dumps support this.- Some non-fatal warnings (e.g. about the TimescaleDB extension or existing objects) are normal. A non-zero exit with only warnings is usually fine — inspect the output before assuming failure.
Alternatively, stream the restore without docker cp:
docker exec -i obmp-psql pg_restore -U openbmp -d openbmp \
--no-owner --no-privileges < "${DUMP}"
(Streaming via stdin disables --jobs parallelism — use docker cp for
large dumps.)
4. Verify
docker exec -i obmp-psql psql -U openbmp -d openbmp -c "
SELECT (SELECT count(*) FROM routers) AS routers,
(SELECT count(*) FROM bgp_peers) AS peers,
(SELECT count(*) FROM ip_rib) AS rib_rows;"
Confirm hypertables came back:
docker exec -i obmp-psql psql -U openbmp -d openbmp -c "
SELECT hypertable_name FROM timescaledb_information.hypertables;"
5. Restart the writers
docker compose -p obmp start collector psql-app
The collector reconnects to the routers' BMP sessions and psql-app resumes consuming from Kafka. Live state catches up from the routers.
What is NOT covered
This backup is PostgreSQL only. The following are out of scope and need their own handling:
-
Kafka data is transient. The
obmp-kafkatopics are a short-retention pipeline buffer (KAFKA_LOG_RETENTION_MINUTES: 720— 12 hours). They are not a system of record and do not need backing up. After a restore, routers re-send BMP and the pipeline refills naturally. -
InfluxDB telemetry has its own backup. The gNMI streaming-telemetry data lives in
obmp-influxdb(buckettelemetry), not in PostgreSQL.pg_dumpdoes not touch it. Back it up separately with the Influx CLI:# Backup docker exec obmp-influxdb influx backup /var/lib/influxdb2/backup \ --token "$INFLUXDB_ADMIN_TOKEN" docker cp obmp-influxdb:/var/lib/influxdb2/backup \ /var/openbmp/backups/influxdb-$(date +%Y%m%d) # Restore docker cp /var/openbmp/backups/influxdb-YYYYMMDD \ obmp-influxdb:/var/lib/influxdb2/restore docker exec obmp-influxdb influx restore /var/lib/influxdb2/restore \ --token "$INFLUXDB_ADMIN_TOKEN"Telemetry is also less critical than BMP data (30-day retention, data-plane counters) — back it up if you need historical telemetry to survive a host loss; otherwise the 30-day window simply re-fills.
-
Grafana — dashboards and datasources are provisioned from files in the repo (
obmp-grafana/provisioning/andobmp-grafana/dashboards/), so they are already version-controlled in git. The Grafana database under${OBMP_DATA_ROOT}/grafana(users, preferences, manually-created dashboards, alert state) is not covered by this script — back up that directory separately if it holds anything not reproducible from the repo. -
Configuration & secrets —
.env,docker-compose.yml, and the${OBMP_DATA_ROOT}/configdirectory. Keep these in version control / your secrets manager.