# OpenBMP Backup & Restore How to back up and restore the OpenBMP PostgreSQL database, what the backup covers, and what it deliberately does not. --- ## What `scripts/pg-backup.sh` backs up The script runs `pg_dump` inside the `obmp-psql` container and produces a single timestamped, compressed, custom-format dump of the **entire `openbmp` database**: - All BMP/BGP operational tables — `routers`, `bgp_peers`, `ip_rib`, `base_attrs`, `global_ip_rib`, `l3vpn_rib`, the `ls_*` link-state tables. - All history / TimescaleDB hypertables — `ip_rib_log`, `peer_event_log`, `stat_reports`, and the `stats_*` aggregate tables. - Reference / enrichment data — `geo_ip`, `info_asn`, `info_route`, `rpki_validator`, `pdb_exchange_peers`. - Schema objects — table definitions, indexes, views, functions, triggers, enum types, and the TimescaleDB hypertable configuration. The dump is taken against a **live database** — `pg_dump` uses an MVCC snapshot, so no downtime and no service stop is required. It is written atomically (to a `.partial` file, renamed on success) so an interrupted run never leaves a dump that looks valid but is truncated. Output: `${OBMP_DATA_ROOT:-/var/openbmp}/backups/openbmp-YYYYMMDD-HHMMSS.dump` ### TimescaleDB note The OpenBMP database uses TimescaleDB hypertables (`ip_rib_log`, `peer_event_log`, the `stats_*` tables, with compression policies). **A `pg_dump` logical backup restores hypertables correctly** — the dump captures the `_timescaledb_catalog` metadata, and on restore the hypertable structure, chunks, and compression settings are recreated. No special flags are needed for the dump. The only requirement is that the **restore target has the TimescaleDB extension available** — which the `openbmp/postgres` image provides, so restoring into a fresh `obmp-psql` works out of the box. --- ## Scheduling Make the script executable once: ```bash chmod +x scripts/pg-backup.sh ``` Add a cron entry (`crontab -e`) — daily at 02:30, logging to a file: ```cron 30 2 * * * OBMP_DATA_ROOT=/var/openbmp /home/user/obmp-docker/scripts/pg-backup.sh >> /var/openbmp/backups/pg-backup.log 2>&1 ``` The cron user must be able to reach the Docker daemon — run it as a user in the `docker` group, or as root. A systemd timer is an equally valid alternative. ### Configuration All settings are environment variables with sensible defaults: | Variable | Default | Purpose | |----------|---------|---------| | `OBMP_DATA_ROOT` | `/var/openbmp` | Base data dir; backups go to `${OBMP_DATA_ROOT}/backups` | | `OBMP_BACKUP_DIR` | (unset) | Explicit backup dir, overrides the default | | `OBMP_PG_CONTAINER` | `obmp-psql` | Postgres container name | | `OBMP_PG_DB` | `openbmp` | Database name | | `OBMP_PG_USER` | `openbmp` | Database user | | `OBMP_BACKUP_RETENTION_DAYS` | `14` | Dumps older than this are pruned each run | Retention only prunes files matching the script's own `openbmp-*.dump` naming pattern — nothing else in the directory is touched. ### Production recommendations - **Copy dumps off-host.** A local backup does not survive host loss. Sync the backup directory to object storage / a backup server (e.g. nightly `rclone`, `restic`, or your existing ISP backup tooling). - **Size the backup volume** — at production scale (~100–150M NLRIs) the dump can be tens of GB even compressed. See `docs/production-sizing.md`. - **Test restores periodically** — an untested backup is not a backup. - For tighter RPO than once-daily logical dumps, consider PostgreSQL continuous archiving / PITR (WAL archiving + `pg_basebackup`). That is out of scope for this script but worth planning for a production deployment. --- ## Restore procedure This restores a dump into a **fresh, empty** `obmp-psql` database. Restoring over a populated database risks conflicts — start clean. ### 1. Stop the writers Stop the services that write to the database so nothing races the restore: ```bash docker compose -p obmp stop psql-app collector ``` Leave `obmp-psql` running. ### 2. Recreate an empty database Drop and recreate the `openbmp` database inside the running container: ```bash docker exec -i obmp-psql psql -U openbmp -d postgres <<'EOSQL' DROP DATABASE IF EXISTS openbmp; CREATE DATABASE openbmp OWNER openbmp; EOSQL ``` > Restoring into a **brand-new container**? Bring `obmp-psql` up first and let > it initialize, but **do not** create the `config/init_db` trigger file — > the schema comes from the dump, not from psql-app's first-run migration. ### 3. Restore the dump Copy the dump into the container and run `pg_restore`: ```bash DUMP=/var/openbmp/backups/openbmp-YYYYMMDD-HHMMSS.dump docker cp "${DUMP}" obmp-psql:/tmp/restore.dump docker exec -i obmp-psql \ pg_restore -U openbmp -d openbmp --no-owner --no-privileges \ --jobs=4 /tmp/restore.dump docker exec obmp-psql rm -f /tmp/restore.dump ``` - `--no-owner --no-privileges` — the dump was created with the same flags; objects are recreated owned by the connecting role. - `--jobs=4` — parallel restore; raise it on a many-core host to speed up the large `ip_rib` / `ip_rib_log` tables. Custom-format dumps support this. - Some non-fatal warnings (e.g. about the TimescaleDB extension or existing objects) are normal. A non-zero exit with only warnings is usually fine — inspect the output before assuming failure. Alternatively, stream the restore without `docker cp`: ```bash docker exec -i obmp-psql pg_restore -U openbmp -d openbmp \ --no-owner --no-privileges < "${DUMP}" ``` (Streaming via stdin disables `--jobs` parallelism — use `docker cp` for large dumps.) ### 4. Verify ```bash docker exec -i obmp-psql psql -U openbmp -d openbmp -c " SELECT (SELECT count(*) FROM routers) AS routers, (SELECT count(*) FROM bgp_peers) AS peers, (SELECT count(*) FROM ip_rib) AS rib_rows;" ``` Confirm hypertables came back: ```bash docker exec -i obmp-psql psql -U openbmp -d openbmp -c " SELECT hypertable_name FROM timescaledb_information.hypertables;" ``` ### 5. Restart the writers ```bash docker compose -p obmp start collector psql-app ``` The collector reconnects to the routers' BMP sessions and psql-app resumes consuming from Kafka. Live state catches up from the routers. --- ## What is NOT covered This backup is **PostgreSQL only**. The following are out of scope and need their own handling: - **Kafka data is transient.** The `obmp-kafka` topics are a short-retention pipeline buffer (`KAFKA_LOG_RETENTION_MINUTES: 720` — 12 hours). They are not a system of record and do not need backing up. After a restore, routers re-send BMP and the pipeline refills naturally. - **InfluxDB telemetry has its own backup.** The gNMI streaming-telemetry data lives in `obmp-influxdb` (bucket `telemetry`), not in PostgreSQL. `pg_dump` does not touch it. Back it up separately with the Influx CLI: ```bash # Backup docker exec obmp-influxdb influx backup /var/lib/influxdb2/backup \ --token "$INFLUXDB_ADMIN_TOKEN" docker cp obmp-influxdb:/var/lib/influxdb2/backup \ /var/openbmp/backups/influxdb-$(date +%Y%m%d) # Restore docker cp /var/openbmp/backups/influxdb-YYYYMMDD \ obmp-influxdb:/var/lib/influxdb2/restore docker exec obmp-influxdb influx restore /var/lib/influxdb2/restore \ --token "$INFLUXDB_ADMIN_TOKEN" ``` Telemetry is also less critical than BMP data (30-day retention, data-plane counters) — back it up if you need historical telemetry to survive a host loss; otherwise the 30-day window simply re-fills. - **Grafana** — dashboards and datasources are provisioned from files in the repo (`obmp-grafana/provisioning/` and `obmp-grafana/dashboards/`), so they are already version-controlled in git. The Grafana database under `${OBMP_DATA_ROOT}/grafana` (users, preferences, manually-created dashboards, alert state) is *not* covered by this script — back up that directory separately if it holds anything not reproducible from the repo. - **Configuration & secrets** — `.env`, `docker-compose.yml`, and the `${OBMP_DATA_ROOT}/config` directory. Keep these in version control / your secrets manager.