224 lines
8.0 KiB
Markdown
224 lines
8.0 KiB
Markdown
|
|
# OpenBMP Backup & Restore
|
|||
|
|
|
|||
|
|
How to back up and restore the OpenBMP PostgreSQL database, what the backup
|
|||
|
|
covers, and what it deliberately does not.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## What `scripts/pg-backup.sh` backs up
|
|||
|
|
|
|||
|
|
The script runs `pg_dump` inside the `obmp-psql` container and produces a
|
|||
|
|
single timestamped, compressed, custom-format dump of the **entire `openbmp`
|
|||
|
|
database**:
|
|||
|
|
|
|||
|
|
- All BMP/BGP operational tables — `routers`, `bgp_peers`, `ip_rib`,
|
|||
|
|
`base_attrs`, `global_ip_rib`, `l3vpn_rib`, the `ls_*` link-state tables.
|
|||
|
|
- All history / TimescaleDB hypertables — `ip_rib_log`, `peer_event_log`,
|
|||
|
|
`stat_reports`, and the `stats_*` aggregate tables.
|
|||
|
|
- Reference / enrichment data — `geo_ip`, `info_asn`, `info_route`,
|
|||
|
|
`rpki_validator`, `pdb_exchange_peers`.
|
|||
|
|
- Schema objects — table definitions, indexes, views, functions, triggers,
|
|||
|
|
enum types, and the TimescaleDB hypertable configuration.
|
|||
|
|
|
|||
|
|
The dump is taken against a **live database** — `pg_dump` uses an MVCC
|
|||
|
|
snapshot, so no downtime and no service stop is required. It is written
|
|||
|
|
atomically (to a `.partial` file, renamed on success) so an interrupted run
|
|||
|
|
never leaves a dump that looks valid but is truncated.
|
|||
|
|
|
|||
|
|
Output: `${OBMP_DATA_ROOT:-/var/openbmp}/backups/openbmp-YYYYMMDD-HHMMSS.dump`
|
|||
|
|
|
|||
|
|
### TimescaleDB note
|
|||
|
|
|
|||
|
|
The OpenBMP database uses TimescaleDB hypertables (`ip_rib_log`,
|
|||
|
|
`peer_event_log`, the `stats_*` tables, with compression policies).
|
|||
|
|
**A `pg_dump` logical backup restores hypertables correctly** — the dump
|
|||
|
|
captures the `_timescaledb_catalog` metadata, and on restore the hypertable
|
|||
|
|
structure, chunks, and compression settings are recreated. No special flags
|
|||
|
|
are needed for the dump. The only requirement is that the **restore target
|
|||
|
|
has the TimescaleDB extension available** — which the `openbmp/postgres`
|
|||
|
|
image provides, so restoring into a fresh `obmp-psql` works out of the box.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Scheduling
|
|||
|
|
|
|||
|
|
Make the script executable once:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
chmod +x scripts/pg-backup.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Add a cron entry (`crontab -e`) — daily at 02:30, logging to a file:
|
|||
|
|
|
|||
|
|
```cron
|
|||
|
|
30 2 * * * OBMP_DATA_ROOT=/var/openbmp /home/user/obmp-docker/scripts/pg-backup.sh >> /var/openbmp/backups/pg-backup.log 2>&1
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The cron user must be able to reach the Docker daemon — run it as a user in
|
|||
|
|
the `docker` group, or as root. A systemd timer is an equally valid
|
|||
|
|
alternative.
|
|||
|
|
|
|||
|
|
### Configuration
|
|||
|
|
|
|||
|
|
All settings are environment variables with sensible defaults:
|
|||
|
|
|
|||
|
|
| Variable | Default | Purpose |
|
|||
|
|
|----------|---------|---------|
|
|||
|
|
| `OBMP_DATA_ROOT` | `/var/openbmp` | Base data dir; backups go to `${OBMP_DATA_ROOT}/backups` |
|
|||
|
|
| `OBMP_BACKUP_DIR` | (unset) | Explicit backup dir, overrides the default |
|
|||
|
|
| `OBMP_PG_CONTAINER` | `obmp-psql` | Postgres container name |
|
|||
|
|
| `OBMP_PG_DB` | `openbmp` | Database name |
|
|||
|
|
| `OBMP_PG_USER` | `openbmp` | Database user |
|
|||
|
|
| `OBMP_BACKUP_RETENTION_DAYS` | `14` | Dumps older than this are pruned each run |
|
|||
|
|
|
|||
|
|
Retention only prunes files matching the script's own `openbmp-*.dump`
|
|||
|
|
naming pattern — nothing else in the directory is touched.
|
|||
|
|
|
|||
|
|
### Production recommendations
|
|||
|
|
|
|||
|
|
- **Copy dumps off-host.** A local backup does not survive host loss. Sync
|
|||
|
|
the backup directory to object storage / a backup server (e.g. nightly
|
|||
|
|
`rclone`, `restic`, or your existing ISP backup tooling).
|
|||
|
|
- **Size the backup volume** — at production scale (~100–150M NLRIs) the
|
|||
|
|
dump can be tens of GB even compressed. See `docs/production-sizing.md`.
|
|||
|
|
- **Test restores periodically** — an untested backup is not a backup.
|
|||
|
|
- For tighter RPO than once-daily logical dumps, consider PostgreSQL
|
|||
|
|
continuous archiving / PITR (WAL archiving + `pg_basebackup`). That is out
|
|||
|
|
of scope for this script but worth planning for a production deployment.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Restore procedure
|
|||
|
|
|
|||
|
|
This restores a dump into a **fresh, empty** `obmp-psql` database. Restoring
|
|||
|
|
over a populated database risks conflicts — start clean.
|
|||
|
|
|
|||
|
|
### 1. Stop the writers
|
|||
|
|
|
|||
|
|
Stop the services that write to the database so nothing races the restore:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker compose -p obmp stop psql-app collector
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Leave `obmp-psql` running.
|
|||
|
|
|
|||
|
|
### 2. Recreate an empty database
|
|||
|
|
|
|||
|
|
Drop and recreate the `openbmp` database inside the running container:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker exec -i obmp-psql psql -U openbmp -d postgres <<'EOSQL'
|
|||
|
|
DROP DATABASE IF EXISTS openbmp;
|
|||
|
|
CREATE DATABASE openbmp OWNER openbmp;
|
|||
|
|
EOSQL
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
> Restoring into a **brand-new container**? Bring `obmp-psql` up first and let
|
|||
|
|
> it initialize, but **do not** create the `config/init_db` trigger file —
|
|||
|
|
> the schema comes from the dump, not from psql-app's first-run migration.
|
|||
|
|
|
|||
|
|
### 3. Restore the dump
|
|||
|
|
|
|||
|
|
Copy the dump into the container and run `pg_restore`:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
DUMP=/var/openbmp/backups/openbmp-YYYYMMDD-HHMMSS.dump
|
|||
|
|
|
|||
|
|
docker cp "${DUMP}" obmp-psql:/tmp/restore.dump
|
|||
|
|
|
|||
|
|
docker exec -i obmp-psql \
|
|||
|
|
pg_restore -U openbmp -d openbmp --no-owner --no-privileges \
|
|||
|
|
--jobs=4 /tmp/restore.dump
|
|||
|
|
|
|||
|
|
docker exec obmp-psql rm -f /tmp/restore.dump
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
- `--no-owner --no-privileges` — the dump was created with the same flags;
|
|||
|
|
objects are recreated owned by the connecting role.
|
|||
|
|
- `--jobs=4` — parallel restore; raise it on a many-core host to speed up the
|
|||
|
|
large `ip_rib` / `ip_rib_log` tables. Custom-format dumps support this.
|
|||
|
|
- Some non-fatal warnings (e.g. about the TimescaleDB extension or existing
|
|||
|
|
objects) are normal. A non-zero exit with only warnings is usually fine —
|
|||
|
|
inspect the output before assuming failure.
|
|||
|
|
|
|||
|
|
Alternatively, stream the restore without `docker cp`:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker exec -i obmp-psql pg_restore -U openbmp -d openbmp \
|
|||
|
|
--no-owner --no-privileges < "${DUMP}"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
(Streaming via stdin disables `--jobs` parallelism — use `docker cp` for
|
|||
|
|
large dumps.)
|
|||
|
|
|
|||
|
|
### 4. Verify
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker exec -i obmp-psql psql -U openbmp -d openbmp -c "
|
|||
|
|
SELECT (SELECT count(*) FROM routers) AS routers,
|
|||
|
|
(SELECT count(*) FROM bgp_peers) AS peers,
|
|||
|
|
(SELECT count(*) FROM ip_rib) AS rib_rows;"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Confirm hypertables came back:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker exec -i obmp-psql psql -U openbmp -d openbmp -c "
|
|||
|
|
SELECT hypertable_name FROM timescaledb_information.hypertables;"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5. Restart the writers
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
docker compose -p obmp start collector psql-app
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
The collector reconnects to the routers' BMP sessions and psql-app resumes
|
|||
|
|
consuming from Kafka. Live state catches up from the routers.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## What is NOT covered
|
|||
|
|
|
|||
|
|
This backup is **PostgreSQL only**. The following are out of scope and need
|
|||
|
|
their own handling:
|
|||
|
|
|
|||
|
|
- **Kafka data is transient.** The `obmp-kafka` topics are a short-retention
|
|||
|
|
pipeline buffer (`KAFKA_LOG_RETENTION_MINUTES: 720` — 12 hours). They are
|
|||
|
|
not a system of record and do not need backing up. After a restore, routers
|
|||
|
|
re-send BMP and the pipeline refills naturally.
|
|||
|
|
|
|||
|
|
- **InfluxDB telemetry has its own backup.** The gNMI streaming-telemetry
|
|||
|
|
data lives in `obmp-influxdb` (bucket `telemetry`), not in PostgreSQL.
|
|||
|
|
`pg_dump` does not touch it. Back it up separately with the Influx CLI:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Backup
|
|||
|
|
docker exec obmp-influxdb influx backup /var/lib/influxdb2/backup \
|
|||
|
|
--token "$INFLUXDB_ADMIN_TOKEN"
|
|||
|
|
docker cp obmp-influxdb:/var/lib/influxdb2/backup \
|
|||
|
|
/var/openbmp/backups/influxdb-$(date +%Y%m%d)
|
|||
|
|
|
|||
|
|
# Restore
|
|||
|
|
docker cp /var/openbmp/backups/influxdb-YYYYMMDD \
|
|||
|
|
obmp-influxdb:/var/lib/influxdb2/restore
|
|||
|
|
docker exec obmp-influxdb influx restore /var/lib/influxdb2/restore \
|
|||
|
|
--token "$INFLUXDB_ADMIN_TOKEN"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Telemetry is also less critical than BMP data (30-day retention,
|
|||
|
|
data-plane counters) — back it up if you need historical telemetry to
|
|||
|
|
survive a host loss; otherwise the 30-day window simply re-fills.
|
|||
|
|
|
|||
|
|
- **Grafana** — dashboards and datasources are provisioned from files in the
|
|||
|
|
repo (`obmp-grafana/provisioning/` and `obmp-grafana/dashboards/`), so they
|
|||
|
|
are already version-controlled in git. The Grafana database under
|
|||
|
|
`${OBMP_DATA_ROOT}/grafana` (users, preferences, manually-created
|
|||
|
|
dashboards, alert state) is *not* covered by this script — back up that
|
|||
|
|
directory separately if it holds anything not reproducible from the repo.
|
|||
|
|
|
|||
|
|
- **Configuration & secrets** — `.env`, `docker-compose.yml`, and the
|
|||
|
|
`${OBMP_DATA_ROOT}/config` directory. Keep these in version control /
|
|||
|
|
your secrets manager.
|