obmp-docker/docs/backup-restore.md

# OpenBMP Backup & Restore

How to back up and restore the OpenBMP PostgreSQL database, what the backup
covers, and what it deliberately does not.

---

## What `scripts/pg-backup.sh` backs up

The script runs `pg_dump` inside the `obmp-psql` container and produces a
single timestamped, compressed, custom-format dump of the **entire `openbmp`
database**:

- All BMP/BGP operational tables — `routers`, `bgp_peers`, `ip_rib`,
  `base_attrs`, `global_ip_rib`, `l3vpn_rib`, the `ls_*` link-state tables.
- All history / TimescaleDB hypertables — `ip_rib_log`, `peer_event_log`,
  `stat_reports`, and the `stats_*` aggregate tables.
- Reference / enrichment data — `geo_ip`, `info_asn`, `info_route`,
  `rpki_validator`, `pdb_exchange_peers`.
- Schema objects — table definitions, indexes, views, functions, triggers,
  enum types, and the TimescaleDB hypertable configuration.

The dump is taken against a **live database** — `pg_dump` uses an MVCC
snapshot, so no downtime and no service stop is required. It is written
atomically (to a `.partial` file, renamed on success) so an interrupted run
never leaves a dump that looks valid but is truncated.

Output: `${OBMP_DATA_ROOT:-/var/openbmp}/backups/openbmp-YYYYMMDD-HHMMSS.dump`

### TimescaleDB note

The OpenBMP database uses TimescaleDB hypertables (`ip_rib_log`,
`peer_event_log`, the `stats_*` tables, with compression policies).
**A `pg_dump` logical backup restores hypertables correctly** — the dump
captures the `_timescaledb_catalog` metadata, and on restore the hypertable
structure, chunks, and compression settings are recreated. No special flags
are needed for the dump. The only requirement is that the **restore target
has the TimescaleDB extension available** — which the `openbmp/postgres`
image provides, so restoring into a fresh `obmp-psql` works out of the box.

---

## Scheduling

Make the script executable once:

```bash
chmod +x scripts/pg-backup.sh
```

Add a cron entry (`crontab -e`) — daily at 02:30, logging to a file:

```cron
30 2 * * * OBMP_DATA_ROOT=/var/openbmp /home/user/obmp-docker/scripts/pg-backup.sh >> /var/openbmp/backups/pg-backup.log 2>&1
```

The cron user must be able to reach the Docker daemon — run it as a user in
the `docker` group, or as root. A systemd timer is an equally valid
alternative.

### Configuration

All settings are environment variables with sensible defaults:

| Variable | Default | Purpose |
|----------|---------|---------|
| `OBMP_DATA_ROOT` | `/var/openbmp` | Base data dir; backups go to `${OBMP_DATA_ROOT}/backups` |
| `OBMP_BACKUP_DIR` | (unset) | Explicit backup dir, overrides the default |
| `OBMP_PG_CONTAINER` | `obmp-psql` | Postgres container name |
| `OBMP_PG_DB` | `openbmp` | Database name |
| `OBMP_PG_USER` | `openbmp` | Database user |
| `OBMP_BACKUP_RETENTION_DAYS` | `14` | Dumps older than this are pruned each run |

Retention only prunes files matching the script's own `openbmp-*.dump`
naming pattern — nothing else in the directory is touched.

### Production recommendations

- **Copy dumps off-host.** A local backup does not survive host loss. Sync
  the backup directory to object storage / a backup server (e.g. nightly
  `rclone`, `restic`, or your existing ISP backup tooling).
- **Size the backup volume** — at production scale (~100–150M NLRIs) the
  dump can be tens of GB even compressed. See `docs/production-sizing.md`.
- **Test restores periodically** — an untested backup is not a backup.
- For tighter RPO than once-daily logical dumps, consider PostgreSQL
  continuous archiving / PITR (WAL archiving + `pg_basebackup`). That is out
  of scope for this script but worth planning for a production deployment.

---

## Restore procedure

This restores a dump into a **fresh, empty** `obmp-psql` database. Restoring
over a populated database risks conflicts — start clean.

### 1. Stop the writers

Stop the services that write to the database so nothing races the restore:

```bash
docker compose -p obmp stop psql-app collector
```

Leave `obmp-psql` running.

### 2. Recreate an empty database

Drop and recreate the `openbmp` database inside the running container:

```bash
docker exec -i obmp-psql psql -U openbmp -d postgres <<'EOSQL'
DROP DATABASE IF EXISTS openbmp;
CREATE DATABASE openbmp OWNER openbmp;
EOSQL
```

> Restoring into a **brand-new container**? Bring `obmp-psql` up first and let
> it initialize, but **do not** create the `config/init_db` trigger file —
> the schema comes from the dump, not from psql-app's first-run migration.

### 3. Restore the dump

Copy the dump into the container and run `pg_restore`:

```bash
DUMP=/var/openbmp/backups/openbmp-YYYYMMDD-HHMMSS.dump

docker cp "${DUMP}" obmp-psql:/tmp/restore.dump

docker exec -i obmp-psql \
  pg_restore -U openbmp -d openbmp --no-owner --no-privileges \
             --jobs=4 /tmp/restore.dump

docker exec obmp-psql rm -f /tmp/restore.dump
```

- `--no-owner --no-privileges` — the dump was created with the same flags;
  objects are recreated owned by the connecting role.
- `--jobs=4` — parallel restore; raise it on a many-core host to speed up the
  large `ip_rib` / `ip_rib_log` tables. Custom-format dumps support this.
- Some non-fatal warnings (e.g. about the TimescaleDB extension or existing
  objects) are normal. A non-zero exit with only warnings is usually fine —
  inspect the output before assuming failure.

Alternatively, stream the restore without `docker cp`:

```bash
docker exec -i obmp-psql pg_restore -U openbmp -d openbmp \
  --no-owner --no-privileges < "${DUMP}"
```

(Streaming via stdin disables `--jobs` parallelism — use `docker cp` for
large dumps.)

### 4. Verify

```bash
docker exec -i obmp-psql psql -U openbmp -d openbmp -c "
  SELECT (SELECT count(*) FROM routers)    AS routers,
         (SELECT count(*) FROM bgp_peers)  AS peers,
         (SELECT count(*) FROM ip_rib)     AS rib_rows;"
```

Confirm hypertables came back:

```bash
docker exec -i obmp-psql psql -U openbmp -d openbmp -c "
  SELECT hypertable_name FROM timescaledb_information.hypertables;"
```

### 5. Restart the writers

```bash
docker compose -p obmp start collector psql-app
```

The collector reconnects to the routers' BMP sessions and psql-app resumes
consuming from Kafka. Live state catches up from the routers.

---

## What is NOT covered

This backup is **PostgreSQL only**. The following are out of scope and need
their own handling:

- **Kafka data is transient.** The `obmp-kafka` topics are a short-retention
  pipeline buffer (`KAFKA_LOG_RETENTION_MINUTES: 720` — 12 hours). They are
  not a system of record and do not need backing up. After a restore, routers
  re-send BMP and the pipeline refills naturally.

- **InfluxDB telemetry has its own backup.** The gNMI streaming-telemetry
  data lives in `obmp-influxdb` (bucket `telemetry`), not in PostgreSQL.
  `pg_dump` does not touch it. Back it up separately with the Influx CLI:

  ```bash
  # Backup
  docker exec obmp-influxdb influx backup /var/lib/influxdb2/backup \
    --token "$INFLUXDB_ADMIN_TOKEN"
  docker cp obmp-influxdb:/var/lib/influxdb2/backup \
    /var/openbmp/backups/influxdb-$(date +%Y%m%d)

  # Restore
  docker cp /var/openbmp/backups/influxdb-YYYYMMDD \
    obmp-influxdb:/var/lib/influxdb2/restore
  docker exec obmp-influxdb influx restore /var/lib/influxdb2/restore \
    --token "$INFLUXDB_ADMIN_TOKEN"
  ```

  Telemetry is also less critical than BMP data (30-day retention,
  data-plane counters) — back it up if you need historical telemetry to
  survive a host loss; otherwise the 30-day window simply re-fills.

- **Grafana** — dashboards and datasources are provisioned from files in the
  repo (`obmp-grafana/provisioning/` and `obmp-grafana/dashboards/`), so they
  are already version-controlled in git. The Grafana database under
  `${OBMP_DATA_ROOT}/grafana` (users, preferences, manually-created
  dashboards, alert state) is *not* covered by this script — back up that
  directory separately if it holds anything not reproducible from the repo.

- **Configuration & secrets** — `.env`, `docker-compose.yml`, and the
  `${OBMP_DATA_ROOT}/config` directory. Keep these in version control /
  your secrets manager.