sam 0732ebfa07 Add production-readiness deliverables: security, backup, alerting
Adds a prioritized security-hardening checklist, a PostgreSQL logical-backup
script (pg-backup.sh) with a documented restore procedure, and Grafana
alerting provisioning (peer-down, flap-storm, RPKI-invalid, router-down rules
plus a contact-point template). The alerting YAML and contact points need
operator review before being relied on for paging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 20:55:03 -07:00

72 lines
2.8 KiB
YAML

# OpenBMP — Grafana contact points & notification policy provisioning
# Grafana 9.1.7 (apiVersion: 1)
#
# Defines WHERE alert notifications go (contact points) and WHICH alerts go
# there (the notification policy tree). Pairs with obmp-alerts.yaml in this
# directory.
#
# ----------------------------------------------------------------------
# OPERATOR REVIEW — this file ships with PLACEHOLDERS. Fill them in.
# ----------------------------------------------------------------------
# * The 'obmp-ops' contact point below has BOTH an email and a webhook
# receiver as examples. Delete whichever you do not use and fill in real
# values for the one you keep.
# * EMAIL requires Grafana SMTP to be configured (the [smtp] section of
# grafana.ini, or GF_SMTP_* env vars on the obmp-grafana container).
# Without working SMTP the email receiver silently fails.
# * WEBHOOK url: point it at your alerting system (Slack incoming webhook,
# PagerDuty Events API, Mattermost, an internal handler, etc.).
# * After editing, restart Grafana and verify under
# Alerting > Contact points > (test).
# ----------------------------------------------------------------------
apiVersion: 1
# --- Contact points ----------------------------------------------------
contactPoints:
- orgId: 1
name: obmp-ops
receivers:
# ---- Email receiver (requires Grafana SMTP configured) ----
- uid: obmp-ops-email
type: email
settings:
# REPLACE with the real NOC / on-call distribution address(es).
# Comma-separate multiple recipients.
addresses: noc@example.net
singleEmail: false
disableResolveMessage: false
# ---- Webhook receiver (Slack / PagerDuty / internal handler) ----
# Delete this block if you only use email.
- uid: obmp-ops-webhook
type: webhook
settings:
# REPLACE with your real webhook endpoint.
url: https://hooks.example.net/services/REPLACE-ME
httpMethod: POST
disableResolveMessage: false
# --- Notification policy tree -----------------------------------------
# The root policy routes every alert from obmp-alerts.yaml to 'obmp-ops'.
# Sub-routes split by the `severity` label so critical alerts can page
# faster / repeat sooner than warnings.
policies:
- orgId: 1
receiver: obmp-ops
# Group alerts that share these labels into a single notification.
group_by: ['alertname', 'service']
# Timing for the default (warning-ish) path.
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
# Critical alerts (peer down, router BMP down): notify fast, repeat
# more often until resolved.
- receiver: obmp-ops
matchers:
- severity = critical
group_wait: 10s
group_interval: 2m
repeat_interval: 1h