feat: landing zone backup to R2 via rclone + Litestream
Landing files (append-only JSON.gz) synced to R2 every 30 min via systemd timer + rclone. Extraction state DB (.state.sqlite) continuously replicated via Litestream (second DB entry). Auto-restore on container startup for both app.db and .state.sqlite. Reuses existing R2 bucket and credentials — no new env vars needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -96,6 +96,27 @@ analytics.duckdb ← serving tables only, web app read-only
|
||||
└── serving.* ← atomically replaced by export_serving.py
|
||||
```
|
||||
|
||||
## Backup & disaster recovery
|
||||
|
||||
| Data | Tool | Target | Frequency |
|
||||
|------|------|--------|-----------|
|
||||
| `app.db` (auth, billing) | Litestream | R2 `padelnomics/app.db` | Continuous (WAL) |
|
||||
| `.state.sqlite` (extraction state) | Litestream | R2 `padelnomics/state.sqlite` | Continuous (WAL) |
|
||||
| `data/landing/` (JSON.gz files) | rclone sync | R2 `padelnomics/landing/` | Every 30 min (systemd timer) |
|
||||
| `lakehouse.duckdb`, `analytics.duckdb` | N/A (derived) | Re-run pipeline | On demand |
|
||||
|
||||
Recovery:
|
||||
```bash
|
||||
# App database (auto-restored by Litestream container on startup)
|
||||
litestream restore -config /etc/litestream.yml /app/data/app.db
|
||||
|
||||
# Extraction state (auto-restored by Litestream container on startup)
|
||||
litestream restore -config /etc/litestream.yml /data/landing/.state.sqlite
|
||||
|
||||
# Landing zone files
|
||||
source /opt/padelnomics/.env && bash infra/restore_landing.sh
|
||||
```
|
||||
|
||||
## Environment variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|
||||
Reference in New Issue
Block a user