On first deploy to a new server, deploy.sh:
1. Installs age and sops binaries if missing
2. Generates an age keypair if missing
3. Prints the public key and exits with instructions
All checks are idempotent — subsequent deploys skip to decryption.
Removed duplicate sops/age setup from setup_server.sh (deploy.sh handles it).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reads age key from /opt/padelnomics/age-key.txt (overridable via
SOPS_AGE_KEY_FILE env var). Decrypts .env.prod.sops → .env with
chmod 600.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nginx -t resolves upstream hostnames — if the config points to a stopped
slot from a previous failed deploy, the health check fails and the router
stays unhealthy indefinitely, blocking all future deploys.
Before up -d --wait, write the router config to point to the CURRENT live
slot (which is still running) and restart the router. This clears the
stale unhealthy state. After the new slot passes health checks, switch
the router config to the new slot and reload.
Also extracted _write_router_conf() to avoid duplicating the nginx config
template.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docker compose requires --profile to access profiled services even for
the logs command. Without it, blue-app logs were empty in the failure
dump, hiding the actual crash reason.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 100-line combined log dump was entirely filled by litestream R2
errors, hiding the actual blue-app crash output. Now dumps blue-app
(60 lines), router (10 lines), and litestream (10 lines) separately.
Revert litestream image tag to latest — the R2 errors were caused by
misconfigured endpoint/bucket CI variables, not a litestream version
bug. The v0.5.8 tag may not exist on Docker Hub (tags omit 'v' prefix).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Router had no profile so it was always included in `up -d --wait`.
Writing the new target's config BEFORE the wait caused the router to become
unhealthy if the new slot failed — leaving it in a broken state for the next
deploy attempt.
Now: router keeps its old config (pointing to the still-running old slot)
during the health check wait, so it stays healthy throughout. Config is only
written and nginx -s reload triggered after the new slot passes its health
check. This is the correct blue-green pattern.
Also add `retries: 3` and `start_period: 10s` to the router health check
for resilience against transient startup failures.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Router health check (nginx -t) fails when default.conf doesn't exist yet.
Move config write to before `up -d --wait` so nginx has a valid config
on first deploy or after a volume wipe. Router reload stays post-health-check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Migration atomicity:
- Remove conn.commit() and executescript() from all up() functions (0000,
0011, 0012, 0013, 0014, 0015); executescript() issued implicit COMMITs
which broke the batch-rollback guarantee of the migration runner
- Rewrite 0000 with individual conn.execute() calls (was a single
executescript block)
Deploy hardening:
- Add pre-migration DB backup step to deploy.sh: saves
app.db.pre-deploy-<timestamp> in the volume before every migration
- On health-check failure: restore the backup, then stop + exit
- On success: clean up old backups (keep last 3)
Litestream:
- Enable R2 as primary replica in litestream.yml (env-var placeholders)
- Add local /app/data/backups as secondary replica
- docker-compose: add auto-restore on empty volume (sh entrypoint runs
'litestream restore' before 'litestream replicate' if app.db missing)
- Add LITESTREAM_R2_* vars to .gitlab-ci.yml .env block and .env.example
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docker compose --profile stop also stops non-profiled services (router,
litestream), causing 502. Now explicitly names only slot services to stop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GitLab CI runs pytest + ruff on master/MRs, then auto-deploys via SSH.
Blue-green strategy using Docker Compose profiles with an nginx router
on port 5000 for zero-downtime switching between slots.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>