Commit Graph

11 Commits

Author SHA1 Message Date
Deeman
76814dade7 feat: landing zone backup to R2 via rclone + Litestream
Landing files (append-only JSON.gz) synced to R2 every 30 min via
systemd timer + rclone. Extraction state DB (.state.sqlite) continuously
replicated via Litestream (second DB entry). Auto-restore on container
startup for both app.db and .state.sqlite. Reuses existing R2 bucket
and credentials — no new env vars needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 14:06:16 +01:00
Deeman
13c264ca75 fix(deploy): split log dump by service, revert litestream to latest
The 100-line combined log dump was entirely filled by litestream R2
errors, hiding the actual blue-app crash output. Now dumps blue-app
(60 lines), router (10 lines), and litestream (10 lines) separately.

Revert litestream image tag to latest — the R2 errors were caused by
misconfigured endpoint/bucket CI variables, not a litestream version
bug. The v0.5.8 tag may not exist on Docker Hub (tags omit 'v' prefix).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 14:01:32 +01:00
Deeman
ad5e2516c4 fix(infra): pin litestream to v0.5.8 for R2 compatibility
latest tag may resolve to an older version that treats Cloudflare R2's
NoSuchKey response on empty-prefix ListObjectsV2 as a hard error instead
of an empty list, causing the replica sync to stall on first deployment.
v0.5.8 is the current stable release (2026-02-12).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 13:49:02 +01:00
Deeman
5f7e8f1200 fix(deploy): move router config write to after health check passes
Router had no profile so it was always included in `up -d --wait`.
Writing the new target's config BEFORE the wait caused the router to become
unhealthy if the new slot failed — leaving it in a broken state for the next
deploy attempt.

Now: router keeps its old config (pointing to the still-running old slot)
during the health check wait, so it stays healthy throughout. Config is only
written and nginx -s reload triggered after the new slot passes its health
check. This is the correct blue-green pattern.

Also add `retries: 3` and `start_period: 10s` to the router health check
for resilience against transient startup failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 13:22:50 +01:00
Deeman
22ad855c70 fix(deploy): update docker-compose.prod.yml paths after repo flatten
build context, env_file, and litestream volume mount all pointed at
./padelnomics/ which no longer exists after the flatten.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:17:47 +01:00
Deeman
358bc5c02f fix: use kill -0 1 for litestream healthcheck
pgrep may not be available in the litestream image. kill -0 1 checks
whether PID 1 (litestream, after exec) is alive — works in any container.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 13:08:00 +01:00
Deeman
76fc19c183 fix: litestream healthcheck gate + 1yr retention
Re-enable deploy gate on litestream: pgrep-based healthcheck with 6
retries (30s window) after a 15s start period — broken backups now
fail the deploy loudly instead of silently succeeding.

Extend retention from 7d to 1yr (8760h): WAL frames are tiny for a
low-traffic app, R2 free tier covers years of storage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 13:00:29 +01:00
Deeman
b0f36192a6 fix: litestream single replica + disable healthcheck gate
v0.5.8 dropped multi-replica support — remove the local path replica,
keeping only R2. Also disable litestream's healthcheck so deploy's
`up --wait` isn't gated on the backup service.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 12:54:19 +01:00
Deeman
c0c8607664 fix: migration atomicity + deploy hardening + Litestream R2
Migration atomicity:
- Remove conn.commit() and executescript() from all up() functions (0000,
  0011, 0012, 0013, 0014, 0015); executescript() issued implicit COMMITs
  which broke the batch-rollback guarantee of the migration runner
- Rewrite 0000 with individual conn.execute() calls (was a single
  executescript block)

Deploy hardening:
- Add pre-migration DB backup step to deploy.sh: saves
  app.db.pre-deploy-<timestamp> in the volume before every migration
- On health-check failure: restore the backup, then stop + exit
- On success: clean up old backups (keep last 3)

Litestream:
- Enable R2 as primary replica in litestream.yml (env-var placeholders)
- Add local /app/data/backups as secondary replica
- docker-compose: add auto-restore on empty volume (sh entrypoint runs
  'litestream restore' before 'litestream replicate' if app.db missing)
- Add LITESTREAM_R2_* vars to .gitlab-ci.yml .env block and .env.example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:28:59 +01:00
Deeman
337816c6c1 fix env_file path to use padelnomics/.env
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 16:44:33 +01:00
Deeman
fa09fc81c9 add CI/CD pipeline with blue-green deployment
GitLab CI runs pytest + ruff on master/MRs, then auto-deploys via SSH.
Blue-green strategy using Docker Compose profiles with an nginx router
on port 5000 for zero-downtime switching between slots.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:39:15 +01:00