Commit Graph

7 Commits

Author SHA1 Message Date
Deeman
13c264ca75 fix(deploy): split log dump by service, revert litestream to latest
The 100-line combined log dump was entirely filled by litestream R2
errors, hiding the actual blue-app crash output. Now dumps blue-app
(60 lines), router (10 lines), and litestream (10 lines) separately.

Revert litestream image tag to latest — the R2 errors were caused by
misconfigured endpoint/bucket CI variables, not a litestream version
bug. The v0.5.8 tag may not exist on Docker Hub (tags omit 'v' prefix).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 14:01:32 +01:00
Deeman
5f7e8f1200 fix(deploy): move router config write to after health check passes
Router had no profile so it was always included in `up -d --wait`.
Writing the new target's config BEFORE the wait caused the router to become
unhealthy if the new slot failed — leaving it in a broken state for the next
deploy attempt.

Now: router keeps its old config (pointing to the still-running old slot)
during the health check wait, so it stays healthy throughout. Config is only
written and nginx -s reload triggered after the new slot passes its health
check. This is the correct blue-green pattern.

Also add `retries: 3` and `start_period: 10s` to the router health check
for resilience against transient startup failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 13:22:50 +01:00
Deeman
e39eaefb43 fix(deploy): dump app container logs on health check failure
Makes the crash reason visible in GitLab CI logs instead of just
"container is unhealthy".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 13:12:26 +01:00
Deeman
dc02563e52 fix: write nginx config before container start to fix first-deploy health check
Router health check (nginx -t) fails when default.conf doesn't exist yet.
Move config write to before `up -d --wait` so nginx has a valid config
on first deploy or after a volume wipe. Router reload stays post-health-check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 12:45:37 +01:00
Deeman
c0c8607664 fix: migration atomicity + deploy hardening + Litestream R2
Migration atomicity:
- Remove conn.commit() and executescript() from all up() functions (0000,
  0011, 0012, 0013, 0014, 0015); executescript() issued implicit COMMITs
  which broke the batch-rollback guarantee of the migration runner
- Rewrite 0000 with individual conn.execute() calls (was a single
  executescript block)

Deploy hardening:
- Add pre-migration DB backup step to deploy.sh: saves
  app.db.pre-deploy-<timestamp> in the volume before every migration
- On health-check failure: restore the backup, then stop + exit
- On success: clean up old backups (keep last 3)

Litestream:
- Enable R2 as primary replica in litestream.yml (env-var placeholders)
- Add local /app/data/backups as secondary replica
- docker-compose: add auto-restore on empty volume (sh entrypoint runs
  'litestream restore' before 'litestream replicate' if app.db missing)
- Add LITESTREAM_R2_* vars to .gitlab-ci.yml .env block and .env.example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 10:28:59 +01:00
Deeman
1e56087060 fix deploy.sh stopping router during blue-green switch
docker compose --profile stop also stops non-profiled services (router,
litestream), causing 502. Now explicitly names only slot services to stop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:16:19 +01:00
Deeman
fa09fc81c9 add CI/CD pipeline with blue-green deployment
GitLab CI runs pytest + ruff on master/MRs, then auto-deploys via SSH.
Blue-green strategy using Docker Compose profiles with an nginx router
on port 5000 for zero-downtime switching between slots.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:39:15 +01:00