Add supervisor deployment with continuous pipeline orchestration

Implements automated supervisor instance deployment that runs scheduled
pipelines using a TigerBeetle-inspired continuous orchestration pattern.

Infrastructure changes:
- Update Pulumi to use existing R2 buckets (beanflows-artifacts, beanflows-data-prod)
- Rename scheduler → supervisor, optimize to CCX11 (€4/mo)
- Remove always-on worker (workers are now ephemeral only)
- Add artifacts bucket resource for CLI/pipeline packages

Supervisor architecture:
- supervisor.sh: Continuous loop checking schedules every 15 minutes
- Self-updating: Checks for new CLI versions hourly
- Fixed schedules: Extract at 2 AM UTC, Transform at 3 AM UTC
- systemd service for automatic restart on failure
- Logs to systemd journal for observability

CI/CD changes:
- deploy:infra now runs on every master push (not just on changes)
- New deploy:supervisor job:
  * Deploys supervisor.sh and systemd service
  * Installs latest materia CLI from R2
  * Configures environment with Pulumi ESC secrets
  * Restarts supervisor service

Future enhancements documented:
- SQLMesh-aware scheduling (check models before running)
- Model tags for worker sizing (heavy/distributed hints)
- Multi-pipeline support, distributed execution
- Cost optimization with multi-cloud spot pricing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Deeman
2025-10-12 22:23:55 +02:00
parent 7e6ff29dea
commit f207fb441d
6 changed files with 648 additions and 79 deletions

View File

@@ -195,8 +195,15 @@ Each artifact is a self-contained tarball with all dependencies.
- Loads secrets from Pulumi ESC (`beanflows/prod`)
- Only requires `PULUMI_ACCESS_TOKEN` in GitLab variables
- All other secrets (R2 credentials, SSH keys, API tokens) come from ESC
- **`deploy:infra`**: Runs `pulumi up` to deploy infrastructure changes
- Only triggers when `infra/**/*` files change
- **`deploy:infra`**: Runs `pulumi up` to ensure supervisor instance exists
- Runs on every master push (not just on infra changes)
- Creates/updates Hetzner CCX11 supervisor instance
- Configures Cloudflare R2 buckets (`beanflows-artifacts`, `beanflows-data-prod`)
- **`deploy:supervisor`**: Deploys supervisor script and materia CLI
- Runs after `deploy:r2` and `deploy:infra`
- Copies `supervisor.sh` and systemd service to supervisor instance
- Downloads and installs latest materia CLI from R2
- Restarts supervisor service to pick up changes
### Production Architecture: Ephemeral Worker Model
@@ -209,10 +216,15 @@ Each artifact is a self-contained tarball with all dependencies.
**Components:**
#### 1. Supervisor Instance (Small Hetzner VM)
- Runs the `materia` management CLI
- Small, always-on instance (cheap)
- Pulls secrets from Pulumi ESC
- Orchestrates worker lifecycle via cloud provider APIs
- Runs `supervisor.sh` - continuous orchestration loop (inspired by TigerBeetle's CFO supervisor)
- Hetzner CCX11: 2 vCPU, 4GB RAM (~€4/mo)
- Always-on, minimal resource usage
- Checks for new CLI versions every hour (self-updating)
- Runs pipelines on schedule:
- Extract: Daily at 2 AM UTC
- Transform: Daily at 3 AM UTC
- Uses systemd service for automatic restart on failure
- Pulls secrets from Pulumi ESC and passes to workers
#### 2. Ephemeral Workers (On-Demand)
- Created for each pipeline execution
@@ -248,13 +260,16 @@ Worker: curl → extract → execute
- No persistent database on workers
**Execution Flow:**
1. Supervisor receives schedule trigger (cron/manual)
2. CLI runs: `materia pipeline run extract`
3. Creates Hetzner worker with SSH key
4. Worker downloads `materia-extract-latest.tar.gz` from R2
5. CLI injects secrets via SSH: `export R2_ACCESS_KEY_ID=... && ./extract_psd`
6. Pipeline executes, writes to R2 Iceberg catalog
7. Worker destroyed (entire lifecycle ~5-10 minutes)
1. Supervisor loop wakes up every 15 minutes
2. Checks if current time matches pipeline schedule (e.g., 2 AM for extract)
3. Checks for CLI updates (hourly) and self-updates if needed
4. CLI runs: `materia pipeline run extract`
5. Creates Hetzner worker with SSH key
6. Worker downloads `materia-extract-latest.tar.gz` from R2
7. CLI injects secrets via SSH: `export R2_ACCESS_KEY_ID=... && ./extract_psd`
8. Pipeline executes, writes to R2 Iceberg catalog
9. Worker destroyed (entire lifecycle ~5-10 minutes)
10. Supervisor logs results and continues loop
**Multi-Cloud Provider Abstraction:**
- Protocol-based interface (data-oriented design, no OOP)