Add supervisor deployment with continuous pipeline orchestration
Implements automated supervisor instance deployment that runs scheduled pipelines using a TigerBeetle-inspired continuous orchestration pattern. Infrastructure changes: - Update Pulumi to use existing R2 buckets (beanflows-artifacts, beanflows-data-prod) - Rename scheduler → supervisor, optimize to CCX11 (€4/mo) - Remove always-on worker (workers are now ephemeral only) - Add artifacts bucket resource for CLI/pipeline packages Supervisor architecture: - supervisor.sh: Continuous loop checking schedules every 15 minutes - Self-updating: Checks for new CLI versions hourly - Fixed schedules: Extract at 2 AM UTC, Transform at 3 AM UTC - systemd service for automatic restart on failure - Logs to systemd journal for observability CI/CD changes: - deploy:infra now runs on every master push (not just on changes) - New deploy:supervisor job: * Deploys supervisor.sh and systemd service * Installs latest materia CLI from R2 * Configures environment with Pulumi ESC secrets * Restarts supervisor service Future enhancements documented: - SQLMesh-aware scheduling (check models before running) - Model tags for worker sizing (heavy/distributed hints) - Multi-pipeline support, distributed execution - Cost optimization with multi-cloud spot pricing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
41
CLAUDE.md
41
CLAUDE.md
@@ -195,8 +195,15 @@ Each artifact is a self-contained tarball with all dependencies.
|
||||
- Loads secrets from Pulumi ESC (`beanflows/prod`)
|
||||
- Only requires `PULUMI_ACCESS_TOKEN` in GitLab variables
|
||||
- All other secrets (R2 credentials, SSH keys, API tokens) come from ESC
|
||||
- **`deploy:infra`**: Runs `pulumi up` to deploy infrastructure changes
|
||||
- Only triggers when `infra/**/*` files change
|
||||
- **`deploy:infra`**: Runs `pulumi up` to ensure supervisor instance exists
|
||||
- Runs on every master push (not just on infra changes)
|
||||
- Creates/updates Hetzner CCX11 supervisor instance
|
||||
- Configures Cloudflare R2 buckets (`beanflows-artifacts`, `beanflows-data-prod`)
|
||||
- **`deploy:supervisor`**: Deploys supervisor script and materia CLI
|
||||
- Runs after `deploy:r2` and `deploy:infra`
|
||||
- Copies `supervisor.sh` and systemd service to supervisor instance
|
||||
- Downloads and installs latest materia CLI from R2
|
||||
- Restarts supervisor service to pick up changes
|
||||
|
||||
### Production Architecture: Ephemeral Worker Model
|
||||
|
||||
@@ -209,10 +216,15 @@ Each artifact is a self-contained tarball with all dependencies.
|
||||
**Components:**
|
||||
|
||||
#### 1. Supervisor Instance (Small Hetzner VM)
|
||||
- Runs the `materia` management CLI
|
||||
- Small, always-on instance (cheap)
|
||||
- Pulls secrets from Pulumi ESC
|
||||
- Orchestrates worker lifecycle via cloud provider APIs
|
||||
- Runs `supervisor.sh` - continuous orchestration loop (inspired by TigerBeetle's CFO supervisor)
|
||||
- Hetzner CCX11: 2 vCPU, 4GB RAM (~€4/mo)
|
||||
- Always-on, minimal resource usage
|
||||
- Checks for new CLI versions every hour (self-updating)
|
||||
- Runs pipelines on schedule:
|
||||
- Extract: Daily at 2 AM UTC
|
||||
- Transform: Daily at 3 AM UTC
|
||||
- Uses systemd service for automatic restart on failure
|
||||
- Pulls secrets from Pulumi ESC and passes to workers
|
||||
|
||||
#### 2. Ephemeral Workers (On-Demand)
|
||||
- Created for each pipeline execution
|
||||
@@ -248,13 +260,16 @@ Worker: curl → extract → execute
|
||||
- No persistent database on workers
|
||||
|
||||
**Execution Flow:**
|
||||
1. Supervisor receives schedule trigger (cron/manual)
|
||||
2. CLI runs: `materia pipeline run extract`
|
||||
3. Creates Hetzner worker with SSH key
|
||||
4. Worker downloads `materia-extract-latest.tar.gz` from R2
|
||||
5. CLI injects secrets via SSH: `export R2_ACCESS_KEY_ID=... && ./extract_psd`
|
||||
6. Pipeline executes, writes to R2 Iceberg catalog
|
||||
7. Worker destroyed (entire lifecycle ~5-10 minutes)
|
||||
1. Supervisor loop wakes up every 15 minutes
|
||||
2. Checks if current time matches pipeline schedule (e.g., 2 AM for extract)
|
||||
3. Checks for CLI updates (hourly) and self-updates if needed
|
||||
4. CLI runs: `materia pipeline run extract`
|
||||
5. Creates Hetzner worker with SSH key
|
||||
6. Worker downloads `materia-extract-latest.tar.gz` from R2
|
||||
7. CLI injects secrets via SSH: `export R2_ACCESS_KEY_ID=... && ./extract_psd`
|
||||
8. Pipeline executes, writes to R2 Iceberg catalog
|
||||
9. Worker destroyed (entire lifecycle ~5-10 minutes)
|
||||
10. Supervisor logs results and continues loop
|
||||
|
||||
**Multi-Cloud Provider Abstraction:**
|
||||
- Protocol-based interface (data-oriented design, no OOP)
|
||||
|
||||
Reference in New Issue
Block a user