add claude memory update

This commit is contained in:
Deeman
2025-10-12 21:52:39 +02:00
parent 6c93021f2d
commit 7e6ff29dea

100
CLAUDE.md
View File

@@ -164,13 +164,103 @@ sqlmesh test
pytest --cov=./ --cov-report=xml
```
## CI/CD Pipeline
## CI/CD Pipeline and Production Architecture
GitLab CI runs three stages (`.gitlab-ci.yml`):
### CI/CD Pipeline (`.gitlab-ci.yml`)
1. **Lint:** Runs ruff check and format validation, plus pip-audit
2. **Test:** Runs pytest with coverage
3. **Build:** Creates distribution packages (on tags only)
**4 Stages: Lint → Test → Build → Deploy**
#### 1. Lint Stage
- Runs `ruff check` and `ruff format --check`
- Validates code quality on every commit
#### 2. Test Stage
- **`test:cli`**: Runs pytest on materia CLI with 71% coverage
- Tests secrets management (Pulumi ESC integration)
- Tests worker lifecycle (create, list, destroy)
- Tests pipeline execution (extract, transform)
- Exports coverage reports to GitLab
- **`test:sqlmesh`**: Runs SQLMesh model tests in transform layer
#### 3. Build Stage (only on master branch)
Creates separate artifacts for each workspace package:
- **`build:extract`**: Builds `materia-extract-latest.tar.gz` (psdonline package)
- **`build:transform`**: Builds `materia-transform-latest.tar.gz` (sqlmesh_materia package)
- **`build:cli`**: Builds `materia-cli-latest.tar.gz` (materia management CLI)
Each artifact is a self-contained tarball with all dependencies.
#### 4. Deploy Stage (only on master branch)
- **`deploy:r2`**: Uploads artifacts to Cloudflare R2 using rclone
- Loads secrets from Pulumi ESC (`beanflows/prod`)
- Only requires `PULUMI_ACCESS_TOKEN` in GitLab variables
- All other secrets (R2 credentials, SSH keys, API tokens) come from ESC
- **`deploy:infra`**: Runs `pulumi up` to deploy infrastructure changes
- Only triggers when `infra/**/*` files change
### Production Architecture: Ephemeral Worker Model
**Design Philosophy:**
- No always-on workers (cost optimization)
- Supervisor instance dynamically creates/destroys workers on-demand
- Language-agnostic artifacts enable future migration to C/Rust/Go
- Multi-cloud abstraction for pricing optimization
**Components:**
#### 1. Supervisor Instance (Small Hetzner VM)
- Runs the `materia` management CLI
- Small, always-on instance (cheap)
- Pulls secrets from Pulumi ESC
- Orchestrates worker lifecycle via cloud provider APIs
#### 2. Ephemeral Workers (On-Demand)
- Created for each pipeline execution
- Downloads pre-built artifacts from R2 (no git, no uv on worker)
- Receives secrets via SSH environment variable injection
- Destroyed immediately after job completion
- Different instance types per pipeline:
- Extract: `ccx12` (2 vCPU, 8GB RAM)
- Transform: `ccx22` (4 vCPU, 16GB RAM)
#### 3. Secrets Flow
```
Pulumi ESC (beanflows/prod)
Supervisor Instance (materia CLI)
Workers (injected as env vars via SSH)
```
#### 4. Artifact Flow
```
GitLab CI: uv build → tar.gz
Cloudflare R2 (artifact storage)
Worker: curl → extract → execute
```
#### 5. Data Storage
- **Dev**: Local DuckDB file (`materia_dev.db`)
- **Prod**: DuckDB in-memory + Cloudflare R2 Data Catalog (Iceberg REST API)
- ACID transactions on object storage
- No persistent database on workers
**Execution Flow:**
1. Supervisor receives schedule trigger (cron/manual)
2. CLI runs: `materia pipeline run extract`
3. Creates Hetzner worker with SSH key
4. Worker downloads `materia-extract-latest.tar.gz` from R2
5. CLI injects secrets via SSH: `export R2_ACCESS_KEY_ID=... && ./extract_psd`
6. Pipeline executes, writes to R2 Iceberg catalog
7. Worker destroyed (entire lifecycle ~5-10 minutes)
**Multi-Cloud Provider Abstraction:**
- Protocol-based interface (data-oriented design, no OOP)
- Providers: Hetzner (implemented), OVH, Scaleway, Oracle (stubs)
- Allows switching providers for cost optimization
- Each provider implements: `create_instance`, `destroy_instance`, `list_instances`, `wait_for_ssh`
## Key Design Patterns