add claude memory update
This commit is contained in:
100
CLAUDE.md
100
CLAUDE.md
@@ -164,13 +164,103 @@ sqlmesh test
|
||||
pytest --cov=./ --cov-report=xml
|
||||
```
|
||||
|
||||
## CI/CD Pipeline
|
||||
## CI/CD Pipeline and Production Architecture
|
||||
|
||||
GitLab CI runs three stages (`.gitlab-ci.yml`):
|
||||
### CI/CD Pipeline (`.gitlab-ci.yml`)
|
||||
|
||||
1. **Lint:** Runs ruff check and format validation, plus pip-audit
|
||||
2. **Test:** Runs pytest with coverage
|
||||
3. **Build:** Creates distribution packages (on tags only)
|
||||
**4 Stages: Lint → Test → Build → Deploy**
|
||||
|
||||
#### 1. Lint Stage
|
||||
- Runs `ruff check` and `ruff format --check`
|
||||
- Validates code quality on every commit
|
||||
|
||||
#### 2. Test Stage
|
||||
- **`test:cli`**: Runs pytest on materia CLI with 71% coverage
|
||||
- Tests secrets management (Pulumi ESC integration)
|
||||
- Tests worker lifecycle (create, list, destroy)
|
||||
- Tests pipeline execution (extract, transform)
|
||||
- Exports coverage reports to GitLab
|
||||
- **`test:sqlmesh`**: Runs SQLMesh model tests in transform layer
|
||||
|
||||
#### 3. Build Stage (only on master branch)
|
||||
Creates separate artifacts for each workspace package:
|
||||
- **`build:extract`**: Builds `materia-extract-latest.tar.gz` (psdonline package)
|
||||
- **`build:transform`**: Builds `materia-transform-latest.tar.gz` (sqlmesh_materia package)
|
||||
- **`build:cli`**: Builds `materia-cli-latest.tar.gz` (materia management CLI)
|
||||
|
||||
Each artifact is a self-contained tarball with all dependencies.
|
||||
|
||||
#### 4. Deploy Stage (only on master branch)
|
||||
- **`deploy:r2`**: Uploads artifacts to Cloudflare R2 using rclone
|
||||
- Loads secrets from Pulumi ESC (`beanflows/prod`)
|
||||
- Only requires `PULUMI_ACCESS_TOKEN` in GitLab variables
|
||||
- All other secrets (R2 credentials, SSH keys, API tokens) come from ESC
|
||||
- **`deploy:infra`**: Runs `pulumi up` to deploy infrastructure changes
|
||||
- Only triggers when `infra/**/*` files change
|
||||
|
||||
### Production Architecture: Ephemeral Worker Model
|
||||
|
||||
**Design Philosophy:**
|
||||
- No always-on workers (cost optimization)
|
||||
- Supervisor instance dynamically creates/destroys workers on-demand
|
||||
- Language-agnostic artifacts enable future migration to C/Rust/Go
|
||||
- Multi-cloud abstraction for pricing optimization
|
||||
|
||||
**Components:**
|
||||
|
||||
#### 1. Supervisor Instance (Small Hetzner VM)
|
||||
- Runs the `materia` management CLI
|
||||
- Small, always-on instance (cheap)
|
||||
- Pulls secrets from Pulumi ESC
|
||||
- Orchestrates worker lifecycle via cloud provider APIs
|
||||
|
||||
#### 2. Ephemeral Workers (On-Demand)
|
||||
- Created for each pipeline execution
|
||||
- Downloads pre-built artifacts from R2 (no git, no uv on worker)
|
||||
- Receives secrets via SSH environment variable injection
|
||||
- Destroyed immediately after job completion
|
||||
- Different instance types per pipeline:
|
||||
- Extract: `ccx12` (2 vCPU, 8GB RAM)
|
||||
- Transform: `ccx22` (4 vCPU, 16GB RAM)
|
||||
|
||||
#### 3. Secrets Flow
|
||||
```
|
||||
Pulumi ESC (beanflows/prod)
|
||||
↓
|
||||
Supervisor Instance (materia CLI)
|
||||
↓
|
||||
Workers (injected as env vars via SSH)
|
||||
```
|
||||
|
||||
#### 4. Artifact Flow
|
||||
```
|
||||
GitLab CI: uv build → tar.gz
|
||||
↓
|
||||
Cloudflare R2 (artifact storage)
|
||||
↓
|
||||
Worker: curl → extract → execute
|
||||
```
|
||||
|
||||
#### 5. Data Storage
|
||||
- **Dev**: Local DuckDB file (`materia_dev.db`)
|
||||
- **Prod**: DuckDB in-memory + Cloudflare R2 Data Catalog (Iceberg REST API)
|
||||
- ACID transactions on object storage
|
||||
- No persistent database on workers
|
||||
|
||||
**Execution Flow:**
|
||||
1. Supervisor receives schedule trigger (cron/manual)
|
||||
2. CLI runs: `materia pipeline run extract`
|
||||
3. Creates Hetzner worker with SSH key
|
||||
4. Worker downloads `materia-extract-latest.tar.gz` from R2
|
||||
5. CLI injects secrets via SSH: `export R2_ACCESS_KEY_ID=... && ./extract_psd`
|
||||
6. Pipeline executes, writes to R2 Iceberg catalog
|
||||
7. Worker destroyed (entire lifecycle ~5-10 minutes)
|
||||
|
||||
**Multi-Cloud Provider Abstraction:**
|
||||
- Protocol-based interface (data-oriented design, no OOP)
|
||||
- Providers: Hetzner (implemented), OVH, Scaleway, Oracle (stubs)
|
||||
- Allows switching providers for cost optimization
|
||||
- Each provider implements: `create_instance`, `destroy_instance`, `list_instances`, `wait_for_ssh`
|
||||
|
||||
## Key Design Patterns
|
||||
|
||||
|
||||
Reference in New Issue
Block a user