add claude memory update
This commit is contained in:
100
CLAUDE.md
100
CLAUDE.md
@@ -164,13 +164,103 @@ sqlmesh test
|
|||||||
pytest --cov=./ --cov-report=xml
|
pytest --cov=./ --cov-report=xml
|
||||||
```
|
```
|
||||||
|
|
||||||
## CI/CD Pipeline
|
## CI/CD Pipeline and Production Architecture
|
||||||
|
|
||||||
GitLab CI runs three stages (`.gitlab-ci.yml`):
|
### CI/CD Pipeline (`.gitlab-ci.yml`)
|
||||||
|
|
||||||
1. **Lint:** Runs ruff check and format validation, plus pip-audit
|
**4 Stages: Lint → Test → Build → Deploy**
|
||||||
2. **Test:** Runs pytest with coverage
|
|
||||||
3. **Build:** Creates distribution packages (on tags only)
|
#### 1. Lint Stage
|
||||||
|
- Runs `ruff check` and `ruff format --check`
|
||||||
|
- Validates code quality on every commit
|
||||||
|
|
||||||
|
#### 2. Test Stage
|
||||||
|
- **`test:cli`**: Runs pytest on materia CLI with 71% coverage
|
||||||
|
- Tests secrets management (Pulumi ESC integration)
|
||||||
|
- Tests worker lifecycle (create, list, destroy)
|
||||||
|
- Tests pipeline execution (extract, transform)
|
||||||
|
- Exports coverage reports to GitLab
|
||||||
|
- **`test:sqlmesh`**: Runs SQLMesh model tests in transform layer
|
||||||
|
|
||||||
|
#### 3. Build Stage (only on master branch)
|
||||||
|
Creates separate artifacts for each workspace package:
|
||||||
|
- **`build:extract`**: Builds `materia-extract-latest.tar.gz` (psdonline package)
|
||||||
|
- **`build:transform`**: Builds `materia-transform-latest.tar.gz` (sqlmesh_materia package)
|
||||||
|
- **`build:cli`**: Builds `materia-cli-latest.tar.gz` (materia management CLI)
|
||||||
|
|
||||||
|
Each artifact is a self-contained tarball with all dependencies.
|
||||||
|
|
||||||
|
#### 4. Deploy Stage (only on master branch)
|
||||||
|
- **`deploy:r2`**: Uploads artifacts to Cloudflare R2 using rclone
|
||||||
|
- Loads secrets from Pulumi ESC (`beanflows/prod`)
|
||||||
|
- Only requires `PULUMI_ACCESS_TOKEN` in GitLab variables
|
||||||
|
- All other secrets (R2 credentials, SSH keys, API tokens) come from ESC
|
||||||
|
- **`deploy:infra`**: Runs `pulumi up` to deploy infrastructure changes
|
||||||
|
- Only triggers when `infra/**/*` files change
|
||||||
|
|
||||||
|
### Production Architecture: Ephemeral Worker Model
|
||||||
|
|
||||||
|
**Design Philosophy:**
|
||||||
|
- No always-on workers (cost optimization)
|
||||||
|
- Supervisor instance dynamically creates/destroys workers on-demand
|
||||||
|
- Language-agnostic artifacts enable future migration to C/Rust/Go
|
||||||
|
- Multi-cloud abstraction for pricing optimization
|
||||||
|
|
||||||
|
**Components:**
|
||||||
|
|
||||||
|
#### 1. Supervisor Instance (Small Hetzner VM)
|
||||||
|
- Runs the `materia` management CLI
|
||||||
|
- Small, always-on instance (cheap)
|
||||||
|
- Pulls secrets from Pulumi ESC
|
||||||
|
- Orchestrates worker lifecycle via cloud provider APIs
|
||||||
|
|
||||||
|
#### 2. Ephemeral Workers (On-Demand)
|
||||||
|
- Created for each pipeline execution
|
||||||
|
- Downloads pre-built artifacts from R2 (no git, no uv on worker)
|
||||||
|
- Receives secrets via SSH environment variable injection
|
||||||
|
- Destroyed immediately after job completion
|
||||||
|
- Different instance types per pipeline:
|
||||||
|
- Extract: `ccx12` (2 vCPU, 8GB RAM)
|
||||||
|
- Transform: `ccx22` (4 vCPU, 16GB RAM)
|
||||||
|
|
||||||
|
#### 3. Secrets Flow
|
||||||
|
```
|
||||||
|
Pulumi ESC (beanflows/prod)
|
||||||
|
↓
|
||||||
|
Supervisor Instance (materia CLI)
|
||||||
|
↓
|
||||||
|
Workers (injected as env vars via SSH)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4. Artifact Flow
|
||||||
|
```
|
||||||
|
GitLab CI: uv build → tar.gz
|
||||||
|
↓
|
||||||
|
Cloudflare R2 (artifact storage)
|
||||||
|
↓
|
||||||
|
Worker: curl → extract → execute
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 5. Data Storage
|
||||||
|
- **Dev**: Local DuckDB file (`materia_dev.db`)
|
||||||
|
- **Prod**: DuckDB in-memory + Cloudflare R2 Data Catalog (Iceberg REST API)
|
||||||
|
- ACID transactions on object storage
|
||||||
|
- No persistent database on workers
|
||||||
|
|
||||||
|
**Execution Flow:**
|
||||||
|
1. Supervisor receives schedule trigger (cron/manual)
|
||||||
|
2. CLI runs: `materia pipeline run extract`
|
||||||
|
3. Creates Hetzner worker with SSH key
|
||||||
|
4. Worker downloads `materia-extract-latest.tar.gz` from R2
|
||||||
|
5. CLI injects secrets via SSH: `export R2_ACCESS_KEY_ID=... && ./extract_psd`
|
||||||
|
6. Pipeline executes, writes to R2 Iceberg catalog
|
||||||
|
7. Worker destroyed (entire lifecycle ~5-10 minutes)
|
||||||
|
|
||||||
|
**Multi-Cloud Provider Abstraction:**
|
||||||
|
- Protocol-based interface (data-oriented design, no OOP)
|
||||||
|
- Providers: Hetzner (implemented), OVH, Scaleway, Oracle (stubs)
|
||||||
|
- Allows switching providers for cost optimization
|
||||||
|
- Each provider implements: `create_instance`, `destroy_instance`, `list_instances`, `wait_for_ssh`
|
||||||
|
|
||||||
## Key Design Patterns
|
## Key Design Patterns
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user