- Replace brittle ICE_STOCKS_URL env var with API-based URL discovery via the private ICE Report Center JSON API (no auth required) - Add rolling CSV → XLS fallback in extract_ice_stocks() using find_latest_report() from ice_api.py - Add ice_api.py: fetch_report_listings(), find_latest_report() with pagination up to MAX_API_PAGES - Add xls_parse.py: detect_file_format() (magic bytes), xls_to_rows() using xlrd for OLE2/BIFF XLS files - Add extract_ice_aging(): monthly certified stock aging report by age bucket × port → ice_aging/ landing dir - Add extract_ice_historical(): 30-year EOM by-port stocks from static ICE URL → ice_stocks_by_port/ landing dir - Add xlrd>=2.0.1 (parse XLS), xlwt>=1.3.0 (dev, test fixtures) - Add SQLMesh raw + foundation models for both new datasets - Add ice_aging_glob(), ice_stocks_by_port_glob() macros - Add extract_ice_aging + extract_ice_historical pipeline entries - Add 12 unit tests (format detection, XLS roundtrip, API mock, CSV output) Seed files (data/landing/ice_aging/seed/ and ice_stocks_by_port/seed/) must be created locally — data/ is gitignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Materia SQLMesh Transform Layer
Data transformation pipeline using SQLMesh and DuckDB, implementing a 4-layer architecture.
Quick Start
cd transform/sqlmesh_materia
# Local development (virtual environment)
sqlmesh plan dev_<username>
# Production
sqlmesh plan prod
# Run tests
sqlmesh test
# Format SQL
sqlmesh format
Architecture
Gateway Configuration
Single Gateway: All environments connect to Cloudflare R2 Data Catalog (Apache Iceberg)
- Production:
sqlmesh plan prod - Development:
sqlmesh plan dev_<username>(isolated virtual environment)
SQLMesh manages environment isolation automatically - no need for separate local databases.
4-Layer Data Model
See models/README.md for detailed architecture documentation:
- Raw - Immutable source data
- Staging - Schema, types, basic cleansing
- Cleaned - Business logic, integration
- Serving - Analytics-ready (facts, dimensions, aggregates)
Configuration
Config: config.yaml
- DuckDB in-memory with R2 Iceberg catalog
- Extensions: httpfs, iceberg
- Auto-apply enabled (no prompts)
- Initialization hooks for R2 secret/catalog attachment
Commands
# Plan changes for dev environment
sqlmesh plan dev_yourname
# Plan changes for prod
sqlmesh plan prod
# Run tests
sqlmesh test
# Validate models
sqlmesh validate
# Run audits
sqlmesh audit
# Format SQL files
sqlmesh format
# Start web UI
sqlmesh ui
Environment Variables (Prod)
Required for production R2 Iceberg catalog:
CLOUDFLARE_API_TOKEN- R2 API tokenICEBERG_REST_URI- R2 catalog REST endpointR2_WAREHOUSE_NAME- Warehouse name (default: "materia")
These are injected via Pulumi ESC (beanflows/prod) on the supervisor instance.
Development Workflow
- Make changes to models in
models/ - Test locally:
sqlmesh test - Plan changes:
sqlmesh plan dev_yourname - Review and apply changes
- Commit and push to trigger CI/CD
SQLMesh will handle environment isolation, table versioning, and incremental updates automatically.