Commit Graph

26 Commits

Author SHA1 Message Date
Deeman
38897617e7 Refactor PSD extraction: simplify to latest-only + add R2 support
## Key Changes

1. **Simplified extraction logic**
   - Changed from downloading 220+ historical archives to checking only latest available month
   - Tries current month and falls back up to 3 months (handles USDA publication lag)
   - Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary

2. **Flat storage structure**
   - Old: `data/{year}/{month}/{etag}.zip`
   - New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2)
   - Migrated 226 existing files to flat structure

3. **Dual storage modes**
   - **Local mode**: Downloads to local directory (development)
   - **R2 mode**: Uploads to Cloudflare R2 (production)
   - Mode determined by presence of R2 environment variables
   - Added boto3 dependency for S3-compatible R2 API

4. **Updated raw SQLMesh model**
   - Changed pattern from `**/*.zip` to `*.zip` to match flat structure

## Benefits

- Simpler: Single file check instead of 220+ URL attempts
- Efficient: ETag-based deduplication works naturally
- Flexible: Supports both local dev and production R2 storage
- Maintainable: Removed unnecessary complexity

## Testing

-  Local extraction works and respects ETags
-  Falls back correctly when current month unavailable
-  Linting passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 22:02:15 +02:00
Deeman
2d248a2eef Fix SQLMesh config to use correct Pulumi ESC env var names
- Update secret token: CLOUDFLARE_API_TOKEN → R2_ADMIN_API_TOKEN
- Update warehouse name: R2_WAREHOUSE_NAME → ICEBERG_WAREHOUSE_NAME
- Update endpoint: ICEBERG_REST_URI → ICEBERG_CATALOG_URI

- Remove CREATE SCHEMA and USE statements
  - DuckDB has bug with Iceberg REST: missing Content-Type header
  - Schema creation via SQL currently not supported
  - Models will use fully-qualified table names instead

Successfully tested with real R2 credentials:
- Iceberg catalog attachment works ✓
- Plan dry-run executes ✓
- Only fails on missing source data (expected) ✓

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 22:21:27 +02:00
Deeman
05ef15bfdf Configure Iceberg catalog with proper secret reference
- Add catalog ATTACH statement in before_all with SECRET parameter
  - References r2_secret created by connection configuration
  - Uses proper DuckDB ATTACH syntax per Cloudflare docs
  - Single-line format to avoid Jinja parsing issues

- Remove manual CREATE SECRET from before_all hooks
  - Secret automatically created by SQLMesh from connection config
  - Cleaner separation: connection defines credentials, hooks use them

Successfully tested - config validates without warnings.
Only fails on missing env vars (expected locally).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 22:10:51 +02:00
Deeman
2ad344abf4 Refactor SQLMesh config to use connection-level secrets
- Move Iceberg secret from before_all hook to connection.secrets
  - Fixes SQLMesh warning about unsupported @env_var syntax
  - Uses Jinja templating {{ env_var() }} instead of @env_var()

- Remove database: ':memory:' (incompatible with catalogs)
  - DuckDB doesn't allow both database and catalogs config
  - Connection defaults to in-memory when no database specified

- Simplify before_all hooks to only handle ATTACH and schema setup
  - Secret is now created automatically by SQLMesh
  - Cleaner separation: connection config vs runtime setup

Based on:
- https://developers.cloudflare.com/r2/data-catalog/config-examples/duckdb/
- https://sqlmesh.readthedocs.io/en/latest/integrations/engines/duckdb/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 22:04:25 +02:00
Deeman
120fef369a Fix SQLMesh config and CI/CD deployment issues
- Fix SQLMesh config: Add semicolons to SQL statements in before_all hooks
  - Resolves "unsupported syntax" warning for CREATE SECRET and ATTACH
  - DuckDB requires semicolons to terminate statements properly

- Fix deploy:infra job: Update Pulumi authentication
  - Remove `pulumi login --token` (not supported in Docker image)
  - Use PULUMI_ACCESS_TOKEN environment variable directly
  - Chain commands with && to avoid "unknown command 'sh'" error

- Fix deploy:supervisor job: Update esc login syntax
  - Change `esc login --token` to `esc login` (--token flag doesn't exist)
  - esc CLI reads token from PULUMI_ACCESS_TOKEN env var
  - Simplify Pulumi CLI installation (remove apk fallback logic)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 21:58:43 +02:00
Deeman
d2352c1876 Simplify SQLMesh to use single prod gateway with virtual environments
- Remove dev gateway (local DuckDB file no longer needed)
- Single prod gateway connects to R2 Iceberg catalog
- Use virtual environments for dev isolation (e.g., dev_<username>)
- Update CLAUDE.md with new workflow and environment strategy
- Create comprehensive transform/sqlmesh_materia/README.md

Benefits:
- Simpler configuration (one gateway instead of two)
- All environments use same R2 Iceberg catalog
- SQLMesh handles environment isolation automatically
- No need to maintain local 13GB materia_dev.db file
- before_all hooks only run for prod gateway (no conditional logic needed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 21:47:04 +02:00
Deeman
6536724e00 Fix SQLMesh config: remove invalid init_script parameter
- Remove init_script from DuckDB connection config (not a valid parameter)
- Move R2 Iceberg catalog initialization to before_all hooks
- Hooks run before sqlmesh plan/run commands
- Uses SQLMesh @env_var() macro syntax for environment variables

Fixes CI/CD error: 'invalid duckdb connection config: invalid field init_script'

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-13 21:31:56 +02:00
Deeman
55bb84f0fa implement cli/infra update cicd 2025-10-12 21:00:41 +02:00
Deeman
025dda16c6 update dedupe logic -> much faster now 2025-10-07 22:32:45 +02:00
Deeman
da89c2bf6e update staging pipeline 2025-10-07 22:20:48 +02:00
Deeman
0a409acbea update path 2025-09-10 18:56:32 +02:00
Deeman
85704a4bf1 Change layer naming 2025-09-10 18:46:18 +02:00
Deeman
f5f2dbc7a5 refactor 2025-08-25 20:50:25 +02:00
Simon Dmsn
5588be152b Update 3 files
- /notebooks/03_Extraction.ipynb
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql
2025-08-01 14:52:55 +00:00
Simon Dmsn
1c87488cc7 Update 4 files
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_0.sql
2025-08-01 14:45:34 +00:00
Simon Dmsn
82b27e7c55 Update 2 files
- /transform/sqlmesh_materia/seeds/commodity_exchange_codes.csv
- /transform/sqlmesh_materia/seeds/psd_codes_exchange_codes_merge.csv
2025-08-01 14:41:48 +00:00
Simon Dmsn
9d7cc4e1fb Update file commodity_exchange_codes.csv 2025-08-01 14:26:19 +00:00
Simon Dmsn
4ad4386ccc Update 2 files
- /transform/sqlmesh_materia/models/staging/Commodity Exchange Codes.xls
- /transform/sqlmesh_materia/seeds/commodity_exchange_codes.csv
2025-08-01 14:24:26 +00:00
Simon Dmsn
918b0071b1 Update file Commodity Exchange Codes.xls 2025-08-01 14:22:01 +00:00
Deeman
91f8968990 remove comment 2025-07-31 19:48:18 +02:00
Deeman
641f794d61 fix seeds; update models 2025-07-27 22:49:37 +02:00
Deeman
c0d8f60d1c add reference data 2025-07-27 18:28:30 +02:00
Deeman
8b5d05b3c2 raw ingest model 2025-07-27 15:40:41 +02:00
Deeman
f5c73e32c5 testing sqlmesh 2025-07-27 00:18:14 +02:00
Deeman
9baa0d185c testing sqlmesh 2025-07-27 00:18:03 +02:00
Deeman
f0de8a505b update projects to packages 2025-07-26 22:32:47 +02:00