Commit Graph

24 Commits

Author SHA1 Message Date
Deeman
67c048485b Add Phase 1A-C + ICE warehouse stocks: prices, methodology, pipeline automation
Phase 1A — KC=F Coffee Futures Prices:
- New extract/coffee_prices/ package (yfinance): downloads KC=F daily OHLCV,
  stores as gzip CSV with SHA256-based idempotency
- SQLMesh models: raw/coffee_prices → foundation/fct_coffee_prices →
  serving/coffee_prices (with 20d/50d SMA, 52-week high/low, daily return %)
- Dashboard: 4 metric cards + dual-line chart (close, 20d MA, 50d MA)
- API: GET /commodities/<ticker>/prices

Phase 1B — Data Methodology Page:
- New /methodology route with full-page template (base.html)
- 6 anchored sections: USDA PSD, CFTC COT, KC=F price, ICE warehouse stocks,
  data quality model, update schedule table
- "Methodology" link added to marketing footer

Phase 1C — Automated Pipeline:
- supervisor.sh updated: runs extract_cot, extract_prices, extract_ice in
  sequence before transform
- Webhook failure alerting via ALERT_WEBHOOK_URL env var (ntfy/Slack/Telegram)

ICE Warehouse Stocks:
- New extract/ice_stocks/ package (niquests): normalizes ICE Report Center CSV
  to canonical schema, hash-based idempotency, soft-fail on 404 with guidance
- SQLMesh models: raw/ice_warehouse_stocks → foundation/fct_ice_warehouse_stocks
  → serving/ice_warehouse_stocks (30d avg, WoW change, 52w drawdown)
- Dashboard: 4 metric cards + line chart (certified bags + 30d avg)
- API: GET /commodities/<code>/stocks

Foundation:
- dim_commodity: added ticker (KC=F) and ice_stock_report_code (COFFEE-C) columns
- macros/__init__.py: added prices_glob() and ice_stocks_glob()
- pipelines.py: added extract_prices and extract_ice entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 11:41:43 +01:00
Deeman
2962bf5e3b Fix COT pipeline: TRY_CAST nulls, dim_commodity leading zeros, correct CFTC codes
- config.yaml: remove ambiguousorinvalidcolumn linter rule (false positives on read_csv TVFs)
- fct_cot_positioning: use TRY_CAST throughout — CFTC uses '.' as null in many columns
- raw/cot_disaggregated: add columns() declaration for 33 varchar cols
- dim_commodity: switch from SEED to FULL model with SQL VALUES to preserve leading zeros
  Pandas auto-converts '083' → 83 even with varchar column declarations in SEED models
- seeds/dim_commodity.csv: correct cftc_commodity_code from '083731' (contract market code)
  to '083' (3-digit CFTC commodity code); add CSV quoting
- test_cot_foundation.yaml: fix output key name, vars for time range, partial: true,
  and correct cftc_commodity_code to '083'
- analytics.py: COFFEE_CFTC_CODE '083731' → '083' to match actual data

Result: serving.cot_positioning has 685 rows (2013-01-08 to 2026-02-17), 23/23 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 23:28:10 +01:00
Deeman
0a83b2cb74 Add CFTC COT data integration with foundation data model layer
- New extraction package (cftc_cot): downloads yearly Disaggregated Futures ZIPs
  from CFTC, etag-based dedup, dynamic inner filename discovery, gzip normalization
- SQLMesh 3-layer architecture: raw (technical) → foundation (business model) → serving (mart)
- dim_commodity seed: conformed dimension mapping USDA ↔ CFTC codes — the commodity ontology
- fct_cot_positioning: typed, deduplicated weekly positioning facts for all commodities
- obt_cot_positioning: Coffee C mart with COT Index (26w/52w), WoW delta, OI ratios
- Analytics functions + REST API endpoints: /commodities/<code>/positioning[/latest]
- Dashboard widget: Managed Money net, COT Index card, dual-axis Chart.js chart
- 23 passing tests (10 unit + 2 SQLMesh model + existing regression suite)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 23:28:10 +01:00
Deeman
423fb8c619 Fix extract and SQLMesh pipeline to build DuckDB lakehouse
extract: wrap response.content in BytesIO before passing to
normalize_zipped_csv, and call .read() on the returned BytesIO before
write_bytes (two bugs: wrong type in, wrong type out)

sqlmesh: {{ var() }} inside SQL string literals is not substituted by
SQLMesh's Jinja (SQL parser treats them as opaque strings). Replace with
a @psd_glob() macro that evaluates LANDING_DIR at render time and returns
a quoted glob path string.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 17:02:59 +01:00
Deeman
c1d00dcdc4 Refactor to local-first architecture on Hetzner NVMe
Remove distributed R2/Iceberg/SSH pipeline architecture in favor of
local subprocess execution with NVMe storage. Landing data backed up
to R2 via rclone timer.

- Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow
- Pipelines run via subprocess.run() with bounded timeouts
- Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip
- SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH
- Delete unused provider stubs (ovh, scaleway, oracle)
- Add rclone systemd timer for R2 backup every 6h
- Update supervisor to run pipelines with env vars

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 19:50:19 +01:00
Deeman
2748c606e9 Add BeanFlows MVP: coffee analytics dashboard, API, and web app
- Fix pipeline granularity: add market_year to cleaned/serving SQL models
- Add DuckDB data access layer with async query functions (analytics.py)
- Build Chart.js dashboard: supply/demand, STU ratio, top producers, YoY table
- Add country comparison page with multi-select picker
- Replace items CRUD with read-only commodity API (list, metrics, countries, CSV)
- Configure BeanFlows plan tiers (Free/Starter/Pro) with feature gating
- Rewrite public pages for coffee market intelligence positioning
- Remove boilerplate items schema, update health check for DuckDB
- Add test suite: 139 tests passing (dashboard, API, billing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 16:11:50 +01:00
Deeman
6d4377ccf9 cleanup and prefect service setup 2026-02-04 22:24:55 +01:00
Deeman
38897617e7 Refactor PSD extraction: simplify to latest-only + add R2 support
## Key Changes

1. **Simplified extraction logic**
   - Changed from downloading 220+ historical archives to checking only latest available month
   - Tries current month and falls back up to 3 months (handles USDA publication lag)
   - Architecture advisor insight: ETags naturally deduplicate, historical year/month structure was unnecessary

2. **Flat storage structure**
   - Old: `data/{year}/{month}/{etag}.zip`
   - New: `data/{etag}.zip` (local) or `psd/{etag}.zip` (R2)
   - Migrated 226 existing files to flat structure

3. **Dual storage modes**
   - **Local mode**: Downloads to local directory (development)
   - **R2 mode**: Uploads to Cloudflare R2 (production)
   - Mode determined by presence of R2 environment variables
   - Added boto3 dependency for S3-compatible R2 API

4. **Updated raw SQLMesh model**
   - Changed pattern from `**/*.zip` to `*.zip` to match flat structure

## Benefits

- Simpler: Single file check instead of 220+ URL attempts
- Efficient: ETag-based deduplication works naturally
- Flexible: Supports both local dev and production R2 storage
- Maintainable: Removed unnecessary complexity

## Testing

-  Local extraction works and respects ETags
-  Falls back correctly when current month unavailable
-  Linting passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-20 22:02:15 +02:00
Deeman
025dda16c6 update dedupe logic -> much faster now 2025-10-07 22:32:45 +02:00
Deeman
da89c2bf6e update staging pipeline 2025-10-07 22:20:48 +02:00
Deeman
0a409acbea update path 2025-09-10 18:56:32 +02:00
Deeman
85704a4bf1 Change layer naming 2025-09-10 18:46:18 +02:00
Deeman
f5f2dbc7a5 refactor 2025-08-25 20:50:25 +02:00
Simon Dmsn
5588be152b Update 3 files
- /notebooks/03_Extraction.ipynb
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql
2025-08-01 14:52:55 +00:00
Simon Dmsn
1c87488cc7 Update 4 files
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_1_filter_silver_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_2_filter_gold_layer.sql
- /transform/sqlmesh_materia/models/staging/stg_psd_alldata_0.sql
2025-08-01 14:45:34 +00:00
Simon Dmsn
4ad4386ccc Update 2 files
- /transform/sqlmesh_materia/models/staging/Commodity Exchange Codes.xls
- /transform/sqlmesh_materia/seeds/commodity_exchange_codes.csv
2025-08-01 14:24:26 +00:00
Simon Dmsn
918b0071b1 Update file Commodity Exchange Codes.xls 2025-08-01 14:22:01 +00:00
Deeman
91f8968990 remove comment 2025-07-31 19:48:18 +02:00
Deeman
641f794d61 fix seeds; update models 2025-07-27 22:49:37 +02:00
Deeman
c0d8f60d1c add reference data 2025-07-27 18:28:30 +02:00
Deeman
8b5d05b3c2 raw ingest model 2025-07-27 15:40:41 +02:00
Deeman
f5c73e32c5 testing sqlmesh 2025-07-27 00:18:14 +02:00
Deeman
9baa0d185c testing sqlmesh 2025-07-27 00:18:03 +02:00
Deeman
f0de8a505b update projects to packages 2025-07-26 22:32:47 +02:00