beanflows

Author	SHA1	Message	Date
Deeman	80c1163a7f	feat: extraction framework overhaul — extract_core shared package + SQLite state tracking - Add extract/extract_core/ workspace package with three modules: - state.py: SQLite run tracking (open_state_db, start_run, end_run, get_last_cursor) - http.py: niquests session factory + etag normalization helpers - files.py: landing_path, content_hash, write_bytes_atomic (atomic gzip writes) - State lives at {LANDING_DIR}/.state.sqlite — no extra env var needed - SQLite chosen over DuckDB: state tracking is OLTP (row inserts/updates), not analytical - Refactor all 4 extractors (psdonline, cftc_cot, coffee_prices, ice_stocks): - Replace inline boilerplate with extract_core helpers - Add start_run/end_run tracking to every extraction entry point - extract_cot_year returns int (bytes_written) instead of bool - Update tests: assert result == 0 (not `is False`) for the return type change Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-22 14:37:50 +01:00
Deeman	ff7301d6a8	ICE extraction overhaul: API discovery + aging report + historical backfill - Replace brittle ICE_STOCKS_URL env var with API-based URL discovery via the private ICE Report Center JSON API (no auth required) - Add rolling CSV → XLS fallback in extract_ice_stocks() using find_latest_report() from ice_api.py - Add ice_api.py: fetch_report_listings(), find_latest_report() with pagination up to MAX_API_PAGES - Add xls_parse.py: detect_file_format() (magic bytes), xls_to_rows() using xlrd for OLE2/BIFF XLS files - Add extract_ice_aging(): monthly certified stock aging report by age bucket × port → ice_aging/ landing dir - Add extract_ice_historical(): 30-year EOM by-port stocks from static ICE URL → ice_stocks_by_port/ landing dir - Add xlrd>=2.0.1 (parse XLS), xlwt>=1.3.0 (dev, test fixtures) - Add SQLMesh raw + foundation models for both new datasets - Add ice_aging_glob(), ice_stocks_by_port_glob() macros - Add extract_ice_aging + extract_ice_historical pipeline entries - Add 12 unit tests (format detection, XLS roundtrip, API mock, CSV output) Seed files (data/landing/ice_aging/seed/ and ice_stocks_by_port/seed/) must be created locally — data/ is gitignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 21:13:18 +01:00
Deeman	b167a0a9f4	Add scout MCP server for browser recon + msgspec workspace dep - tools/scout/: browser automation MCP server using Pydoll (CDP, no WebDriver) - scout_visit, scout_elements (text-first), scout_click, scout_fill, scout_select - scout_scroll, scout_text, scout_screenshot (opt-in) - scout_har_start / scout_har_stop (asyncio task holds recording context open) - scout_analyze: HAR parsing with HarEntry/HarSummary msgspec structs - Standalone project (not workspace member — websockets conflict with prefect) - Runs via: uv run --directory tools/scout scout-server - .mcp.json: registers scout as Claude Code MCP server (project scope) - msgspec>=0.19 added to root project deps (workspace-wide struct/validation) - coding_philosophy.md: document msgspec as approved dep, usage rules Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 15:44:02 +01:00
Deeman	67c048485b	Add Phase 1A-C + ICE warehouse stocks: prices, methodology, pipeline automation Phase 1A — KC=F Coffee Futures Prices: - New extract/coffee_prices/ package (yfinance): downloads KC=F daily OHLCV, stores as gzip CSV with SHA256-based idempotency - SQLMesh models: raw/coffee_prices → foundation/fct_coffee_prices → serving/coffee_prices (with 20d/50d SMA, 52-week high/low, daily return %) - Dashboard: 4 metric cards + dual-line chart (close, 20d MA, 50d MA) - API: GET /commodities/<ticker>/prices Phase 1B — Data Methodology Page: - New /methodology route with full-page template (base.html) - 6 anchored sections: USDA PSD, CFTC COT, KC=F price, ICE warehouse stocks, data quality model, update schedule table - "Methodology" link added to marketing footer Phase 1C — Automated Pipeline: - supervisor.sh updated: runs extract_cot, extract_prices, extract_ice in sequence before transform - Webhook failure alerting via ALERT_WEBHOOK_URL env var (ntfy/Slack/Telegram) ICE Warehouse Stocks: - New extract/ice_stocks/ package (niquests): normalizes ICE Report Center CSV to canonical schema, hash-based idempotency, soft-fail on 404 with guidance - SQLMesh models: raw/ice_warehouse_stocks → foundation/fct_ice_warehouse_stocks → serving/ice_warehouse_stocks (30d avg, WoW change, 52w drawdown) - Dashboard: 4 metric cards + line chart (certified bags + 30d avg) - API: GET /commodities/<code>/stocks Foundation: - dim_commodity: added ticker (KC=F) and ice_stock_report_code (COFFEE-C) columns - macros/__init__.py: added prices_glob() and ice_stocks_glob() - pipelines.py: added extract_prices and extract_ice entries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 11:41:43 +01:00
Deeman	0a83b2cb74	Add CFTC COT data integration with foundation data model layer - New extraction package (cftc_cot): downloads yearly Disaggregated Futures ZIPs from CFTC, etag-based dedup, dynamic inner filename discovery, gzip normalization - SQLMesh 3-layer architecture: raw (technical) → foundation (business model) → serving (mart) - dim_commodity seed: conformed dimension mapping USDA ↔ CFTC codes — the commodity ontology - fct_cot_positioning: typed, deduplicated weekly positioning facts for all commodities - obt_cot_positioning: Coffee C mart with COT Index (26w/52w), WoW delta, OI ratios - Analytics functions + REST API endpoints: /commodities/<code>/positioning[/latest] - Dashboard widget: Managed Money net, COT Index card, dual-axis Chart.js chart - 23 passing tests (10 unit + 2 SQLMesh model + existing regression suite) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 23:28:10 +01:00
Deeman	d6d2aa8efe	Merge remote-tracking branch 'origin/master' # Conflicts: # infra/readme.md	2026-02-18 21:09:24 +01:00
Deeman	c1d00dcdc4	Refactor to local-first architecture on Hetzner NVMe Remove distributed R2/Iceberg/SSH pipeline architecture in favor of local subprocess execution with NVMe storage. Landing data backed up to R2 via rclone timer. - Strip Iceberg catalog, httpfs, boto3, paramiko, prefect, pyarrow - Pipelines run via subprocess.run() with bounded timeouts - Extract writes to {LANDING_DIR}/psd/{year}/{month}/{etag}.csv.gzip - SQLMesh reads LANDING_DIR variable, writes to DUCKDB_PATH - Delete unused provider stubs (ovh, scaleway, oracle) - Add rclone systemd timer for R2 backup every 6h - Update supervisor to run pipelines with env vars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-18 19:50:19 +01:00
Deeman	09ae88be19	cleanup and prefect service setup	2026-02-05 20:01:50 +01:00
Deeman	6d4377ccf9	cleanup and prefect service setup	2026-02-04 22:24:55 +01:00
Hendrik Dreesmann	b702e6565a	Update SQLMesh for R2 data access & Convert psd data to gzip	2025-11-02 00:26:01 +01:00
Deeman	6c93021f2d	remove stupid rules	2025-10-12 21:44:56 +02:00
Deeman	7e06eae5ac	Add comprehensive ruff linting rules and migrate to uv build backend - Configure ruff with strict linting rules (pycodestyle, pyflakes, isort, pylint, etc.) - Exclude notebooks folder from linting - Set line length to 88 characters and target Python 3.13 - Migrate build backend from hatchling to uv_build for better integration - Add per-file ignores for __init__.py and scripts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-12 21:41:39 +02:00
Deeman	5ce112f44d	Add comprehensive E2E tests for materia CLI - Add pytest and pytest-cov for testing - Add niquests for modern HTTP/2 support (keep requests for hcloud compatibility) - Create 13 E2E tests covering CLI, workers, pipelines, and secrets (71% coverage) - Fix Pulumi ESC environment path (beanflows/prod) and secret key names - Update GitLab CI to run CLI tests with coverage reporting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-12 21:32:51 +02:00
Deeman	55bb84f0fa	implement cli/infra update cicd	2025-10-12 21:00:41 +02:00
Deeman	77dd277ebf	updates	2025-10-12 14:26:37 +02:00
Deeman	9baa0d185c	testing sqlmesh	2025-07-27 00:18:03 +02:00
Deeman	bd65ddcac8	adding incremental load abilities	2025-07-26 21:10:02 +02:00
Deeman	b8ad73202c	finish historical extraction	2025-07-13 23:20:50 +02:00
Deeman	c3c281fcd8	update structure	2025-07-08 22:41:59 +02:00
Deeman	5368c1e521	changes '	2025-06-11 11:49:20 +02:00
Deeman	265250864c	add dlt script to extract data from fas.usda.gov	2025-04-30 22:35:31 +02:00
Simon Dmsn	bff78f1e72	Psd	2025-04-30 18:56:39 +02:00
Deeman	2839757cf8	add cicd and precommit	2025-03-01 18:23:56 +01:00
Deeman	2a4e7fe668	update	2025-03-01 18:15:34 +01:00
Deeman	32b7df714e	add yfinance and more readme	2025-03-01 18:13:38 +01:00
Deeman	96a6abf1e0	Initial commit	2025-03-01 18:11:57 +01:00

26 Commits