- Add extract/extract_core/ workspace package with three modules:
- state.py: SQLite run tracking (open_state_db, start_run, end_run, get_last_cursor)
- http.py: niquests session factory + etag normalization helpers
- files.py: landing_path, content_hash, write_bytes_atomic (atomic gzip writes)
- State lives at {LANDING_DIR}/.state.sqlite — no extra env var needed
- SQLite chosen over DuckDB: state tracking is OLTP (row inserts/updates), not analytical
- Refactor all 4 extractors (psdonline, cftc_cot, coffee_prices, ice_stocks):
- Replace inline boilerplate with extract_core helpers
- Add start_run/end_run tracking to every extraction entry point
- extract_cot_year returns int (bytes_written) instead of bool
- Update tests: assert result == 0 (not `is False`) for the return type change
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ICE API at /marketdata/api/reports/293/results stores all historical
daily XLS reports date-descending. Previously the extractor only fetched
the latest. New extract_ice_backfill entry point pages through the API
and downloads all matching 'Daily Warehouse Stocks' reports.
- ice_api.py: add find_all_reports() alongside find_latest_report()
- execute.py: add extract_ice_stocks_backfill(max_pages=3) — default
covers ~6 months; max_pages=20 fetches ~3 years of history
- pyproject.toml: register extract_ice_backfill entry point
Ran backfill: 131 files, 2025-08-15 → 2026-02-20
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ICE changed the daily stocks XLS header from 'As of: 1/30/2026' to
'As of: Feb 20, 2026 1:35:39PM'. Expand _build_canonical_csv_from_xls
to try multiple strptime formats (%m/%d/%Y, %b %d, %Y, etc.) on both
single-token and three-token date candidates.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace brittle ICE_STOCKS_URL env var with API-based URL discovery via
the private ICE Report Center JSON API (no auth required)
- Add rolling CSV → XLS fallback in extract_ice_stocks() using
find_latest_report() from ice_api.py
- Add ice_api.py: fetch_report_listings(), find_latest_report() with
pagination up to MAX_API_PAGES
- Add xls_parse.py: detect_file_format() (magic bytes), xls_to_rows()
using xlrd for OLE2/BIFF XLS files
- Add extract_ice_aging(): monthly certified stock aging report by
age bucket × port → ice_aging/ landing dir
- Add extract_ice_historical(): 30-year EOM by-port stocks from static
ICE URL → ice_stocks_by_port/ landing dir
- Add xlrd>=2.0.1 (parse XLS), xlwt>=1.3.0 (dev, test fixtures)
- Add SQLMesh raw + foundation models for both new datasets
- Add ice_aging_glob(), ice_stocks_by_port_glob() macros
- Add extract_ice_aging + extract_ice_historical pipeline entries
- Add 12 unit tests (format detection, XLS roundtrip, API mock, CSV output)
Seed files (data/landing/ice_aging/seed/ and ice_stocks_by_port/seed/)
must be created locally — data/ is gitignored.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>