Merge worktree-cot-integration: Phase 1 + scout MCP server

- Phase 1A-C: KC=F price extraction, SQLMesh models, dashboard charts, API endpoints
- ICE warehouse stocks: extraction package, SQLMesh models, dashboard + API
- Methodology page (/methodology) with all data sources documented
- Supervisor pipeline automation with webhook alerting
- Scout MCP server (tools/scout/) for browser recon via Pydoll
- msgspec added as workspace dependency for typed boundary structs
- vision.md updated to reflect Phase 1 completion (Feb 2026)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-21 15:57:49 +01:00
34 changed files with 3316 additions and 53 deletions

View File

@@ -91,61 +91,47 @@ We move fast, ship incrementally, and prioritize value over vanity metrics.
## Current State (February 2026)
### What's Working
- USDA PSD Online extraction (2006-present, monthly archives)
- 4-layer SQLMesh pipeline (raw → staging → cleaned → serving)
- DuckDB backend (local dev + production lakehouse)
- Incremental-by-time-range models with deduplication
- Development environment with pre-commit hooks, linting, formatting
- **Web app (BeanFlows.coffee)** — Quart + HTMX, deployed via Docker
- Magic-link auth + signup with waitlist flow
- Coffee analytics dashboard: time series, top producers, stock-to-use trend, supply/demand balance, YoY change
- Country comparison view
- User settings + account management
- API key management (create, revoke, prefix display)
- Plan-based access control (free / starter / pro) with 5-year history cap on free tier
- Billing via Paddle (subscriptions + webhooks)
- Admin panel (users, waitlist, feedback, tasks)
- REST API with Bearer token auth, rate limiting (1000 req/hr), CSV export
- Feedback + waitlist capture
- GitLab CI pipeline (lint, test, build), regression tests for billing/auth/API
### What's Shipped
- USDA PSD Online extraction + full SQLMesh pipeline (raw→staging→cleaned→serving)
- CFTC COT disaggregated futures: weekly positioning, COT index, managed money net
- KC=F Coffee C futures prices: daily OHLCV, 20d/50d SMA, 52-week range (1971present)
- ICE certified warehouse stocks: extractor ready, awaiting URL confirmation
- Web app (Quart + HTMX): dashboard with supply/demand + COT + price + ICE charts
- REST API with key auth + rate limiting: /metrics, /positioning, /prices, /stocks
- Paddle billing (Starter/Pro plans), magic-link auth, admin panel
- /methodology page with full data source documentation
- Automated supervisor: all extractors + webhook alerting on failure
- 23 passing tests, GitLab CI pipeline
### What We Have
- Comprehensive commodity supply/demand data (USDA PSD, 2006present)
- Established naming conventions and data quality patterns
- Full product pipeline: data → DB → API → web dashboard
- Paddle billing integration (Starter + Pro tiers)
- Working waitlist to capture early interest
### What's Missing
- ICE stocks URL confirmed and backfill running (URL needs manual discovery at theice.com/report-center)
- Python SDK
- Public API documentation
## Roadmap
### Phase 1: Coffee Market Foundation (In Progress → ~70% done)
### Phase 1: Coffee Market Foundation (COMPLETE — ready for outreach)
**Goal:** Build complete coffee analytics from supply to price
**Data Sources to Integrate:**
**Data Sources:**
- ✅ USDA PSD Online (production, stocks, consumption)
- CFTC COT data (trader positioning — weekly, Coffee C futures code 083731)
- Coffee futures prices — KC=F via Yahoo Finance / yfinance, or Databento for tick-level
- ICO (International Coffee Organization) data — trade volumes, consumption stats
- ⬜ ICE certified warehouse stocks (daily CSV from ICE Report Center — free)
- ⬜ Weather data for growing regions — ECMWF/Open-Meteo (free), Brazil frost alerts
- CFTC COT data (trader positioning, COT index)
- ✅ KC=F Coffee futures prices (daily OHLCV, moving averages)
- ICE warehouse stocks (extractor built, seed models deployed)
- ⬜ ICO (International Coffee Organization) — future
**Features to Build:**
-Web dashboard (supply/demand, stock-to-use trend, YoY, country comparison)
- ✅ REST API with key auth, plan-based access, rate limiting
-CSV export
- ⬜ CFTC COT integration → trader sentiment indicators
-Historical price data → price/supply correlation analysis
-Python SDK (`pip install beanflows`) — critical for the quant analyst beachhead
- ⬜ Data methodology documentation page — P0 for trust (see strategy doc)
- ⬜ Parquet export endpoint
- ⬜ Example Jupyter notebooks (show how to pipe data into common models)
**Features:**
-Dashboard: supply/demand + COT + price + ICE warehouse charts
- ✅ REST API: all 4 data sources
-Data methodology page
- ✅ Automated daily pipeline with alerting
-Python SDK
-Historical correlation analysis
**Infrastructure:**
- ⬜ Cloudflare R2 for raw data storage (rclone sync is partly planned)
-Automated daily pipeline on Hetzner (SQLMesh prod + cron)
-Pipeline monitoring + alerting (failure notifications)
- ⬜ Published SLA for data freshness
- ✅ Supervisor loop with all extractors
-Move to Cloudflare R2 for raw data backup
-Deploy to Hetzner production
### Phase 2: Product Market Fit
**Goal:** Validate with real traders, iterate on feedback