Replace monolithic Overview (8 charts, 24 metric cards, no filters) with
a JTBD-driven 5-page dashboard optimised for the data-drop moment.
Navigation (sidebar + mobile nav):
- Pulse /dashboard/ — full-picture overview, 10-second read
- Supply /dashboard/supply — USDA WASDE deep dive, range + metric filters
- Positioning /dashboard/positioning — KC=F price + CFTC COT, range filter
- Warehouse /dashboard/warehouse — ICE certified stocks, range + view filters
- Origins /dashboard/countries — unchanged (HTMX already live)
- Settings — unchanged
New templates:
- pulse.html: 4 metric cards + freshness bar + 2×2 sparkline grid
- supply.html + supply_canvas.html: HTMX partial with 5Y/10Y/Max and
Production/Exports/Imports/Stocks filter pills; free plan gated at 5Y
- positioning.html + positioning_canvas.html: price chart + COT dual-axis;
client-side MA toggles (no server round-trip)
- warehouse.html + warehouse_canvas.html: Daily Stocks / Aging / By Port
view switcher; only active view's queries fire
routes.py:
- RANGE_MAP dict maps URL param → {days, weeks, months, years}
- _safe() helper absorbs asyncio.gather exceptions with defaults
- index() rewritten: 8 lightweight queries, renders pulse.html
- supply(), positioning(), warehouse() routes added; HX-Request detection
returns canvas partial; full request returns page shell
input.css:
- All cc-* component classes moved from countries.html inline style to
global stylesheet (cc-chart-card, cc-trow 3-col grid, cc-empty, etc.)
- filter-bar, filter-pills, filter-pill, canvas-loading, freshness-badge
- cc-chart-body canvas max-height 340px (prevents gigantic charts on 4K)
_feedback_widget.html:
- Mobile: collapses to circular icon button at bottom:72px to clear 5-item
nav bar; "Feedback" label hidden on mobile
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
281 lines
10 KiB
Markdown
281 lines
10 KiB
Markdown
# VISION.md
|
||
|
||
## Mission
|
||
|
||
Build the fastest, most accurate, and most affordable commodity analytics platform for independent traders and small firms—without enterprise sales bullshit.
|
||
|
||
## Product: BeanFlows.coffee
|
||
|
||
**Tagline:** Real-time commodity intelligence for traders who think for themselves.
|
||
|
||
**Beachhead Market:** Coffee commodities
|
||
**Long-term Vision:** Expand to all major commodity markets (~35-40 global contracts)
|
||
|
||
## Why We Exist
|
||
|
||
Platforms like Kpler dominate the commodity analytics space but are:
|
||
- Slow and complex
|
||
- Prohibitively expensive
|
||
- Designed for enterprise buyers with bloated sales processes
|
||
- Built on legacy infrastructure that prioritizes features over performance
|
||
|
||
We're building the anti-Kpler: **better, faster, cheaper**.
|
||
|
||
## Who We Are
|
||
|
||
A two-person indie hacker startup:
|
||
- **Data Engineer:** Building the platform
|
||
- **Commodity Trader:** Domain expertise and product direction
|
||
|
||
We move fast, ship incrementally, and prioritize value over vanity metrics.
|
||
|
||
## Technical Philosophy
|
||
|
||
### Core Principles
|
||
|
||
1. **Simplicity over complexity**
|
||
- Minimal dependencies
|
||
- Clear, readable code
|
||
- Avoid premature abstraction
|
||
|
||
2. **Performance over features**
|
||
- DuckDB over Spark
|
||
- Hetzner/Cloudflare over AWS
|
||
- SQL/Python/C over heavyweight frameworks
|
||
|
||
3. **Accuracy over speed-to-market**
|
||
- Data quality is non-negotiable
|
||
- Rigorous validation at every layer
|
||
- Build trust through reliability
|
||
|
||
4. **Build over buy**
|
||
- We're not afraid to write code from scratch
|
||
- Third-party tools must earn their place
|
||
- Control our destiny, minimize vendor lock-in
|
||
|
||
### Technology Stack
|
||
|
||
**Languages:**
|
||
- SQL (primary transformation language)
|
||
- Python (Web,orchestration, extraction, APIs)
|
||
- C (performance-critical extensions)
|
||
|
||
**Infrastructure:**
|
||
- **Storage:** Baremetal nvme drives (backup cloudflare r2)
|
||
- **Compute:** Hetzner bare metal (not AWS/GCP, maybe later for ephemeral pipelines if needed)
|
||
- **Database:** Sqlite/DuckDB
|
||
- **Orchestration:** SQLMesh + custom Python (not Airflow)
|
||
|
||
**Development:**
|
||
- **Monorepo:** uv workspace
|
||
- **Package Manager:** uv (not pip/poetry)
|
||
- **Version Control:** Git (GitLab)
|
||
- **CI/CD:** GitLab CI
|
||
|
||
### Architectural Philosophy
|
||
|
||
**Data-Oriented Design:**
|
||
- No OOP spaghetti
|
||
- Data flows are explicit and traceable
|
||
- Functions transform data, not objects with hidden state
|
||
|
||
**Layered Architecture:**
|
||
- Raw → Staging → Cleaned → Serving
|
||
- Each layer has a single, clear purpose
|
||
- Immutable raw data, reproducible transformations
|
||
|
||
**Incremental Everything:**
|
||
- Models update incrementally by time ranges
|
||
- Avoid full table scans
|
||
- Pay only for what changed
|
||
|
||
## Current State (February 2026)
|
||
|
||
### What's Shipped
|
||
- USDA PSD Online extraction + full SQLMesh pipeline (raw→staging→cleaned→serving)
|
||
- CFTC COT disaggregated futures: weekly positioning, COT index, managed money net
|
||
- KC=F Coffee C futures prices: daily OHLCV, 20d/50d SMA, 52-week range (1971–present)
|
||
- ICE certified warehouse stocks: daily rolling + aging report + historical by-port (Nov 1996–present)
|
||
— API-based URL discovery (no env var needed), XLS + CSV parsing, full SQLMesh pipeline
|
||
— Dashboard: daily stocks chart, aging stacked bar, 30-year by-port stacked area
|
||
— API endpoints: /stocks, /stocks/aging, /stocks/by-port
|
||
- Web app (Quart + HTMX): dashboard with supply/demand + COT + price + all ICE charts
|
||
- Origin Intelligence page: HATEOAS country comparison, click-to-update via HTMX
|
||
- REST API with key auth + rate limiting
|
||
- Paddle billing (Starter/Pro plans), magic-link auth, admin panel
|
||
- /methodology page with full data source documentation
|
||
- Automated supervisor: all extractors (extract_all meta-pipeline) + webhook alerting
|
||
- 23 passing tests, GitLab CI pipeline
|
||
|
||
### What's Missing (Phase 1 remainder)
|
||
- Python SDK (`pip install beanflows`)
|
||
- Deploy to Hetzner production
|
||
- Cloudflare R2 raw data backup
|
||
- Example Jupyter notebooks
|
||
|
||
## Roadmap
|
||
|
||
### Phase 1: Coffee Market Foundation (COMPLETE — ready for outreach)
|
||
**Goal:** Build complete coffee analytics from supply to price
|
||
|
||
**Data Sources:**
|
||
- ✅ USDA PSD Online (production, stocks, consumption)
|
||
- ✅ CFTC COT data (trader positioning, COT index)
|
||
- ✅ KC=F Coffee futures prices (daily OHLCV, moving averages)
|
||
- ✅ ICE warehouse stocks (daily + aging + historical by-port, full pipeline + API + dashboard)
|
||
- ⬜ ICO (International Coffee Organization) — future
|
||
|
||
**Features:**
|
||
- ✅ Dashboard: supply/demand + COT + price + ICE warehouse charts
|
||
- ✅ REST API: all 4 data sources
|
||
- ✅ Data methodology page
|
||
- ✅ Automated daily pipeline with alerting
|
||
- ⬜ Python SDK
|
||
- ⬜ Historical correlation analysis
|
||
|
||
**Infrastructure:**
|
||
- ✅ Supervisor loop with all extractors
|
||
- ⬜ Move to Cloudflare R2 for raw data backup
|
||
- ⬜ Deploy to Hetzner production
|
||
|
||
### Phase 2: Product Market Fit
|
||
**Goal:** Validate with real traders, iterate on feedback
|
||
|
||
- ⬜ Beta access for small group of coffee traders
|
||
- ⬜ Usage analytics (what queries matter?)
|
||
- ⬜ Performance optimization based on real workloads
|
||
- ⬜ Pricing model experimentation ($X/month, pay-as-you-go?)
|
||
|
||
### Phase 3: Expand Commodity Coverage
|
||
**Goal:** Prove architecture scales across commodities
|
||
|
||
**Priority Markets:**
|
||
1. Other softs (cocoa, sugar, cotton, OJ)
|
||
2. Grains (corn, wheat, soybeans)
|
||
3. Energy (crude oil, natural gas)
|
||
4. Metals (gold, silver, copper)
|
||
|
||
**Reusable Patterns:**
|
||
- Abstract extraction logic (API connectors, scrapers)
|
||
- Standardized staging layer for price/volume data
|
||
- Common serving models (time series, correlations, anomalies)
|
||
|
||
### Phase 4: Advanced Analytics
|
||
**Goal:** Differentiation through unique insights
|
||
|
||
- ⬜ Satellite imagery integration (NASA, Planet) for crop monitoring
|
||
- ⬜ Custom yield forecasting models
|
||
- ⬜ Real-time alert system (price thresholds, supply shocks)
|
||
- ⬜ Historical backtesting framework for trading strategies
|
||
- ⬜ Sentiment analysis from news/reports (USDA GAIN, FAO)
|
||
|
||
### Phase 5: Scale & Polish
|
||
**Goal:** Handle growth, maintain performance advantage
|
||
|
||
- ⬜ Multi-region deployment (low latency globally)
|
||
- ⬜ Advanced caching strategies
|
||
- ⬜ Self-service onboarding (no sales calls)
|
||
- ⬜ Public documentation and API reference
|
||
- ⬜ Community/forum for traders
|
||
|
||
## Key Decisions & Trade-offs
|
||
|
||
### Why DuckDB over Spark?
|
||
- **Speed:** In-process OLAP is faster for our workloads
|
||
- **Simplicity:** No cluster management, no JVM
|
||
- **Cost:** Runs on a single beefy server, not 100 nodes
|
||
- **Developer experience:** SQL-first, Python-friendly
|
||
|
||
### Why SQLMesh over dbt/Airflow?
|
||
- **Unified:** Orchestration + transformation in one tool
|
||
- **Performance:** Built for incremental execution
|
||
- **Virtual environments:** Test changes without breaking prod
|
||
- **Python-native:** Extend with custom macros
|
||
|
||
### Why Cloudflare R2 over S3?
|
||
- **Cost:** No egress fees (huge for data-heavy platform)
|
||
- **Performance:** Global edge network
|
||
- **Simplicity:** S3-compatible API, easy migration path
|
||
|
||
### Why Hetzner over AWS?
|
||
- **Cost:** 10x cheaper for equivalent compute
|
||
- **Performance:** Bare metal = no noisy neighbors
|
||
- **Simplicity:** Less surface area, fewer services to manage
|
||
|
||
### Why Monorepo?
|
||
- **Atomic changes:** Update extraction + transformation together
|
||
- **Shared code:** Reusable utilities across packages
|
||
- **Simplified CI:** One pipeline, consistent tooling
|
||
|
||
## Anti-Goals
|
||
|
||
Things we explicitly do NOT want:
|
||
|
||
- ❌ Enterprise sales team
|
||
- ❌ Complex onboarding processes
|
||
- ❌ Vendor lock-in (AWS, Snowflake, etc.)
|
||
- ❌ OOP frameworks (Django ORM, SQLAlchemy magic)
|
||
- ❌ Microservices (until we need them, which is not now)
|
||
- ❌ Kubernetes (overkill for our scale)
|
||
- ❌ Feature bloat (every feature has a performance cost)
|
||
|
||
## Success Metrics
|
||
|
||
**Phase 1 (Foundation):**
|
||
- All coffee data sources integrated
|
||
- Daily pipeline runs reliably (<5% failure rate)
|
||
- Query latency <500ms for common analytics
|
||
|
||
**Phase 2 (PMF):**
|
||
- 10+ paying beta users
|
||
- 90%+ data accuracy (validated against spot checks)
|
||
- Monthly churn <10%
|
||
|
||
**Phase 3 (Expansion):**
|
||
- 5+ commodity markets covered
|
||
- 100+ active users
|
||
- Break-even on infrastructure costs
|
||
|
||
**Long-term (Scale):**
|
||
- Cover all ~35-40 major commodity contracts
|
||
- 1000+ traders using the platform
|
||
- Recognized as the go-to alternative to Kpler for indie traders
|
||
|
||
## Guiding Questions
|
||
|
||
When making decisions, ask:
|
||
|
||
1. **Does this make us faster?** (Performance)
|
||
2. **Does this make us more accurate?** (Data quality)
|
||
3. **Does this make us simpler?** (Maintainability)
|
||
4. **Does this help traders make better decisions?** (Value)
|
||
5. **Can we afford to run this at scale?** (Unit economics)
|
||
|
||
If the answer to any of these is "no," reconsider.
|
||
|
||
## Current Priorities (Q1 2026)
|
||
|
||
**Goal: Complete Phase 1 "whole product" and start beachhead outreach**
|
||
|
||
### Immediate (ship first):
|
||
1. **CFTC COT data** — extract weekly positioning data (CFTC code 083731), add to SQLMesh pipeline, expose via API. Completes the "USDA + CFTC" V1 promise from the strategy doc.
|
||
2. **Coffee futures price (KC=F)** — daily close via yfinance or Databento. Enables price/supply correlation in the dashboard. Core hook for trader interest.
|
||
3. **Data methodology page** — transparent docs for every field, every source, lineage. The #1 trust driver per the strategy doc. Required before outreach.
|
||
4. **Python SDK** (`pip install beanflows`) — one-line data access for quant analysts. The beachhead segment runs Python; this removes their biggest switching friction.
|
||
|
||
### Then (before Series A of customers):
|
||
5. **Automated daily pipeline** on Hetzner — cron + SQLMesh prod, with failure alerting
|
||
6. **Cloudflare R2** raw data backup + pipeline source
|
||
7. **Example Jupyter notebooks** — show before/after vs. manual WASDE workflow
|
||
8. **ICE warehouse stocks** — daily certified Arabica/Robusta inventory data (free from ICE Report Center)
|
||
|
||
### Business (parallel, not blocking):
|
||
- Start direct outreach to 20–30 named analysts at mid-size commodity funds
|
||
- Weekly "BeanFlows Coffee Data Brief" newsletter (content marketing + credibility signal)
|
||
- Identify 1–2 early beta users willing to give feedback
|
||
|
||
---
|
||
|
||
**Last Updated:** February 2026
|
||
**Next Review:** End of Q1 2026
|