Files

Deeman 4dcf1e7e84 Fix dashboard error handling, settings billing route, update vision.md

- routes.py: return_exceptions=True on gather, log individual query failures
  with per-result defaults so one bad query doesn't blank the whole page
- settings.html: fix billing.portal → billing.manage (correct blueprint route)
- vision.md: update current state to February 2026, document shipped features

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-21 00:02:41 +01:00

11 KiB

Raw Blame History

VISION.md

Mission

Build the fastest, most accurate, and most affordable commodity analytics platform for independent traders and small firms—without enterprise sales bullshit.

Product: BeanFlows.coffee

Tagline: Real-time commodity intelligence for traders who think for themselves.

Beachhead Market: Coffee commodities Long-term Vision: Expand to all major commodity markets (~35-40 global contracts)

Why We Exist

Platforms like Kpler dominate the commodity analytics space but are:

Slow and complex
Prohibitively expensive
Designed for enterprise buyers with bloated sales processes
Built on legacy infrastructure that prioritizes features over performance

We're building the anti-Kpler: better, faster, cheaper.

Who We Are

A two-person indie hacker startup:

Data Engineer: Building the platform
Commodity Trader: Domain expertise and product direction

We move fast, ship incrementally, and prioritize value over vanity metrics.

Technical Philosophy

Core Principles

Simplicity over complexity
- Minimal dependencies
- Clear, readable code
- Avoid premature abstraction
Performance over features
- DuckDB over Spark
- Hetzner/Cloudflare over AWS
- SQL/Python/C over heavyweight frameworks
Accuracy over speed-to-market
- Data quality is non-negotiable
- Rigorous validation at every layer
- Build trust through reliability
Build over buy
- We're not afraid to write code from scratch
- Third-party tools must earn their place
- Control our destiny, minimize vendor lock-in

Technology Stack

Languages:

SQL (primary transformation language)
Python (orchestration, extraction, APIs)
C (performance-critical extensions)

Infrastructure:

Storage: Cloudflare R2 (not S3)
Compute: Hetzner bare metal (not AWS/GCP)
Database: DuckDB (not Spark/Snowflake)
Orchestration: SQLMesh + custom Python (not Airflow)

Development:

Monorepo: uv workspace
Package Manager: uv (not pip/poetry)
Version Control: Git (GitLab)
CI/CD: GitLab CI

Architectural Philosophy

Data-Oriented Design:

No OOP spaghetti
Data flows are explicit and traceable
Functions transform data, not objects with hidden state

Layered Architecture:

Raw → Staging → Cleaned → Serving
Each layer has a single, clear purpose
Immutable raw data, reproducible transformations

Incremental Everything:

Models update incrementally by time ranges
Avoid full table scans
Pay only for what changed

Current State (February 2026)

What's Working

USDA PSD Online extraction (2006-present, monthly archives)
4-layer SQLMesh pipeline (raw → staging → cleaned → serving)
DuckDB backend (local dev + production lakehouse)
Incremental-by-time-range models with deduplication
Development environment with pre-commit hooks, linting, formatting
Web app (BeanFlows.coffee) — Quart + HTMX, deployed via Docker
- Magic-link auth + signup with waitlist flow
- Coffee analytics dashboard: time series, top producers, stock-to-use trend, supply/demand balance, YoY change
- Country comparison view
- User settings + account management
- API key management (create, revoke, prefix display)
- Plan-based access control (free / starter / pro) with 5-year history cap on free tier
- Billing via Paddle (subscriptions + webhooks)
- Admin panel (users, waitlist, feedback, tasks)
- REST API with Bearer token auth, rate limiting (1000 req/hr), CSV export
- Feedback + waitlist capture
GitLab CI pipeline (lint, test, build), regression tests for billing/auth/API

What We Have

Comprehensive commodity supply/demand data (USDA PSD, 2006–present)
Established naming conventions and data quality patterns
Full product pipeline: data → DB → API → web dashboard
Paddle billing integration (Starter + Pro tiers)
Working waitlist to capture early interest

Roadmap

Phase 1: Coffee Market Foundation (In Progress → ~70% done)

Goal: Build complete coffee analytics from supply to price

Data Sources to Integrate:

✅ USDA PSD Online (production, stocks, consumption)
⬜ CFTC COT data (trader positioning — weekly, Coffee C futures code 083731)
⬜ Coffee futures prices — KC=F via Yahoo Finance / yfinance, or Databento for tick-level
⬜ ICO (International Coffee Organization) data — trade volumes, consumption stats
⬜ ICE certified warehouse stocks (daily CSV from ICE Report Center — free)
⬜ Weather data for growing regions — ECMWF/Open-Meteo (free), Brazil frost alerts

Features to Build:

✅ Web dashboard (supply/demand, stock-to-use trend, YoY, country comparison)
✅ REST API with key auth, plan-based access, rate limiting
✅ CSV export
⬜ CFTC COT integration → trader sentiment indicators
⬜ Historical price data → price/supply correlation analysis
⬜ Python SDK (pip install beanflows) — critical for the quant analyst beachhead
⬜ Data methodology documentation page — P0 for trust (see strategy doc)
⬜ Parquet export endpoint
⬜ Example Jupyter notebooks (show how to pipe data into common models)

Infrastructure:

⬜ Cloudflare R2 for raw data storage (rclone sync is partly planned)
⬜ Automated daily pipeline on Hetzner (SQLMesh prod + cron)
⬜ Pipeline monitoring + alerting (failure notifications)
⬜ Published SLA for data freshness

Phase 2: Product Market Fit

Goal: Validate with real traders, iterate on feedback

⬜ Beta access for small group of coffee traders
⬜ Usage analytics (what queries matter?)
⬜ Performance optimization based on real workloads
⬜ Pricing model experimentation ($X/month, pay-as-you-go?)

Phase 3: Expand Commodity Coverage

Goal: Prove architecture scales across commodities

Priority Markets:

Other softs (cocoa, sugar, cotton, OJ)
Grains (corn, wheat, soybeans)
Energy (crude oil, natural gas)
Metals (gold, silver, copper)

Reusable Patterns:

Abstract extraction logic (API connectors, scrapers)
Standardized staging layer for price/volume data
Common serving models (time series, correlations, anomalies)

Phase 4: Advanced Analytics

Goal: Differentiation through unique insights

⬜ Satellite imagery integration (NASA, Planet) for crop monitoring
⬜ Custom yield forecasting models
⬜ Real-time alert system (price thresholds, supply shocks)
⬜ Historical backtesting framework for trading strategies
⬜ Sentiment analysis from news/reports (USDA GAIN, FAO)

Phase 5: Scale & Polish

Goal: Handle growth, maintain performance advantage

⬜ Multi-region deployment (low latency globally)
⬜ Advanced caching strategies
⬜ Self-service onboarding (no sales calls)
⬜ Public documentation and API reference
⬜ Community/forum for traders

Key Decisions & Trade-offs

Why DuckDB over Spark?

Speed: In-process OLAP is faster for our workloads
Simplicity: No cluster management, no JVM
Cost: Runs on a single beefy server, not 100 nodes
Developer experience: SQL-first, Python-friendly

Why SQLMesh over dbt/Airflow?

Unified: Orchestration + transformation in one tool
Performance: Built for incremental execution
Virtual environments: Test changes without breaking prod
Python-native: Extend with custom macros

Why Cloudflare R2 over S3?

Cost: No egress fees (huge for data-heavy platform)
Performance: Global edge network
Simplicity: S3-compatible API, easy migration path

Why Hetzner over AWS?

Cost: 10x cheaper for equivalent compute
Performance: Bare metal = no noisy neighbors
Simplicity: Less surface area, fewer services to manage

Why Monorepo?

Atomic changes: Update extraction + transformation together
Shared code: Reusable utilities across packages
Simplified CI: One pipeline, consistent tooling

Anti-Goals

Things we explicitly do NOT want:

❌ Enterprise sales team
❌ Complex onboarding processes
❌ Vendor lock-in (AWS, Snowflake, etc.)
❌ OOP frameworks (Django ORM, SQLAlchemy magic)
❌ Microservices (until we need them, which is not now)
❌ Kubernetes (overkill for our scale)
❌ Feature bloat (every feature has a performance cost)

Success Metrics

Phase 1 (Foundation):

All coffee data sources integrated
Daily pipeline runs reliably (<5% failure rate)
Query latency <500ms for common analytics

Phase 2 (PMF):

10+ paying beta users
90%+ data accuracy (validated against spot checks)
Monthly churn <10%

Phase 3 (Expansion):

5+ commodity markets covered
100+ active users
Break-even on infrastructure costs

Long-term (Scale):

Cover all ~35-40 major commodity contracts
1000+ traders using the platform
Recognized as the go-to alternative to Kpler for indie traders

Guiding Questions

When making decisions, ask:

Does this make us faster? (Performance)
Does this make us more accurate? (Data quality)
Does this make us simpler? (Maintainability)
Does this help traders make better decisions? (Value)
Can we afford to run this at scale? (Unit economics)

If the answer to any of these is "no," reconsider.

Current Priorities (Q1 2026)

Goal: Complete Phase 1 "whole product" and start beachhead outreach

Immediate (ship first):

CFTC COT data — extract weekly positioning data (CFTC code 083731), add to SQLMesh pipeline, expose via API. Completes the "USDA + CFTC" V1 promise from the strategy doc.
Coffee futures price (KC=F) — daily close via yfinance or Databento. Enables price/supply correlation in the dashboard. Core hook for trader interest.
Data methodology page — transparent docs for every field, every source, lineage. The #1 trust driver per the strategy doc. Required before outreach.
Python SDK (pip install beanflows) — one-line data access for quant analysts. The beachhead segment runs Python; this removes their biggest switching friction.

Then (before Series A of customers):

Automated daily pipeline on Hetzner — cron + SQLMesh prod, with failure alerting
Cloudflare R2 raw data backup + pipeline source
Example Jupyter notebooks — show before/after vs. manual WASDE workflow
ICE warehouse stocks — daily certified Arabica/Robusta inventory data (free from ICE Report Center)

Business (parallel, not blocking):

Start direct outreach to 20–30 named analysts at mid-size commodity funds
Weekly "BeanFlows Coffee Data Brief" newsletter (content marketing + credibility signal)
Identify 1–2 early beta users willing to give feedback

Last Updated: February 2026 Next Review: End of Q1 2026

11 KiB Raw Blame History Unescape Escape