Files
beanflows/vision.md
2026-02-04 22:24:55 +01:00

8.5 KiB

VISION.md

Mission

Build the fastest, most accurate, and most affordable commodity analytics platform for independent traders and small firms—without enterprise sales bullshit.

Product: BeanFlows.coffee

Tagline: Real-time commodity intelligence for traders who think for themselves.

Beachhead Market: Coffee commodities Long-term Vision: Expand to all major commodity markets (~35-40 global contracts)

Why We Exist

Platforms like Kpler dominate the commodity analytics space but are:

  • Slow and complex
  • Prohibitively expensive
  • Designed for enterprise buyers with bloated sales processes
  • Built on legacy infrastructure that prioritizes features over performance

We're building the anti-Kpler: better, faster, cheaper.

Who We Are

A two-person indie hacker startup:

  • Data Engineer: Building the platform
  • Commodity Trader: Domain expertise and product direction

We move fast, ship incrementally, and prioritize value over vanity metrics.

Technical Philosophy

Core Principles

  1. Simplicity over complexity

    • Minimal dependencies
    • Clear, readable code
    • Avoid premature abstraction
  2. Performance over features

    • DuckDB over Spark
    • Hetzner/Cloudflare over AWS
    • SQL/Python/C over heavyweight frameworks
  3. Accuracy over speed-to-market

    • Data quality is non-negotiable
    • Rigorous validation at every layer
    • Build trust through reliability
  4. Build over buy

    • We're not afraid to write code from scratch
    • Third-party tools must earn their place
    • Control our destiny, minimize vendor lock-in

Technology Stack

Languages:

  • SQL (primary transformation language)
  • Python (orchestration, extraction, APIs)
  • C (performance-critical extensions)

Infrastructure:

  • Storage: Cloudflare R2 (not S3)
  • Compute: Hetzner bare metal (not AWS/GCP)
  • Database: DuckDB (not Spark/Snowflake)
  • Orchestration: SQLMesh + custom Python (not Airflow)

Development:

  • Monorepo: uv workspace
  • Package Manager: uv (not pip/poetry)
  • Version Control: Git (GitLab)
  • CI/CD: GitLab CI

Architectural Philosophy

Data-Oriented Design:

  • No OOP spaghetti
  • Data flows are explicit and traceable
  • Functions transform data, not objects with hidden state

Layered Architecture:

  • Raw → Staging → Cleaned → Serving
  • Each layer has a single, clear purpose
  • Immutable raw data, reproducible transformations

Incremental Everything:

  • Models update incrementally by time ranges
  • Avoid full table scans
  • Pay only for what changed

Current State (October 2025)

What's Working

  • USDA PSD Online extraction (2006-present, monthly archives)
  • 4-layer SQLMesh pipeline (raw → staging → cleaned → serving)
  • DuckDB backend with 13GB dev database
  • Incremental-by-time-range models with deduplication
  • Development environment with pre-commit hooks, linting, formatting

What We Have

  • Comprehensive commodity supply/demand data (USDA PSD)
  • Established naming conventions and data quality patterns
  • GitLab CI pipeline (lint, test, build)
  • Documentation (CLAUDE.md, layer conventions)

Roadmap

Phase 1: Coffee Market Foundation (Current)

Goal: Build complete coffee analytics from supply to price

Data Sources to Integrate:

  • USDA PSD Online (production, stocks, consumption)
  • ICO (International Coffee Organization) data
  • Yahoo Finance / Alpha Vantage (coffee futures prices - KC=F)
  • Weather data for coffee-growing regions (OpenWeatherMap, NOAA)
  • CFTC COT data (trader positioning)
  • ICE warehouse stocks (web scraping)

Features to Build:

  • Historical price correlation analysis
  • Supply/demand balance modeling
  • Weather impact scoring
  • Trader sentiment indicators (COT)
  • Simple web dashboard (read-only analytics)
  • Data export APIs (JSON, CSV, Parquet)

Infrastructure:

  • Move to Cloudflare R2 for raw data storage
  • Deploy SQLMesh to Hetzner production environment
  • Set up automated daily extraction + transformation pipeline
  • Implement monitoring and alerting

Phase 2: Product Market Fit

Goal: Validate with real traders, iterate on feedback

  • Beta access for small group of coffee traders
  • Usage analytics (what queries matter?)
  • Performance optimization based on real workloads
  • Pricing model experimentation ($X/month, pay-as-you-go?)

Phase 3: Expand Commodity Coverage

Goal: Prove architecture scales across commodities

Priority Markets:

  1. Other softs (cocoa, sugar, cotton, OJ)
  2. Grains (corn, wheat, soybeans)
  3. Energy (crude oil, natural gas)
  4. Metals (gold, silver, copper)

Reusable Patterns:

  • Abstract extraction logic (API connectors, scrapers)
  • Standardized staging layer for price/volume data
  • Common serving models (time series, correlations, anomalies)

Phase 4: Advanced Analytics

Goal: Differentiation through unique insights

  • Satellite imagery integration (NASA, Planet) for crop monitoring
  • Custom yield forecasting models
  • Real-time alert system (price thresholds, supply shocks)
  • Historical backtesting framework for trading strategies
  • Sentiment analysis from news/reports (USDA GAIN, FAO)

Phase 5: Scale & Polish

Goal: Handle growth, maintain performance advantage

  • Multi-region deployment (low latency globally)
  • Advanced caching strategies
  • Self-service onboarding (no sales calls)
  • Public documentation and API reference
  • Community/forum for traders

Key Decisions & Trade-offs

Why DuckDB over Spark?

  • Speed: In-process OLAP is faster for our workloads
  • Simplicity: No cluster management, no JVM
  • Cost: Runs on a single beefy server, not 100 nodes
  • Developer experience: SQL-first, Python-friendly

Why SQLMesh over dbt/Airflow?

  • Unified: Orchestration + transformation in one tool
  • Performance: Built for incremental execution
  • Virtual environments: Test changes without breaking prod
  • Python-native: Extend with custom macros

Why Cloudflare R2 over S3?

  • Cost: No egress fees (huge for data-heavy platform)
  • Performance: Global edge network
  • Simplicity: S3-compatible API, easy migration path

Why Hetzner over AWS?

  • Cost: 10x cheaper for equivalent compute
  • Performance: Bare metal = no noisy neighbors
  • Simplicity: Less surface area, fewer services to manage

Why Monorepo?

  • Atomic changes: Update extraction + transformation together
  • Shared code: Reusable utilities across packages
  • Simplified CI: One pipeline, consistent tooling

Anti-Goals

Things we explicitly do NOT want:

  • Enterprise sales team
  • Complex onboarding processes
  • Vendor lock-in (AWS, Snowflake, etc.)
  • OOP frameworks (Django ORM, SQLAlchemy magic)
  • Microservices (until we need them, which is not now)
  • Kubernetes (overkill for our scale)
  • Feature bloat (every feature has a performance cost)

Success Metrics

Phase 1 (Foundation):

  • All coffee data sources integrated
  • Daily pipeline runs reliably (<5% failure rate)
  • Query latency <500ms for common analytics

Phase 2 (PMF):

  • 10+ paying beta users
  • 90%+ data accuracy (validated against spot checks)
  • Monthly churn <10%

Phase 3 (Expansion):

  • 5+ commodity markets covered
  • 100+ active users
  • Break-even on infrastructure costs

Long-term (Scale):

  • Cover all ~35-40 major commodity contracts
  • 1000+ traders using the platform
  • Recognized as the go-to alternative to Kpler for indie traders

Guiding Questions

When making decisions, ask:

  1. Does this make us faster? (Performance)
  2. Does this make us more accurate? (Data quality)
  3. Does this make us simpler? (Maintainability)
  4. Does this help traders make better decisions? (Value)
  5. Can we afford to run this at scale? (Unit economics)

If the answer to any of these is "no," reconsider.

Current Priorities (Q4 2025)

  1. Integrate coffee futures price data (Yahoo Finance)
  2. Build time-series serving models for price/supply correlation
  3. Deploy production pipeline to Hetzner
  4. Set up Cloudflare R2 for raw data storage
  5. Create simple read-only dashboard for coffee analytics
  6. Document API for beta testers

Last Updated: October 2025 Next Review: End of Q4 2025 (adjust based on Phase 1 progress)