Files
beanflows/web/src/beanflows/public/templates/methodology.html
Deeman 67c048485b Add Phase 1A-C + ICE warehouse stocks: prices, methodology, pipeline automation
Phase 1A — KC=F Coffee Futures Prices:
- New extract/coffee_prices/ package (yfinance): downloads KC=F daily OHLCV,
  stores as gzip CSV with SHA256-based idempotency
- SQLMesh models: raw/coffee_prices → foundation/fct_coffee_prices →
  serving/coffee_prices (with 20d/50d SMA, 52-week high/low, daily return %)
- Dashboard: 4 metric cards + dual-line chart (close, 20d MA, 50d MA)
- API: GET /commodities/<ticker>/prices

Phase 1B — Data Methodology Page:
- New /methodology route with full-page template (base.html)
- 6 anchored sections: USDA PSD, CFTC COT, KC=F price, ICE warehouse stocks,
  data quality model, update schedule table
- "Methodology" link added to marketing footer

Phase 1C — Automated Pipeline:
- supervisor.sh updated: runs extract_cot, extract_prices, extract_ice in
  sequence before transform
- Webhook failure alerting via ALERT_WEBHOOK_URL env var (ntfy/Slack/Telegram)

ICE Warehouse Stocks:
- New extract/ice_stocks/ package (niquests): normalizes ICE Report Center CSV
  to canonical schema, hash-based idempotency, soft-fail on 404 with guidance
- SQLMesh models: raw/ice_warehouse_stocks → foundation/fct_ice_warehouse_stocks
  → serving/ice_warehouse_stocks (30d avg, WoW change, 52w drawdown)
- Dashboard: 4 metric cards + line chart (certified bags + 30d avg)
- API: GET /commodities/<code>/stocks

Foundation:
- dim_commodity: added ticker (KC=F) and ice_stock_report_code (COFFEE-C) columns
- macros/__init__.py: added prices_glob() and ice_stocks_glob()
- pipelines.py: added extract_prices and extract_ice entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 11:41:43 +01:00

231 lines
14 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{% extends "base.html" %}
{% block title %}Data Methodology — {{ config.APP_NAME }}{% endblock %}
{% block content %}
<main>
<!-- Hero -->
<section class="hero">
<div class="container-page">
<h1 class="heading-display">Data Methodology</h1>
<p>Every number on BeanFlows has a source, a frequency, and a known limitation. Here's exactly where the data comes from and how we process it.</p>
</div>
</section>
<!-- Table of Contents -->
<section class="container-page py-8 max-w-3xl mx-auto">
<nav class="bg-latte rounded-lg p-6 mb-12">
<h2 class="text-sm font-semibold text-espresso uppercase tracking-wide mb-3">On this page</h2>
<ul class="list-none p-0 space-y-1.5 text-sm">
<li><a href="#usda-psd" class="text-copper">USDA Production, Supply &amp; Distribution</a></li>
<li><a href="#cftc-cot" class="text-copper">CFTC Commitments of Traders</a></li>
<li><a href="#kc-price" class="text-copper">Coffee Futures Price (KC=F)</a></li>
<li><a href="#ice-stocks" class="text-copper">ICE Certified Warehouse Stocks</a></li>
<li><a href="#data-quality" class="text-copper">Data Quality</a></li>
<li><a href="#update-schedule" class="text-copper">Update Schedule</a></li>
</ul>
</nav>
<!-- USDA PSD -->
<section id="usda-psd" class="mb-12">
<h2 class="text-2xl mb-4">USDA Production, Supply &amp; Distribution</h2>
<p class="text-stone mb-4">The USDA's <strong>Production, Supply and Distribution (PSD) Online</strong> database is the definitive public source for agricultural commodity supply and demand balances. It is maintained by the USDA Foreign Agricultural Service and covers 160+ countries and 50+ commodities going back to the 1960s for some crops.</p>
<h3 class="text-lg font-semibold mb-2 mt-6">What we use</h3>
<ul class="list-disc list-inside text-stone space-y-1.5 mb-4">
<li><strong>Commodity:</strong> Coffee, Green — USDA commodity code <code class="bg-parchment px-1 rounded">0711100</code></li>
<li><strong>Coverage:</strong> 2006present (monthly updates)</li>
<li><strong>Geography:</strong> Country-level + world aggregate</li>
<li><strong>Source URL:</strong> <code class="bg-parchment px-1 rounded">apps.fas.usda.gov/psdonlineapi</code></li>
</ul>
<h3 class="text-lg font-semibold mb-2 mt-6">Metrics</h3>
<div class="overflow-x-auto mb-4">
<table class="table text-sm">
<thead>
<tr><th>Metric</th><th>Definition</th><th>Unit</th></tr>
</thead>
<tbody>
<tr><td>Production</td><td>Harvested green coffee output</td><td>1,000 × 60-kg bags</td></tr>
<tr><td>Imports</td><td>Physical coffee imported into country</td><td>1,000 × 60-kg bags</td></tr>
<tr><td>Exports</td><td>Physical coffee exported from country</td><td>1,000 × 60-kg bags</td></tr>
<tr><td>Domestic Consumption</td><td>Coffee consumed within country</td><td>1,000 × 60-kg bags</td></tr>
<tr><td>Ending Stocks</td><td>Carry-over stocks at marketing year end</td><td>1,000 × 60-kg bags</td></tr>
<tr><td>Stock-to-Use Ratio</td><td>Ending stocks ÷ consumption × 100</td><td>%</td></tr>
</tbody>
</table>
</div>
<h3 class="text-lg font-semibold mb-2 mt-6">Release schedule</h3>
<p class="text-stone mb-4">USDA publishes PSD updates monthly, typically in the second week of the month as part of the <em>World Agricultural Supply and Demand Estimates (WASDE)</em> report. Our pipeline checks for updates daily and downloads new data when the file hash changes.</p>
<div class="bg-parchment rounded p-4 text-sm text-stone">
<strong>Note on marketing years:</strong> Coffee marketing years vary by origin country. Brazil's marketing year runs AprilMarch; Colombia's runs OctoberSeptember. USDA normalizes all data to a common market year basis for the global aggregate.
</div>
</section>
<!-- CFTC COT -->
<section id="cftc-cot" class="mb-12">
<h2 class="text-2xl mb-4">CFTC Commitments of Traders</h2>
<p class="text-stone mb-4">The <strong>Commitments of Traders (COT)</strong> report is published weekly by the U.S. Commodity Futures Trading Commission (CFTC). It shows the net positions of large traders in regulated futures markets. It is the primary public indicator of speculative positioning in agricultural commodities.</p>
<h3 class="text-lg font-semibold mb-2 mt-6">What we use</h3>
<ul class="list-disc list-inside text-stone space-y-1.5 mb-4">
<li><strong>Report type:</strong> Disaggregated Futures-Only</li>
<li><strong>Commodity:</strong> Coffee C — CFTC code <code class="bg-parchment px-1 rounded">083</code></li>
<li><strong>Snapshot date:</strong> Every Tuesday close-of-business</li>
<li><strong>Release date:</strong> The following Friday at 3:30 PM ET</li>
<li><strong>Coverage:</strong> June 2006present</li>
<li><strong>Source:</strong> <code class="bg-parchment px-1 rounded">cftc.gov/files/dea/history/fut_disagg_txt_{year}.zip</code></li>
</ul>
<h3 class="text-lg font-semibold mb-2 mt-6">Trader categories</h3>
<div class="overflow-x-auto mb-4">
<table class="table text-sm">
<thead>
<tr><th>Category</th><th>Who they are</th><th>What to watch</th></tr>
</thead>
<tbody>
<tr><td>Managed Money</td><td>Hedge funds, CTAs, algorithmic traders</td><td>Primary speculative signal — net long = bullish</td></tr>
<tr><td>Producer / Merchant</td><td>Coffee exporters, processors, roasters</td><td>Commercial hedgers — usually net short</td></tr>
<tr><td>Swap Dealers</td><td>Banks providing OTC commodity exposure</td><td>Index fund replication — less directional signal</td></tr>
<tr><td>Other Reportables</td><td>Large traders not fitting other categories</td><td>Mixed motivations</td></tr>
<tr><td>Non-Reportable</td><td>Small speculators below CFTC threshold</td><td>Retail sentiment proxy</td></tr>
</tbody>
</table>
</div>
<h3 class="text-lg font-semibold mb-2 mt-6">COT Index</h3>
<p class="text-stone mb-4">The <strong>COT Index</strong> normalizes the managed money net position to a 0100 scale over a trailing window (we publish both 26-week and 52-week). It is calculated as:</p>
<div class="bg-parchment rounded p-4 text-sm font-mono mb-4">
COT Index = (current net min over window) ÷ (max over window min over window) × 100
</div>
<p class="text-stone mb-4">A reading near 0 indicates managed money is at its most bearish extreme over the window. A reading near 100 indicates maximum bullish positioning. Think of it as an RSI for speculative positioning.</p>
</section>
<!-- KC=F Price -->
<section id="kc-price" class="mb-12">
<h2 class="text-2xl mb-4">Coffee Futures Price (KC=F)</h2>
<p class="text-stone mb-4">The <strong>Coffee C contract</strong> (ticker: KC=F) is the global benchmark price for Arabica coffee, traded on ICE Futures U.S. (formerly New York Board of Trade). Each contract covers 37,500 lbs of green coffee. Price is quoted in US cents per pound (¢/lb).</p>
<h3 class="text-lg font-semibold mb-2 mt-6">What we use</h3>
<ul class="list-disc list-inside text-stone space-y-1.5 mb-4">
<li><strong>Ticker:</strong> KC=F (front-month continuous contract)</li>
<li><strong>Data:</strong> Daily OHLCV (Open, High, Low, Close, Adjusted Close, Volume)</li>
<li><strong>Source:</strong> Yahoo Finance via yfinance</li>
<li><strong>Coverage:</strong> 1971present</li>
<li><strong>Delay:</strong> ~15-minute delayed (Yahoo Finance standard)</li>
</ul>
<h3 class="text-lg font-semibold mb-2 mt-6">Derived metrics</h3>
<ul class="list-disc list-inside text-stone space-y-1.5 mb-4">
<li><strong>Daily Return %:</strong> (close prev close) ÷ prev close × 100</li>
<li><strong>20-day SMA:</strong> Simple moving average of the last 20 trading days</li>
<li><strong>50-day SMA:</strong> Simple moving average of the last 50 trading days</li>
<li><strong>52-week High/Low:</strong> Rolling high/low over the trailing ~252 trading days</li>
</ul>
<div class="bg-parchment rounded p-4 text-sm text-stone">
<strong>Front-month continuity:</strong> KC=F is the continuous front-month contract. At roll dates, there is a price gap between expiring and next-month contracts. Adjusted Close accounts for roll adjustments. We use raw Close for current price display and Adjusted Close for historical return calculations.
</div>
</section>
<!-- ICE Warehouse Stocks -->
<section id="ice-stocks" class="mb-12">
<h2 class="text-2xl mb-4">ICE Certified Warehouse Stocks</h2>
<p class="text-stone mb-4">ICE Futures U.S. publishes daily reports of <strong>certified warehouse stocks</strong> for Coffee C. These are physical bags of Arabica coffee that have been graded and stamped as meeting ICE delivery specifications — making them eligible for delivery against a futures contract at expiration.</p>
<h3 class="text-lg font-semibold mb-2 mt-6">Why certified stocks matter</h3>
<p class="text-stone mb-4">Certified stocks are the physical backing of the futures market. When certified stocks fall sharply while open interest is high, shorts cannot easily deliver physical coffee — this creates a <strong>squeeze dynamic</strong> that can drive explosive price rallies. Tracking certified stocks alongside positioning data is essential for understanding delivery risk.</p>
<h3 class="text-lg font-semibold mb-2 mt-6">What we track</h3>
<ul class="list-disc list-inside text-stone space-y-1.5 mb-4">
<li><strong>Total Certified Bags:</strong> All ICE-approved warehouse receipts (60-kg bags)</li>
<li><strong>Pending Grading:</strong> Coffee being evaluated for certification (may join or exit certified stock)</li>
<li><strong>Source:</strong> ICE Report Center (daily publication)</li>
<li><strong>Update frequency:</strong> Daily, after market close</li>
</ul>
<h3 class="text-lg font-semibold mb-2 mt-6">Derived metrics</h3>
<ul class="list-disc list-inside text-stone space-y-1.5 mb-4">
<li><strong>WoW Change:</strong> Day-over-day change in certified bags</li>
<li><strong>30-Day Average:</strong> Smoothed trend removing daily noise</li>
<li><strong>52-Week High:</strong> Rolling maximum over trailing 365 days</li>
<li><strong>Drawdown from 52w High:</strong> % decline from peak — measures how far stocks have been drawn down</li>
</ul>
</section>
<!-- Data Quality -->
<section id="data-quality" class="mb-12">
<h2 class="text-2xl mb-4">Data Quality</h2>
<h3 class="text-lg font-semibold mb-2 mt-6">Immutable raw layer</h3>
<p class="text-stone mb-4">All source files are stored as immutable gzip-compressed CSVs in a content-addressed landing directory. Files are never modified in place — a new download creates a new file only if the content hash differs from what is already stored. This means the full history of source corrections is preserved.</p>
<h3 class="text-lg font-semibold mb-2 mt-6">Incremental models with deduplication</h3>
<p class="text-stone mb-4">Foundation models are incremental and deduplicate via a hash key computed from business grain columns and key metrics. If a source issues a correction (CFTC re-states a COT figure, USDA revises a production estimate), the corrected row produces a different hash and is ingested on the next pipeline run. Serving models select the most recent revision per grain.</p>
<h3 class="text-lg font-semibold mb-2 mt-6">Known limitations</h3>
<ul class="list-disc list-inside text-stone space-y-1.5 mb-4">
<li>USDA PSD revisions can extend back multiple years — always treat historical figures as estimates subject to revision.</li>
<li>Yahoo Finance prices carry a ~15-minute delay and may have minor adjustments at roll dates.</li>
<li>COT data reflects Tuesday close positions; the market may move significantly before Friday's release.</li>
<li>ICE warehouse stocks do not distinguish between origins — certified stock drawdowns at specific ports are not visible here.</li>
</ul>
</section>
<!-- Update Schedule -->
<section id="update-schedule" class="mb-12">
<h2 class="text-2xl mb-4">Update Schedule</h2>
<div class="overflow-x-auto">
<table class="table text-sm">
<thead>
<tr>
<th>Source</th>
<th>Frequency</th>
<th>Typical freshness</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>USDA PSD</td>
<td>Monthly</td>
<td>~2nd week of month</td>
<td>WASDE release day; daily pipeline detects hash change</td>
</tr>
<tr>
<td>CFTC COT</td>
<td>Weekly (Friday)</td>
<td>Friday 3:30 PM ET</td>
<td>Reflects prior Tuesday positions</td>
</tr>
<tr>
<td>KC=F Price</td>
<td>Daily</td>
<td>Next morning</td>
<td>Yahoo Finance ~15 min delayed; previous day close available next morning</td>
</tr>
<tr>
<td>ICE Warehouse Stocks</td>
<td>Daily</td>
<td>After market close</td>
<td>ICE publishes report center data daily after the close</td>
</tr>
</tbody>
</table>
</div>
<p class="text-stone mt-4 text-sm">Our pipeline runs continuously. Data is re-checked daily and new data is loaded within hours of publication. The dashboard shows the freshness date on each data section.</p>
</section>
<!-- Questions -->
<section class="bg-latte rounded-lg p-6">
<h2 class="text-xl mb-2">Questions about the data?</h2>
<p class="text-stone text-sm mb-4">If you spot an inconsistency or want to understand how a specific metric is calculated, use the feedback button on any page or reach out directly.</p>
<a href="{{ url_for('auth.signup') }}" class="btn">Try BeanFlows free</a>
</section>
</section>
</main>
{% endblock %}