Replace push-based SSH deploy (deploy:web stage with SSH credentials +
individual env var injection) with tag-based pull deploy:
- Add `tag` stage: creates v${CI_PIPELINE_IID} tag using CI_JOB_TOKEN
- Remove all SSH variables (SSH_PRIVATE_KEY, SSH_KNOWN_HOSTS, DEPLOY_USER,
DEPLOY_HOST) and all individual secret variables from CI
- Zero deploy secrets in CI — only CI_JOB_TOKEN (built-in) needed
Deployment is now handled by the on-server supervisor (src/materia/supervisor.py)
which polls for new v* tags every 60s and runs web/deploy.sh automatically.
Secrets live in .env.prod.sops (git-committed, age-encrypted), decrypted at
deploy time by deploy.sh — never stored in GitLab CI variables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
App containers need access to the serving DuckDB populated by the
pipeline supervisor. Bind-mounts /data/materia/analytics.duckdb as
read-only and sets SERVING_DUCKDB_PATH in container environment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-install sops + age binaries to web/bin/ if not present
- Generate age keypair at repo root age-key.txt if missing (prints public
key with instructions to add to .sops.yaml, then exits)
- Decrypt .env.prod.sops → web/.env at deploy time (no CI secrets needed)
- Backup SQLite DB before migration (timestamped, keeps last 3)
- Rollback on health check failure: dump logs + restore DB backup
- Reset nginx router to current slot before --wait to avoid upstream errors
- Remove web/scripts/deploy.sh (duplicate)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Load .env from repo root first (created by `make secrets-decrypt-dev`),
falling back to web/.env for legacy setups. Also fixes import sort order
and removes unused httpx import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
secrets.py: replace Pulumi ESC (esc CLI) with SOPS decrypt. Reads
.env.prod.sops via `sops --decrypt`, parses dotenv output. Same public
API: get_secret(), list_secrets(), test_connection().
cli.py: update secrets subcommand help text and test command messaging.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- .sops.yaml: creation rules matching .env.{dev,prod}.sops (dotenv format)
- .env.dev.sops: encrypted dev defaults (blank API keys, local paths)
- .env.prod.sops: encrypted prod template (placeholder values to fill in)
- Makefile: root Makefile with secrets-decrypt-dev/prod, secrets-edit-dev/prod, css-build/watch
- .gitignore: add age-key.txt
Dev workflow: make secrets-decrypt-dev → .env (repo root) → web app picks it up.
Server: deploy.sh will auto-decrypt .env.prod.sops on each deploy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- obt_cot_positioning.sql: replace final SELECT * with explicit column list
so linter can resolve schema without foundation.fct_cot_positioning in DB
- fct_weather_daily.sql: fix HASH(location_id, src."date") → located."date"
(cast_and_clean CTE references FROM located, not FROM src)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds extract/openweathermap package with daily weather extraction for 8
coffee-growing regions (Brazil, Vietnam, Colombia, Ethiopia, Honduras,
Guatemala, Indonesia). Feeds crop stress signal for commodity sentiment score.
Extractor:
- OWM One Call API 3.0 / Day Summary — one JSON.gz per (location, date)
- extract_weather: daily, fetches yesterday + today (16 calls max)
- extract_weather_backfill: fills 2020-01-01 to yesterday, capped at 500
calls/run with resume cursor '{location_id}:{date}' for crash safety
- Full idempotency via file existence check; state tracking via extract_core
SQLMesh:
- seeds.weather_locations (8 regions with lat/lon/variety)
- foundation.fct_weather_daily: INCREMENTAL_BY_TIME_RANGE, grain
(location_id, observation_date), dedup via hash key, crop stress flags:
is_frost (<2°C), is_heat_stress (>35°C), is_drought (<1mm), in_growing_season
Landing path: LANDING_DIR/weather/{location_id}/{year}/{date}.json.gz
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Writes .env to web/, runs deploy.sh from web/. Pushes env vars
from GitLab CI/CD variables to the server on every master push.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Delete 6 data raw models (coffee_prices, cot_disaggregated, ice_*,
psd_data) — pure read_csv passthroughs with no added value
- Move 3 PSD seed models raw/ → seeds/, rename schema raw.* → seeds.*
- Update staging.psdalldata__commodity: read_csv(@psd_glob()) directly,
join seeds.psd_* instead of raw.psd_*
- Update 5 foundation models: inline read_csv() with src CTE, removing
raw.* dependency (fct_coffee_prices, fct_cot_positioning, fct_ice_*)
- Remove fixture-based SQLMesh test that depended on raw.cot_disaggregated
(unit tests incompatible with inline read_csv; integration run covers this)
- Update readme.md: 3-layer architecture (staging/foundation → serving)
Landing files are immutable and content-addressed — the landing directory
is the audit trail. A raw SQL layer duplicated file bytes into DuckDB
with no added value.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates the beanflows system user, /opt/beanflows directory, and an
ed25519 GitLab deploy key. Prints the public key to add as a read-only
deploy key on the repo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rename env var to plural (CSV list) in CI yml to match the actual
config key. Add hendrik@beanflow.coffee and simon@beanflows.coffee
as hardcoded defaults so they get admin access without needing the
env var set explicitly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add extract/extract_core/ workspace package with three modules:
- state.py: SQLite run tracking (open_state_db, start_run, end_run, get_last_cursor)
- http.py: niquests session factory + etag normalization helpers
- files.py: landing_path, content_hash, write_bytes_atomic (atomic gzip writes)
- State lives at {LANDING_DIR}/.state.sqlite — no extra env var needed
- SQLite chosen over DuckDB: state tracking is OLTP (row inserts/updates), not analytical
- Refactor all 4 extractors (psdonline, cftc_cot, coffee_prices, ice_stocks):
- Replace inline boilerplate with extract_core helpers
- Add start_run/end_run tracking to every extraction entry point
- extract_cot_year returns int (bytes_written) instead of bool
- Update tests: assert result == 0 (not `is False`) for the return type change
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dashboard/routes.py (4 places) and admin/routes.py still checked
analytics._conn is not None after _conn was removed in the two-file
refactor — causing AttributeError → 500 on every dashboard page.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs fixed:
1. Cross-connection COPY: DuckDB doesn't support referencing another
connection's tables as src.serving.table. Replace with Arrow as
intermediate: src reads to Arrow, dst.register() + CREATE TABLE.
2. Catalog/schema name collision: naming the export file serving.duckdb
made DuckDB assign catalog name "serving" — same as the schema we
create inside it. Every serving.table query became ambiguous. Rename
to analytics.duckdb (catalog "analytics", schema "serving" = no clash).
SERVING_DUCKDB_PATH values updated: serving.duckdb → analytics.duckdb
in supervisor, service, bootstrap, dev_run.sh, .env.example, docker-compose.
3. Temp file: use _export.duckdb (not serving.duckdb.tmp) to avoid
the same catalog collision during the write phase.
Verified: 6 tables exported, serving.* queries work read-only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On the first `./scripts/dev_run.sh` invocation (serving.duckdb absent),
automatically run extract → transform → export_serving from the repo root
so the dashboard is populated without any manual steps.
Subsequent runs skip the pipeline for a fast startup. Delete serving.duckdb
from the repo root to force a full pipeline re-run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The assert _db_path in fetch_analytics() would crash dashboard routes
locally when SERVING_DUCKDB_PATH is unset or serving.duckdb doesn't
exist yet. Change to graceful return [] so the app degrades cleanly.
Also add SERVING_DUCKDB_PATH=../serving.duckdb to local .env so the
web app will auto-connect once `materia pipeline run export_serving`
has been run for the first time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split the single lakehouse.duckdb into two files to eliminate the exclusive
write-lock conflict between SQLMesh (pipeline) and the Quart web app (reader):
lakehouse.duckdb — SQLMesh exclusive (all pipeline layers)
serving.duckdb — web app reads (serving tables only, atomically swapped)
Changes:
web/src/beanflows/analytics.py
- Replace persistent global _conn with per-thread connections (threading.local)
- Add _get_conn(): opens read_only=True on first call per thread, reopens
automatically on inode change (~1μs os.stat) to pick up atomic file swaps
- Switch env var from DUCKDB_PATH → SERVING_DUCKDB_PATH
- Add module docstring documenting architecture + DuckLake migration path
web/src/beanflows/app.py
- Startup check: use SERVING_DUCKDB_PATH
- Health check: use _db_path instead of _conn
src/materia/export_serving.py (new)
- Reads all serving.* tables from lakehouse.duckdb (read_only)
- Writes to serving_new.duckdb, then os.rename → serving.duckdb (atomic)
- ~50 lines; runs after each SQLMesh transform
src/materia/pipelines.py
- Add export_serving pipeline entry (uv run python -c ...)
infra/supervisor/supervisor.sh
- Add SERVING_DUCKDB_PATH env var comment
- Add export step: uv run materia pipeline run export_serving
infra/supervisor/materia-supervisor.service
- Add Environment=SERVING_DUCKDB_PATH=/data/materia/serving.duckdb
infra/bootstrap_supervisor.sh
- Add SERVING_DUCKDB_PATH to .env template
web/.env.example + web/docker-compose.yml
- Document both env vars; switch web service to SERVING_DUCKDB_PATH
web/src/beanflows/dashboard/templates/settings.html
- Minor settings page fix from prior session
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove 'Write' scope checkbox from API key creation form — BeanFlows
is a read-only data platform, write keys are meaningless to users.
Scope is now always 'read' via hidden input.
- Add try/except in billing.manage route so Paddle API failures (e.g.
no live credentials in dev) show a user-facing flash error instead
of a 500.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The ICE API at /marketdata/api/reports/293/results stores all historical
daily XLS reports date-descending. Previously the extractor only fetched
the latest. New extract_ice_backfill entry point pages through the API
and downloads all matching 'Daily Warehouse Stocks' reports.
- ice_api.py: add find_all_reports() alongside find_latest_report()
- execute.py: add extract_ice_stocks_backfill(max_pages=3) — default
covers ~6 months; max_pages=20 fetches ~3 years of history
- pyproject.toml: register extract_ice_backfill entry point
Ran backfill: 131 files, 2025-08-15 → 2026-02-20
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace Apply button flow with immediate HTMX partial fetches:
- toggleCountry() does an optimistic UI update (row + badge) then
calls htmx.ajax() targeting #cc-canvas with swap=innerHTML
- URL is pushed to history on every selection change (bookmarkable)
- HX-Request now returns countries_canvas.html fragment (chips +
chart/empty + inline IIFE that re-syncs globals + re-inits Chart.js)
- Panel (dark) is never swapped; canvas fades during in-flight request
- PALETTE, buildRankings(), initChart() defined once on page load,
called by both initial render and partial IIFE after each swap
- Apply button removed; Clear triggers fetchCanvas() with empty codes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace generic multi-select + plain card with a two-panel layout:
- Dark espresso selector panel (sticky, searchable, click-to-toggle)
with country rows showing rank, name, production figure, checkbox
- Right canvas: metric segment tabs, selected-country chips (colored),
Chart.js line chart with dark espresso tooltip, and a JS-built
rankings table with proportional colored bars (latest year)
- Smooth fade-in animations, monospaced figures, copper accent palette
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ICE changed the daily stocks XLS header from 'As of: 1/30/2026' to
'As of: Feb 20, 2026 1:35:39PM'. Expand _build_canonical_csv_from_xls
to try multiple strptime formats (%m/%d/%Y, %b %d, %Y, etc.) on both
single-token and three-token date candidates.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sequences: extract (PSD) → extract_cot (CFTC) → extract_prices (KC=F) → extract_ice_all (ICE)
Stops and reports on first failure. META_PIPELINES dict makes it easy to
add more meta-pipelines as sources expand.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- serving/ice_aging_stocks.sql: pass-through from foundation, parses age
bucket string to start/end days ints for correct sort order
- serving/ice_warehouse_stocks_by_port.sql: monthly by-port since 1996,
adds MoM change, MoM %, 12-month rolling average
- analytics.py: get_ice_aging_latest(), get_ice_aging_trend(),
get_ice_stocks_by_port_trend(), get_ice_stocks_by_port_latest()
- api/routes.py: GET /commodities/<code>/stocks/aging and
GET /commodities/<code>/stocks/by-port with auth + rate limiting
- dashboard/routes.py: add 3 new queries to asyncio.gather(), pass to template
- index.html: aging stacked bar chart (age buckets × port) with 4 metric
cards; by-port stacked area chart (30-year history) with 4 metric cards
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace brittle ICE_STOCKS_URL env var with API-based URL discovery via
the private ICE Report Center JSON API (no auth required)
- Add rolling CSV → XLS fallback in extract_ice_stocks() using
find_latest_report() from ice_api.py
- Add ice_api.py: fetch_report_listings(), find_latest_report() with
pagination up to MAX_API_PAGES
- Add xls_parse.py: detect_file_format() (magic bytes), xls_to_rows()
using xlrd for OLE2/BIFF XLS files
- Add extract_ice_aging(): monthly certified stock aging report by
age bucket × port → ice_aging/ landing dir
- Add extract_ice_historical(): 30-year EOM by-port stocks from static
ICE URL → ice_stocks_by_port/ landing dir
- Add xlrd>=2.0.1 (parse XLS), xlwt>=1.3.0 (dev, test fixtures)
- Add SQLMesh raw + foundation models for both new datasets
- Add ice_aging_glob(), ice_stocks_by_port_glob() macros
- Add extract_ice_aging + extract_ice_historical pipeline entries
- Add 12 unit tests (format detection, XLS roundtrip, API mock, CSV output)
Seed files (data/landing/ice_aging/seed/ and ice_stocks_by_port/seed/)
must be created locally — data/ is gitignored.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move scout MCP server out of tools/scout/ into its own repo at
/var/home/Deeman/Projects/scout. Update .mcp.json to use absolute path
so any project can reference it.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add scout_click_coords for manual coordinate-based clicks (useful when
CSS selectors can't reach cross-origin iframes)
- Document in _dismiss_cookie_banner why Sourcepoint is not auto-dismissed:
HAR captures traffic regardless of banner visibility; coordinate clicks
are too brittle across screen sizes
- Add missing asyncio import to server.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>