Commit Graph

179 Commits

Author SHA1 Message Date
Deeman
6d716a83ae feat(secrets): rewrite secrets.py for SOPS, update cli.py
secrets.py: replace Pulumi ESC (esc CLI) with SOPS decrypt. Reads
.env.prod.sops via `sops --decrypt`, parses dotenv output. Same public
API: get_secret(), list_secrets(), test_connection().

cli.py: update secrets subcommand help text and test command messaging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:44:25 +01:00
Deeman
9d0e6843f4 feat(secrets): add SOPS+age secret management infrastructure
- .sops.yaml: creation rules matching .env.{dev,prod}.sops (dotenv format)
- .env.dev.sops: encrypted dev defaults (blank API keys, local paths)
- .env.prod.sops: encrypted prod template (placeholder values to fill in)
- Makefile: root Makefile with secrets-decrypt-dev/prod, secrets-edit-dev/prod, css-build/watch
- .gitignore: add age-key.txt

Dev workflow: make secrets-decrypt-dev → .env (repo root) → web app picks it up.
Server: deploy.sh will auto-decrypt .env.prod.sops on each deploy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:36:14 +01:00
Deeman
3ab0cd122f claude 2026-02-26 02:45:09 +01:00
Deeman
302ba07851 add untracked 2026-02-26 02:44:48 +01:00
Deeman
3629783bbf feat: add CMS/pSEO engine, feature flags, email log (template v0.17.0 backport) ... 2026-02-26 02:43:10 +01:00
Deeman
3ae8c7e98a merge: SQL fixes (cot_positioning SELECT *, fct_weather_daily src ref) 2026-02-26 01:32:19 +01:00
Deeman
690691ea36 fix(transform): expand SELECT * in cot_positioning, fix src ref in fct_weather_daily
- obt_cot_positioning.sql: replace final SELECT * with explicit column list
  so linter can resolve schema without foundation.fct_cot_positioning in DB
- fct_weather_daily.sql: fix HASH(location_id, src."date") → located."date"
  (cast_and_clean CTE references FROM located, not FROM src)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 01:32:16 +01:00
Deeman
8285daaa17 merge: Open-Meteo weather extractor (replaces OpenWeatherMap) 2026-02-26 01:01:29 +01:00
Deeman
9de3a3ba01 feat(extract): replace OpenWeatherMap with Open-Meteo weather extractor
Replaced the OWM extractor (8 locations, API key required, 14,600-call
backfill over 30+ days) with Open-Meteo (12 locations, no API key,
ERA5 reanalysis, full backfill in 12 API calls ~30 seconds).

- Rename extract/openweathermap → extract/openmeteo (git mv)
- Rewrite api.py: fetch_archive (ERA5, date-range) + fetch_recent (forecast,
  past_days=10 to cover ERA5 lag); 9 daily variables incl. et0 and VPD
- Rewrite execute.py: _split_and_write() unzips parallel arrays into per-day
  flat JSON; no cursor / rate limiting / call cap needed
- Update pipelines.py: --package openmeteo, timeout 120s (was 1200s)
- Update fct_weather_daily.sql: flat Open-Meteo field names (temperature_2m_*
  etc.), remove pressure_afternoon_hpa, add et0_mm + vpd_max_kpa + is_high_vpd
- Remove OPENWEATHERMAP_API_KEY from CLAUDE.md env vars table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 00:59:54 +01:00
Deeman
32c9d7ae07 merge: expand weather locations to 12 regions 2026-02-26 00:12:33 +01:00
Deeman
4817f7de2f feat(extract): add 4 weather locations (ES, PE, UG, CI)
Expands coverage from 8 to 12 coffee-growing regions:
- brazil_espirito_santo (Robusta/Conilon — largest BR Robusta state)
- peru_jaen (Arabica — fastest-growing origin, top-10 global producer)
- uganda_elgon (Robusta — 4th largest African producer)
- ivory_coast_daloa (Robusta — historically significant West African origin)

Now 8 Arabica + 4 Robusta regions = 12 calls/day (well within OWM free tier).
Backfill cost: ~21,900 additional calls over ~44 days at 500/run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 00:12:29 +01:00
Deeman
99055caaa2 merge: OpenWeatherMap daily weather extractor 2026-02-25 22:40:32 +01:00
Deeman
08e74665bb feat(extract): add OpenWeatherMap daily weather extractor
Adds extract/openweathermap package with daily weather extraction for 8
coffee-growing regions (Brazil, Vietnam, Colombia, Ethiopia, Honduras,
Guatemala, Indonesia). Feeds crop stress signal for commodity sentiment score.

Extractor:
- OWM One Call API 3.0 / Day Summary — one JSON.gz per (location, date)
- extract_weather: daily, fetches yesterday + today (16 calls max)
- extract_weather_backfill: fills 2020-01-01 to yesterday, capped at 500
  calls/run with resume cursor '{location_id}:{date}' for crash safety
- Full idempotency via file existence check; state tracking via extract_core

SQLMesh:
- seeds.weather_locations (8 regions with lat/lon/variety)
- foundation.fct_weather_daily: INCREMENTAL_BY_TIME_RANGE, grain
  (location_id, observation_date), dedup via hash key, crop stress flags:
  is_frost (<2°C), is_heat_stress (>35°C), is_drought (<1mm), in_growing_season

Landing path: LANDING_DIR/weather/{location_id}/{year}/{date}.json.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 22:40:27 +01:00
Deeman
817d9c16b7 ci: enable deploy stage with SSH-based blue/green deployment
Writes .env to web/, runs deploy.sh from web/. Pushes env vars
from GitLab CI/CD variables to the server on every master push.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 23:09:32 +01:00
Deeman
c3c8333407 refactor(transform): remove raw layer, read landing zone directly
- Delete 6 data raw models (coffee_prices, cot_disaggregated, ice_*,
  psd_data) — pure read_csv passthroughs with no added value
- Move 3 PSD seed models raw/ → seeds/, rename schema raw.* → seeds.*
- Update staging.psdalldata__commodity: read_csv(@psd_glob()) directly,
  join seeds.psd_* instead of raw.psd_*
- Update 5 foundation models: inline read_csv() with src CTE, removing
  raw.* dependency (fct_coffee_prices, fct_cot_positioning, fct_ice_*)
- Remove fixture-based SQLMesh test that depended on raw.cot_disaggregated
  (unit tests incompatible with inline read_csv; integration run covers this)
- Update readme.md: 3-layer architecture (staging/foundation → serving)

Landing files are immutable and content-addressed — the landing directory
is the audit trail. A raw SQL layer duplicated file bytes into DuckDB
with no added value.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 17:30:18 +01:00
Deeman
1814a76e74 legal: add imprint page, upgrade privacy policy to GDPR-proper
- Add /imprint route and template (§5 DDG compliant, Hendrik's details)
- Rewrite privacy.html: data controller, legal basis per GDPR Art. 6,
  sub-processors (Paddle/Resend/Umami/Hetzner), retention periods,
  GDPR rights with article references, BfDI supervisory authority link
- Add /imprint to sitemap.xml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 15:54:26 +01:00
Deeman
9a67617f6a infra: fix CRLF line endings in setup_server.sh 2026-02-22 15:24:22 +01:00
Deeman
7153be899c infra: rename app user to beanflows_service 2026-02-22 15:14:44 +01:00
Deeman
8d6d79345c infra: add setup_server.sh for one-time server provisioning
Creates the beanflows system user, /opt/beanflows directory, and an
ed25519 GitLab deploy key. Prints the public key to add as a read-only
deploy key on the repo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 15:12:21 +01:00
Deeman
930ebec259 fix: ADMIN_EMAIL → ADMIN_EMAILS, add default admin emails
Rename env var to plural (CSV list) in CI yml to match the actual
config key. Add hendrik@beanflow.coffee and simon@beanflows.coffee
as hardcoded defaults so they get admin access without needing the
env var set explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 14:59:52 +01:00
Deeman
80c1163a7f feat: extraction framework overhaul — extract_core shared package + SQLite state tracking
- Add extract/extract_core/ workspace package with three modules:
  - state.py: SQLite run tracking (open_state_db, start_run, end_run, get_last_cursor)
  - http.py: niquests session factory + etag normalization helpers
  - files.py: landing_path, content_hash, write_bytes_atomic (atomic gzip writes)
- State lives at {LANDING_DIR}/.state.sqlite — no extra env var needed
- SQLite chosen over DuckDB: state tracking is OLTP (row inserts/updates), not analytical
- Refactor all 4 extractors (psdonline, cftc_cot, coffee_prices, ice_stocks):
  - Replace inline boilerplate with extract_core helpers
  - Add start_run/end_run tracking to every extraction entry point
  - extract_cot_year returns int (bytes_written) instead of bool
- Update tests: assert result == 0 (not `is False`) for the return type change

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 14:37:50 +01:00
Deeman
fc4121183c fix: replace stale analytics._conn checks with _db_path
dashboard/routes.py (4 places) and admin/routes.py still checked
analytics._conn is not None after _conn was removed in the two-file
refactor — causing AttributeError → 500 on every dashboard page.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 13:04:32 +01:00
Deeman
9ee7a3d9d3 fix: export_serving — Arrow-based copy, rename to analytics.duckdb
Two bugs fixed:

1. Cross-connection COPY: DuckDB doesn't support referencing another
   connection's tables as src.serving.table. Replace with Arrow as
   intermediate: src reads to Arrow, dst.register() + CREATE TABLE.

2. Catalog/schema name collision: naming the export file serving.duckdb
   made DuckDB assign catalog name "serving" — same as the schema we
   create inside it. Every serving.table query became ambiguous. Rename
   to analytics.duckdb (catalog "analytics", schema "serving" = no clash).

   SERVING_DUCKDB_PATH values updated: serving.duckdb → analytics.duckdb
   in supervisor, service, bootstrap, dev_run.sh, .env.example, docker-compose.

3. Temp file: use _export.duckdb (not serving.duckdb.tmp) to avoid
   the same catalog collision during the write phase.

Verified: 6 tables exported, serving.* queries work read-only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 12:54:39 +01:00
Deeman
ac8ab47448 feat: dev_run.sh — auto-run pipeline on first startup
On the first `./scripts/dev_run.sh` invocation (serving.duckdb absent),
automatically run extract → transform → export_serving from the repo root
so the dashboard is populated without any manual steps.

Subsequent runs skip the pipeline for a fast startup. Delete serving.duckdb
from the repo root to force a full pipeline re-run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:15:34 +01:00
Deeman
cb799ff019 fix: analytics fetch_analytics returns [] when DB not configured
The assert _db_path in fetch_analytics() would crash dashboard routes
locally when SERVING_DUCKDB_PATH is unset or serving.duckdb doesn't
exist yet. Change to graceful return [] so the app degrades cleanly.

Also add SERVING_DUCKDB_PATH=../serving.duckdb to local .env so the
web app will auto-connect once `materia pipeline run export_serving`
has been run for the first time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:10:36 +01:00
Deeman
b899bcbad4 feat: DuckDB two-file architecture — resolve SQLMesh/web-app lock contention
Split the single lakehouse.duckdb into two files to eliminate the exclusive
write-lock conflict between SQLMesh (pipeline) and the Quart web app (reader):

  lakehouse.duckdb  — SQLMesh exclusive (all pipeline layers)
  serving.duckdb    — web app reads (serving tables only, atomically swapped)

Changes:

web/src/beanflows/analytics.py
- Replace persistent global _conn with per-thread connections (threading.local)
- Add _get_conn(): opens read_only=True on first call per thread, reopens
  automatically on inode change (~1μs os.stat) to pick up atomic file swaps
- Switch env var from DUCKDB_PATH → SERVING_DUCKDB_PATH
- Add module docstring documenting architecture + DuckLake migration path

web/src/beanflows/app.py
- Startup check: use SERVING_DUCKDB_PATH
- Health check: use _db_path instead of _conn

src/materia/export_serving.py (new)
- Reads all serving.* tables from lakehouse.duckdb (read_only)
- Writes to serving_new.duckdb, then os.rename → serving.duckdb (atomic)
- ~50 lines; runs after each SQLMesh transform

src/materia/pipelines.py
- Add export_serving pipeline entry (uv run python -c ...)

infra/supervisor/supervisor.sh
- Add SERVING_DUCKDB_PATH env var comment
- Add export step: uv run materia pipeline run export_serving

infra/supervisor/materia-supervisor.service
- Add Environment=SERVING_DUCKDB_PATH=/data/materia/serving.duckdb

infra/bootstrap_supervisor.sh
- Add SERVING_DUCKDB_PATH to .env template

web/.env.example + web/docker-compose.yml
- Document both env vars; switch web service to SERVING_DUCKDB_PATH

web/src/beanflows/dashboard/templates/settings.html
- Minor settings page fix from prior session

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 11:06:55 +01:00
Deeman
ca7b2ab18b settings: remove Write scope, add billing portal error handling
- Remove 'Write' scope checkbox from API key creation form — BeanFlows
  is a read-only data platform, write keys are meaningless to users.
  Scope is now always 'read' via hidden input.
- Add try/except in billing.manage route so Paddle API failures (e.g.
  no live credentials in dev) show a user-facing flash error instead
  of a 500.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:38:51 +01:00
Deeman
c92e5a8e07 ice_stocks: add backfill extractor for historical daily stocks
The ICE API at /marketdata/api/reports/293/results stores all historical
daily XLS reports date-descending. Previously the extractor only fetched
the latest. New extract_ice_backfill entry point pages through the API
and downloads all matching 'Daily Warehouse Stocks' reports.

- ice_api.py: add find_all_reports() alongside find_latest_report()
- execute.py: add extract_ice_stocks_backfill(max_pages=3) — default
  covers ~6 months; max_pages=20 fetches ~3 years of history
- pyproject.toml: register extract_ice_backfill entry point

Ran backfill: 131 files, 2025-08-15 → 2026-02-20

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:35:57 +01:00
Deeman
090fcb4fdb dashboard: JTBD-driven restructure — Pulse, Supply, Positioning, Warehouse
Replace monolithic Overview (8 charts, 24 metric cards, no filters) with
a JTBD-driven 5-page dashboard optimised for the data-drop moment.

Navigation (sidebar + mobile nav):
- Pulse      /dashboard/         — full-picture overview, 10-second read
- Supply     /dashboard/supply   — USDA WASDE deep dive, range + metric filters
- Positioning /dashboard/positioning — KC=F price + CFTC COT, range filter
- Warehouse  /dashboard/warehouse — ICE certified stocks, range + view filters
- Origins    /dashboard/countries — unchanged (HTMX already live)
- Settings                       — unchanged

New templates:
- pulse.html: 4 metric cards + freshness bar + 2×2 sparkline grid
- supply.html + supply_canvas.html: HTMX partial with 5Y/10Y/Max and
  Production/Exports/Imports/Stocks filter pills; free plan gated at 5Y
- positioning.html + positioning_canvas.html: price chart + COT dual-axis;
  client-side MA toggles (no server round-trip)
- warehouse.html + warehouse_canvas.html: Daily Stocks / Aging / By Port
  view switcher; only active view's queries fire

routes.py:
- RANGE_MAP dict maps URL param → {days, weeks, months, years}
- _safe() helper absorbs asyncio.gather exceptions with defaults
- index() rewritten: 8 lightweight queries, renders pulse.html
- supply(), positioning(), warehouse() routes added; HX-Request detection
  returns canvas partial; full request returns page shell

input.css:
- All cc-* component classes moved from countries.html inline style to
  global stylesheet (cc-chart-card, cc-trow 3-col grid, cc-empty, etc.)
- filter-bar, filter-pills, filter-pill, canvas-loading, freshness-badge
- cc-chart-body canvas max-height 340px (prevents gigantic charts on 4K)

_feedback_widget.html:
- Mobile: collapses to circular icon button at bottom:72px to clear 5-item
  nav bar; "Feedback" label hidden on mobile

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-22 01:27:44 +01:00
Deeman
0d78a22023 changelog: bring up to date through Feb 2026
- [Unreleased]: ICE overhaul (aging + by-port + API discovery + XLS parsing),
  extract_all meta-pipeline, Origin Intelligence redesign + HTMX, axis labels
- [0.2.0]: CFTC COT, KC=F prices, ICE daily stocks, methodology page, supervisor
- [0.1.0]: initial dashboard, country comparison, REST API, plan tiers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:22:04 +01:00
Deeman
6d18a4a7c2 vision: update current state to reflect ICE overhaul + dashboard work shipped
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:00:03 +01:00
Deeman
fb8c6cdb3d overview: add missing axis labels to supply/demand, STU, and top-producers charts
- Supply & Demand chart: Y-axis → '1,000 60-kg bags'
- Stock-to-Use chart: Y-axis → 'Stock-to-Use (%)'
- Top Producers bar: X-axis → '1,000 60-kg bags'
- YoY table: Production column header → 'Production (1k bags)'

COT, price, ICE stocks, aging, and by-port charts already had labels.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 22:55:16 +01:00
Deeman
3f31e33d12 countries: show metric label on chart Y-axis and table value column
- Chart Y-axis title: "production  (1k 60-kg bags)" via Chart.js title
- Rankings table: column header row with "Country" / "production (1k bags)"
- Table section header changes to "Latest snapshot · <metric>"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 22:53:34 +01:00
Deeman
8af7d5e189 rename Countries nav item to Origins
Matches the 'Origin Intelligence' page heading.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 22:48:34 +01:00
Deeman
32e54f0381 countries: HATEOAS + HTMX — click origin to update chart instantly
Replace Apply button flow with immediate HTMX partial fetches:
- toggleCountry() does an optimistic UI update (row + badge) then
  calls htmx.ajax() targeting #cc-canvas with swap=innerHTML
- URL is pushed to history on every selection change (bookmarkable)
- HX-Request now returns countries_canvas.html fragment (chips +
  chart/empty + inline IIFE that re-syncs globals + re-inits Chart.js)
- Panel (dark) is never swapped; canvas fades during in-flight request
- PALETTE, buildRankings(), initChart() defined once on page load,
  called by both initial render and partial IIFE after each swap
- Apply button removed; Clear triggers fetchCanvas() with empty codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 22:40:59 +01:00
Deeman
91a9fb83be redesign Countries page: commodity intelligence terminal aesthetic
Replace generic multi-select + plain card with a two-panel layout:
- Dark espresso selector panel (sticky, searchable, click-to-toggle)
  with country rows showing rank, name, production figure, checkbox
- Right canvas: metric segment tabs, selected-country chips (colored),
  Chart.js line chart with dark espresso tooltip, and a JS-built
  rankings table with proportional colored bars (latest year)
- Smooth fade-in animations, monospaced figures, copper accent palette

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 22:30:37 +01:00
Deeman
493ce64fde fix ice_stocks XLS date parsing: handle 'Feb 20, 2026' format
ICE changed the daily stocks XLS header from 'As of: 1/30/2026' to
'As of: Feb 20, 2026  1:35:39PM'. Expand _build_canonical_csv_from_xls
to try multiple strptime formats (%m/%d/%Y, %b %d, %Y, etc.) on both
single-token and three-token date candidates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 22:18:17 +01:00
Deeman
562e2d1847 Add extract_all meta-pipeline: runs all four data source extractors in sequence
Sequences: extract (PSD) → extract_cot (CFTC) → extract_prices (KC=F) → extract_ice_all (ICE)
Stops and reports on first failure. META_PIPELINES dict makes it easy to
add more meta-pipelines as sources expand.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 22:00:10 +01:00
Deeman
ff896685d2 Add extract_ice_all command to run all three ICE extractors in sequence
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 21:59:08 +01:00
Deeman
6ba1afd8c3 Merge worktree-ice-extraction-overhaul: ICE aging + by-port app integration
Serving models, API endpoints, and dashboard charts for both new ICE datasets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 21:52:39 +01:00
Deeman
ff956b0138 ICE aging + by-port: serving models, API endpoints, dashboard integration
- serving/ice_aging_stocks.sql: pass-through from foundation, parses age
  bucket string to start/end days ints for correct sort order
- serving/ice_warehouse_stocks_by_port.sql: monthly by-port since 1996,
  adds MoM change, MoM %, 12-month rolling average
- analytics.py: get_ice_aging_latest(), get_ice_aging_trend(),
  get_ice_stocks_by_port_trend(), get_ice_stocks_by_port_latest()
- api/routes.py: GET /commodities/<code>/stocks/aging and
  GET /commodities/<code>/stocks/by-port with auth + rate limiting
- dashboard/routes.py: add 3 new queries to asyncio.gather(), pass to template
- index.html: aging stacked bar chart (age buckets × port) with 4 metric
  cards; by-port stacked area chart (30-year history) with 4 metric cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 21:52:35 +01:00
Deeman
04f8df88fe Merge worktree-ice-extraction-overhaul: ICE extraction overhaul
API discovery + aging report + historical by-port backfill.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 21:13:21 +01:00
Deeman
ff7301d6a8 ICE extraction overhaul: API discovery + aging report + historical backfill
- Replace brittle ICE_STOCKS_URL env var with API-based URL discovery via
  the private ICE Report Center JSON API (no auth required)
- Add rolling CSV → XLS fallback in extract_ice_stocks() using
  find_latest_report() from ice_api.py
- Add ice_api.py: fetch_report_listings(), find_latest_report() with
  pagination up to MAX_API_PAGES
- Add xls_parse.py: detect_file_format() (magic bytes), xls_to_rows()
  using xlrd for OLE2/BIFF XLS files
- Add extract_ice_aging(): monthly certified stock aging report by
  age bucket × port → ice_aging/ landing dir
- Add extract_ice_historical(): 30-year EOM by-port stocks from static
  ICE URL → ice_stocks_by_port/ landing dir
- Add xlrd>=2.0.1 (parse XLS), xlwt>=1.3.0 (dev, test fixtures)
- Add SQLMesh raw + foundation models for both new datasets
- Add ice_aging_glob(), ice_stocks_by_port_glob() macros
- Add extract_ice_aging + extract_ice_historical pipeline entries
- Add 12 unit tests (format detection, XLS roundtrip, API mock, CSV output)

Seed files (data/landing/ice_aging/seed/ and ice_stocks_by_port/seed/)
must be created locally — data/ is gitignored.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 21:13:18 +01:00
Deeman
ff39d65dc6 scout: extract to standalone repo at Projects/scout
Move scout MCP server out of tools/scout/ into its own repo at
/var/home/Deeman/Projects/scout. Update .mcp.json to use absolute path
so any project can reference it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 17:58:03 +01:00
Deeman
079c189e0a scout: add scout_click_coords tool, document Sourcepoint limitation
- Add scout_click_coords for manual coordinate-based clicks (useful when
  CSS selectors can't reach cross-origin iframes)
- Document in _dismiss_cookie_banner why Sourcepoint is not auto-dismissed:
  HAR captures traffic regardless of banner visibility; coordinate clicks
  are too brittle across screen sizes
- Add missing asyncio import to server.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 17:34:08 +01:00
Deeman
d96f977c0f fix scout_js: reference browser._state not undefined _state
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 17:26:50 +01:00
Deeman
ab9dc62dd6 scout: add German DSGVO text patterns + Usercentrics shadow DOM support
- German accept texts: Alle akzeptieren, Akzeptieren, Zustimmen, Einverstanden, etc.
- Usercentrics (shadow DOM) support — very common with German publishers
  (Bild, Spiegel, Focus, etc.) — requires shadowRoot traversal, not addressable
  by normal CSS selectors
- Consentmanager selectors — another common German CMP
- Note: German sites tested (Spiegel, Zeit, finanzen.net, Bild) showed no banners
  because Pydoll reuses the existing Chrome user profile with stored consents.
  New-site behaviour will be handled by the added patterns.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 17:23:33 +01:00
Deeman
ec7cfda605 scout: JS-based cookie dismiss + scout_js tool
- _dismiss_cookie_banner: switch to execute_script for CSS selector clicks
  (OneTrust on ICE uses pointer-events:none overlay — mouse clicks don't reach it,
  but JS .click() bypasses this). Falls back to text-based JS search.
- Selectors cover: OneTrust, Cookiebot, CookieYes, generic [id/class*=accept/consent]
- Text fallback covers: IAB TCF "Allow All" pattern (Reuters, etc.)
- Add scout_js tool: run arbitrary JS on current page — useful for shadow DOM,
  z-index overlays, and any element that resists normal CSS/text selectors
- Add _click_via_js helper for targeted JS injection clicks

Tested patterns:
  ICE (theice.com) — OneTrust #onetrust-accept-btn-handler — requires JS click
  CFTC (cftc.gov) — no banner
  Reuters — IAB TCF "Allow All" — text click works

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 17:19:34 +01:00
Deeman
3d3f375e01 Merge worktree-cot-integration: Phase 1 + scout MCP server
- Phase 1A-C: KC=F price extraction, SQLMesh models, dashboard charts, API endpoints
- ICE warehouse stocks: extraction package, SQLMesh models, dashboard + API
- Methodology page (/methodology) with all data sources documented
- Supervisor pipeline automation with webhook alerting
- Scout MCP server (tools/scout/) for browser recon via Pydoll
- msgspec added as workspace dependency for typed boundary structs
- vision.md updated to reflect Phase 1 completion (Feb 2026)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 15:57:49 +01:00
Deeman
b167a0a9f4 Add scout MCP server for browser recon + msgspec workspace dep
- tools/scout/: browser automation MCP server using Pydoll (CDP, no WebDriver)
  - scout_visit, scout_elements (text-first), scout_click, scout_fill, scout_select
  - scout_scroll, scout_text, scout_screenshot (opt-in)
  - scout_har_start / scout_har_stop (asyncio task holds recording context open)
  - scout_analyze: HAR parsing with HarEntry/HarSummary msgspec structs
  - Standalone project (not workspace member — websockets conflict with prefect)
  - Runs via: uv run --directory tools/scout scout-server

- .mcp.json: registers scout as Claude Code MCP server (project scope)

- msgspec>=0.19 added to root project deps (workspace-wide struct/validation)

- coding_philosophy.md: document msgspec as approved dep, usage rules

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 15:44:02 +01:00