Commit Graph

12 Commits

Author SHA1 Message Date
Deeman
301f3b76c3 feat: add scripts/prod_query.py — SSH query tool for prod DuckDB
All checks were successful
CI / test (push) Successful in 56s
CI / tag (push) Successful in 3s
Runs read-only SQL against analytics.duckdb (default) or lakehouse.duckdb
on the prod server over SSH. SQL is base64-encoded to avoid shell escaping.
Supports TSV (default) and JSON output. Blocks mutation keywords.

For lakehouse, works around the DuckDB catalog naming issue (SQLMesh views
reference "local" but the file creates catalog "lakehouse") by attaching
the file as the "local" catalog.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 16:15:38 +01:00
Deeman
90754b8d9f chore: move ci.py to ~/.claude/scripts (uv inline script, no project dep)
All checks were successful
CI / test (push) Successful in 53s
CI / tag (push) Successful in 2s
Script now lives globally as a uv inline-dependency script.
Removes per-project scripts/ci.py and the msgspec dev dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 15:51:36 +01:00
Deeman
277c92e507 chore: add scripts/ci.py for Gitea CI pipeline status
Copies ci.py from beanflows (same script, shared across projects).
Adds msgspec dev dependency required by the script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 15:38:42 +01:00
Deeman
c3f15535b8 fix(pipeline): handle DuckDB catalog naming in diagnostic script
The lakehouse.duckdb file uses catalog "lakehouse" not "local", causing
SQLMesh logical views to break. Script now auto-detects the catalog via
USE and falls back to physical tables when views fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 17:06:44 +01:00
Deeman
6b7fa45bce feat(admin): add pipeline diagnostic script + extraction card UX improvements
- Add scripts/check_pipeline.py: read-only diagnostic for pricing pipeline
  row counts, date range analysis, HAVING filter impact, join coverage
- Add description field to all 12 workflows in workflows.toml
- Parse and display descriptions on extraction status cards
- Show spinner + "Running" state with blue-tinted card border
- Display start time with "running..." text for active extractions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 15:40:12 +01:00
Deeman
60fa2bc720 test(billing): add Stripe E2E test scripts for sandbox validation
- test_stripe_sandbox.py: API-only validation of all 17 products (67 tests)
- stripe_e2e_setup.py: webhook endpoint registration via ngrok
- stripe_e2e_test.py: live webhook tests with real DB verification (67 tests)
- stripe_e2e_checkout_test.py: checkout webhook tests for credit packs,
  sticky boosts, and business plan PDF purchases (40 tests)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 10:50:26 +01:00
Deeman
97c5846d51 feat(extract): GISCO extractor + wire all unscheduled extractors
- New gisco.py: proper extractor module replacing scripts/download_gisco_nuts.py.
  Writes uncompressed .geojson (ST_Read can't handle .gz). Fixed partition path
  gisco/2024/01/nuts2_boundaries.geojson; cursor tracking skips re-download monthly.
- all.py: import + register gisco in EXTRACTORS (9 independent, 1 dep)
- pyproject.toml: add extract-gisco entry point
- workflows.toml: add census_usa, census_usa_income, eurostat_city_labels,
  ons_uk, gisco — all monthly, no dependencies
- Delete scripts/download_gisco_nuts.py (superseded)

Unblocks: stg_nuts2_boundaries, stg_regional_income, stg_income_usa,
and 4 downstream models (dim_locations, pseo_city_costs_de,
location_opportunity_profile, pseo_country_overview).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 15:49:39 +01:00
Deeman
c3531bd75d feat(data): Phase 2b complete — EU NUTS-2 spatial join + US state income
- stg_regional_income: expanded NUTS-1+2 (LENGTH IN 3,4), nuts_code rename, nuts_level
- stg_nuts2_boundaries: new — ST_Read GISCO GeoJSON, bbox columns for spatial pre-filter
- stg_income_usa: new — Census ACS state-level income staging model
- dim_locations: spatial join replaces admin1_to_nuts1 VALUES CTE; us_income CTE with
  PPS normalisation (income/80610×30000); income cascade: NUTS-2→NUTS-1→US state→country
- init_landing_seeds: compress=False for ST_Read files; gisco GeoJSON + census income seeds
- CHANGELOG + PROJECT.md updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 11:03:16 +01:00
Deeman
409dc4bfac feat(data): Phase 2b step 1 — expand stg_regional_income + Census income extractor
- stg_regional_income.sql: accept NUTS-1 (3-char) + NUTS-2 (4-char) codes;
  rename nuts1_code → nuts_code; add nuts_level column; NUTS-2 rows were
  already in the landing zone but discarded by LENGTH(geo_code) = 3
- scripts/download_gisco_nuts.py: one-time download of GISCO NUTS-2 boundary
  GeoJSON (NUTS_RG_20M_2021_4326_LEVL_2.geojson, ~5MB) to landing zone;
  uncompressed because ST_Read cannot read .gz files
- census_usa_income.py: new extractor for ACS B19013_001E state-level median
  household income; follows census_usa.py pattern; 51 states + DC
- all.py + pyproject.toml: register census_usa_income extractor

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 10:58:12 +01:00
Deeman
5ade38eeaf feat(data): Phase 2a — NUTS-1 regional income for opportunity score
- eurostat.py: add nama_10r_2hhinc dataset config; append filter params to
  request URL so server pre-filters the large cube before download
- stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz,
  filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB
- dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer)
  + regional_income CTE; final SELECT uses COALESCE(regional, country) income
- init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz

Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of
both inheriting the same national average (~25.5K PPS).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 10:26:15 +01:00
Deeman
b33dd51d76 feat: standardise recheck availability to JSONL output
- extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz
  (one venue per line with date/captured_at_utc/recheck_hour injected);
  uses compress_jsonl_atomic; removes write_gzip_atomic import
- stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited
  read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL;
  old recheck_blob CTE kept for transition
- init_landing_seeds.py: add JSONL recheck seed alongside blob seed
- Docs: README landing structure + data sources table updated; CHANGELOG
  availability bullets updated; data-sources-inventory paths corrected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 14:52:47 +01:00
Deeman
ec7f115f16 feat: add init_landing_seeds.py for empty-landing bootstrap
Creates minimal .jsonl.gz and .json.gz seed files so all SQLMesh staging
models can compile and run before real extraction data arrives.

Each seed has a single null record filtered by the staging model's WHERE
clause (tenant_id IS NOT NULL, geoname_id IS NOT NULL, type IS NOT NULL, etc).

Covers both formats (JSONL + blob) for the UNION ALL transition CTEs:
  playtomic/1970/01/: tenants.{jsonl,json}.gz, availability seeds (morning + recheck)
  geonames/1970/01/: cities_global.{jsonl,json}.gz
  overpass_tennis/1970/01/: courts.{jsonl,json}.gz
  overpass/1970/01/: courts.json.gz (padel, unchanged format)
  eurostat/1970/01/: urb_cpop1.json.gz, ilc_di03.json.gz
  eurostat_city_labels/1970/01/: cities_codelist.json.gz
  ons_uk/1970/01/: lad_population.json.gz
  census_usa/1970/01/: acs5_places.json.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 12:24:48 +01:00