padelnomics

Author	SHA1	Message	Date
Deeman	301f3b76c3	feat: add scripts/prod_query.py — SSH query tool for prod DuckDB All checks were successful CI / test (push) Successful in 56s Details CI / tag (push) Successful in 3s Details Runs read-only SQL against analytics.duckdb (default) or lakehouse.duckdb on the prod server over SSH. SQL is base64-encoded to avoid shell escaping. Supports TSV (default) and JSON output. Blocks mutation keywords. For lakehouse, works around the DuckDB catalog naming issue (SQLMesh views reference "local" but the file creates catalog "lakehouse") by attaching the file as the "local" catalog. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 16:15:38 +01:00
Deeman	90754b8d9f	chore: move ci.py to ~/.claude/scripts (uv inline script, no project dep) All checks were successful CI / test (push) Successful in 53s Details CI / tag (push) Successful in 2s Details Script now lives globally as a uv inline-dependency script. Removes per-project scripts/ci.py and the msgspec dev dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 15:51:36 +01:00
Deeman	277c92e507	chore: add scripts/ci.py for Gitea CI pipeline status Copies ci.py from beanflows (same script, shared across projects). Adds msgspec dev dependency required by the script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-06 15:38:42 +01:00
Deeman	c3f15535b8	fix(pipeline): handle DuckDB catalog naming in diagnostic script The lakehouse.duckdb file uses catalog "lakehouse" not "local", causing SQLMesh logical views to break. Script now auto-detects the catalog via USE and falls back to physical tables when views fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 17:06:44 +01:00
Deeman	6b7fa45bce	feat(admin): add pipeline diagnostic script + extraction card UX improvements - Add scripts/check_pipeline.py: read-only diagnostic for pricing pipeline row counts, date range analysis, HAVING filter impact, join coverage - Add description field to all 12 workflows in workflows.toml - Parse and display descriptions on extraction status cards - Show spinner + "Running" state with blue-tinted card border - Display start time with "running..." text for active extractions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 15:40:12 +01:00
Deeman	60fa2bc720	test(billing): add Stripe E2E test scripts for sandbox validation - test_stripe_sandbox.py: API-only validation of all 17 products (67 tests) - stripe_e2e_setup.py: webhook endpoint registration via ngrok - stripe_e2e_test.py: live webhook tests with real DB verification (67 tests) - stripe_e2e_checkout_test.py: checkout webhook tests for credit packs, sticky boosts, and business plan PDF purchases (40 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 10:50:26 +01:00
Deeman	97c5846d51	feat(extract): GISCO extractor + wire all unscheduled extractors - New gisco.py: proper extractor module replacing scripts/download_gisco_nuts.py. Writes uncompressed .geojson (ST_Read can't handle .gz). Fixed partition path gisco/2024/01/nuts2_boundaries.geojson; cursor tracking skips re-download monthly. - all.py: import + register gisco in EXTRACTORS (9 independent, 1 dep) - pyproject.toml: add extract-gisco entry point - workflows.toml: add census_usa, census_usa_income, eurostat_city_labels, ons_uk, gisco — all monthly, no dependencies - Delete scripts/download_gisco_nuts.py (superseded) Unblocks: stg_nuts2_boundaries, stg_regional_income, stg_income_usa, and 4 downstream models (dim_locations, pseo_city_costs_de, location_opportunity_profile, pseo_country_overview). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-01 15:49:39 +01:00
Deeman	c3531bd75d	feat(data): Phase 2b complete — EU NUTS-2 spatial join + US state income - stg_regional_income: expanded NUTS-1+2 (LENGTH IN 3,4), nuts_code rename, nuts_level - stg_nuts2_boundaries: new — ST_Read GISCO GeoJSON, bbox columns for spatial pre-filter - stg_income_usa: new — Census ACS state-level income staging model - dim_locations: spatial join replaces admin1_to_nuts1 VALUES CTE; us_income CTE with PPS normalisation (income/80610×30000); income cascade: NUTS-2→NUTS-1→US state→country - init_landing_seeds: compress=False for ST_Read files; gisco GeoJSON + census income seeds - CHANGELOG + PROJECT.md updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 11:03:16 +01:00
Deeman	409dc4bfac	feat(data): Phase 2b step 1 — expand stg_regional_income + Census income extractor - stg_regional_income.sql: accept NUTS-1 (3-char) + NUTS-2 (4-char) codes; rename nuts1_code → nuts_code; add nuts_level column; NUTS-2 rows were already in the landing zone but discarded by LENGTH(geo_code) = 3 - scripts/download_gisco_nuts.py: one-time download of GISCO NUTS-2 boundary GeoJSON (NUTS_RG_20M_2021_4326_LEVL_2.geojson, ~5MB) to landing zone; uncompressed because ST_Read cannot read .gz files - census_usa_income.py: new extractor for ACS B19013_001E state-level median household income; follows census_usa.py pattern; 51 states + DC - all.py + pyproject.toml: register census_usa_income extractor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 10:58:12 +01:00
Deeman	5ade38eeaf	feat(data): Phase 2a — NUTS-1 regional income for opportunity score - eurostat.py: add nama_10r_2hhinc dataset config; append filter params to request URL so server pre-filters the large cube before download - stg_regional_income.sql: new staging model — reads nama_10r_2hhinc.json.gz, filters to NUTS-1 codes (3-char), normalises EL→GR / UK→GB - dim_locations.sql: add admin1_to_nuts1 VALUES CTE (16 German Bundesländer) + regional_income CTE; final SELECT uses COALESCE(regional, country) income - init_landing_seeds.py: add empty seed for nama_10r_2hhinc.json.gz Munich/Bayern now scores ~29K PPS vs Chemnitz/Sachsen ~19K PPS instead of both inheriting the same national average (~25.5K PPS). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-27 10:26:15 +01:00
Deeman	b33dd51d76	feat: standardise recheck availability to JSONL output - extract_recheck() now writes availability_{date}_recheck_{HH}.jsonl.gz (one venue per line with date/captured_at_utc/recheck_hour injected); uses compress_jsonl_atomic; removes write_gzip_atomic import - stg_playtomic_availability: add recheck_jsonl CTE (newline_delimited read_json on *.jsonl.gz recheck files); include in all_venues UNION ALL; old recheck_blob CTE kept for transition - init_landing_seeds.py: add JSONL recheck seed alongside blob seed - Docs: README landing structure + data sources table updated; CHANGELOG availability bullets updated; data-sources-inventory paths corrected Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 14:52:47 +01:00
Deeman	ec7f115f16	feat: add init_landing_seeds.py for empty-landing bootstrap Creates minimal .jsonl.gz and .json.gz seed files so all SQLMesh staging models can compile and run before real extraction data arrives. Each seed has a single null record filtered by the staging model's WHERE clause (tenant_id IS NOT NULL, geoname_id IS NOT NULL, type IS NOT NULL, etc). Covers both formats (JSONL + blob) for the UNION ALL transition CTEs: playtomic/1970/01/: tenants.{jsonl,json}.gz, availability seeds (morning + recheck) geonames/1970/01/: cities_global.{jsonl,json}.gz overpass_tennis/1970/01/: courts.{jsonl,json}.gz overpass/1970/01/: courts.json.gz (padel, unchanged format) eurostat/1970/01/: urb_cpop1.json.gz, ilc_di03.json.gz eurostat_city_labels/1970/01/: cities_codelist.json.gz ons_uk/1970/01/: lad_population.json.gz census_usa/1970/01/: acs5_places.json.gz Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-25 12:24:48 +01:00

12 Commits