diff --git a/CHANGELOG.md b/CHANGELOG.md index c2c158e..8184e00 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ## [Unreleased] ### Added +- **Dual market score system** — split the single market score into two branded scores: + - **padelnomics Marktreife-Score™** (market maturity): existing score, refined — only for cities + with ≥1 padel venue. Adds ×0.85 saturation discount when `venues_per_100k > 8`. + - **padelnomics Marktpotenzial-Score™** (investment opportunity): new score covering ALL + GeoNames locations globally (pop ≥1K), including zero-court locations. Rewards supply gaps, + underserved catchment areas, and racket sport culture via inverted venue density signal. +- **Tennis court Overpass extractor** — `extract-overpass-tennis` downloads all OSM + `sport=tennis` nodes/ways/relations globally (~150K+ features). Lands at + `overpass_tennis/{year}/{month}/courts.json.gz`. Staged in `stg_tennis_courts`. +- **`foundation.dim_locations`** — new conformed dimension seeded from GeoNames (all locations + ≥1K pop), not from padel venues. Grain `(country_code, geoname_id)`. Enriched with: + - `nearest_padel_court_km` via `ST_Distance_Sphere` (DuckDB spatial extension) + - `padel_venue_count` / `padel_venues_per_100k` (venues within 5km) + - `tennis_courts_within_25km` (courts within 25km) +- **GeoNames expanded** — extractor switched from `cities15000` (50K+ filter, ~24K rows) to + `cities1000` (~140K locations, pop ≥1K). Added `lat`, `lon`, `admin1_code`, `admin2_code` + to output. Expanded feature codes to include `PPLA3/4/5` (Gemeinden/cantons). +- **DuckDB spatial extension** — `extensions: [spatial]` added to `config.yaml`. Enables + `ST_Distance_Sphere` for great-circle distance and future map features (bounding box + queries, geometry columns). +- **SOPS secrets** — `GEONAMES_USERNAME=padelnomics` and `CENSUS_API_KEY` added to both + `.env.dev.sops` and `.env.prod.sops`. +- **Methodology page updated** — `/en/market-score` now documents both scores with: + Two Scores intro section, component cards for each score (4 Marktreife + 5 Marktpotenzial), + score band interpretations, expanded FAQ (7 entries). Section headings use the padelnomics + wordmark span (Bricolage Grotesque). Bilingual EN + DE (native-quality German, no calques). - **Market Score methodology page** — standalone page at `/{lang}/market-score` explaining the padelnomics Market Score (Zillow Zestimate-style). Reveals four input categories (demographics, economic strength, demand evidence, data diff --git a/PROJECT.md b/PROJECT.md index 957e102..2bb85a1 100644 --- a/PROJECT.md +++ b/PROJECT.md @@ -135,7 +135,7 @@ ## In Progress 🔄 -_Move here when you start working on it._ +- [ ] **Dual market score system** — Marktreife-Score + Marktpotenzial-Score + expanded data pipeline (merging to master) --- @@ -155,6 +155,13 @@ _Move here when you start working on it._ | Submit sitemap to Google Search Console | Set up Google Search Console + Bing Webmaster Tools (SEO hub ready — just add env vars) | | Verify Litestream R2 backup running on prod | | +### Gemeinde-level pSEO (follow-up from dual score work) + +| 🛠 Tech | +|--------| +| Gemeinde-level pSEO article template — consumes `location_opportunity_profile` data, targets "Padel in [Ort]" + "Padel bauen in [Ort]" queries (zero SERP competition confirmed) | +| "Top 50 underserved locations" ranking page — high-value SEO content, fully programmatic from `location_opportunity_profile` ORDER BY opportunity_score DESC | + ### Week 1–2 — First Revenue | 🛠 Tech | 📣 Business | @@ -196,6 +203,9 @@ _Move here when you start working on it._ - [ ] Padel Hall Accelerator (€999 — report + call + supplier intros) ### Data & Intelligence +- [ ] Sports centre Overpass extract (`leisure=sports_centre`) — additional market signal for `dim_locations` +- [ ] City-level income enrichment (Eurostat NUTS-3 regional income — replaces country-level PPS proxy, higher granularity) +- [ ] Interactive opportunity map / explorer in web app (map UI over `location_opportunity_profile` — bounding box queries via ST_Distance_Sphere) - [ ] Multi-source data aggregation (add booking platforms beyond Playtomic) - [ ] Google Maps signals (reviews, ratings) - [ ] Weather + demographic overlays @@ -246,3 +256,4 @@ _Move here when you start working on it._ | 2026-02-22 | Credit system over pay-per-lead blast | Suppliers self-select → higher quality perception; scales without manual intervention | | 2026-02-22 | No soft email gate on planner | Planner already captures emails at natural points (scenario save → login, quote wizard step 9). Gate would add friction without meaningful list value. Revisit if data shows a gap. | | 2026-02-22 | Wipe test suppliers before launch | 5 `example.com` entries from seed_dev_data.py — empty directory with "Be the first" CTA is better than obviously fake data | +| 2026-02-24 | Split market score into two branded scores | Marktreife-Score (existing market maturity, cities with ≥1 venue) vs Marktpotenzial-Score (greenfield opportunity, all GeoNames locations globally). SERP analysis confirmed zero competition for hyperlocal Gemeinde-level market intelligence pages. | diff --git a/docs/data-sources-inventory.md b/docs/data-sources-inventory.md index f95f3a0..ea5f370 100644 --- a/docs/data-sources-inventory.md +++ b/docs/data-sources-inventory.md @@ -13,7 +13,8 @@ Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytic | Source | Category | Status | Score | Credentials | Pipeline refs | |--------|----------|--------|-------|-------------|---------------| -| OpenStreetMap / Overpass | Court locations | ✅ Ingested | 5 | None | `extract-overpass` → `stg_padel_courts` | +| OpenStreetMap / Overpass (padel) | Court locations | ✅ Ingested | 5 | None | `extract-overpass` → `stg_padel_courts` | +| OpenStreetMap / Overpass (tennis) | Court locations | ✅ Ingested | 4 | None | `extract-overpass-tennis` → `stg_tennis_courts` | | Playtomic — tenants | Court locations | ✅ Ingested | 5 | None | `extract-playtomic-tenants` → `stg_playtomic_venues/resources/opening_hours` | | Playtomic — availability | Pricing / utilisation | ✅ Ingested | 5 | None | `extract-playtomic-availability` → `stg_playtomic_availability` | | Eurostat `urb_cpop1` | Demographics — EU city population | ✅ Ingested | 5 | None | `extract-eurostat` → `stg_population` | @@ -21,7 +22,7 @@ Purpose: Identify and track data sources feeding the Padelnomics DuckDB analytic | Eurostat SDMX city labels | Demographics — EU city lookup | ✅ Ingested | 4 | None | `extract-eurostat-city-labels` → `stg_city_labels` | | ONS UK mid-year estimates | Demographics — UK population | ✅ Ingested | 4 | None | `extract-ons-uk` → `stg_population_uk` | | US Census ACS 5-year | Demographics — US population | ✅ Ingested† | 3 | `CENSUS_API_KEY` (free) | `extract-census-usa` → `stg_population_usa` | -| GeoNames cities15000 | Demographics — global fallback | ✅ Ingested† | 3 | `GEONAMES_USERNAME` (free) | `extract-geonames` → `stg_population_geonames` | +| GeoNames cities1000 | Demographics — global locations ≥1K pop | ✅ Ingested† | 4 | `GEONAMES_USERNAME=padelnomics` (free) | `extract-geonames` → `stg_population_geonames` → `dim_locations` | | ECB / Frankfurter.app | FX rates | 🔲 Planned | 4 | None | `extract-fx` → `stg_fx_rates` (proposed) | | FIP World Padel Report | Market reports | 🔲 Planned | 4 | None (PDF) | Annual seed table | | PadelAPI.org | Tournament data | 🔲 Planned | 3 | Free-tier token | 50k req/mo | diff --git a/transform/sqlmesh_padelnomics/CLAUDE.md b/transform/sqlmesh_padelnomics/CLAUDE.md index e11855b..e0b2096 100644 --- a/transform/sqlmesh_padelnomics/CLAUDE.md +++ b/transform/sqlmesh_padelnomics/CLAUDE.md @@ -55,15 +55,16 @@ Grain must match reality — use `QUALIFY ROW_NUMBER()` to enforce it. | Dimension | Grain | Used by | |-----------|-------|---------| | `foundation.dim_venues` | `venue_id` | `dim_cities`, `dim_venue_capacity`, `fct_daily_availability` (via capacity join) | -| `foundation.dim_cities` | `city_slug` | `serving.city_market_profile` → all pSEO serving models | +| `foundation.dim_cities` | `(country_code, city_slug)` | `serving.city_market_profile` → all pSEO serving models | +| `foundation.dim_locations` | `(country_code, geoname_id)` | `serving.location_opportunity_profile` — all GeoNames locations (pop ≥1K), incl. zero-court locations | | `foundation.dim_venue_capacity` | `tenant_id` | `foundation.fct_daily_availability` | ## Source integration map ``` stg_playtomic_venues ─┐ -stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ─→ city_market_profile -stg_padel_courts ─┘ └→ dim_venue_capacity +stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ──────────────→ city_market_profile +stg_padel_courts ─┘ └→ dim_venue_capacity (Marktreife-Score) ↓ stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_availability ↓ @@ -71,8 +72,33 @@ stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_a ↓ stg_population ──→ dim_cities ─────────────────────────────┘ stg_income ──→ dim_cities + +stg_population_geonames ─┐ +stg_padel_courts ─┤→ dim_locations ──→ location_opportunity_profile +stg_tennis_courts ─┤ (Marktpotenzial-Score) +stg_income ─┘ ``` +## Distance calculation pattern (ST_Distance_Sphere) + +Use a bounding-box pre-filter before calling `ST_Distance_Sphere` to avoid full cross-joins: + +```sql +-- Nearest padel court (km) per location +SELECT l.geoname_id, + MIN(ST_Distance_Sphere( + ST_Point(l.lon, l.lat), ST_Point(p.lon, p.lat) + ) / 1000.0) AS nearest_km +FROM locations l +JOIN padel_courts p + ON ABS(l.lat - p.lat) < 0.5 -- ~55km pre-filter + AND ABS(l.lon - p.lon) < 0.5 +GROUP BY l.geoname_id +``` + +Requires `extensions: [spatial]` in `config.yaml` (already set). DuckDB spatial must +`INSTALL spatial; LOAD spatial;` before `ST_Distance_Sphere` / `ST_Point` are available. + ## Common pitfalls - **Don't add business logic to staging.** Even a CASE statement renaming values = business