docs: update docs and PROJECT.md for dual score pipeline

Task 8: documentation updates for the dual market score feature.

- CHANGELOG.md: comprehensive [Unreleased] entries for all additions
  (Marktpotenzial-Score, tennis courts, dim_locations, GeoNames expansion,
  DuckDB spatial, SOPS secrets, methodology page updates)
- docs/data-sources-inventory.md: add tennis courts Overpass row, update
  GeoNames entry (cities1000, username=padelnomics, higher score)
- transform/sqlmesh_padelnomics/CLAUDE.md: add dim_locations to conformed
  dimensions table, update source integration map with new pipeline branch,
  document ST_Distance_Sphere bounding-box pattern
- PROJECT.md: add dual score to In Progress, add Gemeinde pSEO + top-50
  ranking page to Next Up, add data backlog items (sports_centre, NUTS-3,
  opportunity map), add Decisions Log entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-24 17:12:22 +01:00
parent caec0c4410
commit 405efcfd19
4 changed files with 70 additions and 6 deletions

View File

@@ -55,15 +55,16 @@ Grain must match reality — use `QUALIFY ROW_NUMBER()` to enforce it.
| Dimension | Grain | Used by |
|-----------|-------|---------|
| `foundation.dim_venues` | `venue_id` | `dim_cities`, `dim_venue_capacity`, `fct_daily_availability` (via capacity join) |
| `foundation.dim_cities` | `city_slug` | `serving.city_market_profile` → all pSEO serving models |
| `foundation.dim_cities` | `(country_code, city_slug)` | `serving.city_market_profile` → all pSEO serving models |
| `foundation.dim_locations` | `(country_code, geoname_id)` | `serving.location_opportunity_profile` — all GeoNames locations (pop ≥1K), incl. zero-court locations |
| `foundation.dim_venue_capacity` | `tenant_id` | `foundation.fct_daily_availability` |
## Source integration map
```
stg_playtomic_venues ─┐
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ─→ city_market_profile
stg_padel_courts ─┘ └→ dim_venue_capacity
stg_playtomic_resources─┤→ dim_venues ─┬→ dim_cities ──────────────→ city_market_profile
stg_padel_courts ─┘ └→ dim_venue_capacity (Marktreife-Score)
stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_availability
@@ -71,8 +72,33 @@ stg_playtomic_availability ──→ fct_availability_slot ──→ fct_daily_a
stg_population ──→ dim_cities ─────────────────────────────┘
stg_income ──→ dim_cities
stg_population_geonames ─┐
stg_padel_courts ─┤→ dim_locations ──→ location_opportunity_profile
stg_tennis_courts ─┤ (Marktpotenzial-Score)
stg_income ─┘
```
## Distance calculation pattern (ST_Distance_Sphere)
Use a bounding-box pre-filter before calling `ST_Distance_Sphere` to avoid full cross-joins:
```sql
-- Nearest padel court (km) per location
SELECT l.geoname_id,
MIN(ST_Distance_Sphere(
ST_Point(l.lon, l.lat), ST_Point(p.lon, p.lat)
) / 1000.0) AS nearest_km
FROM locations l
JOIN padel_courts p
ON ABS(l.lat - p.lat) < 0.5 -- ~55km pre-filter
AND ABS(l.lon - p.lon) < 0.5
GROUP BY l.geoname_id
```
Requires `extensions: [spatial]` in `config.yaml` (already set). DuckDB spatial must
`INSTALL spatial; LOAD spatial;` before `ST_Distance_Sphere` / `ST_Point` are available.
## Common pitfalls
- **Don't add business logic to staging.** Even a CASE statement renaming values = business