Part A: Data Layer — Sprints 1-5 Sprint 1 — Eurostat SDMX city labels (unblocks EU population): - New extractor: eurostat_city_labels.py — fetches ESTAT/CITIES codelist (city_code → city_name mapping) with ETag dedup - New staging model: stg_city_labels.sql — grain city_code - Updated dim_cities.sql — joins Eurostat population via city code lookup; replaces hardcoded 0::BIGINT population Sprint 2 — Market score formula v2: - city_market_profile.sql: 30pt population (LN/1M), 25pt income PPS (/200), 30pt demand (occupancy or density), 15pt data confidence - Moved venue_pricing_benchmarks join into base CTE so median_occupancy_rate is available to the scoring formula Sprint 3 — US Census ACS extractor: - New extractor: census_usa.py — ACS 5-year place population (vintage 2023) - New staging model: stg_population_usa.sql — grain (place_fips, ref_year) Sprint 4 — ONS UK extractor: - New extractor: ons_uk.py — 2021 Census LAD population via ONS beta API - New staging model: stg_population_uk.sql — grain (lad_code, ref_year) Sprint 5 — GeoNames global extractor: - New extractor: geonames.py — cities15000.zip bulk download, filtered to ≥50K pop - New staging model: stg_population_geonames.sql — grain geoname_id - dim_cities: 5-source population cascade (Eurostat > Census > ONS > GeoNames > 0) with case/whitespace-insensitive city name matching Registered all 4 new CLI entrypoints in pyproject.toml and all.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
29 lines
1017 B
TOML
29 lines
1017 B
TOML
[project]
|
|
name = "padelnomics_extract"
|
|
version = "0.2.0"
|
|
description = "Data extraction pipelines for padelnomics"
|
|
requires-python = ">=3.11"
|
|
dependencies = [
|
|
"niquests>=3.14.0",
|
|
"python-dotenv>=1.0.0",
|
|
]
|
|
|
|
[project.scripts]
|
|
extract = "padelnomics_extract.all:main"
|
|
extract-overpass = "padelnomics_extract.overpass:main"
|
|
extract-eurostat = "padelnomics_extract.eurostat:main"
|
|
extract-playtomic-tenants = "padelnomics_extract.playtomic_tenants:main"
|
|
extract-playtomic-availability = "padelnomics_extract.playtomic_availability:main"
|
|
extract-playtomic-recheck = "padelnomics_extract.playtomic_availability:main_recheck"
|
|
extract-eurostat-city-labels = "padelnomics_extract.eurostat_city_labels:main"
|
|
extract-census-usa = "padelnomics_extract.census_usa:main"
|
|
extract-ons-uk = "padelnomics_extract.ons_uk:main"
|
|
extract-geonames = "padelnomics_extract.geonames:main"
|
|
|
|
[build-system]
|
|
requires = ["hatchling"]
|
|
build-backend = "hatchling.build"
|
|
|
|
[tool.hatch.build.targets.wheel]
|
|
packages = ["src/padelnomics_extract"]
|