Compare commits

..

37 Commits

Author SHA1 Message Date
Deeman
f9e22a72dd merge: fix CI — update proxy tests for 2-tier design
All checks were successful
CI / test (push) Successful in 54s
CI / tag (push) Successful in 3s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:36:35 +01:00
Deeman
ce466e3f7f test(proxy): update supervisor tests for 2-tier proxy (no Webshare)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:36:30 +01:00
Deeman
563bd1fb2e merge: tiered-proxy-tenants — gisco extractor, proxy fixes, recheck datetime fix
Some checks failed
CI / test (push) Failing after 46s
CI / tag (push) Has been skipped
- feat: GISCO NUTS-2 extractor module (replaces standalone script)
- feat: wire 5 unscheduled extractors into workflows.toml
- fix: add load_dotenv() to _shared.py so .env proxies are picked up
- fix: recheck datetime parsing (HH:MM:SS slot times need start_date prefix)
- fix: graceful 0-venue early return in recheck
- fix(proxy): remove Webshare free tier — DC tier 1, residential tier 2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:12:17 +01:00
Deeman
b980b8f567 fix(proxy): remove Webshare free tier — DC tier 1, residential tier 2
Free Webshare proxies were timing out and exhausting the circuit breaker
before datacenter/residential proxies got a chance to run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:12:08 +01:00
Deeman
0733f1c2a1 docs(scratch): rename guide → question bank with full gap analysis
Transforms the raw question bank into an annotated gap analysis document:
- Every section tagged ANSWERED / PARTIAL / GAP
- Summary table of 13 gaps across 3 tiers with impact and feasibility
- Inline actionable notes linking to research files, planner inputs, and backlog

Key findings captured:
- Tier 1 gaps: subsidies/grants, buyer segmentation, indoor-vs-outdoor decision
  framework, OPEX benchmark display
- Tier 2 gaps: booking platform strategy, depreciation/tax shield, legal/regulatory
  checklist (DE), supplier selection framework, staffing plan template
- Tier 3 gaps: zero-court pSEO pages, pre-opening playbook, drive-time isochrones

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 21:30:27 +01:00
Deeman
320777d24c update env vars
All checks were successful
CI / test (push) Successful in 50s
CI / tag (push) Successful in 2s
2026-03-01 21:28:45 +01:00
Deeman
92930ac717 fix(extract): handle 0-result recheck gracefully — skip file write
When all proxy tiers are exhausted and 0 venues are fetched, the working
file is empty and compress_jsonl_atomic asserts non-empty. Return early
with a warning instead of crashing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 21:25:09 +01:00
Deeman
0cfc841c08 merge: fix recheck 0-result crash
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 3s
2026-03-01 21:25:09 +01:00
Deeman
36deaba00e merge: fix recheck slot datetime parsing
All checks were successful
CI / test (push) Successful in 50s
CI / tag (push) Successful in 2s
2026-03-01 19:47:49 +01:00
Deeman
9608b7f601 feat(admin): replace all native confirm() with styled dialog + fix pipeline tabs scrollbar
Some checks failed
CI / tag (push) Has been cancelled
CI / test (push) Has been cancelled
- Add global htmx:confirm handler in base_admin.html that intercepts
  hx-confirm attributes and shows #confirm-dialog instead of window.confirm()
- Convert 4 pipeline HTMX buttons (Run Transform, Run Export, Run Full
  Pipeline, Run extractor) from onclick+confirm() to hx-confirm
- Convert 4 affiliate form/list delete buttons from onclick+confirm()
  to confirmAction() via event.preventDefault()
- Add scrollbar-width:none + ::-webkit-scrollbar{display:none} to
  .pipeline-tabs to suppress spurious horizontal scrollbar

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 19:47:34 +01:00
Deeman
0811b30cbd fix(extract): recheck slot datetime parsing — was silently skipping all slots
start_time is "HH:MM:SS" (time only), not a full ISO datetime. Combining
with resource's start_date to get "YYYY-MM-DDTHH:MM:SS" before parsing.
The ValueError was silently caught on every slot → 0 venues found → recheck
never actually ran since it was first deployed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 19:47:01 +01:00
Deeman
7d2950928e fix(infra): add R2_ENDPOINT to prod secrets for landing backup
All checks were successful
CI / test (push) Successful in 50s
CI / tag (push) Successful in 3s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 18:41:42 +01:00
Deeman
65e51d2972 fix(infra): switch landing backup to shared r2-landing rclone remote
All checks were successful
CI / test (push) Successful in 52s
CI / tag (push) Successful in 3s
Replace inline LITESTREAM_R2_* credentials in the backup service with
the named [r2-landing] rclone remote and R2_LANDING_* env vars, matching
the beanflows pattern. Add rclone.conf setup to bootstrap_supervisor.sh
so the remote is written from env on each bootstrap run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 18:36:57 +01:00
Deeman
c5d872ec55 chore(secrets): add R2_LANDING_* vars for
All checks were successful
CI / test (push) Successful in 49s
CI / tag (push) Successful in 2s
landing-zone backup bucket…
2026-03-01 17:32:51 +01:00
Deeman
75305935bd merge: GISCO extractor + wire all extractors + load_dotenv fix
All checks were successful
CI / test (push) Successful in 50s
CI / tag (push) Successful in 3s
2026-03-01 17:08:42 +01:00
Deeman
99cb0ac005 chore: remove .gitlab-ci.yml (GitLab now backup-only mirror)
All checks were successful
CI / test (push) Successful in 50s
CI / tag (push) Successful in 2s
CI runs on Gitea only. GitLab is a passive push mirror — no runners,
no tagging, no deploy involvement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 17:06:09 +01:00
Deeman
a15c32d398 fix(extract): load .env automatically via load_dotenv()
PROXY_URLS_* and other secrets were defined in .env but never loaded,
causing availability to run in slow serial mode (1 req/s) instead of
parallel mode with proxies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 16:41:59 +01:00
Deeman
97c5846d51 feat(extract): GISCO extractor + wire all unscheduled extractors
- New gisco.py: proper extractor module replacing scripts/download_gisco_nuts.py.
  Writes uncompressed .geojson (ST_Read can't handle .gz). Fixed partition path
  gisco/2024/01/nuts2_boundaries.geojson; cursor tracking skips re-download monthly.
- all.py: import + register gisco in EXTRACTORS (9 independent, 1 dep)
- pyproject.toml: add extract-gisco entry point
- workflows.toml: add census_usa, census_usa_income, eurostat_city_labels,
  ons_uk, gisco — all monthly, no dependencies
- Delete scripts/download_gisco_nuts.py (superseded)

Unblocks: stg_nuts2_boundaries, stg_regional_income, stg_income_usa,
and 4 downstream models (dim_locations, pseo_city_costs_de,
location_opportunity_profile, pseo_country_overview).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 15:49:39 +01:00
Deeman
0d903ec926 chore(changelog): document stale-tier circuit breaker fix
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 3s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:43:18 +01:00
Deeman
42c49e383c fix(proxy): ignore stale-tier failures in record_failure()
With parallel workers, threads that fetch a proxy just before escalation
can report failures after the tier has already changed — those failures
were silently counting against the new tier, immediately exhausting it
before it ever got tried (Rayobyte being skipped entirely in favour of
DataImpulse because 10 in-flight Webshare failures hit the threshold).

Fix: build a proxy_url → tier_idx reverse map at construction time and
skip the tier-level circuit breaker when the failing proxy belongs to an
already-escalated tier.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:43:05 +01:00
Deeman
1c0edff3e5 chore(changelog): document visual upgrades for longform articles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:29:21 +01:00
Deeman
8a28b94ec2 merge: visual upgrades for longform articles (timeline, callouts, cards, severity pills) 2026-03-01 14:28:57 +01:00
Deeman
9b54f2d544 fix(secrets): add http:// scheme to proxy URLs in dev + prod SOPS
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 3s
PROXY_URLS_DATACENTER was missing the scheme prefix, causing SSL
handshake failures on the Rayobyte HTTP-only proxy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:28:35 +01:00
Deeman
08bd2b2989 chore(changelog): document proxy URL scheme validation fix
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:26:57 +01:00
Deeman
81a57db272 fix(proxy): skip URLs without scheme in load_proxy_tiers()
Validates each URL in PROXY_URLS_DATACENTER / PROXY_URLS_RESIDENTIAL:
logs a warning and skips any entry missing an http:// or https:// scheme
instead of passing malformed URLs that cause SSL or connection errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:26:41 +01:00
Deeman
bce6b2d340 feat(articles): visual upgrades — timeline, callouts, cards, severity pills
Add 4 reusable CSS article components and apply them across 6 cornerstone articles:

CSS (input.css):
- article-timeline: horizontal phase diagram with numbered cards, collapses to vertical on mobile
- article-callout (warning/tip/info): left-bordered callout boxes with icon and title
- article-cards: 2-col grid of accent-topped cards (success/failure/neutral/established/growth/emerging)
- severity: inline pill badges (high/medium-high/medium/low-medium/low) for risk tables

Articles updated:
- padel-hall-build-guide-en + padel-halle-bauen-de: ASCII code block → timeline HTML; 3 bold/blockquote warnings → callout boxes; success/failure patterns → 4 cards
- padel-hall-investment-risks-en + padel-halle-risiken-de: risk overview table severity → pills; personal guarantee section → callout; risk management section → 4 cards
- padel-hall-location-guide-en + padel-standort-analyse-de: market maturity paragraphs → 3 stage cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:24:11 +01:00
Deeman
f92d863781 feat(pipeline): live extraction status + Transform tab
Adds HTMX live polling to the Overview tab (stops when quiet) and a new
Transform tab for managing the SQLMesh + export steps of the ELT pipeline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 13:47:17 +01:00
Deeman
a3dd37b1be chore(changelog): document pipeline transform tab + live status feature
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 13:47:07 +01:00
Deeman
e5cbcf462e feat(pipeline): live extraction status + Transform tab
- worker: add run_transform, run_export, run_pipeline task handlers
  - run_transform: sqlmesh plan prod --auto-apply, 2h timeout
  - run_export: export_serving.py, 10min timeout
  - run_pipeline: sequential extract → transform → export, stops on first failure

- pipeline_routes: refactor overview into _render_overview_partial() helper,
  make pipeline_trigger_extract() HTMX-aware (returns partial on HX-Request),
  add _fetch_pipeline_tasks(), _format_duration() helpers,
  add pipeline_transform() + pipeline_trigger_transform() with concurrency guard

- pipeline_overview.html: wrap in self-polling div (every 5s while any_running),
  convert Run buttons to hx-post targeting #pipeline-overview-content

- pipeline.html: add pulse animation for .status-dot.running, add Transform tab
  button, rewire header "Run Pipeline" button to enqueue run_pipeline task

- pipeline_transform.html: new partial — status cards for transform + export,
  "Run Full Pipeline" card, recent runs table with duration + error details

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 13:46:11 +01:00
Deeman
169092c8ea fix(admin): make pipeline data view responsive on mobile
All checks were successful
CI / test (push) Successful in 50s
CI / tag (push) Successful in 2s
- Tab bar: add overflow-x:auto so 5 tabs scroll on narrow screens
- Overview grid: replace hardcoded 1fr 1fr with .pipeline-two-col (stacks below 640px)
- Overview tables: wrap Serving Tables + Landing Zone in overflow-x:auto divs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 13:16:58 +01:00
Deeman
6ae16f6c1f feat(proxy): per-proxy dead tracking in tiered cycler
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 3s
2026-03-01 12:37:00 +01:00
Deeman
8b33daa4f3 feat(content): remove artificial 500-article generation cap
- fetch_template_data: default limit=0 (all rows); skip LIMIT clause when 0
- generate_articles: default limit=0
- worker handle_generate_articles: default to 0 instead of 500
- Remove "limit": 500 from all 4 enqueue payloads
- template_generate GET handler: use count_template_data() instead of fetch(limit=501) probe

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:33:58 +01:00
Deeman
a898a06575 feat(proxy): per-proxy dead tracking in tiered cycler
Add proxy_failure_limit param to make_tiered_cycler (default 3).
Individual proxies hitting the limit are marked dead and permanently
skipped. next_proxy() auto-escalates when all proxies in the active
tier are dead. Both mechanisms coexist: per-proxy dead tracking removes
broken individuals; tier-level threshold catches systemic failure.

- proxy.py: dead_proxies set + proxy_failure_counts dict in state;
  next_proxy skips dead proxies with bounded loop; record_failure/
  record_success accept optional proxy_url; dead_proxy_count() added
- playtomic_tenants.py: pass proxy_url to record_success/record_failure
- playtomic_availability.py: _worker returns (proxy_url, result);
  serial loops in extract + extract_recheck capture proxy_url
- test_supervisor.py: 11 new tests in TestTieredCyclerDeadProxyTracking

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:28:54 +01:00
Deeman
219554b7cb fix(extract): use tiered cycler in playtomic_tenants
Previously the tenants extractor flattened all proxy tiers into a single
round-robin list, bypassing the circuit breaker entirely. When the free
Webshare tier runs out of bandwidth (402), all 20 free proxies fail and
the batch crashes — the paid datacenter/residential proxies are never tried.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:13:58 +01:00
Deeman
1aedf78ec6 fix(extract): use tiered cycler in playtomic_tenants
Previously the tenants extractor flattened all proxy tiers into a single
round-robin list, bypassing the circuit breaker entirely. When the free
Webshare tier runs out of bandwidth (402), all 20 free proxies fail and
the batch crashes — the paid datacenter/residential proxies are never tried.

Changes:
- Replace make_round_robin_cycler with make_tiered_cycler (same as availability)
- Add _fetch_page_via_cycler: retries per page across tiers, records
  success/failure in cycler so circuit breaker can escalate
- Fix batch_size to BATCH_SIZE=20 constant (was len(all_proxies) ≈ 22)
- Check cycler.is_exhausted() before each batch; catch RuntimeError mid-batch
  and write partial results rather than crashing with nothing
- CIRCUIT_BREAKER_THRESHOLD from env (default 10), matching availability

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:13:50 +01:00
Deeman
8f2ffd432b fix(admin): correct docker volume mount + pipeline_routes repo root
All checks were successful
CI / test (push) Successful in 50s
CI / tag (push) Successful in 2s
- docker-compose.prod.yml: fix volume mount for all 6 web containers
  from /opt/padelnomics/data (stale) → /data/padelnomics (live supervisor output);
  add LANDING_DIR=/app/data/pipeline/landing so extraction/landing stats work
- pipeline_routes.py: fix _REPO_ROOT parents[5] → parents[4] so workflows.toml
  is found in dev and pipeline overview shows workflow schedules

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 11:41:29 +01:00
Deeman
c9dec066f7 fix(admin): mobile UX fixes — contrast, scroll, responsive grids
- CSS: `.nav-mobile a` → `.nav-mobile a:not(.nav-auth-btn)` to fix Sign
  Out button showing slate text instead of white on mobile
- base_admin.html: add `overflow-y: hidden` + `scrollbar-width: none` to
  `.admin-subnav` to eliminate ghost 1px scrollbar on Content tab row
- routes.py: pass `outreach_email=EMAIL_ADDRESSES["outreach"]` to outreach
  template so sending domain is no longer hardcoded
- outreach.html: display dynamic `outreach_email`; replace inline
  `repeat(6,1fr)` grid with responsive `.pipeline-status-grid` (2→3→6 cols)
- index.html: replace inline `repeat(5,1fr)` Lead/Supplier Funnel grids
  with responsive `.funnel-grid` class (2 cols mobile, 5 cols md+)
- pipeline.html: replace inline `repeat(4,1fr)` stat grid with responsive
  `.pipeline-stat-grid` (2 cols mobile, 4 cols md+)
- 4 partials (lead/email/supplier/outreach results): wrap `<table>` in
  `<div style="overflow-x:auto">` so tables scroll on narrow screens

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 11:20:46 +01:00
44 changed files with 1913 additions and 362 deletions

View File

@@ -3,6 +3,8 @@ APP_NAME=ENC[AES256_GCM,data:Vic/MJYoxZo8JAI=,iv:n1SEGQaGeZtYMtLmDRFiljDBbNKFvCz
SECRET_KEY=ENC[AES256_GCM,data:a3Bhj3gSQaE3llRWBYzpjoFDhhhSsNee67jXJs7+qn4=,iv:yvrx78X5Ut4DBSlmBnIn09ESVc/tuDiwiV4njmjcvko=,tag:cbFUTAEpX+isQD9FCVllsw==,type:str]
BASE_URL=ENC[AES256_GCM,data:LcbPDZf9Pwcuv7RxN9xhNfa9Tufi,iv:cOdjW9nNe+BuDXh+dL4b5LFQL2mKBiKV0FaEsDGMAQc=,tag:3uAn3AIwsztIfGpkQLD5Fg==,type:str]
DEBUG=ENC[AES256_GCM,data:qrEGkA==,iv:bCyEDWiEzolHo4vabiyYTsqM0eUaBmNbXYYu4wCsaeE=,tag:80gnDNbdZHRWVEYtuA1M2Q==,type:str]
#ENC[AES256_GCM,data:YB5h,iv:2HFpvHNebAB9M/44rtPk/QpFV9hNKOlV/099OSjPnOA=,tag:BVj8vGy6K3LW/wb1vcZ+Ug==,type:comment]
GITEA_TOKEN=ENC[AES256_GCM,data:aIM7vQXxFbz7FDdXEdwtelvmXAdLgJfWNCSPeK//NlveQrU5cLDt8w==,iv:9qhjk52ZAs+y5WwP5WebMUwHhu6JNdHzAsEOpznrwBw=,tag:WnCDA4hAccMFs6vXVVKqxw==,type:str]
#ENC[AES256_GCM,data:YmlGAWpXxRCqam3oTWtGxHDXC+svEXI4HyUxrm/8OcKTuJsYPcL1WcnYqrP5Mf5lU5qPezEXUrrgZy8vjVW6qAbb0IA2PMM4Kg==,iv:dx6Dn99dJgjwyvUp8NAygXjRQ50yKYFeC73Oqt9WvmY=,tag:6JLF2ixSAv39VkKt6+cecQ==,type:comment]
ADMIN_EMAILS=ENC[AES256_GCM,data:hlG8b32WlD4ems3VKQ==,iv:wWO08dmX4oLhHulXg4HUG0PjRnFiX19RUTkTvjqIw5I=,tag:KMjXsBt7aE/KqlCfV+fdMg==,type:str]
#ENC[AES256_GCM,data:b2wQxnL8Q2Bp,iv:q8ep3yUPzCumpZpljoVL2jbcPdsI5c2piiZ0x5k10Mw=,tag:IbjkT0Mjgu9n+6FGiPVihg==,type:comment]
@@ -58,7 +60,7 @@ NTFY_TOKEN=
#ENC[AES256_GCM,data:BCyQYjRnTx8yW9A=,iv:4OPCP+xzRLUJrpoFewVnbZRKnZH4sAbV76SM//2k5wU=,tag:HxwEp7VFVZUN/VjPiL/+Vw==,type:comment]
RECHECK_WINDOW_MINUTES=ENC[AES256_GCM,data:YWM=,iv:iY5+uMazLAFdwyLT7Gr7MaF1QHBIgHuoi6nF2VbSsOA=,tag:dc6AmuJdTQ55gVe16uzs6A==,type:str]
PROXY_URLS_RESIDENTIAL=ENC[AES256_GCM,data:lfmlsjXFtL+zo40SNFLiFKaZiYvE7CNH+zRwjMK5pqPfCs0TlMX+Y9e1KmzAS+y/cI69TP5sgMPRBzER0Jn7RvH0KA==,iv:jBN/4/K5L5886G4rSzxt8V8u/57tAuj3R76haltzqeU=,tag:Xe6o9eg2PodfktDqmLgVNA==,type:str]
PROXY_URLS_DATACENTER=ENC[AES256_GCM,data:X6xpxz5u8Xh3OXjkIz3UwqH847qLvY9cVWVktW5B+lqhmXAKTzoTzHds8vlRGJf5Up9Yx44XcigbvuK33ZJDSq9ovkAIbY55OK4=,iv:3hHyFD+H9HMzQ/27bPjGr59+7yWmEneUdN9XPQasCig=,tag:oBXsSuV5idB7HqNrNOruwg==,type:str]
PROXY_URLS_DATACENTER=ENC[AES256_GCM,data:Eec0X65EMsV2PD3Qvn+JjGqYaHtLupn0k99H918vmuRuAinP3rv/pwEoyKHmygazrUExg7U2PUELycyzq3lU6RIGtO+r0pRAn/n0S8RwdoZS,iv:T+bfbvULwSLRVD/hyW7rDN8tLLBf1FQkwCEbpiuBB+0=,tag:W/YHfl5U2yaA7ZOXgAFw+Q==,type:str]
WEBSHARE_DOWNLOAD_URL=ENC[AES256_GCM,data:1D9VRZ3MCXPQWfiMH8+CLcrxeYnVVcQgZDvt5kltvbSTuSHQ2hHDmZpBkTOMIBJnw4JLZ2JQKHgG4OaYDtsM2VltFPnfwaRgVI9G5PSenR3o4PeQmYO1AqWOmjn19jPxNXRhEXdupP9UT+xQNXoBJsl6RR20XOpMA5AipUHmSjD0UIKXoZLU,iv:uWUkAydac//qrOTPUThuOLKAKXK4xcZmK9qBVFwpqt4=,tag:1vYhukBW9kEuSXCLAiZZmQ==,type:str]
CIRCUIT_BREAKER_THRESHOLD=
#ENC[AES256_GCM,data:ZcX/OEbrMfKizIQYq3CYGnvzeTEX7KsmQaz2+Jj1rG5tbTy2aljQBIEkjtiwuo8NsNAD+FhIGRGVfBmKe1CAKME1MuiCbgSG,iv:4BSkeD3jZFawP09qECcqyuiWcDnCNSgbIjBATYhazq4=,tag:Ep1d2Uk700MOlWcLWaQ/ig==,type:comment]
@@ -71,7 +73,7 @@ GEONAMES_USERNAME=ENC[AES256_GCM,data:aSkVdLNrhiF6tlg=,iv:eemFGwDIv3EG/P3lVHGZj9
CENSUS_API_KEY=ENC[AES256_GCM,data:qqG971573aGq9MiHI2xLlanKKFwjfcNNoMXtm8LNbyh0rMbQN2XukQ==,iv:az2i0ldH75nHGah4DeOxaXmDbVYqmC1c77ptZqFA9BI=,tag:zoDdKj9bR7fgIDo1/dEU2g==,type:str]
sops_age__list_0__map_enc=-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBxNWNmUzVNUGdWRnE0ZFpF\nM0JQZWZ3UDdEVzlwTmIxakxOZXBkT2x2ZlNrClRtV2M3S2daSGxUZmFDSWQ2Nmh4\neU51QndFcUxlSE00RFovOVJTcDZmUUUKLS0tIDcvL3hRMDRoMWZZSXljNzA3WG5o\nMWFic21MV0krMzlIaldBTVU0ZDdlTE0K7euGQtA+9lHNws+x7TMCArZamm9att96\nL8cXoUDWe5fNI5+M1bXReqVfNwPTwZsV6j/+ZtYKybklIzWz02Ex4A==\n-----END AGE ENCRYPTED FILE-----\n
sops_age__list_0__map_recipient=age1f5002gj4s78jju45jd28kuejtcfhn5cdujz885fl7z2p9ym68pnsgky87a
sops_lastmodified=2026-02-28T15:50:46Z
sops_mac=ENC[AES256_GCM,data:HiLZTLa+p3mqa4hw+tKOK27F/bsJOy4jmDi8MHToi6S7tRfBA/TzcEzXvXUIkkwAixN73NQHvBVeRnbcEsApVpkaxH1OqnjvvyT+B3YFkTEtxczaKGWlCvbqFZNmXYsFvGR9njaWYWsTQPkRIjrroXrSrhr7uxC8F40v7ByxJKo=,iv:qj2IpzWRIh/mM1HtjjkNbyFuhtORKXslVnf/vdEC9Uw=,tag:fr9CZsL74HxRJLXn9eS0xQ==,type:str]
sops_lastmodified=2026-03-01T13:34:16Z
sops_mac=ENC[AES256_GCM,data:JLfGLbNTEcI6M/sUA5Zez6cfEUObgnUBmX52560PzBmeLZt0F5Y5QpeojIBqEDMuNB0hp1nnPI59WClLJtQ12VlHo9TkL3x9uCNUG+KneQrn1bTmJpA3cwNkWTzIm4l+TGbJbd4FpKJ9H0v1w+sqoKOgG8DqbtOeVdUfsVspAso=,iv:UqYxooXkEtx+y7fYzl+GFncpkjz8dcP7o9fp+kFf6w4=,tag:/maSb1aZGo+Ia8eGpB7PYw==,type:str]
sops_unencrypted_suffix=_unencrypted
sops_version=3.12.1

View File

@@ -39,11 +39,11 @@ ALERT_WEBHOOK_URL=ENC[AES256_GCM,data:4sXQk8zklruC525J279TUUatdDJQ43qweuoPhtpI82
NTFY_TOKEN=ENC[AES256_GCM,data:YlOxhsRJ8P1y4kk6ugWm41iyRCsM6oAWjvbU9lGcD0A=,iv:JZXOvi3wTOPV9A46c7fMiqbszNCvXkOgh9i/H1hob24=,tag:8xnPimgy7sesOAnxhaXmpg==,type:str]
SUPERVISOR_GIT_PULL=ENC[AES256_GCM,data:mg==,iv:KgqMVYj12FjOzWxtA1T0r0pqCDJ6MtHzMjE+4W/W+s4=,tag:czFaOqhHG8nqrQ8AZ8QiGw==,type:str]
#ENC[AES256_GCM,data:hzAZvCWc4RTk290=,iv:RsSI4OpAOQGcFVpfXDZ6t705yWmlO0JEWwWF5uQu9As=,tag:UPqFtA2tXiSa0vzJAv8qXg==,type:comment]
PROXY_URLS_RESIDENTIAL=ENC[AES256_GCM,data:x/F0toXDc8stsUNxaepCmxq1+WuacqqPtdc+R5mxTwcAzsKxCdwt8KpBZWMvz7ku4tHDGsKD949QAX2ANXP9oCMTgW0=,iv:6G9gE9/v7GaYj8aqVTmMrpw6AcQK9yMSCAohNdAD1Ws=,tag:2Jimr1ldVSfkh8LPEwdN3w==,type:str]
PROXY_URLS_DATACENTER=ENC[AES256_GCM,data:6BfXBYmyHpgZU/kJWpZLf8eH5VowVK1n0r6GzFTNAx/OmyaaS1RZVPC1JPkPBnTwEmo0WHYRW8uiUdkABmH9F5ZqqlsAesyfW7zvU9r7yD+D7w==,iv:3CBn2qCoTueQy8xVcQqZS4E3F0qoFYnNbzTZTpJ1veo=,tag:wC3Ecl4uNTwPiT23ATvRZg==,type:str]
PROXY_URLS_RESIDENTIAL=ENC[AES256_GCM,data:vxRcXQ/8TUTCtr6hKWBD1zVF47GFSfluIHZ8q0tt8SqQOWDdDe2D7Of6boy/kG3lqlpl7TjqMGJ7fLORcr0klKCykQ==,iv:YjegXXtIXm2qr0a3ZHRHxj3L1JoGZ1iQXkVXQupGQ2E=,tag:kahoHRskXbzplZasWOeiig==,type:str]
PROXY_URLS_DATACENTER=ENC[AES256_GCM,data:23TgU6oUeO7J+MFkraALQ5/RO38DZ3ib5oYYJr7Lj3KXQSlRsgwA+bJlweI5gcUpFphnPXvmwFGiuL6AeY8LzAQ3bx46dcZa5w9LfKw2PMFt,iv:AGXwYLqWjT5VmU02qqada3PbdjfC0mLK2sPruO0uru8=,tag:Z2IS/JPOqWX+x0LZYwyArA==,type:str]
WEBSHARE_DOWNLOAD_URL=ENC[AES256_GCM,data:/N77CFf6tJWCk7HrnBOm2Q1ynx7XoblzfbzJySeCjrxqiu4r+CB90aDkaPahlQKI00DUZih3pcy7WhnjdAwI30G5kJZ3P8H8/R0tP7OBK1wPVbsJq8prQJPFOAWewsS4KWNtSURZPYSCxslcBb7DHLX6ZAjv6A5KFOjRK2N8usR9sIabrCWh,iv:G3Ropu/JGytZK/zKsNGFjjSu3Wt6fvHaAqI9RpUHvlI=,tag:fv6xuS94OR+4xfiyKrYELA==,type:str]
PROXY_CONCURRENCY=ENC[AES256_GCM,data:vdEZ,iv:+eTNQO+s/SsVDBLg1/+fneMzEEsFkuEFxo/FcVV+mWc=,tag:i/EPwi/jOoWl3xW8H0XMdw==,type:str]
RECHECK_WINDOW_MINUTES=ENC[AES256_GCM,data:L2s=,iv:fV3mCKmK5fxUmIWRePELBDAPTb8JZqasVIhnAl55kYw=,tag:XL+PO6sblz/7WqHC3dtk1w==,type:str]
PROXY_CONCURRENCY=ENC[AES256_GCM,data:WWpx,iv:4RdNHXPXxFS5Yf1qa1NbaZgXydhKiiiEiMhkhQxD3xE=,tag:6UOQmBqj+9WlcxFooiTL+A==,type:str]
RECHECK_WINDOW_MINUTES=ENC[AES256_GCM,data:9wQ=,iv:QS4VfelUDdaDbIUC8SJBuy09VpiWM9QQcYliQ7Uai+I=,tag:jwkJY95qXPPrgae8RhKPSg==,type:str]
#ENC[AES256_GCM,data:RC+t2vqLwLjapdAUql8rQls=,iv:Kkiz3ND0g0MRAgcPJysIYMzSQS96Rq+3YP5yO7yWfIY=,tag:Y6TbZd81ihIwn+U515qd1g==,type:comment]
GSC_SERVICE_ACCOUNT_PATH=ENC[AES256_GCM,data:Vki6yHk+gd4n,iv:rxzKvwrGnAkLcpS41EZ097E87NrIpNZGFfl4iXFvr40=,tag:EZkBJpCq5rSpKYVC4H3JHQ==,type:str]
GSC_SITE_URL=ENC[AES256_GCM,data:K0i1xRym+laMP6kgOMEfUyoAn2eNgQ==,iv:kyb+grzFq1e5CG/0NJRO3LkSXexOuCK07uJYApAdWsA=,tag:faljHqYjGTgrR/Zbh27/Yw==,type:str]
@@ -52,13 +52,18 @@ BING_SITE_URL=ENC[AES256_GCM,data:M33VI97DyxH8gRR3ZUXoXg4QrEv5og==,iv:GxZtwfbBVi
#ENC[AES256_GCM,data:OTUMKNkRW0zrupNppXthwE1oieILhNjM+cjx5hFn69g=,iv:48ID2qtSe9ggD2X+G/iUqp3v2uwEc7fZw8lxHIvVXmk=,tag:okBn0Npk1K9dDOFWA/AB1A==,type:comment]
GEONAMES_USERNAME=ENC[AES256_GCM,data:UXd/S2TzXPiGmLY=,iv:OMURM5E6SFEsaqroUlH76DEnr7C/ujNk9UQnbWT0hK4=,tag:VsjjS12QDbudiEhdAQ/OCQ==,type:str]
CENSUS_API_KEY=ENC[AES256_GCM,data:9RbKlxSD17LqIuuNXaOKSgZ8LnFh9Wbze3XHgpctfV/1TqBMZTIedQ==,iv:WwsmR3HLUEcgUpLliGRaUPhGM9vFNPMGXSAQQ6+9UVc=,tag:R4EMNy5MxxvK0UTaCL0umA==,type:str]
#ENC[AES256_GCM,data:SL402gYB8ngjqkrG03FmaA==,iv:I326cYnOWdFnaUwnSfP+s2p9oCDCnqDzUJuPOzSFJc0=,tag:MBW5AqAaq4hTMmNXq1tXKw==,type:comment]
R2_LANDING_BUCKET=ENC[AES256_GCM,data:yZXLNQb8yN9nQPdxqmqv61fLWbRYCjjOqQ==,iv:fAwBLC/EuU0lgYOxZSkTagWyeQCdEadjssapxpCEGjA=,tag:VUmuVw76WZAaukp71Desag==,type:str]
R2_LANDING_ACCESS_KEY_ID=ENC[AES256_GCM,data:Y6y+U1ayhpFDcoaDjl7hyMVjU3gVvtORAH5gbd+HXbM=,iv:ra9kuch1DT+2tfz140bvxQRIXypsdiUrX1QYQ59gNRI=,tag:Wt85qliUMFvgbvoUrOXT7A==,type:str]
R2_LANDING_SECRET_ACCESS_KEY=ENC[AES256_GCM,data:99wB9aKSq2GihW9FOwBSMgHYzNKBHlol2Mf2kg4Ma6Fr4Cr21t/blzPxNQ7YRdeKk6ypFgViXlS4BJz9nC+v0g==,iv:/AmbXtj/uSGcMp+NBhN5tiVb2U56tvO5e1UpG2/ijPo=,tag:Qg2Tt11DUJPyeYcq9iSVnQ==,type:str]
R2_ENDPOINT=ENC[AES256_GCM,data:PBWTzUfhc/qVZ4n3GqJdZu8W7Ee0+FpsgikWVxgptQ3BJ2rQ4ewDuEB05inB1Agz1sB42VEBAsTtR3c5waPPRNs=,iv:ILZ0999fsPYYzVQYuIgAxpyystcplnykVoT5RpSEW2w=,tag:FxFOjQ+YcZuLf+jJr2OVFQ==,type:str]
sops_age__list_0__map_enc=-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBaUVk0UEVqdmtsM3VzQnpZ\nZjJDZ1lsM0VqWFpVVXUvNzdQcCtHbVJLNjFnCmhna01vTkVBaFQ5ZVlXeGhYNXdH\ncWJ5Qi9PdkxLaHBhQnR3cmtoblkxdEUKLS0tIDhHamY4NXhxOG9YN1NpbTN1aVRh\nOHVKcEN1d0QwQldVTDlBWUU4SDVDWlUKRJU+CTfTzIx6LLKin9sTXAHPVAfiUerZ\nCqYVFncsCJE3TbMI424urQj7kragPoGl1z4++yqAXNTRxfZIY4KTkg==\n-----END AGE ENCRYPTED FILE-----\n
sops_age__list_0__map_recipient=age1f5002gj4s78jju45jd28kuejtcfhn5cdujz885fl7z2p9ym68pnsgky87a
sops_age__list_1__map_enc=-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBmVEticFRVemlzZnlzek4x\nbWJ0d0h5ejJVUk5remo1VkdxNjVpdllqbFhFClc1UXlNd09xVVA5MnltMlN5MWRy\nYUlNRmNybHh1RGdPVC9yWlYrVmRTdkkKLS0tIHBUbU9qSDMrVGVHZDZGSFdpWlBh\nT3NXTGl0SmszaU9hRmU5bXI0cDRoRW8KLvbNYsBEwz+ITKvn7Yn+iNHiRzyyjtQt\no9/HupykJ3WjSdleGz7ZN6UiPGelHp0D/rzSASTYaI1+0i0xZ4PUoQ==\n-----END AGE ENCRYPTED FILE-----\n
sops_age__list_1__map_recipient=age1wjepykv3glvsrtegu25tevg7vyn3ngpl607u3yjc9ucay04s045s796msw
sops_age__list_2__map_enc=-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBFeHhaOURNZnRVMEwxNThu\nUjF4Q0kwUXhTUE1QSzZJbmpubnh3RnpQTmdvCjRmWWxpNkxFUmVGb3NRbnlydW5O\nWEg3ZXJQTU4vcndzS2pUQXY3Q0ttYjAKLS0tIE9IRFJ1c2ZxbGVHa2xTL0swbGN1\nTzgwMThPUDRFTWhuZHJjZUYxOTZrU00KY62qrNBCUQYxwcLMXFEnLkwncxq3BPJB\nKm4NzeHBU87XmPWVrgrKuf+PH1mxJlBsl7Hev8xBTy7l6feiZjLIvQ==\n-----END AGE ENCRYPTED FILE-----\n
sops_age__list_2__map_recipient=age1c783ym2q5x9tv7py5d28uc4k44aguudjn03g97l9nzs00dd9tsrqum8h4d
sops_lastmodified=2026-03-01T00:26:54Z
sops_mac=ENC[AES256_GCM,data:DdcABGVm9KbAcFrF0iuZlAaugsouNs7Hon2mZISaHs15/2H/Pd9FniXW3KeQ0+/NdZFQkz/h3i3bVFampcpFS1AxuOE5+1/IgWn8sKtaqPc7E9y8g6lxMnwTkUX2z+n/Q2nR8KAcO9IyE0GNjIluMWkxPWQuLzlRYDOjRN4/1e0=,iv:rm+6lXhYu6VUmrdCIrU0BRN2/ooa21Fw1ESWxr7vATg=,tag:GZmLLZf/LQaNeNNAAEg5bA==,type:str]
sops_lastmodified=2026-03-01T20:26:09Z
sops_mac=ENC[AES256_GCM,data:IxzU6VehA0iHgpIEqDSoMywKyKONI6jSr/6Amo+g3JI72awJtk6ft0ppfDWZjeHhL0ixfnvgqMNwai+1e0V/U8hSP8/FqYKEVpAO0UGJfBPKP3pbw+tx3WJQMF5dIh2/UVNrKvoACZq0IDJfXlVqalCnRMQEHGtKVTIT3fn8m6c=,iv:0w0ohOBsqTzuoQdtt6AI5ZdHEKw9+hI73tycBjDSS0o=,tag:Guw7LweA4m4Nw+3kSuZKWA==,type:str]
sops_unencrypted_suffix=_unencrypted
sops_version=3.12.1

View File

@@ -1,31 +0,0 @@
stages:
- test
- tag
test:
stage: test
image: python:3.12-slim
before_script:
- pip install uv
script:
- uv sync
- uv run pytest web/tests/ -x -q -p no:faulthandler
- uv run ruff check web/src/ web/tests/
rules:
- if: $CI_COMMIT_BRANCH == "master"
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
tag:
stage: tag
image:
name: alpine/git
entrypoint: [""]
script:
- git tag "v${CI_PIPELINE_IID}"
- git push "https://gitlab-ci-token:${CI_JOB_TOKEN}@${CI_SERVER_HOST}/${CI_PROJECT_PATH}.git" "v${CI_PIPELINE_IID}"
rules:
- if: $CI_COMMIT_BRANCH == "master"
# Deployment is handled by the on-server supervisor (src/padelnomics/supervisor.py).
# It polls git every 60s, fetches tags, and deploys only when a new passing tag exists.
# No CI secrets needed — zero SSH keys, zero deploy credentials.

View File

@@ -6,7 +6,37 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Changed
- **Admin: styled confirm dialog for all destructive actions** — replaced all native `window.confirm()` calls with the existing `#confirm-dialog` styled `<dialog>`. A new global `htmx:confirm` handler intercepts HTMX confirmation prompts and shows the dialog; form-submit buttons on affiliate pages were updated to use `confirmAction()`. Affected: pipeline Transform tab (Run Transform, Run Export, Run Full Pipeline), pipeline Overview tab (Run extractor), affiliate product delete, affiliate program delete (both form and list variants).
- **Pipeline tabs: no scrollbar** — added `scrollbar-width: none` and `::-webkit-scrollbar { display: none }` to `.pipeline-tabs` to suppress the spurious horizontal scrollbar on narrow viewports.
### Fixed
- **Stale-tier failures no longer exhaust the next proxy tier** — with parallel workers, threads that fetched a proxy just before tier escalation reported failures after the tier changed, immediately blowing through the new tier's circuit breaker before it ever got tried (Rayobyte was skipped entirely). `record_failure(proxy_url)` now checks which tier the proxy belongs to and ignores the circuit breaker when the proxy is from an already-escalated tier.
- **Proxy URL scheme validation in `load_proxy_tiers()`** — URLs in `PROXY_URLS_DATACENTER` / `PROXY_URLS_RESIDENTIAL` that are missing an `http://` or `https://` scheme are now logged as a warning and skipped, rather than being passed through and causing SSL handshake failures or connection errors at request time. Also fixed a missing `http://` prefix in the dev `.env` `PROXY_URLS_DATACENTER` entry.
### Changed
- **Per-proxy dead tracking in tiered cycler** — `make_tiered_cycler` now accepts a `proxy_failure_limit` parameter (default 3). Individual proxies that hit the limit are marked dead and permanently skipped by `next_proxy()`. If all proxies in the active tier are dead, `next_proxy()` auto-escalates to the next tier without needing the tier-level threshold. `record_failure(proxy_url)` and `record_success(proxy_url)` accept an optional `proxy_url` argument for per-proxy tracking; callers without `proxy_url` are fully backward-compatible. New `dead_proxy_count()` callable exposed for monitoring.
- `extract/padelnomics_extract/src/padelnomics_extract/proxy.py`: added per-proxy state (`proxy_failure_counts`, `dead_proxies`), updated `next_proxy`/`record_failure`/`record_success`, added `dead_proxy_count`
- `extract/padelnomics_extract/src/padelnomics_extract/playtomic_tenants.py`: `_fetch_page_via_cycler` passes `proxy_url` to `record_success`/`record_failure`
- `extract/padelnomics_extract/src/padelnomics_extract/playtomic_availability.py`: `_worker` returns `(proxy_url, result)` tuple; serial loops in `extract` and `extract_recheck` capture `proxy_url` before passing to `record_success`/`record_failure`
- `web/tests/test_supervisor.py`: 11 new tests in `TestTieredCyclerDeadProxyTracking` covering dead proxy skipping, auto-escalation, `dead_proxy_count`, backward compat, and thread safety
### Added
- **Visual upgrades for longform articles** — 4 reusable CSS article components added to `input.css` and applied across 6 cornerstone articles (EN + DE):
- `article-timeline`: horizontal numbered phase diagram with connecting lines; collapses to vertical stack on mobile. Replaces ASCII art code blocks in build guide articles.
- `article-callout` (warning/tip/info variants): left-bordered callout box with icon, title, and body. Replaces `>` blockquotes and bold-text warnings in build and risk guides.
- `article-cards`: 2-column card grid with colored accent bars (success/failure/neutral/established/growth/emerging). Replaces sequential bold-text pattern paragraphs in build, risk, and location guides.
- `severity` pills: inline colored badge for High/Medium-High/Medium/Low-Medium/Low. Applied to risk overview tables in both risk guide articles.
- Articles updated: `padel-hall-build-guide-en`, `padel-halle-bauen-de`, `padel-hall-investment-risks-en`, `padel-halle-risiken-de`, `padel-hall-location-guide-en`, `padel-standort-analyse-de`
- **Pipeline Transform tab + live extraction status** — new "Transform" tab in the pipeline admin with status cards for SQLMesh transform and export-serving tasks, a "Run Full Pipeline" button, and a recent run history table. The Overview tab now auto-polls every 5 s while an extraction task is pending and stops automatically when quiet. Per-extractor "Run" buttons use HTMX in-place updates instead of redirects. The header "Run Pipeline" button now enqueues the full ELT pipeline (extract → transform → export) instead of extraction only. Three new worker task handlers: `run_transform` (sqlmesh plan prod --auto-apply, 2 h timeout), `run_export` (export_serving.py, 10 min timeout), `run_pipeline` (sequential, stops on first failure). Concurrency guard prevents double-enqueuing the same step.
- `web/src/padelnomics/worker.py`: `handle_run_transform`, `handle_run_export`, `handle_run_pipeline`
- `web/src/padelnomics/admin/pipeline_routes.py`: `_render_overview_partial()`, `_fetch_pipeline_tasks()`, `_format_duration()`, `pipeline_transform()`, `pipeline_trigger_transform()`; `pipeline_trigger_extract()` now HTMX-aware
- `web/src/padelnomics/admin/templates/admin/pipeline.html`: pulse animation on `.status-dot.running`, Transform tab button, rewired header button
- `web/src/padelnomics/admin/templates/admin/partials/pipeline_overview.html`: self-polling wrapper, HTMX Run buttons
- `web/src/padelnomics/admin/templates/admin/partials/pipeline_transform.html`: new file
- **Affiliate programs management** — centralised retailer config (`affiliate_programs` table) with URL template + tracking tag + commission %. Products now use a program dropdown + product identifier (e.g. ASIN) instead of manually baking full URLs. URL is assembled at redirect time via `build_affiliate_url()`, so changing a tag propagates instantly to all products. Legacy products (baked `affiliate_url`) continue to work via fallback. Amazon OneLink configured in the Associates dashboard handles geo-redirect to local marketplaces — no per-country programs needed.
- `web/src/padelnomics/migrations/versions/0027_affiliate_programs.py`: `affiliate_programs` table, nullable `program_id` + `product_identifier` columns on `affiliate_products`, seeds "Amazon" program, backfills ASINs from existing URLs
- `web/src/padelnomics/affiliate.py`: `get_all_programs()`, `get_program()`, `get_program_by_slug()`, `build_affiliate_url()`; `get_product()` JOINs program for redirect assembly; `_parse_product()` extracts `_program` sub-dict
@@ -17,6 +47,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
- 15 new tests in `web/tests/test_affiliate.py` (41 total)
### Fixed
- **Data Platform admin view showing stale/zero row counts** — Docker web containers were mounting `/opt/padelnomics/data` (stale copy) instead of `/data/padelnomics` (live supervisor output). Fixed volume mount in all 6 containers (blue/green × app/worker/scheduler) and added `LANDING_DIR=/app/data/pipeline/landing` so extraction stats and landing zone file stats are visible to the web app.
- **`workflows.toml` never found in dev** — `_REPO_ROOT` in `pipeline_routes.py` used `parents[5]` (one level too far up) instead of `parents[4]`. Workflow schedules now display correctly on the pipeline overview tab in dev.
- **Article preview frontmatter bug** — `_rebuild_article()` in `admin/routes.py` now strips YAML frontmatter before passing markdown to `mistune.html()`, preventing raw `title:`, `slug:` etc. from appearing as visible text in article previews.
### Added

View File

@@ -17,15 +17,48 @@ This guide walks through all five phases and 23 steps between your initial marke
## The 5 Phases at a Glance
```
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
Feasibility → Planning & → Construction → Pre- → Operations &
& Concept Design / Conversion Opening Optimization
Month 13 Month 36 Month 612 Month 1013 Ongoing
Steps 15 Steps 611 Steps 1216 Steps 1720 Steps 2123
```
<div class="article-timeline">
<div class="article-timeline__phase">
<div class="article-timeline__num">1</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Feasibility &amp; Concept</div>
<div class="article-timeline__subtitle">Market research, concept, site scouting</div>
<div class="article-timeline__meta">Month 13 · Steps 15</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">2</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Planning &amp; Design</div>
<div class="article-timeline__subtitle">Architect, permits, financing</div>
<div class="article-timeline__meta">Month 36 · Steps 611</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">3</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Construction</div>
<div class="article-timeline__subtitle">Build, courts, IT systems</div>
<div class="article-timeline__meta">Month 612 · Steps 1216</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">4</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Pre-Opening</div>
<div class="article-timeline__subtitle">Hiring, marketing, soft launch</div>
<div class="article-timeline__meta">Month 1013 · Steps 1720</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">5</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Operations</div>
<div class="article-timeline__subtitle">Revenue streams, optimization</div>
<div class="article-timeline__meta">Ongoing · Steps 2123</div>
</div>
</div>
</div>
---
@@ -105,7 +138,12 @@ Deliverables from this phase:
- **MEP design (mechanical, electrical, plumbing):** Heating, ventilation, air conditioning, electrical, drainage — typically the most expensive trade package in a sports hall conversion
- **Fire safety strategy**
> **The most expensive planning mistake in padel hall builds:** underestimating HVAC complexity and budget. Large indoor courts need precise temperature and humidity control — not just for player comfort, but for playing surface longevity and air quality. Courts installed in a poorly climate-controlled building will degrade faster and generate complaints. Budget for it properly from the start, not as a value-engineering target.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">The most expensive planning mistake in padel hall builds</span>
<p>Underestimating HVAC complexity and budget. Large indoor courts need precise temperature and humidity control — not just for player comfort, but for playing surface longevity and air quality. Courts installed in a poorly climate-controlled building will degrade faster and generate complaints. Budget for it properly from the start, not as a value-engineering target.</p>
</div>
</div>
### Step 8: Court Supplier Selection
@@ -160,7 +198,12 @@ Courts are installed after the building envelope is weathertight. This is a hard
Glass panels, artificial turf, and court metalwork must not be exposed to construction dust, moisture, and site traffic. Projects that try to accelerate schedules by installing courts before the building is properly enclosed regularly end up with surface contamination, glass damage, and voided manufacturer warranties.
> **The most common construction mistake on padel hall projects:** rushing court installation sequencing under schedule pressure. The pressure to hit an opening date is real — but installing courts into an unenclosed building is one of the most reliable ways to add cost and delay, not reduce them. Hold the sequence.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">The most common construction mistake on padel hall projects</span>
<p>Rushing court installation sequencing under schedule pressure. The pressure to hit an opening date is real — but installing courts into an unenclosed building is one of the most reliable ways to add cost and delay, not reduce them. Hold the sequence.</p>
</div>
</div>
Allow two to four weeks for court installation per batch, depending on the manufacturer's crew capacity. Build this explicitly into your master program.
@@ -174,7 +217,12 @@ Decide early: which booking platform, which point-of-sale system, and whether yo
Access control systems must be coordinated with the electrical design. Adding them in the final stages of construction is possible but costs more.
> **The most common pre-opening mistake:** the booking system isn't fully configured, tested, and working on day one. A broken booking flow, failed test payments, or a QR code that leads to an error page on opening day kills your launch momentum in a way that's difficult to recover from. Test the system end-to-end — including real bookings, real payments, and real cancellations — two to four weeks before opening.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">The most common pre-opening mistake</span>
<p>The booking system isn't fully configured, tested, and working on day one. A broken booking flow, failed test payments, or a QR code that leads to an error page on opening day kills your launch momentum in a way that's difficult to recover from. Test the system end-to-end — including real bookings, real payments, and real cancellations — two to four weeks before opening.</p>
</div>
</div>
### Step 16: Inspections and Certifications
@@ -248,13 +296,36 @@ Court bookings are your core revenue, but rarely your only opportunity:
Patterns emerge when you observe padel hall projects across a market over time.
**Projects that go over budget** almost always cut at the wrong place early — too little HVAC budget, no construction contingency, a cheap general contractor without adequate contractual protection. The savings on the way in become much larger costs on the way out.
**Projects that slip their schedule** consistently underestimate the regulatory process. Permits, noise assessments, and change-of-use applications take time that money cannot buy once you've started too late. Start conversations with authorities before you need the approvals, not when you need them.
**Projects that open weakly** started marketing too late and tested the booking system too late. An empty calendar on day one and a broken booking page create impressions that stick longer than the opening week.
**Projects that succeed long-term** treat all three phases — planning, build, and opening — with equal rigor, and invest early and consistently in community and repeat customers.
<div class="article-cards">
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projects that go over budget</span>
<p class="article-card__body">Almost always cut at the wrong place early — too little HVAC budget, no construction contingency, a cheap general contractor without adequate contractual protection. The savings on the way in become much larger costs on the way out.</p>
</div>
</div>
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projects that slip their schedule</span>
<p class="article-card__body">Consistently underestimate the regulatory process. Permits, noise assessments, and change-of-use applications take time that money cannot buy once you've started too late. Start conversations with authorities before you need the approvals.</p>
</div>
</div>
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projects that open weakly</span>
<p class="article-card__body">Started marketing too late and tested the booking system too late. An empty calendar on day one and a broken booking page create impressions that stick longer than the opening week.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projects that succeed long-term</span>
<p class="article-card__body">Treat all three phases — planning, build, and opening — with equal rigor, and invest early and consistently in community and repeat customers.</p>
</div>
</div>
</div>
Building a padel hall is complex, but it is a solved problem. The failures are nearly always the same failures. So are the successes.

View File

@@ -21,20 +21,20 @@ This article covers the 14 risks that don't get enough airtime in investor discu
| # | Risk | Category | Severity |
|---|------|----------|----------|
| 1 | Trend / fad risk | Strategic | High |
| 2 | Construction cost overruns | Construction & Development | High |
| 3 | Construction delays | Construction & Development | High |
| 4 | Landlord risk: sale, insolvency, non-renewal | Property & Lease | High |
| 5 | New competitor in your catchment | Competition | MediumHigh |
| 6 | Key-person dependency | Operations | Medium |
| 7 | Staff retention and wage pressure | Operations | Medium |
| 8 | Court surface and maintenance cycles | Operations | Medium |
| 9 | Energy price volatility | Financial | Medium |
| 10 | Interest rate risk | Financial | Medium |
| 11 | Personal guarantee exposure | Financial | High |
| 12 | Customer concentration | Financial | Medium |
| 13 | Noise complaints and regulatory restrictions | Regulatory & Legal | Medium |
| 14 | Booking platform dependency | Regulatory & Legal | LowMedium |
| 1 | Trend / fad risk | Strategic | <span class="severity severity--high">High</span> |
| 2 | Construction cost overruns | Construction & Development | <span class="severity severity--high">High</span> |
| 3 | Construction delays | Construction & Development | <span class="severity severity--high">High</span> |
| 4 | Landlord risk: sale, insolvency, non-renewal | Property & Lease | <span class="severity severity--high">High</span> |
| 5 | New competitor in your catchment | Competition | <span class="severity severity--medium-high">MediumHigh</span> |
| 6 | Key-person dependency | Operations | <span class="severity severity--medium">Medium</span> |
| 7 | Staff retention and wage pressure | Operations | <span class="severity severity--medium">Medium</span> |
| 8 | Court surface and maintenance cycles | Operations | <span class="severity severity--medium">Medium</span> |
| 9 | Energy price volatility | Financial | <span class="severity severity--medium">Medium</span> |
| 10 | Interest rate risk | Financial | <span class="severity severity--medium">Medium</span> |
| 11 | Personal guarantee exposure | Financial | <span class="severity severity--high">High</span> |
| 12 | Customer concentration | Financial | <span class="severity severity--medium">Medium</span> |
| 13 | Noise complaints and regulatory restrictions | Regulatory & Legal | <span class="severity severity--medium">Medium</span> |
| 14 | Booking platform dependency | Regulatory & Legal | <span class="severity severity--low-medium">LowMedium</span> |
---
@@ -137,9 +137,12 @@ Your costs will increase three to five percent per year. Whether you can pass th
## The Risk No One Talks About: Personal Guarantees
**This section gets skipped in almost every padel hall investment conversation. That's a serious mistake.**
Banks financing a single-asset leisure facility without corporate backing will almost universally require personal guarantees from the principal shareholders. Not as an unusual request — as standard terms for this type of deal.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">This section gets skipped in almost every padel hall investment conversation. That's a serious mistake.</span>
<p>Banks financing a single-asset leisure facility without corporate backing will almost universally require personal guarantees from the principal shareholders. Not as an unusual request — as standard terms for this type of deal.</p>
</div>
</div>
Here is what that means in practice:
@@ -180,13 +183,36 @@ Building a parallel booking capability — even a simple direct booking option
The investors who succeed long-term in padel aren't the ones who found a risk-free opportunity. There isn't one. They're the ones who went in with their eyes open.
**They modeled the bad scenarios before assuming the good ones.** A business plan that shows only the base case isn't a planning tool — it's wishful thinking. Explicit downside modeling — 40% utilization, six-month delay, new competitor in year three — is the baseline, not an optional exercise.
**They built structural buffers into the plan.** Liquid reserves covering at least six months of fixed costs. Construction contingency treated as a budget line, not a hedge. These aren't comfort margins; they're operational requirements.
**They got the contractual foundations right from the start.** Lease terms. Financing conditions. Guarantee scope. The cost of good legal and financial advice at the planning stage is trivial relative to the downside exposure it addresses.
**They planned for competition.** Not by hoping it wouldn't come, but by building a product — community, quality, service — that gives existing customers a reason to stay when someone cheaper opens nearby.
<div class="article-cards">
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Model the bad scenarios first</span>
<p class="article-card__body">A business plan showing only the base case isn't a planning tool — it's wishful thinking. Explicit downside modeling — 40% utilization, six-month delay, new competitor in year three — is the baseline, not an optional exercise.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Build structural buffers in</span>
<p class="article-card__body">Liquid reserves covering at least six months of fixed costs. Construction contingency treated as a budget line, not a hedge. These aren't comfort margins; they're operational requirements.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Get the contractual foundations right</span>
<p class="article-card__body">Lease terms. Financing conditions. Guarantee scope. The cost of good legal and financial advice at the planning stage is trivial relative to the downside exposure it addresses.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Plan for competition</span>
<p class="article-card__body">Not by hoping it won't come, but by building a product — community, quality, service — that gives existing customers a reason to stay when someone cheaper opens nearby.</p>
</div>
</div>
</div>
---

View File

@@ -148,11 +148,29 @@ The matrix also reveals where trade-offs are being made explicitly, which makes
The 8 criteria above evaluate specific sites. But before shortlisting sites, it is worth stepping back to read the stage of the overall market — because the right operational strategy differs fundamentally depending on where a city sits in its padel development cycle.
**Established markets**: Booking platforms show consistent peak-hour sell-out across most venues. Waiting lists are common. Demand is validated beyond doubt. The challenge here is elevated rent, elevated build costs, and entrenched operators who have already captured community loyalty. New entrants need a genuine differentiation angle — a superior facility specification, a better location within the city, or an F&B and coaching product that existing venues don't offer. Entry costs are high; returns, if execution is strong, are also high. Munich is the canonical German example.
**Growth markets**: Demand is clearly building — booking availability tightens at weekends, new facilities are announced regularly, and the sport is gaining local media visibility. Supply hasn't caught up, so identifiable gaps still exist in specific districts or the surrounding hinterland. The risk profile is lower than in emerging markets, but the window for securing good real estate at reasonable rent is narrowing. The premium for moving decisively goes to those who arrive before the obvious sites are taken.
**Emerging markets**: Limited current supply, a small but growing player base, and padel not yet mainstream enough to generate organic walk-in demand. Entry costs — rent especially — are lower. The constraint is that demand must be actively created rather than captured. Operators who succeed here invest in community: beginner programmes, local leagues, school partnerships, conversions from tennis clubs. The time to first profitability is longer, but the competitive position built in the first two years is often decisive for the long term.
<div class="article-cards">
<div class="article-card article-card--established">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Established markets</span>
<p class="article-card__body">Booking platforms show consistent peak-hour sell-out. Demand is validated. The challenge: elevated rent, high build costs, entrenched operators. New entrants need a genuine differentiation angle — superior spec, better location, or F&B and coaching that existing venues don't offer. Entry costs are high; returns, if execution is strong, are also high. Munich is the canonical German example.</p>
</div>
</div>
<div class="article-card article-card--growth">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Growth markets</span>
<p class="article-card__body">Demand is clearly building — booking availability tightens at weekends, new facilities are announced regularly. Supply hasn't caught up; identifiable gaps still exist. The risk profile is lower, but the window for securing good real estate at reasonable rent is narrowing. The premium goes to those who arrive before the obvious sites are taken.</p>
</div>
</div>
<div class="article-card article-card--emerging">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Emerging markets</span>
<p class="article-card__body">Limited supply, a small but growing player base, padel not yet mainstream. Entry costs — rent especially — are lower. The constraint: demand must be actively created rather than captured. Operators who succeed invest in community: beginner programmes, local leagues, school partnerships. Time to profitability is longer, but the competitive position built in the first two years is often decisive.</p>
</div>
</div>
</div>
Before committing to a site search in any city, calibrate where it sits on this spectrum. The 8-criteria framework then tells you whether a specific site works; market maturity tells you what kind of operator and strategy is required to make it work at all.

View File

@@ -17,15 +17,48 @@ Dieser Leitfaden zeigt Ihnen alle 5 Phasen und 23 Schritte, die zwischen Ihrer e
## Die 5 Phasen im Überblick
```
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
Machbarkeit → Planung & → Bau / → Voreröff- → Betrieb &
& Konzept Design Umbau nung Optimierung
Monat 13 Monat 36 Monat 612 Monat 1013 laufend
Schritte 15 Schritte 611 Schritte 1216 Schritte 1720 Schritte 2123
```
<div class="article-timeline">
<div class="article-timeline__phase">
<div class="article-timeline__num">1</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Machbarkeit &amp; Konzept</div>
<div class="article-timeline__subtitle">Marktanalyse, Konzept, Standortsuche</div>
<div class="article-timeline__meta">Monat 13 · Schritte 15</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">2</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Planung &amp; Design</div>
<div class="article-timeline__subtitle">Architekt, Genehmigungen, Finanzierung</div>
<div class="article-timeline__meta">Monat 36 · Schritte 611</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">3</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Bau / Umbau</div>
<div class="article-timeline__subtitle">Rohbau, Courts, IT-Systeme</div>
<div class="article-timeline__meta">Monat 612 · Schritte 1216</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">4</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Voreröffnung</div>
<div class="article-timeline__subtitle">Personal, Marketing, Soft Launch</div>
<div class="article-timeline__meta">Monat 1013 · Schritte 1720</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">5</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Betrieb &amp; Optimierung</div>
<div class="article-timeline__subtitle">Einnahmen, Community, Optimierung</div>
<div class="article-timeline__meta">laufend · Schritte 2123</div>
</div>
</div>
</div>
---
@@ -104,7 +137,12 @@ Was in dieser Phase entsteht:
- MEP-Planung (Haustechnik): Heizung, Lüftung, Klimaanlage, Elektro, Sanitär — das sind bei Sporthallen oft die kostenintensivsten Gewerke
- Brandschutzkonzept
**Häufiger Fehler in dieser Phase:** Die Haustechnik wird unterschätzt. Eine große Innenhalle braucht präzise Temperatur- und Feuchtigkeitskontrolle — für die Spielqualität, für die Langlebigkeit des Belags und für das Wohlbefinden der Spieler. Eine schlechte HVAC-Anlage ist eine Dauerbaustelle.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">Häufiger Fehler in dieser Phase</span>
<p>Die Haustechnik wird unterschätzt. Eine große Innenhalle braucht präzise Temperatur- und Feuchtigkeitskontrolle — für die Spielqualität, für die Langlebigkeit des Belags und für das Wohlbefinden der Spieler. Eine schlechte HVAC-Anlage ist eine Dauerbaustelle.</p>
</div>
</div>
### Schritt 8: Courtlieferant auswählen
@@ -155,7 +193,12 @@ Verhandeln Sie Festpreise, wo möglich. Lesen Sie die Risikoverteilung in den Ve
Courts werden nach Fertigstellung der Gebäudehülle montiert — das ist eine harte Reihenfolge, keine Empfehlung. Glaselemente dürfen nicht Feuchtigkeit, Staub und Baustellenverkehr ausgesetzt werden, bevor das Gebäude dicht ist.
**Ein häufiger und vermeidbarer Fehler:** Projekte, die unter Zeitdruck stehen, versuchen, Court-Montage vorzuziehen. Das Ergebnis sind beschädigte Oberflächen, Glasschäden, Verschmutzungen im Belag und Gewährleistungsprobleme mit dem Hersteller. Halten Sie die Reihenfolge ein — konsequent.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">Ein häufiger und vermeidbarer Fehler</span>
<p>Projekte unter Zeitdruck versuchen, die Court-Montage vorzuziehen. Das Ergebnis sind beschädigte Oberflächen, Glasschäden, Verschmutzungen im Belag und Gewährleistungsprobleme mit dem Hersteller. Halten Sie die Reihenfolge ein — konsequent.</p>
</div>
</div>
Die Montage von Courts dauert je nach Hersteller und Parallelkapazität zwei bis vier Wochen pro Charge. Planen Sie das in den Gesamtablauf ein.
@@ -169,7 +212,12 @@ Frühzeitig entscheiden: Playtomic, Matchi, ein anderes System oder eine Hybridl
Zugangskontrolle (falls gewünscht) muss mit der Elektroplanung koordiniert werden. Wer das in der letzten Bauphase ergänzen möchte, zahlt dafür.
**Der häufigste Fehler kurz vor der Eröffnung:** Am Tag der Eröffnung ist das Buchungssystem noch nicht richtig konfiguriert, Testzahlungen schlagen fehl, der QR-Code am Eingang führt auf eine Fehlerseite. Der Eröffnungsbuzz ist ein einmaliges Gut. Testen Sie das System zwei bis vier Wochen vorher vollständig — inklusive echter Buchungen, echter Zahlungen und echter Stornierungen.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">Der häufigste Fehler kurz vor der Eröffnung</span>
<p>Am Tag der Eröffnung ist das Buchungssystem noch nicht richtig konfiguriert, Testzahlungen schlagen fehl, der QR-Code am Eingang führt auf eine Fehlerseite. Der Eröffnungsbuzz ist ein einmaliges Gut. Testen Sie das System zwei bis vier Wochen vorher vollständig — inklusive echter Buchungen, echter Zahlungen und echter Stornierungen.</p>
</div>
</div>
### Schritt 16: Abnahmen und Zertifizierungen
@@ -243,13 +291,36 @@ Die Court-Buchung ist Ihr Kernangebot — aber nicht die einzige Einnahmequelle:
Wer Dutzende Padelhallenprojekte in Europa beobachtet, sieht Muster auf beiden Seiten:
**Die Projekte, die über Budget laufen**, haben fast immer früh an der falschen Stelle gespart — zu wenig Haustechnikbudget, kein Baukostenpuffer, zu günstiger Generalunternehmer ohne ausreichende Vertragsabsicherung.
**Die Projekte, die terminlich entgleisen**, haben die behördlichen Prozesse unterschätzt. Genehmigungen, Lärmschutzgutachten, Nutzungsänderungen brauchen Zeit — und diese Zeit lässt sich nicht kaufen, sobald man zu spät damit anfängt.
**Die Projekte, die schwach starten**, haben das Marketing zu spät begonnen und das Buchungssystem zu spät getestet. Ein leerer Kalender am Eröffnungstag und eine kaputte Buchungsseite erzeugen Eindrücke, die sich festsetzen.
**Die Projekte, die langfristig erfolgreich sind**, haben alle drei Phasen — Planung, Bau, Eröffnung — mit derselben Sorgfalt behandelt und früh in Community und Stammkundschaft investiert.
<div class="article-cards">
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projekte, die über Budget laufen</span>
<p class="article-card__body">Haben fast immer früh an der falschen Stelle gespart — zu wenig Haustechnikbudget, kein Baukostenpuffer, zu günstiger Generalunternehmer ohne ausreichende Vertragsabsicherung.</p>
</div>
</div>
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projekte, die terminlich entgleisen</span>
<p class="article-card__body">Haben die behördlichen Prozesse unterschätzt. Genehmigungen, Lärmschutzgutachten, Nutzungsänderungen brauchen Zeit — und diese Zeit lässt sich nicht kaufen, sobald man zu spät damit anfängt.</p>
</div>
</div>
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projekte, die schwach starten</span>
<p class="article-card__body">Haben das Marketing zu spät begonnen und das Buchungssystem zu spät getestet. Ein leerer Kalender am Eröffnungstag und eine kaputte Buchungsseite erzeugen Eindrücke, die sich festsetzen.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projekte, die langfristig erfolgreich sind</span>
<p class="article-card__body">Behandeln alle drei Phasen — Planung, Bau, Eröffnung — mit derselben Sorgfalt und investieren früh in Community und Stammkundschaft.</p>
</div>
</div>
</div>
Eine Padelhalle zu bauen ist komplex — aber kein ungelöstes Problem. Die Fehler, die Projekte scheitern lassen, sind fast immer dieselben. Genauso wie die Entscheidungen, die sie gelingen lassen.

View File

@@ -21,20 +21,20 @@ Dieser Artikel zeigt Ihnen die 14 Risiken, über die in Investorenrunden zu weni
| # | Risiko | Kategorie | Schwere |
|---|--------|-----------|---------|
| 1 | Trend-/Modeerscheinung | Strategisch | Hoch |
| 2 | Baukostenüberschreitungen | Bau & Entwicklung | Hoch |
| 3 | Verzögerungen während des Baus | Bau & Entwicklung | Hoch |
| 4 | Vermieterproblem: Verkauf, Insolvenz, keine Verlängerung | Immobilie & Mietvertrag | Hoch |
| 5 | Neue Konkurrenz im Einzugsgebiet | Wettbewerb | MittelHoch |
| 6 | Schlüsselpersonen-Abhängigkeit | Betrieb | Mittel |
| 7 | Fachkräftemangel und Lohndruck | Betrieb | Mittel |
| 8 | Instandhaltungszyklen für Belag, Glas, Kunstrasen | Betrieb | Mittel |
| 9 | Energiepreisvolatilität | Finanzen | Mittel |
| 10 | Zinsänderungsrisiko | Finanzen | Mittel |
| 11 | Persönliche Bürgschaft | Finanzen | Hoch |
| 12 | Kundenkonzentration | Finanzen | Mittel |
| 13 | Lärmbeschwerden und behördliche Auflagen | Regulatorisch & Rechtlich | Mittel |
| 14 | Buchungsplattform-Abhängigkeit | Regulatorisch & Rechtlich | NiedrigMittel |
| 1 | Trend-/Modeerscheinung | Strategisch | <span class="severity severity--high">Hoch</span> |
| 2 | Baukostenüberschreitungen | Bau & Entwicklung | <span class="severity severity--high">Hoch</span> |
| 3 | Verzögerungen während des Baus | Bau & Entwicklung | <span class="severity severity--high">Hoch</span> |
| 4 | Vermieterproblem: Verkauf, Insolvenz, keine Verlängerung | Immobilie & Mietvertrag | <span class="severity severity--high">Hoch</span> |
| 5 | Neue Konkurrenz im Einzugsgebiet | Wettbewerb | <span class="severity severity--medium-high">MittelHoch</span> |
| 6 | Schlüsselpersonen-Abhängigkeit | Betrieb | <span class="severity severity--medium">Mittel</span> |
| 7 | Fachkräftemangel und Lohndruck | Betrieb | <span class="severity severity--medium">Mittel</span> |
| 8 | Instandhaltungszyklen für Belag, Glas, Kunstrasen | Betrieb | <span class="severity severity--medium">Mittel</span> |
| 9 | Energiepreisvolatilität | Finanzen | <span class="severity severity--medium">Mittel</span> |
| 10 | Zinsänderungsrisiko | Finanzen | <span class="severity severity--medium">Mittel</span> |
| 11 | Persönliche Bürgschaft | Finanzen | <span class="severity severity--high">Hoch</span> |
| 12 | Kundenkonzentration | Finanzen | <span class="severity severity--medium">Mittel</span> |
| 13 | Lärmbeschwerden und behördliche Auflagen | Regulatorisch & Rechtlich | <span class="severity severity--medium">Mittel</span> |
| 14 | Buchungsplattform-Abhängigkeit | Regulatorisch & Rechtlich | <span class="severity severity--low-medium">NiedrigMittel</span> |
---
@@ -133,9 +133,14 @@ Ihre Kosten steigen jedes Jahr um drei bis fünf Prozent. Können Sie diese Stei
## Sonderbox: Persönliche Bürgschaft — das unterschätzte Risiko Nr. 1
**Dieses Thema wird in fast jedem Gespräch über Padelhallen-Investitionen ausgelassen. Das ist ein Fehler.**
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">Dieses Thema wird in fast jedem Gespräch über Padelhallen-Investitionen ausgelassen. Das ist ein Fehler.</span>
<p>Banken, die einer Einzelanlage ohne Konzernrückhalt Kapital bereitstellen, verlangen in der Praxis fast immer eine persönliche Bürgschaft des oder der Hauptgesellschafter.</p>
</div>
</div>
Banken, die einer Einzelanlage ohne Konzernrückhalt Kapital bereitstellen, verlangen in der Praxis fast immer eine persönliche Bürgschaft des oder der Hauptgesellschafter. Das bedeutet: Wenn das Unternehmen in Zahlungsschwierigkeiten gerät, haftet nicht die GmbH allein — Sie haften persönlich. Mit dem Eigenheim. Mit dem Ersparten. Mit dem Depot.
Das bedeutet: Wenn das Unternehmen in Zahlungsschwierigkeiten gerät, haftet nicht die GmbH allein — Sie haften persönlich. Mit dem Eigenheim. Mit dem Ersparten. Mit dem Depot.
Die Struktur sieht dann typischerweise so aus:
@@ -176,13 +181,36 @@ Mittel- bis langfristig sollten Sie eine eigene Buchungsfähigkeit aufbauen —
Niemand kann alle Risiken eliminieren. Aber die Investoren, die langfristig erfolgreich sind, tun Folgendes:
**Sie rechnen mit den schlechten Szenarien, bevor sie das Gute annehmen.** Ein Businessplan, der nur das Base-Case zeigt, ist kein Werkzeug — er ist Wunschdenken. Rechnen Sie explizit durch: Was passiert bei 40 Prozent Auslastung? Bei einem Bauverzug von sechs Monaten? Bei einem neuen Wettbewerber in Jahr drei?
**Sie bauen Puffer ein, nicht als Komfortpolster, sondern als betriebliche Notwendigkeit.** Liquide Reserven von mindestens sechs Monaten Fixkosten sind kein Luxus.
**Sie sichern Mietverträge und Finanzierungskonditionen von Anfang an sorgfältig ab.** Die Kosten für gute Rechts- und Finanzberatung sind verglichen mit dem Downside verschwindend gering.
**Sie planen für Wettbewerb.** Nicht indem sie auf keine Konkurrenz hoffen, sondern indem sie ein Produkt aufbauen, das Stammkunden bindet — durch Qualität, Community und Dienstleistung.
<div class="article-cards">
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Schlechte Szenarien zuerst durchrechnen</span>
<p class="article-card__body">Ein Businessplan, der nur das Base-Case zeigt, ist kein Werkzeug — er ist Wunschdenken. Was passiert bei 40 Prozent Auslastung? Bei sechs Monaten Bauverzug? Bei einem neuen Wettbewerber in Jahr drei?</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Puffer als betriebliche Notwendigkeit</span>
<p class="article-card__body">Liquide Reserven von mindestens sechs Monaten Fixkosten sind kein Luxus, sondern Pflicht. Baukostenpuffer ist eine Budgetlinie — kein optionales Polster.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Verträge von Anfang an absichern</span>
<p class="article-card__body">Mietvertrag, Finanzierungskonditionen, Bürgschaftsumfang. Die Kosten für gute Rechts- und Finanzberatung in der Planungsphase sind verglichen mit dem Downside verschwindend gering.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Für Wettbewerb planen</span>
<p class="article-card__body">Nicht indem man auf keine Konkurrenz hofft, sondern indem man ein Produkt aufbaut, das Stammkunden bindet — durch Qualität, Community und Dienstleistungsqualität.</p>
</div>
</div>
</div>
---

View File

@@ -138,11 +138,29 @@ Das Ergebnis ist ein Gesamtscore pro Standort, der einen strukturierten Vergleic
Die acht Kriterien oben bewerten konkrete Objekte. Bevor Sie aber mit der Objektsuche beginnen, lohnt ein Schritt zurück: In welcher Entwicklungsphase befindet sich der Markt in Ihrer Zielstadt? Die Antwort bestimmt, welche Betreiberstrategie überhaupt Aussicht auf Erfolg hat.
**Etablierte Märkte**: Buchungsplattformen zeigen durchgehende Vollauslastung zu Stoßzeiten, Wartelisten sind verbreitet, und die Nachfrage ist über jeden Zweifel hinaus belegt. Die Herausforderung liegt nicht mehr in der Nachfrage — sie liegt im Wettbewerb. Etablierte Betreiber haben Markenloyalität aufgebaut, günstige Flächen sind längst vergeben, und Bau- sowie Mietkosten spiegeln die Nachfragesituation wider. Wer in einem solchen Markt neu eintritt, braucht einen echten Differenzierungsansatz: eine bessere Standortlage innerhalb der Stadt, ein überlegenes Hallenprofil oder ein Gastronomie- und Coaching-Angebot, das die bestehenden Anlagen nicht haben. Das Eintrittsinvestment ist hoch — das Ertragspotenzial bei konsequenter Umsetzung aber auch. München ist das paradigmatische Beispiel für Deutschland.
**Wachstumsmärkte**: Die Nachfrage wächst sichtbar — Buchungszeiten füllen sich an Wochenenden, neue Anlagen werden regelmäßig eröffnet, und der Sport erreicht lokale Medienöffentlichkeit. Das Angebot hat die Nachfrage noch nicht vollständig eingeholt; in bestimmten Stadtteilen oder im Umland sind Versorgungslücken erkennbar. Das Risikoprofil ist geringer als in Frühmärkten, aber das Fenster für attraktive Flächen zu vertretbaren Konditionen schließt sich. Wer wartet, bis der Markt offensichtlich attraktiv ist, zahlt für dieses Wissen einen Aufpreis — in Form höherer Mieten, weniger Auswahl und mehr Konkurrenz beim Eintritt.
**Frühmärkte**: Geringes aktuelles Angebot, eine kleine aber wachsende Spielerbasis und ein noch nicht hinreichend bekannter Sport — die Rahmenbedingungen für günstigen Markteintritt sind vorhanden, aber Nachfrage muss aktiv aufgebaut werden, nicht abgeschöpft. Mietkosten sind niedriger, Standortauswahl größer. Der limitierende Faktor ist Geduld und Marketingfähigkeit: Anfängerkurse, Vereinskooperationen, lokale Ligen und die Konversion bestehender Tennisclubs sind die Instrumente, mit denen Betreiber in Frühmärkten Community und damit Auslastung aufbauen. Der Weg zur ersten Profitabilität ist länger — aber die Wettbewerbsposition, die in den ersten zwei Betriebsjahren aufgebaut wird, erweist sich oft als strukturell dauerhaft.
<div class="article-cards">
<div class="article-card article-card--established">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Etablierte Märkte</span>
<p class="article-card__body">Buchungsplattformen zeigen durchgehende Vollauslastung zu Stoßzeiten, Wartelisten sind verbreitet. Die Herausforderung liegt im Wettbewerb: Etablierte Betreiber haben Markenloyalität aufgebaut, günstige Flächen sind vergeben. Neueintretende Betreiber brauchen echten Differenzierungsansatz. Eintrittsinvestment ist hoch — das Ertragspotenzial bei konsequenter Umsetzung ebenfalls. München ist das paradigmatische Beispiel.</p>
</div>
</div>
<div class="article-card article-card--growth">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Wachstumsmärkte</span>
<p class="article-card__body">Die Nachfrage wächst sichtbar — Buchungszeiten füllen sich, neue Anlagen werden eröffnet. Das Angebot hat die Nachfrage noch nicht eingeholt; Versorgungslücken sind erkennbar. Das Fenster für attraktive Flächen zu vertretbaren Konditionen schließt sich. Wer wartet, zahlt den Aufpreis des offensichtlich attraktiven Markts.</p>
</div>
</div>
<div class="article-card article-card--emerging">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Frühmärkte</span>
<p class="article-card__body">Geringes Angebot, kleine aber wachsende Spielerbasis. Mietkosten niedriger, Standortauswahl größer — aber Nachfrage muss aktiv aufgebaut werden. Anfängerkurse, Vereinskooperationen, lokale Ligen und Konversion von Tennisclubs sind die zentralen Instrumente. Der Weg zur Profitabilität ist länger; die aufgebaute Wettbewerbsposition erweist sich oft als dauerhaft.</p>
</div>
</div>
</div>
Bevor Sie in einer Stadt konkret nach Objekten suchen, sollten Sie deren Marktreife einordnen. Der Kriterienkatalog zeigt, ob ein bestimmtes Objekt geeignet ist; die Marktreife zeigt, welches Betreiberprofil und welche Strategie überhaupt die Voraussetzung für Erfolg ist.

View File

@@ -60,9 +60,10 @@ services:
environment:
- DATABASE_PATH=/app/data/app.db
- SERVING_DUCKDB_PATH=/app/data/pipeline/analytics.duckdb
- LANDING_DIR=/app/data/pipeline/landing
volumes:
- app-data:/app/data
- /opt/padelnomics/data:/app/data/pipeline:ro
- /data/padelnomics:/app/data/pipeline:ro
networks:
- net
healthcheck:
@@ -82,9 +83,10 @@ services:
environment:
- DATABASE_PATH=/app/data/app.db
- SERVING_DUCKDB_PATH=/app/data/pipeline/analytics.duckdb
- LANDING_DIR=/app/data/pipeline/landing
volumes:
- app-data:/app/data
- /opt/padelnomics/data:/app/data/pipeline:ro
- /data/padelnomics:/app/data/pipeline:ro
networks:
- net
@@ -98,9 +100,10 @@ services:
environment:
- DATABASE_PATH=/app/data/app.db
- SERVING_DUCKDB_PATH=/app/data/pipeline/analytics.duckdb
- LANDING_DIR=/app/data/pipeline/landing
volumes:
- app-data:/app/data
- /opt/padelnomics/data:/app/data/pipeline:ro
- /data/padelnomics:/app/data/pipeline:ro
networks:
- net
@@ -115,9 +118,10 @@ services:
environment:
- DATABASE_PATH=/app/data/app.db
- SERVING_DUCKDB_PATH=/app/data/pipeline/analytics.duckdb
- LANDING_DIR=/app/data/pipeline/landing
volumes:
- app-data:/app/data
- /opt/padelnomics/data:/app/data/pipeline:ro
- /data/padelnomics:/app/data/pipeline:ro
networks:
- net
healthcheck:
@@ -137,9 +141,10 @@ services:
environment:
- DATABASE_PATH=/app/data/app.db
- SERVING_DUCKDB_PATH=/app/data/pipeline/analytics.duckdb
- LANDING_DIR=/app/data/pipeline/landing
volumes:
- app-data:/app/data
- /opt/padelnomics/data:/app/data/pipeline:ro
- /data/padelnomics:/app/data/pipeline:ro
networks:
- net
@@ -153,9 +158,10 @@ services:
environment:
- DATABASE_PATH=/app/data/app.db
- SERVING_DUCKDB_PATH=/app/data/pipeline/analytics.duckdb
- LANDING_DIR=/app/data/pipeline/landing
volumes:
- app-data:/app/data
- /opt/padelnomics/data:/app/data/pipeline:ro
- /data/padelnomics:/app/data/pipeline:ro
networks:
- net

View File

@@ -21,6 +21,7 @@ extract-census-usa = "padelnomics_extract.census_usa:main"
extract-census-usa-income = "padelnomics_extract.census_usa_income:main"
extract-ons-uk = "padelnomics_extract.ons_uk:main"
extract-geonames = "padelnomics_extract.geonames:main"
extract-gisco = "padelnomics_extract.gisco:main"
[build-system]
requires = ["hatchling"]

View File

@@ -11,9 +11,12 @@ from datetime import UTC, datetime
from pathlib import Path
import niquests
from dotenv import load_dotenv
from .utils import end_run, open_state_db, start_run
load_dotenv()
LANDING_DIR = Path(os.environ.get("LANDING_DIR", "data/landing"))
HTTP_TIMEOUT_SECONDS = 30

View File

@@ -7,7 +7,7 @@ A graphlib.TopologicalSorter schedules them: tasks with no unmet dependencies
run immediately in parallel; each completion may unlock new tasks.
Current dependency graph:
- All 8 non-availability extractors have no dependencies (run in parallel)
- All 9 non-availability extractors have no dependencies (run in parallel)
- playtomic_availability depends on playtomic_tenants (starts as soon as
tenants finishes, even if other extractors are still running)
"""
@@ -26,6 +26,8 @@ from .eurostat_city_labels import EXTRACTOR_NAME as EUROSTAT_CITY_LABELS_NAME
from .eurostat_city_labels import extract as extract_eurostat_city_labels
from .geonames import EXTRACTOR_NAME as GEONAMES_NAME
from .geonames import extract as extract_geonames
from .gisco import EXTRACTOR_NAME as GISCO_NAME
from .gisco import extract as extract_gisco
from .ons_uk import EXTRACTOR_NAME as ONS_UK_NAME
from .ons_uk import extract as extract_ons_uk
from .overpass import EXTRACTOR_NAME as OVERPASS_NAME
@@ -50,6 +52,7 @@ EXTRACTORS: dict[str, tuple] = {
CENSUS_USA_INCOME_NAME: (extract_census_usa_income, []),
ONS_UK_NAME: (extract_ons_uk, []),
GEONAMES_NAME: (extract_geonames, []),
GISCO_NAME: (extract_gisco, []),
TENANTS_NAME: (extract_tenants, []),
AVAILABILITY_NAME: (extract_availability, [TENANTS_NAME]),
}

View File

@@ -0,0 +1,95 @@
"""GISCO NUTS-2 boundary GeoJSON extractor.
Downloads NUTS-2 boundary polygons from Eurostat GISCO. The file is stored
uncompressed because DuckDB's ST_Read cannot read gzipped files.
NUTS classification revises approximately every 7 years (current: 2021).
The partition path is fixed to the revision year, not the run date, making
the source version explicit. Cursor tracking still uses year_month to avoid
re-downloading on every monthly run.
Landing: {LANDING_DIR}/gisco/2024/01/nuts2_boundaries.geojson (~5 MB, uncompressed)
"""
import sqlite3
from pathlib import Path
import niquests
from ._shared import HTTP_TIMEOUT_SECONDS, run_extractor, setup_logging
from .utils import get_last_cursor
logger = setup_logging("padelnomics.extract.gisco")
EXTRACTOR_NAME = "gisco"
# NUTS 2021 revision, 20M scale (1:20,000,000), WGS84 (EPSG:4326), LEVL_2 only.
# 20M resolution gives simplified polygons that are fast for point-in-polygon
# matching without sacrificing accuracy at the NUTS-2 boundary level.
GISCO_URL = (
"https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/"
"NUTS_RG_20M_2021_4326_LEVL_2.geojson"
)
# Fixed partition: NUTS boundaries are a static reference file, not time-series data.
# The 2024/01 partition reflects when this NUTS 2021 dataset was first ingested.
DEST_REL = Path("gisco/2024/01/nuts2_boundaries.geojson")
_GISCO_TIMEOUT_SECONDS = HTTP_TIMEOUT_SECONDS * 4 # ~5 MB; generous for slow upstreams
def extract(
landing_dir: Path,
year_month: str,
conn: sqlite3.Connection,
session: niquests.Session,
) -> dict:
"""Download NUTS-2 GeoJSON. Skips if already run this month or file exists."""
last_cursor = get_last_cursor(conn, EXTRACTOR_NAME)
if last_cursor == year_month:
logger.info("already ran for %s — skipping", year_month)
return {"files_written": 0, "files_skipped": 1, "bytes_written": 0}
dest = landing_dir / DEST_REL
if dest.exists():
logger.info("file already exists (skipping download): %s", dest)
return {
"files_written": 0,
"files_skipped": 1,
"bytes_written": 0,
"cursor_value": year_month,
}
dest.parent.mkdir(parents=True, exist_ok=True)
logger.info("GET %s", GISCO_URL)
resp = session.get(GISCO_URL, timeout=_GISCO_TIMEOUT_SECONDS)
resp.raise_for_status()
content = resp.content
assert len(content) > 100_000, (
f"GeoJSON too small ({len(content)} bytes) — download may have failed"
)
assert b'"FeatureCollection"' in content, "Response does not look like GeoJSON"
# Write uncompressed — ST_Read requires a plain file, not .gz
tmp = dest.with_suffix(".geojson.tmp")
tmp.write_bytes(content)
tmp.rename(dest)
size_mb = len(content) / 1_000_000
logger.info("written %s (%.1f MB)", dest, size_mb)
return {
"files_written": 1,
"files_skipped": 0,
"bytes_written": len(content),
"cursor_value": year_month,
}
def main() -> None:
run_extractor(EXTRACTOR_NAME, extract)
if __name__ == "__main__":
main()

View File

@@ -213,9 +213,10 @@ def _fetch_venues_parallel(
completed_count = 0
lock = threading.Lock()
def _worker(tenant_id: str) -> dict | None:
def _worker(tenant_id: str) -> tuple[str | None, dict | None]:
proxy_url = cycler["next_proxy"]()
return _fetch_venue_availability(tenant_id, start_min_str, start_max_str, proxy_url)
result = _fetch_venue_availability(tenant_id, start_min_str, start_max_str, proxy_url)
return proxy_url, result
with ThreadPoolExecutor(max_workers=worker_count) as pool:
for batch_start in range(0, len(tenant_ids), PARALLEL_BATCH_SIZE):
@@ -231,17 +232,17 @@ def _fetch_venues_parallel(
batch_futures = {pool.submit(_worker, tid): tid for tid in batch}
for future in as_completed(batch_futures):
result = future.result()
proxy_url, result = future.result()
with lock:
completed_count += 1
if result is not None:
venues_data.append(result)
cycler["record_success"]()
cycler["record_success"](proxy_url)
if on_result is not None:
on_result(result)
else:
venues_errored += 1
cycler["record_failure"]()
cycler["record_failure"](proxy_url)
if completed_count % 500 == 0:
logger.info(
@@ -336,16 +337,17 @@ def extract(
else:
logger.info("Serial mode: 1 worker, %d venues", len(venues_to_process))
for i, tenant_id in enumerate(venues_to_process):
proxy_url = cycler["next_proxy"]()
result = _fetch_venue_availability(
tenant_id, start_min_str, start_max_str, cycler["next_proxy"](),
tenant_id, start_min_str, start_max_str, proxy_url,
)
if result is not None:
new_venues_data.append(result)
cycler["record_success"]()
cycler["record_success"](proxy_url)
_on_result(result)
else:
venues_errored += 1
cycler["record_failure"]()
cycler["record_failure"](proxy_url)
if cycler["is_exhausted"]():
logger.error("All proxy tiers exhausted — writing partial results")
break
@@ -432,8 +434,10 @@ def _find_venues_with_upcoming_slots(
if not start_time_str:
continue
try:
# Parse "2026-02-24T17:00:00" format
slot_start = datetime.fromisoformat(start_time_str).replace(tzinfo=UTC)
# start_time is "HH:MM:SS"; combine with resource's start_date
start_date = resource.get("start_date", "")
full_dt = f"{start_date}T{start_time_str}" if start_date else start_time_str
slot_start = datetime.fromisoformat(full_dt).replace(tzinfo=UTC)
if window_start <= slot_start < window_end:
tenant_ids.add(tid)
break # found one upcoming slot, no need to check more
@@ -500,13 +504,14 @@ def extract_recheck(
venues_data = []
venues_errored = 0
for tid in venues_to_recheck:
result = _fetch_venue_availability(tid, start_min_str, start_max_str, cycler["next_proxy"]())
proxy_url = cycler["next_proxy"]()
result = _fetch_venue_availability(tid, start_min_str, start_max_str, proxy_url)
if result is not None:
venues_data.append(result)
cycler["record_success"]()
cycler["record_success"](proxy_url)
else:
venues_errored += 1
cycler["record_failure"]()
cycler["record_failure"](proxy_url)
if cycler["is_exhausted"]():
logger.error("All proxy tiers exhausted — writing partial recheck results")
break
@@ -517,6 +522,10 @@ def extract_recheck(
dest_dir = landing_path(landing_dir, "playtomic", year, month)
dest = dest_dir / f"availability_{target_date}_recheck_{recheck_hour:02d}.jsonl.gz"
if not venues_data:
logger.warning("Recheck fetched 0 venues (%d errors) — skipping file write", venues_errored)
return {"files_written": 0, "files_skipped": 0, "bytes_written": 0}
captured_at = datetime.now(UTC).strftime("%Y-%m-%dT%H:%M:%SZ")
working_path = dest.with_suffix("").with_suffix(".working.jsonl")
with open(working_path, "w") as f:

View File

@@ -10,11 +10,11 @@ API notes (discovered 2026-02):
- `size=100` is the maximum effective page size
- ~14K venues globally as of Feb 2026
Parallel mode: when PROXY_URLS is set, fires batch_size = len(proxy_urls)
pages concurrently. Each page gets its own fresh session + proxy. Pages beyond
the last one return empty lists (safe — just triggers the done condition).
Without proxies, falls back to single-threaded with THROTTLE_SECONDS between
pages.
Parallel mode: when proxy tiers are configured, fires BATCH_SIZE pages
concurrently. Each page gets its own fresh session + proxy from the tiered
cycler. On failure the cycler escalates through free → datacenter →
residential tiers. Without proxies, falls back to single-threaded with
THROTTLE_SECONDS between pages.
Rate: 1 req / 2 s per IP (see docs/data-sources-inventory.md §1.2).
@@ -22,6 +22,7 @@ Landing: {LANDING_DIR}/playtomic/{year}/{month}/tenants.jsonl.gz
"""
import json
import os
import sqlite3
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
@@ -31,7 +32,7 @@ from pathlib import Path
import niquests
from ._shared import HTTP_TIMEOUT_SECONDS, run_extractor, setup_logging, ua_for_proxy
from .proxy import load_proxy_tiers, make_round_robin_cycler
from .proxy import load_proxy_tiers, make_tiered_cycler
from .utils import compress_jsonl_atomic, landing_path
logger = setup_logging("padelnomics.extract.playtomic_tenants")
@@ -42,6 +43,9 @@ PLAYTOMIC_TENANTS_URL = "https://api.playtomic.io/v1/tenants"
THROTTLE_SECONDS = 2
PAGE_SIZE = 100
MAX_PAGES = 500 # safety bound — ~50K venues max, well above current ~14K
BATCH_SIZE = 20 # concurrent pages per batch (fixed, independent of proxy count)
CIRCUIT_BREAKER_THRESHOLD = int(os.environ.get("CIRCUIT_BREAKER_THRESHOLD") or "10")
MAX_PAGE_ATTEMPTS = 5 # max retries per individual page before giving up
def _fetch_one_page(proxy_url: str | None, page: int) -> tuple[int, list[dict]]:
@@ -61,22 +65,57 @@ def _fetch_one_page(proxy_url: str | None, page: int) -> tuple[int, list[dict]]:
return (page, tenants)
def _fetch_pages_parallel(pages: list[int], next_proxy) -> list[tuple[int, list[dict]]]:
"""Fetch multiple pages concurrently. Returns [(page_num, tenants_list), ...]."""
def _fetch_page_via_cycler(cycler: dict, page: int) -> tuple[int, list[dict]]:
"""Fetch a single page, retrying across proxy tiers via the circuit breaker.
On each attempt, pulls the next proxy from the active tier. Records
success/failure so the circuit breaker can escalate tiers. Raises
RuntimeError if all tiers are exhausted or MAX_PAGE_ATTEMPTS is exceeded.
"""
last_exc: Exception | None = None
for attempt in range(MAX_PAGE_ATTEMPTS):
proxy_url = cycler["next_proxy"]()
if proxy_url is None: # all tiers exhausted
raise RuntimeError(f"All proxy tiers exhausted fetching page {page}")
try:
result = _fetch_one_page(proxy_url, page)
cycler["record_success"](proxy_url)
return result
except Exception as exc:
last_exc = exc
logger.warning(
"Page %d attempt %d/%d failed (proxy=%s): %s",
page,
attempt + 1,
MAX_PAGE_ATTEMPTS,
proxy_url,
exc,
)
cycler["record_failure"](proxy_url)
if cycler["is_exhausted"]():
raise RuntimeError(f"All proxy tiers exhausted fetching page {page}") from exc
raise RuntimeError(f"Page {page} failed after {MAX_PAGE_ATTEMPTS} attempts") from last_exc
def _fetch_pages_parallel(pages: list[int], cycler: dict) -> list[tuple[int, list[dict]]]:
"""Fetch multiple pages concurrently using the tiered cycler.
Returns [(page_num, tenants_list), ...]. Raises if any page exhausts all tiers.
"""
with ThreadPoolExecutor(max_workers=len(pages)) as pool:
futures = [pool.submit(_fetch_one_page, next_proxy(), p) for p in pages]
futures = [pool.submit(_fetch_page_via_cycler, cycler, p) for p in pages]
return [f.result() for f in as_completed(futures)]
def extract(
landing_dir: Path,
year_month: str, # noqa: ARG001 — unused; tenants uses ISO week partition instead
year_month: str, # noqa: ARG001 — unused; tenants uses daily partition instead
conn: sqlite3.Connection,
session: niquests.Session,
) -> dict:
"""Fetch all Playtomic venues via global pagination. Returns run metrics.
Partitioned by ISO week (e.g. 2026/W09) so each weekly run produces a
Partitioned by day (e.g. 2026/03/01) so each daily run produces a
fresh file. _load_tenant_ids() in playtomic_availability globs across all
partitions and picks the most recent one.
"""
@@ -89,12 +128,16 @@ def extract(
return {"files_written": 0, "files_skipped": 1, "bytes_written": 0}
tiers = load_proxy_tiers()
all_proxies = [url for tier in tiers for url in tier]
next_proxy = make_round_robin_cycler(all_proxies) if all_proxies else None
batch_size = len(all_proxies) if all_proxies else 1
cycler = make_tiered_cycler(tiers, CIRCUIT_BREAKER_THRESHOLD) if tiers else None
batch_size = BATCH_SIZE if cycler else 1
if next_proxy:
logger.info("Parallel mode: %d pages per batch (%d proxies across %d tier(s))", batch_size, len(all_proxies), len(tiers))
if cycler:
logger.info(
"Parallel mode: %d pages/batch, %d tier(s), threshold=%d",
batch_size,
cycler["tier_count"](),
CIRCUIT_BREAKER_THRESHOLD,
)
else:
logger.info("Serial mode: 1 page at a time (no proxies)")
@@ -104,15 +147,33 @@ def extract(
done = False
while not done and page < MAX_PAGES:
if cycler and cycler["is_exhausted"]():
logger.error(
"All proxy tiers exhausted — stopping at page %d (%d venues collected)",
page,
len(all_tenants),
)
break
batch_end = min(page + batch_size, MAX_PAGES)
pages_to_fetch = list(range(page, batch_end))
if next_proxy and len(pages_to_fetch) > 1:
if cycler and len(pages_to_fetch) > 1:
logger.info(
"Fetching pages %d-%d in parallel (%d workers, total so far: %d)",
page, batch_end - 1, len(pages_to_fetch), len(all_tenants),
page,
batch_end - 1,
len(pages_to_fetch),
len(all_tenants),
)
results = _fetch_pages_parallel(pages_to_fetch, next_proxy)
try:
results = _fetch_pages_parallel(pages_to_fetch, cycler)
except RuntimeError:
logger.error(
"Proxy tiers exhausted mid-batch — writing partial results (%d venues)",
len(all_tenants),
)
break
else:
# Serial: reuse the shared session, throttle between pages
page_num = pages_to_fetch[0]
@@ -126,7 +187,7 @@ def extract(
)
results = [(page_num, tenants)]
# Process pages in order so the done-detection on < PAGE_SIZE is deterministic
# Process pages in order so done-detection on < PAGE_SIZE is deterministic
for p, tenants in sorted(results):
new_count = 0
for tenant in tenants:
@@ -137,7 +198,11 @@ def extract(
new_count += 1
logger.info(
"page=%d got=%d new=%d total=%d", p, len(tenants), new_count, len(all_tenants),
"page=%d got=%d new=%d total=%d",
p,
len(tenants),
new_count,
len(all_tenants),
)
# Last page — fewer than PAGE_SIZE results means we've exhausted the list
@@ -146,7 +211,7 @@ def extract(
break
page = batch_end
if not next_proxy:
if not cycler:
time.sleep(THROTTLE_SECONDS)
# Write each tenant as a JSONL line, then compress atomically

View File

@@ -3,10 +3,9 @@
Proxies are configured via environment variables. When unset, all functions
return None/no-op — extractors fall back to direct requests.
Three-tier escalation: free → datacenter → residential.
Tier 1 (free): WEBSHARE_DOWNLOAD_URL — auto-fetched from Webshare API
Tier 2 (datacenter): PROXY_URLS_DATACENTER — comma-separated paid DC proxies
Tier 3 (residential): PROXY_URLS_RESIDENTIAL — comma-separated paid residential proxies
Two-tier escalation: datacenter → residential.
Tier 1 (datacenter): PROXY_URLS_DATACENTER — comma-separated paid DC proxies
Tier 2 (residential): PROXY_URLS_RESIDENTIAL — comma-separated paid residential proxies
Tiered circuit breaker:
Active tier is used until consecutive failures >= threshold, then escalates
@@ -69,27 +68,26 @@ def fetch_webshare_proxies(download_url: str, max_proxies: int = MAX_WEBSHARE_PR
def load_proxy_tiers() -> list[list[str]]:
"""Assemble proxy tiers in escalation order: free → datacenter → residential.
"""Assemble proxy tiers in escalation order: datacenter → residential.
Tier 1 (free): fetched from WEBSHARE_DOWNLOAD_URL if set.
Tier 2 (datacenter): PROXY_URLS_DATACENTER (comma-separated).
Tier 3 (residential): PROXY_URLS_RESIDENTIAL (comma-separated).
Tier 1 (datacenter): PROXY_URLS_DATACENTER (comma-separated).
Tier 2 (residential): PROXY_URLS_RESIDENTIAL (comma-separated).
Empty tiers are omitted. Returns [] if no proxies configured anywhere.
"""
tiers: list[list[str]] = []
webshare_url = os.environ.get("WEBSHARE_DOWNLOAD_URL", "").strip()
if webshare_url:
free_proxies = fetch_webshare_proxies(webshare_url)
if free_proxies:
tiers.append(free_proxies)
for var in ("PROXY_URLS_DATACENTER", "PROXY_URLS_RESIDENTIAL"):
raw = os.environ.get(var, "")
urls = [u.strip() for u in raw.split(",") if u.strip()]
if urls:
tiers.append(urls)
valid = []
for url in urls:
if not url.startswith(("http://", "https://")):
logger.warning("%s contains URL without scheme, skipping: %s", var, url[:60])
continue
valid.append(url)
if valid:
tiers.append(valid)
return tiers
@@ -134,8 +132,8 @@ def make_sticky_selector(proxy_urls: list[str]):
return select_proxy
def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
"""Thread-safe N-tier proxy cycler with circuit breaker.
def make_tiered_cycler(tiers: list[list[str]], threshold: int, proxy_failure_limit: int = 3) -> dict:
"""Thread-safe N-tier proxy cycler with circuit breaker and per-proxy dead tracking.
Uses tiers[0] until consecutive failures >= threshold, then escalates
to tiers[1], then tiers[2], etc. Once all tiers are exhausted,
@@ -144,13 +142,28 @@ def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
Failure counter resets on each escalation — the new tier gets a fresh start.
Once exhausted, further record_failure() calls are no-ops.
Per-proxy dead tracking (when proxy_failure_limit > 0):
Individual proxies are marked dead after proxy_failure_limit failures and
skipped by next_proxy(). If all proxies in the active tier are dead,
next_proxy() auto-escalates to the next tier. Both mechanisms coexist:
per-proxy dead tracking removes broken individuals; tier-level threshold
catches systemic failure even before any single proxy hits the limit.
Stale-failure protection:
With parallel workers, some threads may fetch a proxy just before the tier
escalates and report failure after. record_failure(proxy_url) checks which
tier the proxy belongs to and ignores the tier-level circuit breaker if the
proxy is from an already-escalated tier. This prevents in-flight failures
from a dead tier instantly exhausting the freshly-escalated one.
Returns a dict of callables:
next_proxy() -> str | None — URL from the active tier, or None
record_success() -> None — resets consecutive failure counter
record_failure() -> bool — True if just escalated to next tier
next_proxy() -> str | None — URL from active tier (skips dead), or None
record_success(proxy_url=None) -> None — resets consecutive failure counter
record_failure(proxy_url=None) -> bool — True if just escalated to next tier
is_exhausted() -> bool — True if all tiers exhausted
active_tier_index() -> int — 0-based index of current tier
tier_count() -> int — total number of tiers
dead_proxy_count() -> int — number of individual proxies marked dead
Edge cases:
Empty tiers list: next_proxy() always returns None, is_exhausted() True.
@@ -158,32 +171,97 @@ def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
"""
assert threshold > 0, f"threshold must be positive, got {threshold}"
assert isinstance(tiers, list), f"tiers must be a list, got {type(tiers)}"
assert proxy_failure_limit >= 0, f"proxy_failure_limit must be >= 0, got {proxy_failure_limit}"
# Reverse map: proxy URL -> tier index. Used in record_failure to ignore
# "in-flight" failures from workers that fetched a proxy before escalation —
# those failures belong to the old tier and must not count against the new one.
proxy_to_tier_idx: dict[str, int] = {
url: tier_idx
for tier_idx, tier in enumerate(tiers)
for url in tier
}
lock = threading.Lock()
cycles = [itertools.cycle(t) for t in tiers]
state = {
"active_tier": 0,
"consecutive_failures": 0,
"proxy_failure_counts": {}, # proxy_url -> int
"dead_proxies": set(), # proxy URLs marked dead
}
def next_proxy() -> str | None:
with lock:
idx = state["active_tier"]
if idx >= len(cycles):
return None
return next(cycles[idx])
# Try each remaining tier (bounded: at most len(tiers) escalations)
for _ in range(len(tiers) + 1):
idx = state["active_tier"]
if idx >= len(cycles):
return None
def record_success() -> None:
tier_proxies = tiers[idx]
tier_len = len(tier_proxies)
# Find a live proxy in this tier (bounded: try each proxy at most once)
for _ in range(tier_len):
candidate = next(cycles[idx])
if candidate not in state["dead_proxies"]:
return candidate
# All proxies in this tier are dead — auto-escalate
state["consecutive_failures"] = 0
state["active_tier"] += 1
new_idx = state["active_tier"]
if new_idx < len(tiers):
logger.warning(
"All proxies in tier %d are dead — auto-escalating to tier %d/%d",
idx + 1,
new_idx + 1,
len(tiers),
)
else:
logger.error(
"All proxies in all %d tier(s) are dead — no more fallbacks",
len(tiers),
)
return None # safety fallback
def record_success(proxy_url: str | None = None) -> None:
with lock:
state["consecutive_failures"] = 0
if proxy_url is not None:
state["proxy_failure_counts"][proxy_url] = 0
def record_failure() -> bool:
def record_failure(proxy_url: str | None = None) -> bool:
"""Increment failure counter. Returns True if just escalated to next tier."""
with lock:
# Per-proxy dead tracking (additional to tier-level circuit breaker)
if proxy_url is not None and proxy_failure_limit > 0:
count = state["proxy_failure_counts"].get(proxy_url, 0) + 1
state["proxy_failure_counts"][proxy_url] = count
if count >= proxy_failure_limit and proxy_url not in state["dead_proxies"]:
state["dead_proxies"].add(proxy_url)
logger.warning(
"Proxy %s marked dead after %d consecutive failures",
proxy_url,
count,
)
# Tier-level circuit breaker (existing behavior)
idx = state["active_tier"]
if idx >= len(tiers):
# Already exhausted — no-op
return False
# Ignore failures from proxies that belong to an already-escalated tier.
# With parallel workers, some threads fetch a proxy just before escalation
# and report back after — those stale failures must not penalise the new tier.
if proxy_url is not None:
proxy_tier = proxy_to_tier_idx.get(proxy_url)
if proxy_tier is not None and proxy_tier < idx:
return False
state["consecutive_failures"] += 1
if state["consecutive_failures"] < threshold:
return False
@@ -219,6 +297,10 @@ def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
def tier_count() -> int:
return len(tiers)
def dead_proxy_count() -> int:
with lock:
return len(state["dead_proxies"])
return {
"next_proxy": next_proxy,
"record_success": record_success,
@@ -226,4 +308,5 @@ def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
"is_exhausted": is_exhausted,
"active_tier_index": active_tier_index,
"tier_count": tier_count,
"dead_proxy_count": dead_proxy_count,
}

View File

@@ -54,6 +54,40 @@ chmod 600 "${REPO_DIR}/.env"
sudo -u "${SERVICE_USER}" bash -c "cd ${REPO_DIR} && ${UV} sync --all-packages"
# ── rclone config (r2-landing remote) ────────────────────────────────────────
_env_get() { grep -E "^${1}=" "${REPO_DIR}/.env" 2>/dev/null | head -1 | cut -d= -f2- | tr -d '"'"'" || true; }
R2_LANDING_KEY=$(_env_get R2_LANDING_ACCESS_KEY_ID)
R2_LANDING_SECRET=$(_env_get R2_LANDING_SECRET_ACCESS_KEY)
R2_ENDPOINT=$(_env_get R2_ENDPOINT)
if [ -n "${R2_LANDING_KEY}" ] && [ -n "${R2_LANDING_SECRET}" ] && [ -n "${R2_ENDPOINT}" ]; then
RCLONE_CONF_DIR="/home/${SERVICE_USER}/.config/rclone"
RCLONE_CONF="${RCLONE_CONF_DIR}/rclone.conf"
sudo -u "${SERVICE_USER}" mkdir -p "${RCLONE_CONF_DIR}"
grep -v '^\[r2-landing\]' "${RCLONE_CONF}" 2>/dev/null > "${RCLONE_CONF}.tmp" || true
cat >> "${RCLONE_CONF}.tmp" <<EOF
[r2-landing]
type = s3
provider = Cloudflare
access_key_id = ${R2_LANDING_KEY}
secret_access_key = ${R2_LANDING_SECRET}
endpoint = ${R2_ENDPOINT}
acl = private
no_check_bucket = true
EOF
mv "${RCLONE_CONF}.tmp" "${RCLONE_CONF}"
chown "${SERVICE_USER}:${SERVICE_USER}" "${RCLONE_CONF}"
chmod 600 "${RCLONE_CONF}"
echo "$(date '+%H:%M:%S') ==> rclone [r2-landing] remote configured."
else
echo "$(date '+%H:%M:%S') ==> R2_LANDING_* not set — skipping rclone config."
fi
# ── Systemd services ──────────────────────────────────────────────────────────
cp "${REPO_DIR}/infra/landing-backup/padelnomics-landing-backup.service" /etc/systemd/system/

View File

@@ -7,15 +7,5 @@ Wants=network-online.target
Type=oneshot
User=padelnomics_service
EnvironmentFile=/opt/padelnomics/.env
Environment=LANDING_DIR=/data/padelnomics/landing
ExecStart=/usr/bin/rclone sync ${LANDING_DIR} :s3:${LITESTREAM_R2_BUCKET}/padelnomics/landing \
--s3-provider Cloudflare \
--s3-access-key-id ${LITESTREAM_R2_ACCESS_KEY_ID} \
--s3-secret-access-key ${LITESTREAM_R2_SECRET_ACCESS_KEY} \
--s3-endpoint https://${LITESTREAM_R2_ENDPOINT} \
--s3-no-check-bucket \
--exclude ".state.sqlite*"
StandardOutput=journal
StandardError=journal
SyslogIdentifier=padelnomics-landing-backup
ExecStart=/bin/sh -c 'exec /usr/bin/rclone sync /data/padelnomics/landing/ r2-landing:${R2_LANDING_BUCKET}/padelnomics/ --log-level INFO --exclude ".state.sqlite*"'
TimeoutStartSec=1800

View File

@@ -39,3 +39,23 @@ module = "padelnomics_extract.playtomic_availability"
entry = "main_recheck"
schedule = "0,30 6-23 * * *"
depends_on = ["playtomic_availability"]
[census_usa]
module = "padelnomics_extract.census_usa"
schedule = "monthly"
[census_usa_income]
module = "padelnomics_extract.census_usa_income"
schedule = "monthly"
[eurostat_city_labels]
module = "padelnomics_extract.eurostat_city_labels"
schedule = "monthly"
[ons_uk]
module = "padelnomics_extract.ons_uk"
schedule = "monthly"
[gisco]
module = "padelnomics_extract.gisco"
schedule = "monthly"

View File

@@ -1,4 +1,35 @@
# Building a Padel Hall — Complete Guide
# Padel Hall — Question Bank & Gap Analysis
> **What this file is**: A structured question bank covering the full universe of questions a padel hall entrepreneur needs to answer — from concept to exit. It is **not** an article for publication.
>
> **Purpose**: Gap analysis — identify which questions Padelnomics already answers (planner, city articles, pipeline data, business plan PDF) and which are unanswered gaps we could fill to improve product value.
>
> **Coverage legend**:
> - `ANSWERED` — fully covered by the planner, city articles, or BP export
> - `PARTIAL` — partially addressed; notable gap or missing depth
> - `GAP` — not addressed at all; actionable opportunity
---
## Gap Analysis Summary
| Tier | Gap | Estimated Impact | Status |
|------|-----|-----------------|--------|
| 1 | Subsidies & grants (Germany) | High | Not in product; data exists in `research/padel-hall-economics.md` |
| 1 | Buyer segmentation (sports club / commercial / hotel / franchise) | High | Not in planner; segmentation table exists in research |
| 1 | Indoor vs outdoor decision framework | High | Planner models both; no comparison table or decision guide |
| 1 | OPEX benchmarks shown inline | Medium-High | Planner has inputs; defaults not visually benchmarked |
| 2 | Booking platform strategy (Playtomic vs Matchi vs custom) | Medium | Zero guidance; we scrape Playtomic so know it well |
| 2 | Depreciation & tax shield | Medium | All calcs pre-tax; Germany: 30% effective, 7yr courts |
| 2 | Legal & regulatory checklist (Germany) | Medium | Only permit cost line; Bauantrag, TA Lärm, GmbH etc. missing |
| 2 | Court supplier selection framework | Medium | Supplier directory exists; no evaluation criteria |
| 2 | Staffing plan template | Medium | BP has narrative field; no structured role × FTE × salary |
| 3 | Zero-court location pages (white-space pSEO) | High data value | `location_opportunity_profile` scores them; none published |
| 3 | Pre-opening / marketing playbook | Low-Medium | Out of scope; static article possible |
| 3 | Catchment area isochrones (drive-time) | Low | Heavy lift; `nearest_padel_court_km` is straight-line only |
| 3 | Trend/fad risk quantification | Low | Inherently speculative |
---
## Table of Contents
@@ -16,6 +47,8 @@
### Market & Demand
> **COVERAGE: PARTIAL** — Venue counts, density (venues/100K), Market Score, and Opportunity Score per city are all answered by pipeline data (`location_opportunity_profile`) and surfaced in city articles. Missing: actual player counts, competitor utilization rates, household income / age demographics for the catchment area. No drive-time isochrone analysis (Tier 3 gap).
- How many padel players are in your target area? Is the sport growing locally or are you betting on future adoption?
- What's the competitive landscape — how many existing courts within a 2030 minute drive radius? Are they full? What are their peak/off-peak utilization rates?
- What's the demographic profile of your catchment area (income, age, sports participation)?
@@ -23,6 +56,8 @@
### Site & Location
> **COVERAGE: GAP** — The planner has a rent/land cost input and a `own` toggle for buy vs lease, but there is no guidance on site selection criteria (ceiling height, column spacing, zoning classification, parking ratios). A static article or checklist would cover this. See also Tier 2 gap: legal/regulatory checklist.
- Do you want to build new (greenfield), convert an existing building (warehouse, industrial hall), or add to an existing sports complex?
- What zoning and building regulations apply? Is a padel hall classified as sports, leisure, commercial?
- What's the required ceiling height? (Minimum ~810m for indoor padel, ideally 10m+)
@@ -30,6 +65,8 @@
### Product & Scope
> **COVERAGE: PARTIAL** — Court count is fully answered (planner supports 112 courts, sensitivity analysis included). Ancillary revenue streams (coaching, F&B, pro shop, events, memberships, corporate) are modelled. Indoor vs outdoor is modelled but there is no structured decision framework comparing CAPEX, revenue ceiling, seasonal risk, noise, and permits (Tier 1 gap #3). Quality level / positioning is not addressed.
- How many courts? (Typically 48 is the sweet spot for a standalone hall; fewer than 4 struggles with profitability, more than 8 requires very strong demand)
- Indoor only, outdoor, or hybrid with a retractable/seasonal structure?
- What ancillary offerings: pro shop, café/bar/lounge, fitness area, changing rooms, padel school/academy?
@@ -37,6 +74,8 @@
### Financial
> **COVERAGE: ANSWERED** — All four questions are directly answered by the planner: equity/debt split, rent/land cost, real peak/off-peak prices per city (from Playtomic via `planner_defaults`), utilization ramp curve (Year 15), and breakeven utilization (sensitivity grid).
- What's your total budget, and what's the split between equity and debt?
- What rental or land purchase cost can you sustain?
- What are realistic court booking prices in your market?
@@ -45,6 +84,8 @@
### Legal & Organizational
> **COVERAGE: GAP** — Only a permit cost line item exists in CAPEX. No entity guidance (GmbH vs UG vs Verein), no permit checklist, no license types, no insurance guidance. A Germany-first legal/regulatory checklist (Bauantrag, Nutzungsänderung, TA Lärm, Gewerbeerlaubnis, §4 Nr. 22 UStG sports VAT exemption) would be high-value static content (Tier 2 gap #7). Buyer segmentation (sports club vs. commercial) affects entity choice and grant eligibility (Tier 1 gap #2).
- What legal entity will you use?
- Do you need partners (operational, financial, franchise)?
- What permits, licenses, and insurance do you need?
@@ -56,6 +97,10 @@
### Phase 1: Feasibility & Concept (Month 13)
> **COVERAGE: ANSWERED** — This phase is fully supported. Market research → city articles (venue density, Market Score, Opportunity Score). Concept development → planner inputs. Location scouting → city articles + planner. Preliminary financial model → planner. Go/no-go → planner output (EBITDA, IRR, NPV).
>
> Missing: Buyer segmentation (Tier 1 gap #2) — the planner treats all users identically. A "project type" selector (sports club / commercial / hotel / franchise) would adjust CAPEX defaults, grant eligibility, and entity guidance.
1. **Market research**: Survey local players, visit competing facilities, analyze demographics within a 1520 minute drive radius. Talk to padel coaches and club organizers.
2. **Concept development**: Define your number of courts, target audience, service level, and ancillary revenue streams.
3. **Location scouting**: Identify 35 candidate sites. Evaluate each on accessibility, visibility, size, ceiling height (if conversion), zoning, and cost.
@@ -64,6 +109,8 @@
### Phase 2: Planning & Design (Month 36)
> **COVERAGE: PARTIAL** — Detailed financial model (step 9) and financing (step 10) are fully answered by the planner (DSCR, covenants, sensitivity). Court supplier selection (step 8) has a partial answer: a supplier directory exists in the product but there is no evaluation framework (Tier 2 gap #8: origin, price/court, warranty, glass type, installation, lead time). Permit process (step 11) is a gap (Tier 2 gap #7). Site security and architect hiring are operational advice, out of scope.
6. **Secure the site**: Sign a letter of intent or option agreement for purchase or lease.
7. **Hire an architect** experienced in sports facilities. They'll produce floor plans, elevations, structural assessments (for conversions), and MEP (mechanical, electrical, plumbing) layouts.
8. **Padel court supplier selection**: Get quotes from manufacturers (e.g., Mondo, Padelcreations, MejorSet). Courts come as prefabricated modules — coordinate dimensions, drainage, lighting, and glass specifications with your architect.
@@ -73,6 +120,8 @@
### Phase 3: Construction / Conversion (Month 612)
> **COVERAGE: PARTIAL** — Booking system (step 15) is partially addressed: booking system cost is a planner input, but there is no guidance on platform selection (Playtomic vs Matchi vs custom) despite this being a real decision with revenue and data implications (Tier 2 gap #5). Construction, installation, fit-out, and inspections are operational steps outside Padelnomics' scope.
12. **Tender and contract construction**: Either a general contractor or construction management approach. Key trades: structural/civil, flooring, HVAC (critical for indoor comfort), electrical (LED court lighting to specific lux standards), plumbing.
13. **Install padel courts**: Usually done after the building shell is complete. Courts take 24 weeks to install per batch.
14. **Fit-out ancillary areas**: Reception, changing rooms, lounge/bar, pro shop.
@@ -81,6 +130,8 @@
### Phase 4: Pre-Opening (Month 1013)
> **COVERAGE: PARTIAL** — Staffing plan (step 17): the BP export has a `staffing_plan` narrative field, but there is no structured template with role × FTE × salary defaults. Research benchmarks (€9.914.2K/month for 23 FTE + manager) could pre-fill this based on court count (Tier 2 gap #9). Marketing playbook (step 18): not addressed; could be a static article (Tier 3 gap #11). Soft/grand opening: out of scope.
17. **Hire staff**: Manager, reception, coaches, cleaning, potentially F&B staff.
18. **Marketing launch**: Social media, local partnerships (sports clubs, corporate wellness), opening event, introductory pricing.
19. **Soft opening**: Invite local players, influencers, press for a trial period.
@@ -88,6 +139,8 @@
### Phase 5: Operations & Optimization (Ongoing)
> **COVERAGE: PARTIAL** — Utilization monitoring and financial review are covered by the planner model. Upsell streams (coaching, equipment, F&B, memberships) are all revenue line items. Community building and dynamic pricing strategy are not addressed — these are operational, not data-driven, and are out of scope.
21. **Monitor utilization** by court, time slot, and day. Adjust pricing dynamically.
22. **Build community**: Leagues, tournaments, social events, corporate bookings.
23. **Upsell**: Coaching, equipment, food/beverage, memberships.
@@ -97,6 +150,8 @@
## Plans You Need to Create
> **COVERAGE: PARTIAL** — Business Plan and Financial Plan are both fully answered (planner + BP PDF export with 15+ narrative sections). Architectural Plans, Marketing Plan, and Legal/Permit Plan are outside the product's scope. Operational Plan is partial: staffing and booking system inputs exist but lack depth (Tier 2 gaps #5, #9).
- **Business Plan** — the master document covering market analysis, concept, operations plan, management team, and financials. This is what banks and investors want to see.
- **Architectural Plans** — floor plans, cross-sections, elevations, structural drawings, MEP plans. Required for permits and construction.
- **Financial Plan** — the core of your business plan. Includes investment budget, funding plan, P&L forecast (35 years), cash flow forecast, and sensitivity analysis.
@@ -112,6 +167,8 @@
### Investment Budget (CAPEX)
> **COVERAGE: ANSWERED** — The planner covers all 15+ CAPEX line items for both lease (`rent`) and purchase (`own`) scenarios. Subsidies and grants are **not** modelled (Tier 1 gap #1): `research/padel-hall-economics.md` documents Landessportbund grants (35% for sports clubs), KfW 150 loans, and a real example of €258K → €167K net after grant (padel-court.de). A "Fördermittel" (grants) section in the BP or a callout in DE city articles would surface this.
| Item | Estimate |
|---|---|
| Building lease deposit or land | €50,000€200,000 |
@@ -131,6 +188,8 @@ Realistic midpoint for a solid 6-court hall: **~€1.21.5M**.
### Revenue Model
> **COVERAGE: ANSWERED** — Court utilization × price per hour is the core model. Real peak/off-peak prices per city are pre-filled via `planner_defaults` from Playtomic data. Ramp curve (Year 15 utilization), 6 ancillary streams, and monthly seasonal curve are all modelled.
Core driver: **court utilization × price per hour**.
- 6 courts × 15 bookable hours/day × 365 days = **32,850 court-hours/year** (theoretical max)
@@ -149,6 +208,8 @@ Core driver: **court utilization × price per hour**.
### Operating Costs (OPEX)
> **COVERAGE: PARTIAL** — All OPEX line items exist as planner inputs. The defaults are reasonable but are not visually benchmarked against market data (Tier 1 gap #4). Research benchmarks from `research/padel-hall-economics.md` §7: electricity €2.54.5K/month, staff €9.914.2K/month for 23 FTE + manager, rent €815K/month. Showing "typical range for your market" next to each OPEX input field would improve trust in the defaults.
| Cost Item | Year 1 | Year 2 | Year 3 |
|---|---|---|---|
| Rent / lease | €120k | €123k | €127k |
@@ -164,6 +225,8 @@ Core driver: **court utilization × price per hour**.
### Profitability
> **COVERAGE: ANSWERED** — EBITDA, EBITDA margin, debt service, and free cash flow after debt are all computed by the planner for all 60 months.
| Metric | Year 1 | Year 2 | Year 3 |
|---|---|---|---|
| **EBITDA** | €310k | €577k | €759k |
@@ -173,6 +236,8 @@ Core driver: **court utilization × price per hour**.
### Key Metrics to Track
> **COVERAGE: ANSWERED** — Payback period, IRR (equity + project), NPV, MOIC, DSCR per year, breakeven utilization, and revenue per available hour are all computed and displayed.
- **Payback period**: Typically 35 years for a well-run padel hall
- **ROI on equity**: If you put in €500k equity and generate €300k+ annual free cash flow by year 3, that's a 60%+ cash-on-cash return
- **Breakeven utilization**: Usually around 3540% — below which you lose money
@@ -180,12 +245,18 @@ Core driver: **court utilization × price per hour**.
### Sensitivity Analysis
> **COVERAGE: ANSWERED** — 12-step utilization sensitivity and 8-step price sensitivity are both shown as grids, each including DSCR values.
Model what happens if utilization is 10% lower than planned, if the average price drops by €5, or if construction costs overrun by 20%. This is what banks want to see — that you survive the downside.
---
## How to Decide Where to Build
> **COVERAGE: PARTIAL overall** — The product answers competition mapping (venue density, Opportunity Score) and rent/cost considerations (planner input). Missing: drive-time catchment analysis (Tier 3 gap #12 — would need isochrone API), accessibility/visibility/building suitability assessment (static checklist possible), growth trajectory (no new-development data), and regulatory environment (Tier 2 gap #7).
>
> **Tier 3 opportunity**: `location_opportunity_profile` scores thousands of GeoNames locations including zero-court towns. Only venues with existing courts get a public article. Generating pSEO pages for top-scoring zero-court locations would surface "build here" recommendations (white-space pages).
1. **Catchment area analysis**: Draw a 15-minute and 30-minute drive-time radius around candidate sites. Analyze population density, household income, age distribution (2555 is the core padel demographic), and existing sports participation rates.
2. **Competition mapping**: Map every existing padel facility within 30 minutes. Call them, check their booking systems — are courts booked out at peak? If competitors are running at 80%+ utilization, that's a strong signal of unmet demand.
@@ -208,70 +279,104 @@ Model what happens if utilization is 10% lower than planned, if the average pric
### NPV & IRR
> **COVERAGE: ANSWERED** — Both equity IRR and project IRR are computed. NPV is shown with the WACC input. Hurdle rate is a user input.
Discount your projected free cash flows at your WACC (or required return on equity if all-equity financed) to get a net present value. The IRR tells you whether the project clears your hurdle rate. For a padel hall, you'd typically want an unlevered IRR of 1525% to justify the risk of a single-asset, operationally intensive business. Compare this against alternative uses of your capital.
### WACC & Cost of Capital
> **COVERAGE: ANSWERED** — WACC is a planner input used in NPV calculations. Debt cost and equity cost are separately configurable.
If you're blending debt and equity, calculate your weighted average cost of capital properly. Bank debt for a sports facility might run 47% depending on jurisdiction and collateral. Your equity cost should reflect the illiquidity premium and operational risk — this isn't a passive real estate investment, it's an operating business. A reasonable cost of equity might be 1220%.
### Terminal Value
> **COVERAGE: ANSWERED** — Terminal value is computed as EBITDA × exit multiple at the end of the hold period. MOIC and value bridge are displayed.
If you model 5 years of explicit cash flows, you need a terminal value. You can use a perpetuity growth model (FCF year 5 × (1+g) / (WACC g)) or an exit multiple. For the exit multiple approach, think about what a buyer would pay — likely 47x EBITDA for a mature, well-run single-location padel hall, potentially higher if it's part of a multi-site rollout story.
### Lease vs. Buy
> **COVERAGE: ANSWERED** — The `own` toggle in the planner changes the entire CAPEX/OPEX structure: land purchase replaces lease deposit, mortgage replaces rent, and property appreciation is modelled in terminal value.
A critical capital allocation decision. Buying the property ties up far more capital but gives you residual asset value and eliminates landlord risk. Leasing preserves capital for operations and expansion but exposes you to rent increases and lease termination risk. Model both scenarios and compare the risk-adjusted NPV. Also consider sale-and-leaseback if you build on owned land.
### Operating Leverage
> **COVERAGE: ANSWERED** — The sensitivity grids explicitly show how a 10% utilization swing affects EBITDA and DSCR.
A padel hall has high fixed costs (rent, staff base, debt service) and relatively low variable costs. This means profitability is extremely sensitive to utilization. Model the operating leverage explicitly — a 10% swing in utilization might cause a 2530% swing in EBITDA. This is both the opportunity and the risk.
### Depreciation & Tax Shield
> **COVERAGE: GAP** — All planner calculations are pre-tax (Tier 2 gap #6). Adding a depreciation schedule and effective tax rate would materially improve the financial model for Germany: 7-year depreciation for courts/equipment, ~30% effective tax rate (15% KSt + 14% GewSt). This would require jurisdiction selection (start with Germany only). Non-trivial but the most common user geography.
Padel courts depreciate over 710 years, building fit-out over 1015 years, equipment over 35 years. The depreciation tax shield is meaningful. Interest expense on debt is also tax-deductible. Model your effective tax rate and the present value of these shields — they improve your after-tax returns materially.
### Working Capital Cycle
> **COVERAGE: ANSWERED** — Pre-opening cash burn and ramp-up period are modelled in the 60-month cash flow. Working capital reserve is a CAPEX line item.
Padel halls are generally working-capital-light (customers pay at booking or on arrival, you pay suppliers on 3060 day terms). But model the initial ramp-up period where you're carrying costs before revenue reaches steady state. The pre-opening cash burn and first 612 months of sub-breakeven operation is where most of your working capital risk sits.
### Scenario & Sensitivity Analysis
> **COVERAGE: ANSWERED** — Utilization sensitivity (12 steps) and price sensitivity (8 steps) grids are shown, both with DSCR. Bear/base/bull narrative is covered in the BP export.
Model three scenarios (bear/base/bull) varying utilization, pricing, and cost overruns simultaneously. Identify the breakeven utilization rate precisely. A Monte Carlo simulation on the key variables (utilization, average price, construction cost, ramp-up speed) gives you a probability distribution of outcomes rather than a single point estimate.
### Exit Strategy & Valuation
> **COVERAGE: ANSWERED** — Hold period, exit EBITDA multiple, terminal value, MOIC, and value bridge are all displayed in the planner.
Think about this upfront. Are you building to hold and cash-flow, or building to sell to a consolidator or franchise operator? The exit multiple depends heavily on whether you've built a transferable business (brand, systems, trained staff, long lease) or an owner-dependent operation. Multi-site operators and franchise groups trade at higher multiples (610x EBITDA) than single sites.
### Optionality Value
> **COVERAGE: GAP** — Real option value (second location, franchise, repurposing) is mentioned in the BP narrative but not quantified. Out of scope for the planner; noting as a caveat in the BP export text would be sufficient.
A successful first hall gives you the option to expand — second location, franchise model, or selling the playbook. This real option has value that a static DCF doesn't capture. Similarly, if you own the land/building, you have conversion optionality (the building could be repurposed if padel demand fades).
### Counterparty & Concentration Risk
> **COVERAGE: PARTIAL** — The planner models this implicitly (single-site, single-sport), and DSCR warnings flag over-leverage. No explicit counterparty risk section. Mentioning it in the BP risk narrative would be low-effort coverage.
You're exposed to a single landlord (lease risk), a single location (demand risk), and potentially a single sport (trend risk). A bank or sophisticated investor will flag all three. Mitigants include long lease terms with caps on escalation, diversified revenue streams (F&B, events, coaching), and contractual protections.
### Subsidies & Grants
> **COVERAGE: GAP — Tier 1 priority.** `research/padel-hall-economics.md` documents: Landessportbund grants (up to 35% CAPEX for registered sports clubs), KfW 150 low-interest loans, and a worked example: €258K gross → €167K net CAPEX after grant. The planner has no grants input. Quick wins: (a) add a "Fördermittel" accordion section to DE city articles; (b) add a grant percentage input to the planner CAPEX section (reduces total investment and boosts IRR). Note: grant eligibility depends on buyer type (Tier 1 gap #2) — sports clubs qualify, commercial operators typically do not.
Many municipalities and national sports bodies offer grants or subsidized loans for sports infrastructure. In some European countries, this can cover 1030% of CAPEX. Factor this into your funding plan — it's essentially free equity that boosts your returns.
### VAT & Tax Structuring
> **COVERAGE: GAP** — Not modelled. Germany-specific: court rental may qualify for §4 Nr. 22 UStG sports VAT exemption (0% VAT) if operated by a non-commercial entity; commercial operators pay 19% VAT on court rental. F&B is 19% (or 7% eat-in). Getting this wrong materially affects revenue net-of-VAT. Worth a callout in the legal/regulatory article (Tier 2 gap #7).
Depending on your jurisdiction, court rental may be VAT-exempt or reduced-rate (sports exemption), while F&B is standard-rated. This affects pricing strategy and cash flow. The entity structure (single GmbH, holding structure, partnership) has implications for profit extraction, liability, and eventual exit taxation. Worth getting tax advice early.
### Insurance & Business Interruption
> **COVERAGE: PARTIAL** — Insurance is a planner OPEX line item. No guidance on coverage types or BI insurance sizing. Low priority to expand.
Price in comprehensive insurance — property, liability, business interruption. A fire or structural issue that shuts you down for 3 months could be existential without BI coverage. This is a real cost that's often underestimated.
### Covenant Compliance
> **COVERAGE: ANSWERED** — DSCR is computed for each of the 5 years and shown with a warning band. LTV warnings are also displayed.
If you take bank debt, you'll likely face covenants — DSCR (debt service coverage ratio) minimums of 1.21.5x, leverage caps, possibly revenue milestones. Model your covenant headroom explicitly. Breaching a covenant in year 1 during ramp-up is a real risk if you've over-leveraged.
### Inflation Sensitivity
> **COVERAGE: ANSWERED** — The planner has separate `revenue_growth_rate` and `opex_growth_rate` inputs, allowing asymmetric inflation scenarios.
Energy costs, staff wages, and maintenance all inflate. Can you pass these through via price increases without killing utilization? Model a scenario where costs inflate at 35% but you can only raise prices by 23%.
### Residual / Liquidation Value
> **COVERAGE: PARTIAL** — Terminal/exit value is modelled (EBITDA multiple). A true liquidation scenario (courts resale, lease termination penalties, building write-off) is not separately modelled. Sufficient for the current product.
In a downside scenario, what are your assets worth? Padel courts have some resale value. Building improvements are largely sunk. If you've leased, your downside is limited to equity invested plus any personal guarantees. If you've bought property, the real estate retains value but may take time to sell. Model the liquidation scenario honestly.
---
@@ -280,24 +385,34 @@ In a downside scenario, what are your assets worth? Padel courts have some resal
### Existential Risks
> **COVERAGE: PARTIAL** — Trend/fad risk is acknowledged in the BP narrative but not quantified (Tier 3 gap #13). FIP/Playtomic data (7,187 new courts globally in 2024, +26% YoY new clubs) exists but long-term quantification is inherently speculative. Force majeure/pandemic risk is not addressed; a reserve fund input (CAPEX working capital) provides partial mitigation modelling.
- **Trend / Fad Risk**: Padel is booming now, but so did squash in the 1980s. You're locking in a 1015 year investment thesis on a sport that may plateau or decline. The key question is whether padel reaches self-sustaining critical mass in your market or stays a novelty. If utilization drops from 65% to 35% in year 5 because the hype fades, your entire model breaks. This is largely unhedgeable.
- **Force Majeure / Pandemic Risk**: COVID shut down indoor sports facilities for months. Insurance may not cover it. Having enough cash reserves or credit facilities to survive 36 months of zero revenue is prudent.
### Construction & Development Risks
> **COVERAGE: PARTIAL** — A contingency/overrun percentage is a planner CAPEX input. Delay cost (carrying costs during construction) is not explicitly modelled.
- **Construction Cost Overruns & Delays**: Sports facility builds routinely overrun by 1530%. Every month of delay is a month of carrying costs (rent, debt service, staff already hired) with zero revenue. Build a contingency buffer of 1520% of CAPEX minimum and negotiate fixed-price construction contracts where possible.
### Property & Lease Risks
> **COVERAGE: GAP** — No lease-term inputs or landlord risk guidance. The `own` toggle handles the buy scenario. A callout in the BP template about minimum lease length (15+ years, renewal options) would be useful but is low priority.
- **Landlord Risk**: If you're leasing, you're spending €500k+ fitting out someone else's building. What happens if the landlord sells, goes bankrupt, or refuses to renew? You need a long lease (15+ years), with options to renew, and ideally a step-in right or compensation clause for tenant improvements.
### Competitive Risks
> **COVERAGE: PARTIAL** — City articles show existing venue density and Opportunity Score. The planner does not model a "competitor opens nearby" scenario. A simple sensitivity scenario (utilization drop) is the best proxy available in the current model.
- **Cannibalization from New Entrants**: Your success is visible — full courts, long waitlists. This attracts competitors. Someone opens a new hall 10 minutes away, and your utilization drops from 70% to 50%. There's no real moat in padel besides location, community loyalty, and service quality. Model what happens when a competitor opens nearby in year 3.
### Operational Risks
> **COVERAGE: PARTIAL** — Court maintenance OPEX and maintenance reserve are planner inputs. F&B, staffing, and booking platform risks are not addressed. See Tier 2 gaps #5 (booking platform strategy) and #9 (staffing plan). Seasonality is fully modelled (12-month outdoor seasonal curve; monthly cash flow).
- **Key Person Dependency**: If the whole operation depends on one founder-operator or one star coach who brings all the members, that's a fragility. Illness, burnout, or departure can crater the business.
- **Staff Retention & Labor Market**: Good facility managers, coaches, and front-desk staff with a hospitality mindset are hard to find and keep. Turnover is expensive and disruptive. In tight labor markets, wage pressure can erode margins.
@@ -310,6 +425,8 @@ In a downside scenario, what are your assets worth? Padel courts have some resal
### Financial Risks
> **COVERAGE: PARTIAL** — Energy volatility: energy OPEX is a modelled input with growth rate, but no locking/hedging guidance. Financing environment: debt rate is a planner input; stress-test at +2% is covered by the sensitivity grid indirectly. Personal guarantee and customer concentration: not addressed (out of scope for data-driven product). Inflation pass-through: answered (separate revenue vs OPEX growth rates).
- **Energy Price Volatility**: Indoor padel halls consume significant energy. Energy costs spiking can destroy margins. Consider locking in energy contracts, investing in solar panels, or using LED lighting and efficient HVAC to reduce exposure.
- **Financing Environment**: If interest rates rise between when you plan the project and when you draw down the loan, your debt service costs increase. Lock in rates where possible, or stress-test your model at rates 2% higher than current.
@@ -322,22 +439,32 @@ In a downside scenario, what are your assets worth? Padel courts have some resal
### Regulatory & Legal Risks
> **COVERAGE: GAP — Tier 2 priority.** Noise complaints (TA Lärm), injury liability, and permit risks are all unaddressed. A Germany-first regulatory checklist article would cover: Bauantrag, Nutzungsänderung, TA Lärm compliance, GmbH vs UG formation, Gewerbeerlaubnis, §4 Nr. 22 UStG sports VAT, and Gaststättengesetz (liquor license). High value for Phase 1/2 users who are evaluating feasibility.
- **Noise Complaints**: Padel is loud — the ball hitting glass walls generates significant noise. Neighbors can complain and municipal authorities can impose operating hour restrictions or require expensive sound mitigation. Check local noise ordinances thoroughly before committing.
- **Injury Liability**: Padel involves glass walls, fast-moving balls, and quick lateral movement. Player injuries happen. Proper insurance, waiver systems, and court maintenance protocols are essential.
### Technology & Platform Risks
> **COVERAGE: GAP — Tier 2 priority.** Booking platform dependency is a real decision point for operators (Playtomic commission ~1520%, data ownership implications, competitor steering risk). We scrape Playtomic and know it intimately. A standalone article "Playtomic vs Matchi vs eigenes System" or a section in the BP template would address this. The booking system commission rate is already a planner input — we could link to a decision guide from there.
- **Booking Platform Dependency**: If you rely on a third-party booking platform like Playtomic, you're giving them access to your customer relationships and paying commission. They could raise fees, change terms, or steer demand to competitors.
### Reputational Risks
> **COVERAGE: GAP** — Not addressed. Out of scope for a data-driven product; operational advice.
- **Brand / Reputation Risk**: One viral negative review, a hygiene issue, a safety incident, or a social media complaint can disproportionately hurt a local leisure business.
### Currency & External Risks
> **COVERAGE: GAP** — FX risk from Spanish/Italian manufacturers is not modelled. Minor; most German buyers pay in EUR. Note in BP template as a caveat if importing outside Eurozone.
- **Currency Risk**: Relevant if importing courts or equipment from another currency zone — padel court manufacturers are often Spanish or Italian, so FX moves can affect CAPEX if you're outside the Eurozone.
### Opportunity Cost
> **COVERAGE: PARTIAL** — IRR and NPV implicitly address opportunity cost (you enter the hurdle rate as WACC/cost of equity). No explicit comparison against passive investment alternatives is shown. Sufficient for current product.
The capital, time, and energy you put into this project could go elsewhere. If you could earn 810% passively in diversified investments, a padel hall needs to deliver meaningfully more on a risk-adjusted basis to justify the concentration, illiquidity, and personal time commitment.

View File

@@ -1,81 +0,0 @@
"""Download NUTS-2 boundary GeoJSON from Eurostat GISCO.
One-time (or on NUTS revision) download of NUTS-2 boundary polygons used for
spatial income resolution in dim_locations. Stored uncompressed because DuckDB's
ST_Read function cannot read gzipped files.
NUTS classification changes approximately every 7 years. Current revision: 2021.
Output: {LANDING_DIR}/gisco/2024/01/nuts2_boundaries.geojson (~5MB, uncompressed)
Usage:
uv run python scripts/download_gisco_nuts.py [--landing-dir data/landing]
Idempotent: skips download if the file already exists.
"""
import argparse
import sys
from pathlib import Path
import niquests
# NUTS 2021 revision, 20M scale (1:20,000,000), WGS84 (EPSG:4326), LEVL_2 only.
# 20M resolution gives simplified polygons that are fast for point-in-polygon
# matching without sacrificing accuracy at the NUTS-2 boundary level.
GISCO_URL = (
"https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/"
"NUTS_RG_20M_2021_4326_LEVL_2.geojson"
)
# Fixed partition: NUTS boundaries are a static reference file, not time-series data.
# Use the NUTS revision year as the partition to make the source version explicit.
DEST_REL_PATH = "gisco/2024/01/nuts2_boundaries.geojson"
HTTP_TIMEOUT_SECONDS = 120
def download_nuts_boundaries(landing_dir: Path) -> None:
dest = landing_dir / DEST_REL_PATH
if dest.exists():
print(f"Already exists (skipping): {dest}")
return
dest.parent.mkdir(parents=True, exist_ok=True)
print(f"Downloading NUTS-2 boundaries from GISCO...")
print(f" URL: {GISCO_URL}")
with niquests.Session() as session:
resp = session.get(GISCO_URL, timeout=HTTP_TIMEOUT_SECONDS)
resp.raise_for_status()
content = resp.content
assert len(content) > 100_000, (
f"GeoJSON too small ({len(content)} bytes) — download may have failed"
)
assert b'"FeatureCollection"' in content, "Response does not look like GeoJSON"
# Write uncompressed — ST_Read requires a plain file
tmp = dest.with_suffix(".geojson.tmp")
tmp.write_bytes(content)
tmp.rename(dest)
size_mb = len(content) / 1_000_000
print(f" Written: {dest} ({size_mb:.1f} MB)")
print("Done. Run SQLMesh plan to rebuild stg_nuts2_boundaries.")
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--landing-dir", default="data/landing", type=Path)
args = parser.parse_args()
if not args.landing_dir.is_dir():
print(f"Error: landing dir does not exist: {args.landing_dir}", file=sys.stderr)
sys.exit(1)
download_nuts_boundaries(args.landing_dir)
if __name__ == "__main__":
main()

View File

@@ -6,7 +6,9 @@ Operational visibility for the data extraction and transformation pipeline:
/admin/pipeline/overview → HTMX tab: extraction status, serving freshness, landing stats
/admin/pipeline/extractions → HTMX tab: filterable extraction run history
/admin/pipeline/extractions/<id>/mark-stale → POST: mark stuck "running" row as failed
/admin/pipeline/extract/trigger → POST: enqueue full extraction run
/admin/pipeline/extract/trigger → POST: enqueue extraction run (HTMX-aware)
/admin/pipeline/transform → HTMX tab: SQLMesh + export status, run history
/admin/pipeline/transform/trigger → POST: enqueue transform/export/pipeline step
/admin/pipeline/catalog → HTMX tab: data catalog (tables, columns, sample data)
/admin/pipeline/catalog/<table> → HTMX partial: table detail (columns + sample)
/admin/pipeline/query → HTMX tab: SQL query editor
@@ -18,6 +20,7 @@ Data sources:
- analytics.duckdb (DuckDB read-only via analytics.execute_user_query)
- LANDING_DIR/ (filesystem scan for file sizes + dates)
- infra/supervisor/workflows.toml (schedule definitions — tomllib, stdlib)
- app.db tasks table (run_transform, run_export, run_pipeline task rows)
"""
import asyncio
import json
@@ -49,7 +52,7 @@ _LANDING_DIR = os.environ.get("LANDING_DIR", "data/landing")
_SERVING_DUCKDB_PATH = os.environ.get("SERVING_DUCKDB_PATH", "data/analytics.duckdb")
# Repo root: web/src/padelnomics/admin/ → up 4 levels
_REPO_ROOT = Path(__file__).resolve().parents[5]
_REPO_ROOT = Path(__file__).resolve().parents[4]
_WORKFLOWS_TOML = _REPO_ROOT / "infra" / "supervisor" / "workflows.toml"
# A "running" row older than this is considered stale/crashed.
@@ -626,10 +629,8 @@ async def pipeline_dashboard():
# ── Overview tab ─────────────────────────────────────────────────────────────
@bp.route("/overview")
@role_required("admin")
async def pipeline_overview():
"""HTMX tab: extraction status per source, serving freshness, landing zone."""
async def _render_overview_partial():
"""Build and render the pipeline overview partial (shared by GET and POST triggers)."""
latest_runs, landing_stats, workflows, serving_meta = await asyncio.gather(
asyncio.to_thread(_fetch_latest_per_extractor_sync),
asyncio.to_thread(_get_landing_zone_stats_sync),
@@ -650,6 +651,13 @@ async def pipeline_overview():
"stale": _is_stale(run) if run else False,
})
# Treat pending extraction tasks as "running" (queued or active).
from ..core import fetch_all as _fetch_all # noqa: PLC0415
pending_extraction = await _fetch_all(
"SELECT id FROM tasks WHERE task_name = 'run_extraction' AND status = 'pending' LIMIT 1"
)
any_running = bool(pending_extraction)
# Compute landing zone totals
total_landing_bytes = sum(s["total_bytes"] for s in landing_stats)
@@ -677,10 +685,18 @@ async def pipeline_overview():
total_landing_bytes=total_landing_bytes,
serving_tables=serving_tables,
last_export=last_export,
any_running=any_running,
format_bytes=_format_bytes,
)
@bp.route("/overview")
@role_required("admin")
async def pipeline_overview():
"""HTMX tab: extraction status per source, serving freshness, landing zone."""
return await _render_overview_partial()
# ── Extractions tab ────────────────────────────────────────────────────────────
@@ -745,7 +761,11 @@ async def pipeline_mark_stale(run_id: int):
@role_required("admin")
@csrf_protect
async def pipeline_trigger_extract():
"""Enqueue an extraction run — all extractors, or a single named one."""
"""Enqueue an extraction run — all extractors, or a single named one.
HTMX-aware: if the HX-Request header is present, returns the overview partial
directly so the UI can update in-place without a redirect.
"""
from ..worker import enqueue
form = await request.form
@@ -757,11 +777,15 @@ async def pipeline_trigger_extract():
await flash(f"Unknown extractor '{extractor}'.", "warning")
return redirect(url_for("pipeline.pipeline_dashboard"))
await enqueue("run_extraction", {"extractor": extractor})
await flash(f"Extractor '{extractor}' queued. Check the task queue for progress.", "success")
else:
await enqueue("run_extraction")
await flash("Extraction run queued. Check the task queue for progress.", "success")
is_htmx = request.headers.get("HX-Request") == "true"
if is_htmx:
return await _render_overview_partial()
msg = f"Extractor '{extractor}' queued." if extractor else "Extraction run queued."
await flash(f"{msg} Check the task queue for progress.", "success")
return redirect(url_for("pipeline.pipeline_dashboard"))
@@ -847,6 +871,156 @@ async def pipeline_lineage_schema(model: str):
)
# ── Transform tab ─────────────────────────────────────────────────────────────
_TRANSFORM_TASK_NAMES = ("run_transform", "run_export", "run_pipeline")
async def _fetch_pipeline_tasks() -> dict:
"""Fetch the latest task row for each transform task type, plus recent run history.
Returns:
{
"latest": {"run_transform": row|None, "run_export": row|None, "run_pipeline": row|None},
"history": [row, ...], # last 20 rows across all three task types, newest first
}
"""
from ..core import fetch_all as _fetch_all # noqa: PLC0415
# Latest row per task type (may be pending, complete, or failed)
latest_rows = await _fetch_all(
"""
SELECT t.*
FROM tasks t
INNER JOIN (
SELECT task_name, MAX(id) AS max_id
FROM tasks
WHERE task_name IN ('run_transform', 'run_export', 'run_pipeline')
GROUP BY task_name
) latest ON t.id = latest.max_id
"""
)
latest: dict = {"run_transform": None, "run_export": None, "run_pipeline": None}
for row in latest_rows:
latest[row["task_name"]] = dict(row)
history = await _fetch_all(
"""
SELECT id, task_name, status, created_at, completed_at, error
FROM tasks
WHERE task_name IN ('run_transform', 'run_export', 'run_pipeline')
ORDER BY id DESC
LIMIT 20
"""
)
return {"latest": latest, "history": [dict(r) for r in history]}
def _format_duration(created_at: str | None, completed_at: str | None) -> str:
"""Human-readable duration between created_at and completed_at, or '' if unavailable."""
if not created_at or not completed_at:
return ""
try:
fmt = "%Y-%m-%d %H:%M:%S"
start = datetime.strptime(created_at, fmt)
end = datetime.strptime(completed_at, fmt)
delta = int((end - start).total_seconds())
if delta < 0:
return ""
if delta < 60:
return f"{delta}s"
return f"{delta // 60}m {delta % 60}s"
except ValueError:
return ""
async def _render_transform_partial():
"""Build and render the transform tab partial."""
task_data = await _fetch_pipeline_tasks()
latest = task_data["latest"]
history = task_data["history"]
# Enrich history rows with duration
for row in history:
row["duration"] = _format_duration(row.get("created_at"), row.get("completed_at"))
# Truncate error for display
if row.get("error"):
row["error_short"] = row["error"][:120]
else:
row["error_short"] = None
any_running = any(
t is not None and t["status"] == "pending" for t in latest.values()
)
serving_meta = await asyncio.to_thread(_load_serving_meta)
return await render_template(
"admin/partials/pipeline_transform.html",
latest=latest,
history=history,
any_running=any_running,
serving_meta=serving_meta,
format_duration=_format_duration,
)
@bp.route("/transform")
@role_required("admin")
async def pipeline_transform():
"""HTMX tab: SQLMesh transform + export status, run history."""
return await _render_transform_partial()
@bp.route("/transform/trigger", methods=["POST"])
@role_required("admin")
@csrf_protect
async def pipeline_trigger_transform():
"""Enqueue a transform, export, or full pipeline task.
form field `step`: 'transform' | 'export' | 'pipeline'
Concurrency guard: rejects if the same task type is already pending.
HTMX-aware: returns the transform partial for HTMX requests.
"""
from ..core import fetch_one as _fetch_one # noqa: PLC0415
from ..worker import enqueue
form = await request.form
step = (form.get("step") or "").strip()
step_to_task = {
"transform": "run_transform",
"export": "run_export",
"pipeline": "run_pipeline",
}
if step not in step_to_task:
await flash(f"Unknown step '{step}'.", "warning")
return redirect(url_for("pipeline.pipeline_dashboard"))
task_name = step_to_task[step]
# Concurrency guard: reject if same task type is already pending
existing = await _fetch_one(
"SELECT id FROM tasks WHERE task_name = ? AND status = 'pending' LIMIT 1",
(task_name,),
)
if existing:
is_htmx = request.headers.get("HX-Request") == "true"
if is_htmx:
return await _render_transform_partial()
await flash(f"A '{step}' task is already queued (task #{existing['id']}).", "warning")
return redirect(url_for("pipeline.pipeline_dashboard"))
await enqueue(task_name)
is_htmx = request.headers.get("HX-Request") == "true"
if is_htmx:
return await _render_transform_partial()
await flash(f"'{step}' task queued. Check the task queue for progress.", "success")
return redirect(url_for("pipeline.pipeline_dashboard"))
# ── Catalog tab ───────────────────────────────────────────────────────────────

View File

@@ -169,7 +169,6 @@ async def pseo_generate_gaps(slug: str):
"template_slug": slug,
"start_date": date.today().isoformat(),
"articles_per_day": 500,
"limit": 500,
})
await flash(
f"Queued generation for {len(gaps)} missing articles in '{config['name']}'.",

View File

@@ -1865,7 +1865,7 @@ async def template_preview(slug: str, row_key: str):
@csrf_protect
async def template_generate(slug: str):
"""Generate articles from template + DuckDB data."""
from ..content import fetch_template_data, load_template
from ..content import count_template_data, load_template
try:
config = load_template(slug)
@@ -1873,8 +1873,7 @@ async def template_generate(slug: str):
await flash("Template not found.", "error")
return redirect(url_for("admin.templates"))
data_rows = await fetch_template_data(config["data_table"], limit=501)
row_count = len(data_rows)
row_count = await count_template_data(config["data_table"])
if request.method == "POST":
form = await request.form
@@ -1888,7 +1887,6 @@ async def template_generate(slug: str):
"template_slug": slug,
"start_date": start_date.isoformat(),
"articles_per_day": articles_per_day,
"limit": 500,
})
await flash(
f"Article generation queued for '{config['name']}'. "
@@ -1923,7 +1921,6 @@ async def template_regenerate(slug: str):
"template_slug": slug,
"start_date": date.today().isoformat(),
"articles_per_day": 500,
"limit": 500,
})
await flash("Regeneration queued. The worker will process it in the background.", "success")
return redirect(url_for("admin.template_detail", slug=slug))
@@ -2729,7 +2726,6 @@ async def rebuild_all():
"template_slug": t["slug"],
"start_date": date.today().isoformat(),
"articles_per_day": 500,
"limit": 500,
})
# Manual articles still need inline rebuild
@@ -3037,6 +3033,7 @@ async def outreach():
current_search=search,
current_follow_up=follow_up,
page=page,
outreach_email=EMAIL_ADDRESSES["outreach"],
)

View File

@@ -229,7 +229,7 @@ document.addEventListener('DOMContentLoaded', function() {
<form method="post" action="{{ url_for('admin.affiliate_delete', product_id=product_id) }}" style="margin:0">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<button type="submit" class="btn-outline"
onclick="return confirm('Delete this product? This cannot be undone.')">Delete</button>
onclick="event.preventDefault(); confirmAction('Delete this product? This cannot be undone.', this.closest('form'))">Delete</button>
</form>
{% endif %}
</div>

View File

@@ -123,7 +123,7 @@ document.addEventListener('DOMContentLoaded', function() {
<form method="post" action="{{ url_for('admin.affiliate_program_delete', program_id=program_id) }}" style="margin:0">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<button type="submit" class="btn-outline"
onclick="return confirm('Delete this program? Blocked if products reference it.')">Delete</button>
onclick="event.preventDefault(); confirmAction('Delete this program? Blocked if products reference it.', this.closest('form'))">Delete</button>
</form>
{% endif %}
</div>

View File

@@ -40,8 +40,10 @@
.admin-subnav {
display: flex; align-items: stretch; padding: 0 2rem;
background: #fff; border-bottom: 1px solid #E2E8F0;
flex-shrink: 0; overflow-x: auto; gap: 0;
flex-shrink: 0; overflow-x: auto; overflow-y: hidden; gap: 0;
scrollbar-width: none;
}
.admin-subnav::-webkit-scrollbar { display: none; }
.admin-subnav a {
display: flex; align-items: center; gap: 5px;
padding: 0 1px; margin: 0 13px 0 0; height: 42px;
@@ -242,5 +244,19 @@ function confirmAction(message, form) {
document.getElementById('confirm-cancel').addEventListener('click', function() { dialog.close(); }, { once: true });
dialog.showModal();
}
// Intercept hx-confirm to use the styled dialog instead of window.confirm()
document.body.addEventListener('htmx:confirm', function(evt) {
var dialog = document.getElementById('confirm-dialog');
if (!dialog) return; // fallback: let HTMX use native confirm
evt.preventDefault();
document.getElementById('confirm-msg').textContent = evt.detail.question;
var ok = document.getElementById('confirm-ok');
var newOk = ok.cloneNode(true);
ok.replaceWith(newOk);
newOk.addEventListener('click', function() { dialog.close(); evt.detail.issueRequest(true); }, { once: true });
document.getElementById('confirm-cancel').addEventListener('click', function() { dialog.close(); }, { once: true });
dialog.showModal();
});
</script>
{% endblock %}

View File

@@ -3,6 +3,19 @@
{% block title %}Admin Dashboard - {{ config.APP_NAME }}{% endblock %}
{% block admin_head %}
<style>
.funnel-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 0.75rem;
}
@media (min-width: 768px) {
.funnel-grid { grid-template-columns: repeat(5, 1fr); }
}
</style>
{% endblock %}
{% block admin_content %}
<header class="flex justify-between items-center mb-8">
<div>
@@ -47,7 +60,7 @@
<!-- Lead Funnel -->
<p class="text-xs font-semibold text-slate uppercase tracking-wider mb-2">Lead Funnel</p>
<div style="display:grid;grid-template-columns:repeat(5,1fr);gap:0.75rem" class="mb-8">
<div class="funnel-grid mb-8">
<div class="card text-center border-l-4 border-l-electric" style="padding:0.75rem">
<p class="text-xs text-slate">Planner Users</p>
<p class="text-xl font-bold text-navy">{{ stats.planner_users }}</p>
@@ -72,7 +85,7 @@
<!-- Supplier Stats -->
<p class="text-xs font-semibold text-slate uppercase tracking-wider mb-2">Supplier Funnel</p>
<div style="display:grid;grid-template-columns:repeat(5,1fr);gap:0.75rem" class="mb-8">
<div class="funnel-grid mb-8">
<div class="card text-center border-l-4 border-l-accent" style="padding:0.75rem">
<p class="text-xs text-slate">Claimed Suppliers</p>
<p class="text-xl font-bold text-navy">{{ stats.suppliers_claimed }}</p>

View File

@@ -2,13 +2,30 @@
{% set admin_page = "outreach" %}
{% block title %}Outreach Pipeline - Admin - {{ config.APP_NAME }}{% endblock %}
{% block admin_head %}
<style>
.pipeline-status-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 0.75rem;
margin-bottom: 1.5rem;
}
@media (min-width: 640px) {
.pipeline-status-grid { grid-template-columns: repeat(3, 1fr); }
}
@media (min-width: 1024px) {
.pipeline-status-grid { grid-template-columns: repeat(6, 1fr); }
}
</style>
{% endblock %}
{% block admin_content %}
<header class="flex justify-between items-center mb-6">
<div>
<h1 class="text-2xl">Outreach</h1>
<p class="text-sm text-slate mt-1">
{{ pipeline.total }} supplier{{ 's' if pipeline.total != 1 else '' }} in pipeline
&middot; Sending domain: <span class="mono text-xs">hello.padelnomics.io</span>
&middot; Sending from: <span class="mono text-xs">{{ outreach_email }}</span>
</p>
</div>
<div class="flex gap-2">
@@ -18,7 +35,7 @@
</header>
<!-- Pipeline cards -->
<div style="display:grid;grid-template-columns:repeat(6,1fr);gap:0.75rem;margin-bottom:1.5rem">
<div class="pipeline-status-grid">
{% set status_colors = {
'prospect': '#E2E8F0',
'contacted': '#DBEAFE',

View File

@@ -24,7 +24,7 @@
<form method="post" action="{{ url_for('admin.affiliate_program_delete', program_id=prog.id) }}" style="display:inline">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<button type="submit" class="btn-outline btn-sm"
onclick="return confirm('Delete {{ prog.name }}? This is blocked if products reference it.')">Delete</button>
onclick="event.preventDefault(); confirmAction('Delete {{ prog.name }}? This is blocked if products reference it.', this.closest('form'))">Delete</button>
</form>
</td>
</tr>

View File

@@ -23,7 +23,7 @@
<form method="post" action="{{ url_for('admin.affiliate_delete', product_id=product.id) }}" style="display:inline">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<button type="submit" class="btn-outline btn-sm"
onclick="return confirm('Delete {{ product.name }}?')">Delete</button>
onclick="event.preventDefault(); confirmAction('Delete {{ product.name }}?', this.closest('form'))">Delete</button>
</form>
</td>
</tr>

View File

@@ -1,5 +1,6 @@
{% if emails %}
<div class="card">
<div style="overflow-x:auto">
<table class="table">
<thead>
<tr>
@@ -38,6 +39,7 @@
{% endfor %}
</tbody>
</table>
</div>
</div>
{% else %}
<div class="card text-center" style="padding:2rem">

View File

@@ -25,6 +25,7 @@
{% if leads %}
<div class="card">
<div style="overflow-x:auto">
<table class="table">
<thead>
<tr>
@@ -58,6 +59,7 @@
{% endfor %}
</tbody>
</table>
</div>
</div>
<!-- Pagination -->

View File

@@ -1,5 +1,6 @@
{% if suppliers %}
<div class="card">
<div style="overflow-x:auto">
<table class="table">
<thead>
<tr>
@@ -19,6 +20,7 @@
{% endfor %}
</tbody>
</table>
</div>
</div>
{% else %}
<div class="card text-center" style="padding:2rem">

View File

@@ -1,4 +1,11 @@
<!-- Pipeline Overview Tab: extraction status, serving freshness, landing zone -->
<!-- Pipeline Overview Tab: extraction status, serving freshness, landing zone
Self-polls every 5s while any extraction task is pending, stops when quiet. -->
<div id="pipeline-overview-content"
hx-get="{{ url_for('pipeline.pipeline_overview') }}"
hx-target="this"
hx-swap="outerHTML"
{% if any_running %}hx-trigger="every 5s"{% endif %}>
<!-- Extraction Status Grid -->
<div class="card mb-4">
@@ -26,12 +33,14 @@
{% if stale %}
<span class="badge-warning" style="font-size:10px;padding:1px 6px;margin-left:auto">stale</span>
{% endif %}
<form method="post" action="{{ url_for('pipeline.pipeline_trigger_extract') }}" class="m-0 ml-auto">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="extractor" value="{{ wf.name }}">
<button type="button" class="btn btn-sm" style="padding:2px 8px;font-size:11px"
onclick="confirmAction('Run {{ wf.name }} extractor?', this.closest('form'))">Run</button>
</form>
<button type="button"
class="btn btn-sm ml-auto"
style="padding:2px 8px;font-size:11px"
hx-post="{{ url_for('pipeline.pipeline_trigger_extract') }}"
hx-target="#pipeline-overview-content"
hx-swap="outerHTML"
hx-vals='{"extractor": "{{ wf.name }}", "csrf_token": "{{ csrf_token() }}"}'
hx-confirm="Run {{ wf.name }} extractor?">Run</button>
</div>
<p class="text-xs text-slate">{{ wf.schedule_label }}</p>
{% if run %}
@@ -57,7 +66,7 @@
</div>
<!-- Two-column row: Serving Freshness + Landing Zone -->
<div style="display:grid;grid-template-columns:1fr 1fr;gap:1rem">
<div class="pipeline-two-col">
<!-- Serving Freshness -->
<div class="card">
@@ -68,6 +77,7 @@
</p>
{% endif %}
{% if serving_tables %}
<div style="overflow-x:auto">
<table class="table" style="font-size:0.8125rem">
<thead>
<tr>
@@ -86,6 +96,7 @@
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<p class="text-sm text-slate">No serving tables found — run the pipeline first.</p>
{% endif %}
@@ -99,6 +110,7 @@
</span>
</p>
{% if landing_stats %}
<div style="overflow-x:auto">
<table class="table" style="font-size:0.8125rem">
<thead>
<tr>
@@ -119,6 +131,7 @@
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<p class="text-sm text-slate">
Landing zone empty or not found at <code>data/landing</code>.
@@ -127,3 +140,5 @@
</div>
</div>
</div>{# end #pipeline-overview-content #}

View File

@@ -0,0 +1,197 @@
<!-- Pipeline Transform Tab: SQLMesh + export status, run history
Self-polls every 5s while any transform/export task is pending. -->
<div id="pipeline-transform-content"
hx-get="{{ url_for('pipeline.pipeline_transform') }}"
hx-target="this"
hx-swap="outerHTML"
{% if any_running %}hx-trigger="every 5s"{% endif %}>
<!-- Status Cards: Transform + Export -->
<div class="pipeline-two-col mb-4">
<!-- SQLMesh Transform -->
{% set tx = latest['run_transform'] %}
<div class="card">
<p class="card-header">SQLMesh Transform</p>
<div class="flex items-center gap-2 mb-3">
{% if tx is none %}
<span class="status-dot pending"></span>
<span class="text-sm text-slate">Never run</span>
{% elif tx.status == 'pending' %}
<span class="status-dot running"></span>
<span class="text-sm text-slate">Running…</span>
{% elif tx.status == 'complete' %}
<span class="status-dot ok"></span>
<span class="text-sm text-slate">Complete</span>
{% else %}
<span class="status-dot failed"></span>
<span class="text-sm text-danger">Failed</span>
{% endif %}
</div>
{% if tx %}
<p class="text-xs text-slate mono">
Started: {{ (tx.created_at or '')[:19] or '—' }}
</p>
{% if tx.completed_at %}
<p class="text-xs text-slate mono">
Finished: {{ tx.completed_at[:19] }}
</p>
{% endif %}
{% if tx.status == 'failed' and tx.error %}
<details class="mt-2">
<summary class="text-xs text-danger cursor-pointer">Error</summary>
<pre class="text-xs mt-1 p-2 bg-gray-50 rounded overflow-auto" style="max-height:8rem;white-space:pre-wrap">{{ tx.error[:400] }}</pre>
</details>
{% endif %}
{% endif %}
<div class="mt-3">
<button type="button"
class="btn btn-sm"
{% if any_running %}disabled{% endif %}
hx-post="{{ url_for('pipeline.pipeline_trigger_transform') }}"
hx-target="#pipeline-transform-content"
hx-swap="outerHTML"
hx-vals='{"step": "transform", "csrf_token": "{{ csrf_token() }}"}'
hx-confirm="Run SQLMesh transform (prod --auto-apply)?">
Run Transform
</button>
</div>
</div>
<!-- Export Serving -->
{% set ex = latest['run_export'] %}
<div class="card">
<p class="card-header">Export Serving</p>
<div class="flex items-center gap-2 mb-3">
{% if ex is none %}
<span class="status-dot pending"></span>
<span class="text-sm text-slate">Never run</span>
{% elif ex.status == 'pending' %}
<span class="status-dot running"></span>
<span class="text-sm text-slate">Running…</span>
{% elif ex.status == 'complete' %}
<span class="status-dot ok"></span>
<span class="text-sm text-slate">Complete</span>
{% else %}
<span class="status-dot failed"></span>
<span class="text-sm text-danger">Failed</span>
{% endif %}
</div>
{% if ex %}
<p class="text-xs text-slate mono">
Started: {{ (ex.created_at or '')[:19] or '—' }}
</p>
{% if ex.completed_at %}
<p class="text-xs text-slate mono">
Finished: {{ ex.completed_at[:19] }}
</p>
{% endif %}
{% if serving_meta %}
<p class="text-xs text-slate mt-1">
Last export: <span class="font-semibold mono">{{ (serving_meta.exported_at_utc or '')[:19].replace('T', ' ') or '—' }}</span>
</p>
{% endif %}
{% if ex.status == 'failed' and ex.error %}
<details class="mt-2">
<summary class="text-xs text-danger cursor-pointer">Error</summary>
<pre class="text-xs mt-1 p-2 bg-gray-50 rounded overflow-auto" style="max-height:8rem;white-space:pre-wrap">{{ ex.error[:400] }}</pre>
</details>
{% endif %}
{% endif %}
<div class="mt-3">
<button type="button"
class="btn btn-sm"
{% if any_running %}disabled{% endif %}
hx-post="{{ url_for('pipeline.pipeline_trigger_transform') }}"
hx-target="#pipeline-transform-content"
hx-swap="outerHTML"
hx-vals='{"step": "export", "csrf_token": "{{ csrf_token() }}"}'
hx-confirm="Export serving tables (lakehouse → analytics.duckdb)?">
Run Export
</button>
</div>
</div>
</div>
<!-- Run Full Pipeline -->
{% set pl = latest['run_pipeline'] %}
<div class="card mb-4">
<div class="flex items-center justify-between flex-wrap gap-3">
<div>
<p class="font-semibold text-navy text-sm">Full Pipeline</p>
<p class="text-xs text-slate mt-1">Runs extract → transform → export sequentially</p>
{% if pl %}
<p class="text-xs text-slate mono mt-1">
Last: {{ (pl.created_at or '')[:19] or '—' }}
{% if pl.status == 'complete' %}<span class="badge-success ml-2">Complete</span>{% endif %}
{% if pl.status == 'pending' %}<span class="badge-warning ml-2">Running…</span>{% endif %}
{% if pl.status == 'failed' %}<span class="badge-danger ml-2">Failed</span>{% endif %}
</p>
{% endif %}
</div>
<button type="button"
class="btn btn-sm"
{% if any_running %}disabled{% endif %}
hx-post="{{ url_for('pipeline.pipeline_trigger_transform') }}"
hx-target="#pipeline-transform-content"
hx-swap="outerHTML"
hx-vals='{"step": "pipeline", "csrf_token": "{{ csrf_token() }}"}'
hx-confirm="Run full ELT pipeline (extract → transform → export)?">
Run Full Pipeline
</button>
</div>
</div>
<!-- Recent Runs -->
<div class="card">
<p class="card-header">Recent Runs</p>
{% if history %}
<div style="overflow-x:auto">
<table class="table" style="font-size:0.8125rem">
<thead>
<tr>
<th>#</th>
<th>Step</th>
<th>Started</th>
<th>Duration</th>
<th>Status</th>
<th>Error</th>
</tr>
</thead>
<tbody>
{% for row in history %}
<tr>
<td class="text-xs text-slate">{{ row.id }}</td>
<td class="mono text-xs">{{ row.task_name | replace('run_', '') }}</td>
<td class="mono text-xs text-slate">{{ (row.created_at or '')[:19] or '—' }}</td>
<td class="mono text-xs text-slate">{{ row.duration or '—' }}</td>
<td>
{% if row.status == 'complete' %}
<span class="badge-success">Complete</span>
{% elif row.status == 'failed' %}
<span class="badge-danger">Failed</span>
{% else %}
<span class="badge-warning">Running…</span>
{% endif %}
</td>
<td>
{% if row.error_short %}
<details>
<summary class="text-xs text-danger cursor-pointer">Error</summary>
<pre class="text-xs mt-1 p-2 bg-gray-50 rounded overflow-auto" style="max-width:24rem;white-space:pre-wrap">{{ row.error_short }}</pre>
</details>
{% else %}—{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<p class="text-sm text-slate">No transform runs yet.</p>
{% endif %}
</div>
</div>{# end #pipeline-transform-content #}

View File

@@ -1,5 +1,6 @@
{% if suppliers %}
<div class="card">
<div style="overflow-x:auto">
<table class="table">
<thead>
<tr>
@@ -47,6 +48,7 @@
{% endfor %}
</tbody>
</table>
</div>
</div>
{% else %}
<div class="card text-center" style="padding:2rem">

View File

@@ -4,9 +4,20 @@
{% block admin_head %}
<style>
.pipeline-stat-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 0.75rem;
}
@media (min-width: 768px) {
.pipeline-stat-grid { grid-template-columns: repeat(4, 1fr); }
}
.pipeline-tabs {
display: flex; gap: 0; border-bottom: 2px solid #E2E8F0; margin-bottom: 1.5rem;
overflow-x: auto; -webkit-overflow-scrolling: touch; scrollbar-width: none;
}
.pipeline-tabs::-webkit-scrollbar { display: none; }
.pipeline-tabs button {
padding: 0.625rem 1.25rem; font-size: 0.8125rem; font-weight: 600;
color: #64748B; background: none; border: none; cursor: pointer;
@@ -23,7 +34,19 @@
.status-dot.failed { background: #EF4444; }
.status-dot.stale { background: #D97706; }
.status-dot.running { background: #3B82F6; }
@keyframes pulse-dot { 0%,100%{opacity:1} 50%{opacity:0.4} }
.status-dot.running { animation: pulse-dot 1.5s ease-in-out infinite; }
.status-dot.pending { background: #CBD5E1; }
.pipeline-two-col {
display: grid;
grid-template-columns: 1fr;
gap: 1rem;
}
@media (min-width: 640px) {
.pipeline-two-col { grid-template-columns: 1fr 1fr; }
}
</style>
{% endblock %}
@@ -34,10 +57,11 @@
<p class="text-sm text-slate mt-1">Extraction status, data catalog, and ad-hoc query editor</p>
</div>
<div class="flex gap-2">
<form method="post" action="{{ url_for('pipeline.pipeline_trigger_extract') }}" class="m-0">
<form method="post" action="{{ url_for('pipeline.pipeline_trigger_transform') }}" class="m-0">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="step" value="pipeline">
<button type="button" class="btn btn-sm"
onclick="confirmAction('Enqueue a full extraction run? This will run all extractors in the background.', this.closest('form'))">
onclick="confirmAction('Run full ELT pipeline (extract → transform → export)? This runs in the background.', this.closest('form'))">
Run Pipeline
</button>
</form>
@@ -46,7 +70,7 @@
</header>
<!-- Health stat cards -->
<div style="display:grid;grid-template-columns:repeat(4,1fr);gap:0.75rem" class="mb-6">
<div class="pipeline-stat-grid mb-6">
<div class="card text-center" style="padding:0.875rem">
<p class="text-xs text-slate">Total Runs</p>
<p class="text-2xl font-bold text-navy metric">{{ summary.total | default(0) }}</p>
@@ -97,6 +121,10 @@
hx-get="{{ url_for('pipeline.pipeline_lineage') }}"
hx-target="#pipeline-tab-content" hx-swap="innerHTML"
hx-trigger="click">Lineage</button>
<button data-tab="transform"
hx-get="{{ url_for('pipeline.pipeline_transform') }}"
hx-target="#pipeline-tab-content" hx-swap="innerHTML"
hx-trigger="click">Transform</button>
</div>
<!-- Tab content (Overview loads on page load) -->

View File

@@ -123,17 +123,19 @@ async def get_table_columns(data_table: str) -> list[dict]:
async def fetch_template_data(
data_table: str,
order_by: str | None = None,
limit: int = 500,
limit: int = 0,
) -> list[dict]:
"""Fetch all rows from a DuckDB serving table."""
"""Fetch rows from a DuckDB serving table. limit=0 means all rows."""
assert "." in data_table, "data_table must be schema-qualified"
_validate_table_name(data_table)
order_clause = f"ORDER BY {order_by} DESC" if order_by else ""
return await fetch_analytics(
f"SELECT * FROM {data_table} {order_clause} LIMIT ?",
[limit],
)
if limit:
return await fetch_analytics(
f"SELECT * FROM {data_table} {order_clause} LIMIT ?",
[limit],
)
return await fetch_analytics(f"SELECT * FROM {data_table} {order_clause}")
async def count_template_data(data_table: str) -> int:
@@ -290,7 +292,7 @@ async def generate_articles(
start_date: date,
articles_per_day: int,
*,
limit: int = 500,
limit: int = 0,
base_url: str = "https://padelnomics.io",
task_id: int | None = None,
) -> int:

View File

@@ -218,9 +218,7 @@
.nav-bar[data-navopen="true"] .nav-mobile {
display: flex;
}
.nav-mobile a,
.nav-mobile button.nav-auth-btn,
.nav-mobile a.nav-auth-btn {
.nav-mobile a:not(.nav-auth-btn) {
display: block;
padding: 0.625rem 0;
border-bottom: 1px solid #F1F5F9;
@@ -230,15 +228,18 @@
text-decoration: none;
transition: color 0.15s;
}
.nav-mobile a:last-child { border-bottom: none; }
.nav-mobile a:hover { color: #1D4ED8; }
.nav-mobile a:not(.nav-auth-btn):last-child { border-bottom: none; }
.nav-mobile a:not(.nav-auth-btn):hover { color: #1D4ED8; }
/* nav-auth-btn in mobile menu: override block style, keep button colours */
.nav-mobile a.nav-auth-btn,
.nav-mobile button.nav-auth-btn {
display: inline-flex;
margin-top: 0.5rem;
padding: 6px 16px;
border-bottom: none;
width: auto;
align-self: flex-start;
color: #fff;
}
.nav-mobile .nav-mobile-section {
font-size: 0.6875rem;
@@ -569,6 +570,270 @@
@apply px-4 pb-4 text-slate-dark;
}
/* ── Article Timeline (phase/process diagrams) ── */
.article-timeline {
display: flex;
gap: 0;
margin: 1.5rem 0 2rem;
position: relative;
overflow-x: auto;
padding-bottom: 0.5rem;
}
.article-timeline__phase {
flex: 1;
min-width: 130px;
display: flex;
flex-direction: column;
align-items: center;
position: relative;
}
/* Connecting line between phases */
.article-timeline__phase + .article-timeline__phase::before {
content: '';
position: absolute;
top: 22px;
left: calc(-50% + 22px);
right: calc(50% + 22px);
height: 2px;
background: #CBD5E1;
z-index: 0;
}
.article-timeline__phase + .article-timeline__phase::after {
content: '';
position: absolute;
top: 10px;
left: calc(-50% + 18px);
font-size: 1rem;
line-height: 1;
color: #94A3B8;
z-index: 1;
}
.article-timeline__num {
width: 44px;
height: 44px;
border-radius: 50%;
background: #0F172A;
color: #fff;
display: flex;
align-items: center;
justify-content: center;
font-size: 0.75rem;
font-weight: 700;
font-family: var(--font-display);
flex-shrink: 0;
position: relative;
z-index: 2;
}
.article-timeline__card {
margin-top: 0.75rem;
background: #F8FAFC;
border: 1px solid #E2E8F0;
border-radius: 12px;
padding: 0.75rem 0.875rem;
text-align: center;
width: 100%;
}
.article-timeline__title {
font-weight: 700;
font-size: 0.8125rem;
color: #0F172A;
line-height: 1.3;
margin-bottom: 0.25rem;
font-family: var(--font-display);
}
.article-timeline__subtitle {
font-size: 0.75rem;
color: #64748B;
margin-bottom: 0.375rem;
line-height: 1.3;
}
.article-timeline__meta {
font-size: 0.6875rem;
color: #94A3B8;
line-height: 1.4;
}
/* Mobile: vertical timeline */
@media (max-width: 600px) {
.article-timeline {
flex-direction: column;
gap: 0.75rem;
overflow-x: visible;
}
.article-timeline__phase {
flex-direction: row;
align-items: flex-start;
min-width: auto;
gap: 0.75rem;
}
.article-timeline__phase + .article-timeline__phase::before {
content: '';
position: absolute;
top: calc(-0.375rem);
left: 21px;
right: auto;
width: 2px;
height: 0.75rem;
background: #CBD5E1;
}
.article-timeline__phase + .article-timeline__phase::after {
content: '';
position: absolute;
top: calc(-0.3rem);
left: 15px;
font-size: 0.9rem;
transform: rotate(90deg);
}
.article-timeline__card {
margin-top: 0;
text-align: left;
flex: 1;
}
.article-timeline__num {
flex-shrink: 0;
}
}
/* ── Article Callout Boxes ── */
.article-callout {
display: flex;
gap: 0.875rem;
padding: 1rem 1.25rem;
border-radius: 12px;
border-left: 4px solid;
margin: 1.5rem 0;
}
.article-callout::before {
font-size: 1.1rem;
flex-shrink: 0;
line-height: 1.5;
}
.article-callout__body {
flex: 1;
}
.article-callout__title {
font-weight: 700;
font-size: 0.875rem;
margin-bottom: 0.375rem;
display: block;
}
.article-callout p {
font-size: 0.875rem;
line-height: 1.6;
margin: 0;
color: inherit;
}
.article-callout--warning {
background: #FFFBEB;
border-color: #D97706;
color: #78350F;
}
.article-callout--warning::before {
content: '⚠';
color: #D97706;
}
.article-callout--warning .article-callout__title {
color: #92400E;
}
.article-callout--tip {
background: #F0FDF4;
border-color: #16A34A;
color: #14532D;
}
.article-callout--tip::before {
content: '💡';
}
.article-callout--tip .article-callout__title {
color: #166534;
}
.article-callout--info {
background: #EFF6FF;
border-color: #1D4ED8;
color: #1E3A5F;
}
.article-callout--info::before {
content: '';
color: #1D4ED8;
}
.article-callout--info .article-callout__title {
color: #1E40AF;
}
/* ── Article Cards (2-col comparison grid) ── */
.article-cards {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 1rem;
margin: 1.5rem 0;
}
@media (max-width: 580px) {
.article-cards {
grid-template-columns: 1fr;
}
}
.article-card {
border-radius: 12px;
border: 1px solid #E2E8F0;
overflow: hidden;
background: #fff;
}
.article-card__accent {
height: 4px;
}
.article-card--success .article-card__accent { background: #16A34A; }
.article-card--failure .article-card__accent { background: #EF4444; }
.article-card--neutral .article-card__accent { background: #1D4ED8; }
.article-card--established .article-card__accent { background: #0F172A; }
.article-card--growth .article-card__accent { background: #1D4ED8; }
.article-card--emerging .article-card__accent { background: #16A34A; }
.article-card__inner {
padding: 1rem 1.125rem;
}
.article-card__title {
font-weight: 700;
font-size: 0.875rem;
color: #0F172A;
margin-bottom: 0.5rem;
font-family: var(--font-display);
display: block;
}
.article-card__body {
font-size: 0.8125rem;
color: #475569;
line-height: 1.6;
margin: 0;
}
/* ── Severity Pills (risk table badges) ── */
.severity {
display: inline-block;
padding: 0.125rem 0.5rem;
border-radius: 9999px;
font-size: 0.6875rem;
font-weight: 700;
letter-spacing: 0.03em;
white-space: nowrap;
}
.severity--high {
background: #FEE2E2;
color: #991B1B;
}
.severity--medium-high {
background: #FEF3C7;
color: #92400E;
}
.severity--medium {
background: #FEF9C3;
color: #713F12;
}
.severity--low-medium {
background: #ECFDF5;
color: #065F46;
}
.severity--low {
background: #F0FDF4;
color: #166534;
}
/* Inline HTMX loading indicator for search forms.
Opacity is handled by HTMX's built-in .htmx-indicator CSS.
This class only adds positioning and the spin animation. */

View File

@@ -735,6 +735,107 @@ async def handle_run_extraction(payload: dict) -> None:
logger.info("Extraction completed: %s", result.stdout[-300:] if result.stdout else "(no output)")
@task("run_transform")
async def handle_run_transform(payload: dict) -> None:
"""Run SQLMesh transform (prod plan --auto-apply) in the background.
Shells out to `uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod --auto-apply`.
2-hour absolute timeout — same as extraction.
"""
import subprocess
from pathlib import Path
repo_root = Path(__file__).resolve().parents[4]
result = await asyncio.to_thread(
subprocess.run,
["uv", "run", "sqlmesh", "-p", "transform/sqlmesh_padelnomics", "plan", "prod", "--auto-apply"],
capture_output=True,
text=True,
timeout=7200,
cwd=str(repo_root),
)
if result.returncode != 0:
raise RuntimeError(
f"SQLMesh transform failed (exit {result.returncode}): {result.stderr[:500]}"
)
logger.info("SQLMesh transform completed: %s", result.stdout[-300:] if result.stdout else "(no output)")
@task("run_export")
async def handle_run_export(payload: dict) -> None:
"""Export serving tables from lakehouse.duckdb → analytics.duckdb.
Shells out to `uv run python src/padelnomics/export_serving.py`.
10-minute absolute timeout.
"""
import subprocess
from pathlib import Path
repo_root = Path(__file__).resolve().parents[4]
result = await asyncio.to_thread(
subprocess.run,
["uv", "run", "python", "src/padelnomics/export_serving.py"],
capture_output=True,
text=True,
timeout=600,
cwd=str(repo_root),
)
if result.returncode != 0:
raise RuntimeError(
f"Export failed (exit {result.returncode}): {result.stderr[:500]}"
)
logger.info("Export completed: %s", result.stdout[-300:] if result.stdout else "(no output)")
@task("run_pipeline")
async def handle_run_pipeline(payload: dict) -> None:
"""Run full ELT pipeline: extract → transform → export, stopping on first failure."""
import subprocess
from pathlib import Path
repo_root = Path(__file__).resolve().parents[4]
steps = [
(
"extraction",
["uv", "run", "--package", "padelnomics_extract", "extract"],
7200,
),
(
"transform",
["uv", "run", "sqlmesh", "-p", "transform/sqlmesh_padelnomics", "plan", "prod", "--auto-apply"],
7200,
),
(
"export",
["uv", "run", "python", "src/padelnomics/export_serving.py"],
600,
),
]
for step_name, cmd, timeout_seconds in steps:
logger.info("Pipeline step starting: %s", step_name)
result = await asyncio.to_thread(
subprocess.run,
cmd,
capture_output=True,
text=True,
timeout=timeout_seconds,
cwd=str(repo_root),
)
if result.returncode != 0:
raise RuntimeError(
f"Pipeline failed at {step_name} (exit {result.returncode}): {result.stderr[:500]}"
)
logger.info(
"Pipeline step complete: %s%s",
step_name,
result.stdout[-200:] if result.stdout else "(no output)",
)
logger.info("Full pipeline complete (extract → transform → export)")
@task("generate_articles")
async def handle_generate_articles(payload: dict) -> None:
"""Generate articles from a template in the background."""
@@ -745,7 +846,7 @@ async def handle_generate_articles(payload: dict) -> None:
slug = payload["template_slug"]
start_date = date_cls.fromisoformat(payload["start_date"])
articles_per_day = payload.get("articles_per_day", 3)
limit = payload.get("limit", 500)
limit = payload.get("limit", 0)
task_id = payload.get("_task_id")
count = await generate_articles(

View File

@@ -286,24 +286,20 @@ class TestLoadProxyTiers:
assert len(tiers) == 1
assert tiers[0] == ["http://res1:8080"]
def test_three_tiers_correct_order(self, monkeypatch):
def test_two_tiers_correct_order(self, monkeypatch):
self._clear_proxy_env(monkeypatch)
with patch("padelnomics_extract.proxy.fetch_webshare_proxies", return_value=["http://user:pass@1.2.3.4:1080"]):
monkeypatch.setenv("WEBSHARE_DOWNLOAD_URL", "http://example.com/list")
monkeypatch.setenv("PROXY_URLS_DATACENTER", "http://dc1:8080")
monkeypatch.setenv("PROXY_URLS_RESIDENTIAL", "http://res1:8080")
tiers = load_proxy_tiers()
assert len(tiers) == 3
assert tiers[0] == ["http://user:pass@1.2.3.4:1080"] # free
assert tiers[1] == ["http://dc1:8080"] # datacenter
assert tiers[2] == ["http://res1:8080"] # residential
monkeypatch.setenv("PROXY_URLS_DATACENTER", "http://dc1:8080")
monkeypatch.setenv("PROXY_URLS_RESIDENTIAL", "http://res1:8080")
tiers = load_proxy_tiers()
assert len(tiers) == 2
assert tiers[0] == ["http://dc1:8080"] # datacenter (tier 1)
assert tiers[1] == ["http://res1:8080"] # residential (tier 2)
def test_webshare_fetch_failure_skips_tier(self, monkeypatch):
def test_webshare_env_var_is_ignored(self, monkeypatch):
self._clear_proxy_env(monkeypatch)
with patch("padelnomics_extract.proxy.fetch_webshare_proxies", return_value=[]):
monkeypatch.setenv("WEBSHARE_DOWNLOAD_URL", "http://example.com/list")
monkeypatch.setenv("PROXY_URLS_DATACENTER", "http://dc1:8080")
tiers = load_proxy_tiers()
monkeypatch.setenv("WEBSHARE_DOWNLOAD_URL", "http://example.com/list")
monkeypatch.setenv("PROXY_URLS_DATACENTER", "http://dc1:8080")
tiers = load_proxy_tiers()
assert len(tiers) == 1
assert tiers[0] == ["http://dc1:8080"]
@@ -500,3 +496,131 @@ class TestTieredCyclerNTier:
t.join()
assert errors == [], f"Thread safety errors: {errors}"
class TestTieredCyclerDeadProxyTracking:
"""Per-proxy dead tracking: individual proxies marked dead are skipped."""
def test_dead_proxy_skipped_in_next_proxy(self):
"""After a proxy hits the failure limit it is never returned again."""
tiers = [["http://dead", "http://live"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
# Mark http://dead as dead
cycler["record_failure"]("http://dead")
# next_proxy must always return the live one
for _ in range(6):
assert cycler["next_proxy"]() == "http://live"
def test_dead_proxy_count_increments(self):
tiers = [["http://a", "http://b", "http://c"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=2)
assert cycler["dead_proxy_count"]() == 0
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 0 # only 1 failure, limit is 2
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 1
cycler["record_failure"]("http://b")
cycler["record_failure"]("http://b")
assert cycler["dead_proxy_count"]() == 2
def test_auto_escalates_when_all_proxies_in_tier_dead(self):
"""If all proxies in the active tier are dead, next_proxy auto-escalates."""
tiers = [["http://t0a", "http://t0b"], ["http://t1"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
# Kill all proxies in tier 0
cycler["record_failure"]("http://t0a")
cycler["record_failure"]("http://t0b")
# next_proxy should transparently escalate and return tier 1 proxy
assert cycler["next_proxy"]() == "http://t1"
def test_auto_escalates_updates_active_tier_index(self):
"""Auto-escalation via dead proxies bumps active_tier_index."""
tiers = [["http://t0a", "http://t0b"], ["http://t1"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
cycler["record_failure"]("http://t0a")
cycler["record_failure"]("http://t0b")
cycler["next_proxy"]() # triggers auto-escalation
assert cycler["active_tier_index"]() == 1
def test_returns_none_when_all_tiers_exhausted_by_dead_proxies(self):
tiers = [["http://t0"], ["http://t1"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
cycler["record_failure"]("http://t0")
cycler["record_failure"]("http://t1")
assert cycler["next_proxy"]() is None
def test_record_success_resets_per_proxy_counter(self):
"""Success resets the failure count so proxy is not marked dead."""
tiers = [["http://a", "http://b"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=3)
# Two failures — not dead yet
cycler["record_failure"]("http://a")
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 0
# Success resets the counter
cycler["record_success"]("http://a")
# Two more failures — still not dead (counter was reset)
cycler["record_failure"]("http://a")
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 0
# Third failure after reset — now dead
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 1
def test_dead_proxy_stays_dead_after_success(self):
"""Once marked dead, a proxy is not revived by record_success."""
tiers = [["http://a", "http://b"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 1
cycler["record_success"]("http://a")
assert cycler["dead_proxy_count"]() == 1
# http://a is still skipped
for _ in range(6):
assert cycler["next_proxy"]() == "http://b"
def test_backward_compat_no_proxy_url(self):
"""Calling record_failure/record_success without proxy_url still works."""
tiers = [["http://t0"], ["http://t1"]]
cycler = make_tiered_cycler(tiers, threshold=2)
cycler["record_failure"]()
cycler["record_failure"]() # escalates
assert cycler["active_tier_index"]() == 1
cycler["record_success"]()
assert cycler["dead_proxy_count"]() == 0 # no per-proxy tracking happened
def test_proxy_failure_limit_zero_disables_per_proxy_tracking(self):
"""proxy_failure_limit=0 disables per-proxy dead tracking entirely."""
tiers = [["http://a", "http://b"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=0)
for _ in range(100):
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 0
def test_thread_safety_with_per_proxy_tracking(self):
"""Concurrent record_failure(proxy_url) calls don't corrupt state."""
import threading as _threading
tiers = [["http://t0a", "http://t0b", "http://t0c"], ["http://t1a"]]
cycler = make_tiered_cycler(tiers, threshold=50, proxy_failure_limit=5)
errors = []
lock = _threading.Lock()
def worker():
try:
for _ in range(30):
p = cycler["next_proxy"]()
if p is not None:
cycler["record_failure"](p)
cycler["record_success"](p)
except Exception as e:
with lock:
errors.append(e)
threads = [_threading.Thread(target=worker) for _ in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
assert errors == [], f"Thread safety errors: {errors}"