Compare commits

..

17 Commits

Author SHA1 Message Date
Deeman
0d903ec926 chore(changelog): document stale-tier circuit breaker fix
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 3s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:43:18 +01:00
Deeman
42c49e383c fix(proxy): ignore stale-tier failures in record_failure()
With parallel workers, threads that fetch a proxy just before escalation
can report failures after the tier has already changed — those failures
were silently counting against the new tier, immediately exhausting it
before it ever got tried (Rayobyte being skipped entirely in favour of
DataImpulse because 10 in-flight Webshare failures hit the threshold).

Fix: build a proxy_url → tier_idx reverse map at construction time and
skip the tier-level circuit breaker when the failing proxy belongs to an
already-escalated tier.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:43:05 +01:00
Deeman
1c0edff3e5 chore(changelog): document visual upgrades for longform articles
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:29:21 +01:00
Deeman
8a28b94ec2 merge: visual upgrades for longform articles (timeline, callouts, cards, severity pills) 2026-03-01 14:28:57 +01:00
Deeman
9b54f2d544 fix(secrets): add http:// scheme to proxy URLs in dev + prod SOPS
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 3s
PROXY_URLS_DATACENTER was missing the scheme prefix, causing SSL
handshake failures on the Rayobyte HTTP-only proxy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:28:35 +01:00
Deeman
08bd2b2989 chore(changelog): document proxy URL scheme validation fix
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:26:57 +01:00
Deeman
81a57db272 fix(proxy): skip URLs without scheme in load_proxy_tiers()
Validates each URL in PROXY_URLS_DATACENTER / PROXY_URLS_RESIDENTIAL:
logs a warning and skips any entry missing an http:// or https:// scheme
instead of passing malformed URLs that cause SSL or connection errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:26:41 +01:00
Deeman
bce6b2d340 feat(articles): visual upgrades — timeline, callouts, cards, severity pills
Add 4 reusable CSS article components and apply them across 6 cornerstone articles:

CSS (input.css):
- article-timeline: horizontal phase diagram with numbered cards, collapses to vertical on mobile
- article-callout (warning/tip/info): left-bordered callout boxes with icon and title
- article-cards: 2-col grid of accent-topped cards (success/failure/neutral/established/growth/emerging)
- severity: inline pill badges (high/medium-high/medium/low-medium/low) for risk tables

Articles updated:
- padel-hall-build-guide-en + padel-halle-bauen-de: ASCII code block → timeline HTML; 3 bold/blockquote warnings → callout boxes; success/failure patterns → 4 cards
- padel-hall-investment-risks-en + padel-halle-risiken-de: risk overview table severity → pills; personal guarantee section → callout; risk management section → 4 cards
- padel-hall-location-guide-en + padel-standort-analyse-de: market maturity paragraphs → 3 stage cards

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 14:24:11 +01:00
Deeman
f92d863781 feat(pipeline): live extraction status + Transform tab
Adds HTMX live polling to the Overview tab (stops when quiet) and a new
Transform tab for managing the SQLMesh + export steps of the ELT pipeline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 13:47:17 +01:00
Deeman
a3dd37b1be chore(changelog): document pipeline transform tab + live status feature
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 13:47:07 +01:00
Deeman
e5cbcf462e feat(pipeline): live extraction status + Transform tab
- worker: add run_transform, run_export, run_pipeline task handlers
  - run_transform: sqlmesh plan prod --auto-apply, 2h timeout
  - run_export: export_serving.py, 10min timeout
  - run_pipeline: sequential extract → transform → export, stops on first failure

- pipeline_routes: refactor overview into _render_overview_partial() helper,
  make pipeline_trigger_extract() HTMX-aware (returns partial on HX-Request),
  add _fetch_pipeline_tasks(), _format_duration() helpers,
  add pipeline_transform() + pipeline_trigger_transform() with concurrency guard

- pipeline_overview.html: wrap in self-polling div (every 5s while any_running),
  convert Run buttons to hx-post targeting #pipeline-overview-content

- pipeline.html: add pulse animation for .status-dot.running, add Transform tab
  button, rewire header "Run Pipeline" button to enqueue run_pipeline task

- pipeline_transform.html: new partial — status cards for transform + export,
  "Run Full Pipeline" card, recent runs table with duration + error details

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 13:46:11 +01:00
Deeman
169092c8ea fix(admin): make pipeline data view responsive on mobile
All checks were successful
CI / test (push) Successful in 50s
CI / tag (push) Successful in 2s
- Tab bar: add overflow-x:auto so 5 tabs scroll on narrow screens
- Overview grid: replace hardcoded 1fr 1fr with .pipeline-two-col (stacks below 640px)
- Overview tables: wrap Serving Tables + Landing Zone in overflow-x:auto divs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 13:16:58 +01:00
Deeman
6ae16f6c1f feat(proxy): per-proxy dead tracking in tiered cycler
All checks were successful
CI / test (push) Successful in 51s
CI / tag (push) Successful in 3s
2026-03-01 12:37:00 +01:00
Deeman
8b33daa4f3 feat(content): remove artificial 500-article generation cap
- fetch_template_data: default limit=0 (all rows); skip LIMIT clause when 0
- generate_articles: default limit=0
- worker handle_generate_articles: default to 0 instead of 500
- Remove "limit": 500 from all 4 enqueue payloads
- template_generate GET handler: use count_template_data() instead of fetch(limit=501) probe

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:33:58 +01:00
Deeman
a898a06575 feat(proxy): per-proxy dead tracking in tiered cycler
Add proxy_failure_limit param to make_tiered_cycler (default 3).
Individual proxies hitting the limit are marked dead and permanently
skipped. next_proxy() auto-escalates when all proxies in the active
tier are dead. Both mechanisms coexist: per-proxy dead tracking removes
broken individuals; tier-level threshold catches systemic failure.

- proxy.py: dead_proxies set + proxy_failure_counts dict in state;
  next_proxy skips dead proxies with bounded loop; record_failure/
  record_success accept optional proxy_url; dead_proxy_count() added
- playtomic_tenants.py: pass proxy_url to record_success/record_failure
- playtomic_availability.py: _worker returns (proxy_url, result);
  serial loops in extract + extract_recheck capture proxy_url
- test_supervisor.py: 11 new tests in TestTieredCyclerDeadProxyTracking

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:28:54 +01:00
Deeman
219554b7cb fix(extract): use tiered cycler in playtomic_tenants
Previously the tenants extractor flattened all proxy tiers into a single
round-robin list, bypassing the circuit breaker entirely. When the free
Webshare tier runs out of bandwidth (402), all 20 free proxies fail and
the batch crashes — the paid datacenter/residential proxies are never tried.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:13:58 +01:00
Deeman
1aedf78ec6 fix(extract): use tiered cycler in playtomic_tenants
Previously the tenants extractor flattened all proxy tiers into a single
round-robin list, bypassing the circuit breaker entirely. When the free
Webshare tier runs out of bandwidth (402), all 20 free proxies fail and
the batch crashes — the paid datacenter/residential proxies are never tried.

Changes:
- Replace make_round_robin_cycler with make_tiered_cycler (same as availability)
- Add _fetch_page_via_cycler: retries per page across tiers, records
  success/failure in cycler so circuit breaker can escalate
- Fix batch_size to BATCH_SIZE=20 constant (was len(all_proxies) ≈ 22)
- Check cycler.is_exhausted() before each batch; catch RuntimeError mid-batch
  and write partial results rather than crashing with nothing
- CIRCUIT_BREAKER_THRESHOLD from env (default 10), matching availability

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:13:50 +01:00
22 changed files with 1492 additions and 181 deletions

View File

@@ -58,7 +58,7 @@ NTFY_TOKEN=
#ENC[AES256_GCM,data:BCyQYjRnTx8yW9A=,iv:4OPCP+xzRLUJrpoFewVnbZRKnZH4sAbV76SM//2k5wU=,tag:HxwEp7VFVZUN/VjPiL/+Vw==,type:comment]
RECHECK_WINDOW_MINUTES=ENC[AES256_GCM,data:YWM=,iv:iY5+uMazLAFdwyLT7Gr7MaF1QHBIgHuoi6nF2VbSsOA=,tag:dc6AmuJdTQ55gVe16uzs6A==,type:str]
PROXY_URLS_RESIDENTIAL=ENC[AES256_GCM,data:lfmlsjXFtL+zo40SNFLiFKaZiYvE7CNH+zRwjMK5pqPfCs0TlMX+Y9e1KmzAS+y/cI69TP5sgMPRBzER0Jn7RvH0KA==,iv:jBN/4/K5L5886G4rSzxt8V8u/57tAuj3R76haltzqeU=,tag:Xe6o9eg2PodfktDqmLgVNA==,type:str]
PROXY_URLS_DATACENTER=ENC[AES256_GCM,data:X6xpxz5u8Xh3OXjkIz3UwqH847qLvY9cVWVktW5B+lqhmXAKTzoTzHds8vlRGJf5Up9Yx44XcigbvuK33ZJDSq9ovkAIbY55OK4=,iv:3hHyFD+H9HMzQ/27bPjGr59+7yWmEneUdN9XPQasCig=,tag:oBXsSuV5idB7HqNrNOruwg==,type:str]
PROXY_URLS_DATACENTER=ENC[AES256_GCM,data:Eec0X65EMsV2PD3Qvn+JjGqYaHtLupn0k99H918vmuRuAinP3rv/pwEoyKHmygazrUExg7U2PUELycyzq3lU6RIGtO+r0pRAn/n0S8RwdoZS,iv:T+bfbvULwSLRVD/hyW7rDN8tLLBf1FQkwCEbpiuBB+0=,tag:W/YHfl5U2yaA7ZOXgAFw+Q==,type:str]
WEBSHARE_DOWNLOAD_URL=ENC[AES256_GCM,data:1D9VRZ3MCXPQWfiMH8+CLcrxeYnVVcQgZDvt5kltvbSTuSHQ2hHDmZpBkTOMIBJnw4JLZ2JQKHgG4OaYDtsM2VltFPnfwaRgVI9G5PSenR3o4PeQmYO1AqWOmjn19jPxNXRhEXdupP9UT+xQNXoBJsl6RR20XOpMA5AipUHmSjD0UIKXoZLU,iv:uWUkAydac//qrOTPUThuOLKAKXK4xcZmK9qBVFwpqt4=,tag:1vYhukBW9kEuSXCLAiZZmQ==,type:str]
CIRCUIT_BREAKER_THRESHOLD=
#ENC[AES256_GCM,data:ZcX/OEbrMfKizIQYq3CYGnvzeTEX7KsmQaz2+Jj1rG5tbTy2aljQBIEkjtiwuo8NsNAD+FhIGRGVfBmKe1CAKME1MuiCbgSG,iv:4BSkeD3jZFawP09qECcqyuiWcDnCNSgbIjBATYhazq4=,tag:Ep1d2Uk700MOlWcLWaQ/ig==,type:comment]
@@ -71,7 +71,7 @@ GEONAMES_USERNAME=ENC[AES256_GCM,data:aSkVdLNrhiF6tlg=,iv:eemFGwDIv3EG/P3lVHGZj9
CENSUS_API_KEY=ENC[AES256_GCM,data:qqG971573aGq9MiHI2xLlanKKFwjfcNNoMXtm8LNbyh0rMbQN2XukQ==,iv:az2i0ldH75nHGah4DeOxaXmDbVYqmC1c77ptZqFA9BI=,tag:zoDdKj9bR7fgIDo1/dEU2g==,type:str]
sops_age__list_0__map_enc=-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBxNWNmUzVNUGdWRnE0ZFpF\nM0JQZWZ3UDdEVzlwTmIxakxOZXBkT2x2ZlNrClRtV2M3S2daSGxUZmFDSWQ2Nmh4\neU51QndFcUxlSE00RFovOVJTcDZmUUUKLS0tIDcvL3hRMDRoMWZZSXljNzA3WG5o\nMWFic21MV0krMzlIaldBTVU0ZDdlTE0K7euGQtA+9lHNws+x7TMCArZamm9att96\nL8cXoUDWe5fNI5+M1bXReqVfNwPTwZsV6j/+ZtYKybklIzWz02Ex4A==\n-----END AGE ENCRYPTED FILE-----\n
sops_age__list_0__map_recipient=age1f5002gj4s78jju45jd28kuejtcfhn5cdujz885fl7z2p9ym68pnsgky87a
sops_lastmodified=2026-02-28T15:50:46Z
sops_mac=ENC[AES256_GCM,data:HiLZTLa+p3mqa4hw+tKOK27F/bsJOy4jmDi8MHToi6S7tRfBA/TzcEzXvXUIkkwAixN73NQHvBVeRnbcEsApVpkaxH1OqnjvvyT+B3YFkTEtxczaKGWlCvbqFZNmXYsFvGR9njaWYWsTQPkRIjrroXrSrhr7uxC8F40v7ByxJKo=,iv:qj2IpzWRIh/mM1HtjjkNbyFuhtORKXslVnf/vdEC9Uw=,tag:fr9CZsL74HxRJLXn9eS0xQ==,type:str]
sops_lastmodified=2026-03-01T13:26:08Z
sops_mac=ENC[AES256_GCM,data:WmbT6tCUEoCDyKu673NQoJNzmCiilpG8yDVGl6ObxTOYleWt+1DVdPS+XUV+0Wd4bfkEhGTEfXAyy+wfoCVfYnenMuDGjXUUdsvqrOX6nnNCJ8nIntL46LfbRsbVrU6eeYGu/TaTyfouWjkk6pqlxffNSS6rrEFNZE4Q+v58+EI=,iv:TuCEmK6YJXsYISbN4mbuVbS6OvUNuhPRLstjjNkkrPk=,tag:hWLS036q7H5lMNpR6gZBVA==,type:str]
sops_unencrypted_suffix=_unencrypted
sops_version=3.12.1

View File

@@ -39,8 +39,8 @@ ALERT_WEBHOOK_URL=ENC[AES256_GCM,data:4sXQk8zklruC525J279TUUatdDJQ43qweuoPhtpI82
NTFY_TOKEN=ENC[AES256_GCM,data:YlOxhsRJ8P1y4kk6ugWm41iyRCsM6oAWjvbU9lGcD0A=,iv:JZXOvi3wTOPV9A46c7fMiqbszNCvXkOgh9i/H1hob24=,tag:8xnPimgy7sesOAnxhaXmpg==,type:str]
SUPERVISOR_GIT_PULL=ENC[AES256_GCM,data:mg==,iv:KgqMVYj12FjOzWxtA1T0r0pqCDJ6MtHzMjE+4W/W+s4=,tag:czFaOqhHG8nqrQ8AZ8QiGw==,type:str]
#ENC[AES256_GCM,data:hzAZvCWc4RTk290=,iv:RsSI4OpAOQGcFVpfXDZ6t705yWmlO0JEWwWF5uQu9As=,tag:UPqFtA2tXiSa0vzJAv8qXg==,type:comment]
PROXY_URLS_RESIDENTIAL=ENC[AES256_GCM,data:x/F0toXDc8stsUNxaepCmxq1+WuacqqPtdc+R5mxTwcAzsKxCdwt8KpBZWMvz7ku4tHDGsKD949QAX2ANXP9oCMTgW0=,iv:6G9gE9/v7GaYj8aqVTmMrpw6AcQK9yMSCAohNdAD1Ws=,tag:2Jimr1ldVSfkh8LPEwdN3w==,type:str]
PROXY_URLS_DATACENTER=ENC[AES256_GCM,data:6BfXBYmyHpgZU/kJWpZLf8eH5VowVK1n0r6GzFTNAx/OmyaaS1RZVPC1JPkPBnTwEmo0WHYRW8uiUdkABmH9F5ZqqlsAesyfW7zvU9r7yD+D7w==,iv:3CBn2qCoTueQy8xVcQqZS4E3F0qoFYnNbzTZTpJ1veo=,tag:wC3Ecl4uNTwPiT23ATvRZg==,type:str]
PROXY_URLS_RESIDENTIAL=ENC[AES256_GCM,data:vxRcXQ/8TUTCtr6hKWBD1zVF47GFSfluIHZ8q0tt8SqQOWDdDe2D7Of6boy/kG3lqlpl7TjqMGJ7fLORcr0klKCykQ==,iv:YjegXXtIXm2qr0a3ZHRHxj3L1JoGZ1iQXkVXQupGQ2E=,tag:kahoHRskXbzplZasWOeiig==,type:str]
PROXY_URLS_DATACENTER=ENC[AES256_GCM,data:23TgU6oUeO7J+MFkraALQ5/RO38DZ3ib5oYYJr7Lj3KXQSlRsgwA+bJlweI5gcUpFphnPXvmwFGiuL6AeY8LzAQ3bx46dcZa5w9LfKw2PMFt,iv:AGXwYLqWjT5VmU02qqada3PbdjfC0mLK2sPruO0uru8=,tag:Z2IS/JPOqWX+x0LZYwyArA==,type:str]
WEBSHARE_DOWNLOAD_URL=ENC[AES256_GCM,data:/N77CFf6tJWCk7HrnBOm2Q1ynx7XoblzfbzJySeCjrxqiu4r+CB90aDkaPahlQKI00DUZih3pcy7WhnjdAwI30G5kJZ3P8H8/R0tP7OBK1wPVbsJq8prQJPFOAWewsS4KWNtSURZPYSCxslcBb7DHLX6ZAjv6A5KFOjRK2N8usR9sIabrCWh,iv:G3Ropu/JGytZK/zKsNGFjjSu3Wt6fvHaAqI9RpUHvlI=,tag:fv6xuS94OR+4xfiyKrYELA==,type:str]
PROXY_CONCURRENCY=ENC[AES256_GCM,data:vdEZ,iv:+eTNQO+s/SsVDBLg1/+fneMzEEsFkuEFxo/FcVV+mWc=,tag:i/EPwi/jOoWl3xW8H0XMdw==,type:str]
RECHECK_WINDOW_MINUTES=ENC[AES256_GCM,data:L2s=,iv:fV3mCKmK5fxUmIWRePELBDAPTb8JZqasVIhnAl55kYw=,tag:XL+PO6sblz/7WqHC3dtk1w==,type:str]
@@ -58,7 +58,7 @@ sops_age__list_1__map_enc=-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb2
sops_age__list_1__map_recipient=age1wjepykv3glvsrtegu25tevg7vyn3ngpl607u3yjc9ucay04s045s796msw
sops_age__list_2__map_enc=-----BEGIN AGE ENCRYPTED FILE-----\nYWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+IFgyNTUxOSBFeHhaOURNZnRVMEwxNThu\nUjF4Q0kwUXhTUE1QSzZJbmpubnh3RnpQTmdvCjRmWWxpNkxFUmVGb3NRbnlydW5O\nWEg3ZXJQTU4vcndzS2pUQXY3Q0ttYjAKLS0tIE9IRFJ1c2ZxbGVHa2xTL0swbGN1\nTzgwMThPUDRFTWhuZHJjZUYxOTZrU00KY62qrNBCUQYxwcLMXFEnLkwncxq3BPJB\nKm4NzeHBU87XmPWVrgrKuf+PH1mxJlBsl7Hev8xBTy7l6feiZjLIvQ==\n-----END AGE ENCRYPTED FILE-----\n
sops_age__list_2__map_recipient=age1c783ym2q5x9tv7py5d28uc4k44aguudjn03g97l9nzs00dd9tsrqum8h4d
sops_lastmodified=2026-03-01T00:26:54Z
sops_mac=ENC[AES256_GCM,data:DdcABGVm9KbAcFrF0iuZlAaugsouNs7Hon2mZISaHs15/2H/Pd9FniXW3KeQ0+/NdZFQkz/h3i3bVFampcpFS1AxuOE5+1/IgWn8sKtaqPc7E9y8g6lxMnwTkUX2z+n/Q2nR8KAcO9IyE0GNjIluMWkxPWQuLzlRYDOjRN4/1e0=,iv:rm+6lXhYu6VUmrdCIrU0BRN2/ooa21Fw1ESWxr7vATg=,tag:GZmLLZf/LQaNeNNAAEg5bA==,type:str]
sops_lastmodified=2026-03-01T13:25:41Z
sops_mac=ENC[AES256_GCM,data:EL9Bgo0pWWECeHaaM1bHtkvwBgBmS3P2cX+6oahHKmLEJLI7P7fiomP7G8SdrfUyNpZaP9d4LlfwZSuCPqH6rP8jzF67oNkfXfd/xK4OW2U2TqSvouCMzlhqVQgS4HHl5EgvOI488WEIZko7KK2A1rxnpkm8C29WG9d9G64LKvw=,iv:XzsNm3CXnlC6SIef63BdddALjGustp8czHQCWOtjXBQ=,tag:zll0db6K1+M4brOpfVWnhg==,type:str]
sops_unencrypted_suffix=_unencrypted
sops_version=3.12.1

View File

@@ -6,7 +6,33 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]
### Fixed
- **Stale-tier failures no longer exhaust the next proxy tier** — with parallel workers, threads that fetched a proxy just before tier escalation reported failures after the tier changed, immediately blowing through the new tier's circuit breaker before it ever got tried (Rayobyte was skipped entirely). `record_failure(proxy_url)` now checks which tier the proxy belongs to and ignores the circuit breaker when the proxy is from an already-escalated tier.
- **Proxy URL scheme validation in `load_proxy_tiers()`** — URLs in `PROXY_URLS_DATACENTER` / `PROXY_URLS_RESIDENTIAL` that are missing an `http://` or `https://` scheme are now logged as a warning and skipped, rather than being passed through and causing SSL handshake failures or connection errors at request time. Also fixed a missing `http://` prefix in the dev `.env` `PROXY_URLS_DATACENTER` entry.
### Changed
- **Per-proxy dead tracking in tiered cycler** — `make_tiered_cycler` now accepts a `proxy_failure_limit` parameter (default 3). Individual proxies that hit the limit are marked dead and permanently skipped by `next_proxy()`. If all proxies in the active tier are dead, `next_proxy()` auto-escalates to the next tier without needing the tier-level threshold. `record_failure(proxy_url)` and `record_success(proxy_url)` accept an optional `proxy_url` argument for per-proxy tracking; callers without `proxy_url` are fully backward-compatible. New `dead_proxy_count()` callable exposed for monitoring.
- `extract/padelnomics_extract/src/padelnomics_extract/proxy.py`: added per-proxy state (`proxy_failure_counts`, `dead_proxies`), updated `next_proxy`/`record_failure`/`record_success`, added `dead_proxy_count`
- `extract/padelnomics_extract/src/padelnomics_extract/playtomic_tenants.py`: `_fetch_page_via_cycler` passes `proxy_url` to `record_success`/`record_failure`
- `extract/padelnomics_extract/src/padelnomics_extract/playtomic_availability.py`: `_worker` returns `(proxy_url, result)` tuple; serial loops in `extract` and `extract_recheck` capture `proxy_url` before passing to `record_success`/`record_failure`
- `web/tests/test_supervisor.py`: 11 new tests in `TestTieredCyclerDeadProxyTracking` covering dead proxy skipping, auto-escalation, `dead_proxy_count`, backward compat, and thread safety
### Added
- **Visual upgrades for longform articles** — 4 reusable CSS article components added to `input.css` and applied across 6 cornerstone articles (EN + DE):
- `article-timeline`: horizontal numbered phase diagram with connecting lines; collapses to vertical stack on mobile. Replaces ASCII art code blocks in build guide articles.
- `article-callout` (warning/tip/info variants): left-bordered callout box with icon, title, and body. Replaces `>` blockquotes and bold-text warnings in build and risk guides.
- `article-cards`: 2-column card grid with colored accent bars (success/failure/neutral/established/growth/emerging). Replaces sequential bold-text pattern paragraphs in build, risk, and location guides.
- `severity` pills: inline colored badge for High/Medium-High/Medium/Low-Medium/Low. Applied to risk overview tables in both risk guide articles.
- Articles updated: `padel-hall-build-guide-en`, `padel-halle-bauen-de`, `padel-hall-investment-risks-en`, `padel-halle-risiken-de`, `padel-hall-location-guide-en`, `padel-standort-analyse-de`
- **Pipeline Transform tab + live extraction status** — new "Transform" tab in the pipeline admin with status cards for SQLMesh transform and export-serving tasks, a "Run Full Pipeline" button, and a recent run history table. The Overview tab now auto-polls every 5 s while an extraction task is pending and stops automatically when quiet. Per-extractor "Run" buttons use HTMX in-place updates instead of redirects. The header "Run Pipeline" button now enqueues the full ELT pipeline (extract → transform → export) instead of extraction only. Three new worker task handlers: `run_transform` (sqlmesh plan prod --auto-apply, 2 h timeout), `run_export` (export_serving.py, 10 min timeout), `run_pipeline` (sequential, stops on first failure). Concurrency guard prevents double-enqueuing the same step.
- `web/src/padelnomics/worker.py`: `handle_run_transform`, `handle_run_export`, `handle_run_pipeline`
- `web/src/padelnomics/admin/pipeline_routes.py`: `_render_overview_partial()`, `_fetch_pipeline_tasks()`, `_format_duration()`, `pipeline_transform()`, `pipeline_trigger_transform()`; `pipeline_trigger_extract()` now HTMX-aware
- `web/src/padelnomics/admin/templates/admin/pipeline.html`: pulse animation on `.status-dot.running`, Transform tab button, rewired header button
- `web/src/padelnomics/admin/templates/admin/partials/pipeline_overview.html`: self-polling wrapper, HTMX Run buttons
- `web/src/padelnomics/admin/templates/admin/partials/pipeline_transform.html`: new file
- **Affiliate programs management** — centralised retailer config (`affiliate_programs` table) with URL template + tracking tag + commission %. Products now use a program dropdown + product identifier (e.g. ASIN) instead of manually baking full URLs. URL is assembled at redirect time via `build_affiliate_url()`, so changing a tag propagates instantly to all products. Legacy products (baked `affiliate_url`) continue to work via fallback. Amazon OneLink configured in the Associates dashboard handles geo-redirect to local marketplaces — no per-country programs needed.
- `web/src/padelnomics/migrations/versions/0027_affiliate_programs.py`: `affiliate_programs` table, nullable `program_id` + `product_identifier` columns on `affiliate_products`, seeds "Amazon" program, backfills ASINs from existing URLs
- `web/src/padelnomics/affiliate.py`: `get_all_programs()`, `get_program()`, `get_program_by_slug()`, `build_affiliate_url()`; `get_product()` JOINs program for redirect assembly; `_parse_product()` extracts `_program` sub-dict

View File

@@ -17,15 +17,48 @@ This guide walks through all five phases and 23 steps between your initial marke
## The 5 Phases at a Glance
```
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
Feasibility → Planning & → Construction → Pre- → Operations &
& Concept Design / Conversion Opening Optimization
Month 13 Month 36 Month 612 Month 1013 Ongoing
Steps 15 Steps 611 Steps 1216 Steps 1720 Steps 2123
```
<div class="article-timeline">
<div class="article-timeline__phase">
<div class="article-timeline__num">1</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Feasibility &amp; Concept</div>
<div class="article-timeline__subtitle">Market research, concept, site scouting</div>
<div class="article-timeline__meta">Month 13 · Steps 15</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">2</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Planning &amp; Design</div>
<div class="article-timeline__subtitle">Architect, permits, financing</div>
<div class="article-timeline__meta">Month 36 · Steps 611</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">3</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Construction</div>
<div class="article-timeline__subtitle">Build, courts, IT systems</div>
<div class="article-timeline__meta">Month 612 · Steps 1216</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">4</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Pre-Opening</div>
<div class="article-timeline__subtitle">Hiring, marketing, soft launch</div>
<div class="article-timeline__meta">Month 1013 · Steps 1720</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">5</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Operations</div>
<div class="article-timeline__subtitle">Revenue streams, optimization</div>
<div class="article-timeline__meta">Ongoing · Steps 2123</div>
</div>
</div>
</div>
---
@@ -105,7 +138,12 @@ Deliverables from this phase:
- **MEP design (mechanical, electrical, plumbing):** Heating, ventilation, air conditioning, electrical, drainage — typically the most expensive trade package in a sports hall conversion
- **Fire safety strategy**
> **The most expensive planning mistake in padel hall builds:** underestimating HVAC complexity and budget. Large indoor courts need precise temperature and humidity control — not just for player comfort, but for playing surface longevity and air quality. Courts installed in a poorly climate-controlled building will degrade faster and generate complaints. Budget for it properly from the start, not as a value-engineering target.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">The most expensive planning mistake in padel hall builds</span>
<p>Underestimating HVAC complexity and budget. Large indoor courts need precise temperature and humidity control — not just for player comfort, but for playing surface longevity and air quality. Courts installed in a poorly climate-controlled building will degrade faster and generate complaints. Budget for it properly from the start, not as a value-engineering target.</p>
</div>
</div>
### Step 8: Court Supplier Selection
@@ -160,7 +198,12 @@ Courts are installed after the building envelope is weathertight. This is a hard
Glass panels, artificial turf, and court metalwork must not be exposed to construction dust, moisture, and site traffic. Projects that try to accelerate schedules by installing courts before the building is properly enclosed regularly end up with surface contamination, glass damage, and voided manufacturer warranties.
> **The most common construction mistake on padel hall projects:** rushing court installation sequencing under schedule pressure. The pressure to hit an opening date is real — but installing courts into an unenclosed building is one of the most reliable ways to add cost and delay, not reduce them. Hold the sequence.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">The most common construction mistake on padel hall projects</span>
<p>Rushing court installation sequencing under schedule pressure. The pressure to hit an opening date is real — but installing courts into an unenclosed building is one of the most reliable ways to add cost and delay, not reduce them. Hold the sequence.</p>
</div>
</div>
Allow two to four weeks for court installation per batch, depending on the manufacturer's crew capacity. Build this explicitly into your master program.
@@ -174,7 +217,12 @@ Decide early: which booking platform, which point-of-sale system, and whether yo
Access control systems must be coordinated with the electrical design. Adding them in the final stages of construction is possible but costs more.
> **The most common pre-opening mistake:** the booking system isn't fully configured, tested, and working on day one. A broken booking flow, failed test payments, or a QR code that leads to an error page on opening day kills your launch momentum in a way that's difficult to recover from. Test the system end-to-end — including real bookings, real payments, and real cancellations — two to four weeks before opening.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">The most common pre-opening mistake</span>
<p>The booking system isn't fully configured, tested, and working on day one. A broken booking flow, failed test payments, or a QR code that leads to an error page on opening day kills your launch momentum in a way that's difficult to recover from. Test the system end-to-end — including real bookings, real payments, and real cancellations — two to four weeks before opening.</p>
</div>
</div>
### Step 16: Inspections and Certifications
@@ -248,13 +296,36 @@ Court bookings are your core revenue, but rarely your only opportunity:
Patterns emerge when you observe padel hall projects across a market over time.
**Projects that go over budget** almost always cut at the wrong place early — too little HVAC budget, no construction contingency, a cheap general contractor without adequate contractual protection. The savings on the way in become much larger costs on the way out.
**Projects that slip their schedule** consistently underestimate the regulatory process. Permits, noise assessments, and change-of-use applications take time that money cannot buy once you've started too late. Start conversations with authorities before you need the approvals, not when you need them.
**Projects that open weakly** started marketing too late and tested the booking system too late. An empty calendar on day one and a broken booking page create impressions that stick longer than the opening week.
**Projects that succeed long-term** treat all three phases — planning, build, and opening — with equal rigor, and invest early and consistently in community and repeat customers.
<div class="article-cards">
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projects that go over budget</span>
<p class="article-card__body">Almost always cut at the wrong place early — too little HVAC budget, no construction contingency, a cheap general contractor without adequate contractual protection. The savings on the way in become much larger costs on the way out.</p>
</div>
</div>
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projects that slip their schedule</span>
<p class="article-card__body">Consistently underestimate the regulatory process. Permits, noise assessments, and change-of-use applications take time that money cannot buy once you've started too late. Start conversations with authorities before you need the approvals.</p>
</div>
</div>
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projects that open weakly</span>
<p class="article-card__body">Started marketing too late and tested the booking system too late. An empty calendar on day one and a broken booking page create impressions that stick longer than the opening week.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projects that succeed long-term</span>
<p class="article-card__body">Treat all three phases — planning, build, and opening — with equal rigor, and invest early and consistently in community and repeat customers.</p>
</div>
</div>
</div>
Building a padel hall is complex, but it is a solved problem. The failures are nearly always the same failures. So are the successes.

View File

@@ -21,20 +21,20 @@ This article covers the 14 risks that don't get enough airtime in investor discu
| # | Risk | Category | Severity |
|---|------|----------|----------|
| 1 | Trend / fad risk | Strategic | High |
| 2 | Construction cost overruns | Construction & Development | High |
| 3 | Construction delays | Construction & Development | High |
| 4 | Landlord risk: sale, insolvency, non-renewal | Property & Lease | High |
| 5 | New competitor in your catchment | Competition | MediumHigh |
| 6 | Key-person dependency | Operations | Medium |
| 7 | Staff retention and wage pressure | Operations | Medium |
| 8 | Court surface and maintenance cycles | Operations | Medium |
| 9 | Energy price volatility | Financial | Medium |
| 10 | Interest rate risk | Financial | Medium |
| 11 | Personal guarantee exposure | Financial | High |
| 12 | Customer concentration | Financial | Medium |
| 13 | Noise complaints and regulatory restrictions | Regulatory & Legal | Medium |
| 14 | Booking platform dependency | Regulatory & Legal | LowMedium |
| 1 | Trend / fad risk | Strategic | <span class="severity severity--high">High</span> |
| 2 | Construction cost overruns | Construction & Development | <span class="severity severity--high">High</span> |
| 3 | Construction delays | Construction & Development | <span class="severity severity--high">High</span> |
| 4 | Landlord risk: sale, insolvency, non-renewal | Property & Lease | <span class="severity severity--high">High</span> |
| 5 | New competitor in your catchment | Competition | <span class="severity severity--medium-high">MediumHigh</span> |
| 6 | Key-person dependency | Operations | <span class="severity severity--medium">Medium</span> |
| 7 | Staff retention and wage pressure | Operations | <span class="severity severity--medium">Medium</span> |
| 8 | Court surface and maintenance cycles | Operations | <span class="severity severity--medium">Medium</span> |
| 9 | Energy price volatility | Financial | <span class="severity severity--medium">Medium</span> |
| 10 | Interest rate risk | Financial | <span class="severity severity--medium">Medium</span> |
| 11 | Personal guarantee exposure | Financial | <span class="severity severity--high">High</span> |
| 12 | Customer concentration | Financial | <span class="severity severity--medium">Medium</span> |
| 13 | Noise complaints and regulatory restrictions | Regulatory & Legal | <span class="severity severity--medium">Medium</span> |
| 14 | Booking platform dependency | Regulatory & Legal | <span class="severity severity--low-medium">LowMedium</span> |
---
@@ -137,9 +137,12 @@ Your costs will increase three to five percent per year. Whether you can pass th
## The Risk No One Talks About: Personal Guarantees
**This section gets skipped in almost every padel hall investment conversation. That's a serious mistake.**
Banks financing a single-asset leisure facility without corporate backing will almost universally require personal guarantees from the principal shareholders. Not as an unusual request — as standard terms for this type of deal.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">This section gets skipped in almost every padel hall investment conversation. That's a serious mistake.</span>
<p>Banks financing a single-asset leisure facility without corporate backing will almost universally require personal guarantees from the principal shareholders. Not as an unusual request — as standard terms for this type of deal.</p>
</div>
</div>
Here is what that means in practice:
@@ -180,13 +183,36 @@ Building a parallel booking capability — even a simple direct booking option
The investors who succeed long-term in padel aren't the ones who found a risk-free opportunity. There isn't one. They're the ones who went in with their eyes open.
**They modeled the bad scenarios before assuming the good ones.** A business plan that shows only the base case isn't a planning tool — it's wishful thinking. Explicit downside modeling — 40% utilization, six-month delay, new competitor in year three — is the baseline, not an optional exercise.
**They built structural buffers into the plan.** Liquid reserves covering at least six months of fixed costs. Construction contingency treated as a budget line, not a hedge. These aren't comfort margins; they're operational requirements.
**They got the contractual foundations right from the start.** Lease terms. Financing conditions. Guarantee scope. The cost of good legal and financial advice at the planning stage is trivial relative to the downside exposure it addresses.
**They planned for competition.** Not by hoping it wouldn't come, but by building a product — community, quality, service — that gives existing customers a reason to stay when someone cheaper opens nearby.
<div class="article-cards">
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Model the bad scenarios first</span>
<p class="article-card__body">A business plan showing only the base case isn't a planning tool — it's wishful thinking. Explicit downside modeling — 40% utilization, six-month delay, new competitor in year three — is the baseline, not an optional exercise.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Build structural buffers in</span>
<p class="article-card__body">Liquid reserves covering at least six months of fixed costs. Construction contingency treated as a budget line, not a hedge. These aren't comfort margins; they're operational requirements.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Get the contractual foundations right</span>
<p class="article-card__body">Lease terms. Financing conditions. Guarantee scope. The cost of good legal and financial advice at the planning stage is trivial relative to the downside exposure it addresses.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Plan for competition</span>
<p class="article-card__body">Not by hoping it won't come, but by building a product — community, quality, service — that gives existing customers a reason to stay when someone cheaper opens nearby.</p>
</div>
</div>
</div>
---

View File

@@ -148,11 +148,29 @@ The matrix also reveals where trade-offs are being made explicitly, which makes
The 8 criteria above evaluate specific sites. But before shortlisting sites, it is worth stepping back to read the stage of the overall market — because the right operational strategy differs fundamentally depending on where a city sits in its padel development cycle.
**Established markets**: Booking platforms show consistent peak-hour sell-out across most venues. Waiting lists are common. Demand is validated beyond doubt. The challenge here is elevated rent, elevated build costs, and entrenched operators who have already captured community loyalty. New entrants need a genuine differentiation angle — a superior facility specification, a better location within the city, or an F&B and coaching product that existing venues don't offer. Entry costs are high; returns, if execution is strong, are also high. Munich is the canonical German example.
**Growth markets**: Demand is clearly building — booking availability tightens at weekends, new facilities are announced regularly, and the sport is gaining local media visibility. Supply hasn't caught up, so identifiable gaps still exist in specific districts or the surrounding hinterland. The risk profile is lower than in emerging markets, but the window for securing good real estate at reasonable rent is narrowing. The premium for moving decisively goes to those who arrive before the obvious sites are taken.
**Emerging markets**: Limited current supply, a small but growing player base, and padel not yet mainstream enough to generate organic walk-in demand. Entry costs — rent especially — are lower. The constraint is that demand must be actively created rather than captured. Operators who succeed here invest in community: beginner programmes, local leagues, school partnerships, conversions from tennis clubs. The time to first profitability is longer, but the competitive position built in the first two years is often decisive for the long term.
<div class="article-cards">
<div class="article-card article-card--established">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Established markets</span>
<p class="article-card__body">Booking platforms show consistent peak-hour sell-out. Demand is validated. The challenge: elevated rent, high build costs, entrenched operators. New entrants need a genuine differentiation angle — superior spec, better location, or F&B and coaching that existing venues don't offer. Entry costs are high; returns, if execution is strong, are also high. Munich is the canonical German example.</p>
</div>
</div>
<div class="article-card article-card--growth">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Growth markets</span>
<p class="article-card__body">Demand is clearly building — booking availability tightens at weekends, new facilities are announced regularly. Supply hasn't caught up; identifiable gaps still exist. The risk profile is lower, but the window for securing good real estate at reasonable rent is narrowing. The premium goes to those who arrive before the obvious sites are taken.</p>
</div>
</div>
<div class="article-card article-card--emerging">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Emerging markets</span>
<p class="article-card__body">Limited supply, a small but growing player base, padel not yet mainstream. Entry costs — rent especially — are lower. The constraint: demand must be actively created rather than captured. Operators who succeed invest in community: beginner programmes, local leagues, school partnerships. Time to profitability is longer, but the competitive position built in the first two years is often decisive.</p>
</div>
</div>
</div>
Before committing to a site search in any city, calibrate where it sits on this spectrum. The 8-criteria framework then tells you whether a specific site works; market maturity tells you what kind of operator and strategy is required to make it work at all.

View File

@@ -17,15 +17,48 @@ Dieser Leitfaden zeigt Ihnen alle 5 Phasen und 23 Schritte, die zwischen Ihrer e
## Die 5 Phasen im Überblick
```
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
Machbarkeit → Planung & → Bau / → Voreröff- → Betrieb &
& Konzept Design Umbau nung Optimierung
Monat 13 Monat 36 Monat 612 Monat 1013 laufend
Schritte 15 Schritte 611 Schritte 1216 Schritte 1720 Schritte 2123
```
<div class="article-timeline">
<div class="article-timeline__phase">
<div class="article-timeline__num">1</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Machbarkeit &amp; Konzept</div>
<div class="article-timeline__subtitle">Marktanalyse, Konzept, Standortsuche</div>
<div class="article-timeline__meta">Monat 13 · Schritte 15</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">2</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Planung &amp; Design</div>
<div class="article-timeline__subtitle">Architekt, Genehmigungen, Finanzierung</div>
<div class="article-timeline__meta">Monat 36 · Schritte 611</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">3</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Bau / Umbau</div>
<div class="article-timeline__subtitle">Rohbau, Courts, IT-Systeme</div>
<div class="article-timeline__meta">Monat 612 · Schritte 1216</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">4</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Voreröffnung</div>
<div class="article-timeline__subtitle">Personal, Marketing, Soft Launch</div>
<div class="article-timeline__meta">Monat 1013 · Schritte 1720</div>
</div>
</div>
<div class="article-timeline__phase">
<div class="article-timeline__num">5</div>
<div class="article-timeline__card">
<div class="article-timeline__title">Betrieb &amp; Optimierung</div>
<div class="article-timeline__subtitle">Einnahmen, Community, Optimierung</div>
<div class="article-timeline__meta">laufend · Schritte 2123</div>
</div>
</div>
</div>
---
@@ -104,7 +137,12 @@ Was in dieser Phase entsteht:
- MEP-Planung (Haustechnik): Heizung, Lüftung, Klimaanlage, Elektro, Sanitär — das sind bei Sporthallen oft die kostenintensivsten Gewerke
- Brandschutzkonzept
**Häufiger Fehler in dieser Phase:** Die Haustechnik wird unterschätzt. Eine große Innenhalle braucht präzise Temperatur- und Feuchtigkeitskontrolle — für die Spielqualität, für die Langlebigkeit des Belags und für das Wohlbefinden der Spieler. Eine schlechte HVAC-Anlage ist eine Dauerbaustelle.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">Häufiger Fehler in dieser Phase</span>
<p>Die Haustechnik wird unterschätzt. Eine große Innenhalle braucht präzise Temperatur- und Feuchtigkeitskontrolle — für die Spielqualität, für die Langlebigkeit des Belags und für das Wohlbefinden der Spieler. Eine schlechte HVAC-Anlage ist eine Dauerbaustelle.</p>
</div>
</div>
### Schritt 8: Courtlieferant auswählen
@@ -155,7 +193,12 @@ Verhandeln Sie Festpreise, wo möglich. Lesen Sie die Risikoverteilung in den Ve
Courts werden nach Fertigstellung der Gebäudehülle montiert — das ist eine harte Reihenfolge, keine Empfehlung. Glaselemente dürfen nicht Feuchtigkeit, Staub und Baustellenverkehr ausgesetzt werden, bevor das Gebäude dicht ist.
**Ein häufiger und vermeidbarer Fehler:** Projekte, die unter Zeitdruck stehen, versuchen, Court-Montage vorzuziehen. Das Ergebnis sind beschädigte Oberflächen, Glasschäden, Verschmutzungen im Belag und Gewährleistungsprobleme mit dem Hersteller. Halten Sie die Reihenfolge ein — konsequent.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">Ein häufiger und vermeidbarer Fehler</span>
<p>Projekte unter Zeitdruck versuchen, die Court-Montage vorzuziehen. Das Ergebnis sind beschädigte Oberflächen, Glasschäden, Verschmutzungen im Belag und Gewährleistungsprobleme mit dem Hersteller. Halten Sie die Reihenfolge ein — konsequent.</p>
</div>
</div>
Die Montage von Courts dauert je nach Hersteller und Parallelkapazität zwei bis vier Wochen pro Charge. Planen Sie das in den Gesamtablauf ein.
@@ -169,7 +212,12 @@ Frühzeitig entscheiden: Playtomic, Matchi, ein anderes System oder eine Hybridl
Zugangskontrolle (falls gewünscht) muss mit der Elektroplanung koordiniert werden. Wer das in der letzten Bauphase ergänzen möchte, zahlt dafür.
**Der häufigste Fehler kurz vor der Eröffnung:** Am Tag der Eröffnung ist das Buchungssystem noch nicht richtig konfiguriert, Testzahlungen schlagen fehl, der QR-Code am Eingang führt auf eine Fehlerseite. Der Eröffnungsbuzz ist ein einmaliges Gut. Testen Sie das System zwei bis vier Wochen vorher vollständig — inklusive echter Buchungen, echter Zahlungen und echter Stornierungen.
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">Der häufigste Fehler kurz vor der Eröffnung</span>
<p>Am Tag der Eröffnung ist das Buchungssystem noch nicht richtig konfiguriert, Testzahlungen schlagen fehl, der QR-Code am Eingang führt auf eine Fehlerseite. Der Eröffnungsbuzz ist ein einmaliges Gut. Testen Sie das System zwei bis vier Wochen vorher vollständig — inklusive echter Buchungen, echter Zahlungen und echter Stornierungen.</p>
</div>
</div>
### Schritt 16: Abnahmen und Zertifizierungen
@@ -243,13 +291,36 @@ Die Court-Buchung ist Ihr Kernangebot — aber nicht die einzige Einnahmequelle:
Wer Dutzende Padelhallenprojekte in Europa beobachtet, sieht Muster auf beiden Seiten:
**Die Projekte, die über Budget laufen**, haben fast immer früh an der falschen Stelle gespart — zu wenig Haustechnikbudget, kein Baukostenpuffer, zu günstiger Generalunternehmer ohne ausreichende Vertragsabsicherung.
**Die Projekte, die terminlich entgleisen**, haben die behördlichen Prozesse unterschätzt. Genehmigungen, Lärmschutzgutachten, Nutzungsänderungen brauchen Zeit — und diese Zeit lässt sich nicht kaufen, sobald man zu spät damit anfängt.
**Die Projekte, die schwach starten**, haben das Marketing zu spät begonnen und das Buchungssystem zu spät getestet. Ein leerer Kalender am Eröffnungstag und eine kaputte Buchungsseite erzeugen Eindrücke, die sich festsetzen.
**Die Projekte, die langfristig erfolgreich sind**, haben alle drei Phasen — Planung, Bau, Eröffnung — mit derselben Sorgfalt behandelt und früh in Community und Stammkundschaft investiert.
<div class="article-cards">
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projekte, die über Budget laufen</span>
<p class="article-card__body">Haben fast immer früh an der falschen Stelle gespart — zu wenig Haustechnikbudget, kein Baukostenpuffer, zu günstiger Generalunternehmer ohne ausreichende Vertragsabsicherung.</p>
</div>
</div>
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projekte, die terminlich entgleisen</span>
<p class="article-card__body">Haben die behördlichen Prozesse unterschätzt. Genehmigungen, Lärmschutzgutachten, Nutzungsänderungen brauchen Zeit — und diese Zeit lässt sich nicht kaufen, sobald man zu spät damit anfängt.</p>
</div>
</div>
<div class="article-card article-card--failure">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projekte, die schwach starten</span>
<p class="article-card__body">Haben das Marketing zu spät begonnen und das Buchungssystem zu spät getestet. Ein leerer Kalender am Eröffnungstag und eine kaputte Buchungsseite erzeugen Eindrücke, die sich festsetzen.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Projekte, die langfristig erfolgreich sind</span>
<p class="article-card__body">Behandeln alle drei Phasen — Planung, Bau, Eröffnung — mit derselben Sorgfalt und investieren früh in Community und Stammkundschaft.</p>
</div>
</div>
</div>
Eine Padelhalle zu bauen ist komplex — aber kein ungelöstes Problem. Die Fehler, die Projekte scheitern lassen, sind fast immer dieselben. Genauso wie die Entscheidungen, die sie gelingen lassen.

View File

@@ -21,20 +21,20 @@ Dieser Artikel zeigt Ihnen die 14 Risiken, über die in Investorenrunden zu weni
| # | Risiko | Kategorie | Schwere |
|---|--------|-----------|---------|
| 1 | Trend-/Modeerscheinung | Strategisch | Hoch |
| 2 | Baukostenüberschreitungen | Bau & Entwicklung | Hoch |
| 3 | Verzögerungen während des Baus | Bau & Entwicklung | Hoch |
| 4 | Vermieterproblem: Verkauf, Insolvenz, keine Verlängerung | Immobilie & Mietvertrag | Hoch |
| 5 | Neue Konkurrenz im Einzugsgebiet | Wettbewerb | MittelHoch |
| 6 | Schlüsselpersonen-Abhängigkeit | Betrieb | Mittel |
| 7 | Fachkräftemangel und Lohndruck | Betrieb | Mittel |
| 8 | Instandhaltungszyklen für Belag, Glas, Kunstrasen | Betrieb | Mittel |
| 9 | Energiepreisvolatilität | Finanzen | Mittel |
| 10 | Zinsänderungsrisiko | Finanzen | Mittel |
| 11 | Persönliche Bürgschaft | Finanzen | Hoch |
| 12 | Kundenkonzentration | Finanzen | Mittel |
| 13 | Lärmbeschwerden und behördliche Auflagen | Regulatorisch & Rechtlich | Mittel |
| 14 | Buchungsplattform-Abhängigkeit | Regulatorisch & Rechtlich | NiedrigMittel |
| 1 | Trend-/Modeerscheinung | Strategisch | <span class="severity severity--high">Hoch</span> |
| 2 | Baukostenüberschreitungen | Bau & Entwicklung | <span class="severity severity--high">Hoch</span> |
| 3 | Verzögerungen während des Baus | Bau & Entwicklung | <span class="severity severity--high">Hoch</span> |
| 4 | Vermieterproblem: Verkauf, Insolvenz, keine Verlängerung | Immobilie & Mietvertrag | <span class="severity severity--high">Hoch</span> |
| 5 | Neue Konkurrenz im Einzugsgebiet | Wettbewerb | <span class="severity severity--medium-high">MittelHoch</span> |
| 6 | Schlüsselpersonen-Abhängigkeit | Betrieb | <span class="severity severity--medium">Mittel</span> |
| 7 | Fachkräftemangel und Lohndruck | Betrieb | <span class="severity severity--medium">Mittel</span> |
| 8 | Instandhaltungszyklen für Belag, Glas, Kunstrasen | Betrieb | <span class="severity severity--medium">Mittel</span> |
| 9 | Energiepreisvolatilität | Finanzen | <span class="severity severity--medium">Mittel</span> |
| 10 | Zinsänderungsrisiko | Finanzen | <span class="severity severity--medium">Mittel</span> |
| 11 | Persönliche Bürgschaft | Finanzen | <span class="severity severity--high">Hoch</span> |
| 12 | Kundenkonzentration | Finanzen | <span class="severity severity--medium">Mittel</span> |
| 13 | Lärmbeschwerden und behördliche Auflagen | Regulatorisch & Rechtlich | <span class="severity severity--medium">Mittel</span> |
| 14 | Buchungsplattform-Abhängigkeit | Regulatorisch & Rechtlich | <span class="severity severity--low-medium">NiedrigMittel</span> |
---
@@ -133,9 +133,14 @@ Ihre Kosten steigen jedes Jahr um drei bis fünf Prozent. Können Sie diese Stei
## Sonderbox: Persönliche Bürgschaft — das unterschätzte Risiko Nr. 1
**Dieses Thema wird in fast jedem Gespräch über Padelhallen-Investitionen ausgelassen. Das ist ein Fehler.**
<div class="article-callout article-callout--warning">
<div class="article-callout__body">
<span class="article-callout__title">Dieses Thema wird in fast jedem Gespräch über Padelhallen-Investitionen ausgelassen. Das ist ein Fehler.</span>
<p>Banken, die einer Einzelanlage ohne Konzernrückhalt Kapital bereitstellen, verlangen in der Praxis fast immer eine persönliche Bürgschaft des oder der Hauptgesellschafter.</p>
</div>
</div>
Banken, die einer Einzelanlage ohne Konzernrückhalt Kapital bereitstellen, verlangen in der Praxis fast immer eine persönliche Bürgschaft des oder der Hauptgesellschafter. Das bedeutet: Wenn das Unternehmen in Zahlungsschwierigkeiten gerät, haftet nicht die GmbH allein — Sie haften persönlich. Mit dem Eigenheim. Mit dem Ersparten. Mit dem Depot.
Das bedeutet: Wenn das Unternehmen in Zahlungsschwierigkeiten gerät, haftet nicht die GmbH allein — Sie haften persönlich. Mit dem Eigenheim. Mit dem Ersparten. Mit dem Depot.
Die Struktur sieht dann typischerweise so aus:
@@ -176,13 +181,36 @@ Mittel- bis langfristig sollten Sie eine eigene Buchungsfähigkeit aufbauen —
Niemand kann alle Risiken eliminieren. Aber die Investoren, die langfristig erfolgreich sind, tun Folgendes:
**Sie rechnen mit den schlechten Szenarien, bevor sie das Gute annehmen.** Ein Businessplan, der nur das Base-Case zeigt, ist kein Werkzeug — er ist Wunschdenken. Rechnen Sie explizit durch: Was passiert bei 40 Prozent Auslastung? Bei einem Bauverzug von sechs Monaten? Bei einem neuen Wettbewerber in Jahr drei?
**Sie bauen Puffer ein, nicht als Komfortpolster, sondern als betriebliche Notwendigkeit.** Liquide Reserven von mindestens sechs Monaten Fixkosten sind kein Luxus.
**Sie sichern Mietverträge und Finanzierungskonditionen von Anfang an sorgfältig ab.** Die Kosten für gute Rechts- und Finanzberatung sind verglichen mit dem Downside verschwindend gering.
**Sie planen für Wettbewerb.** Nicht indem sie auf keine Konkurrenz hoffen, sondern indem sie ein Produkt aufbauen, das Stammkunden bindet — durch Qualität, Community und Dienstleistung.
<div class="article-cards">
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Schlechte Szenarien zuerst durchrechnen</span>
<p class="article-card__body">Ein Businessplan, der nur das Base-Case zeigt, ist kein Werkzeug — er ist Wunschdenken. Was passiert bei 40 Prozent Auslastung? Bei sechs Monaten Bauverzug? Bei einem neuen Wettbewerber in Jahr drei?</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Puffer als betriebliche Notwendigkeit</span>
<p class="article-card__body">Liquide Reserven von mindestens sechs Monaten Fixkosten sind kein Luxus, sondern Pflicht. Baukostenpuffer ist eine Budgetlinie — kein optionales Polster.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Verträge von Anfang an absichern</span>
<p class="article-card__body">Mietvertrag, Finanzierungskonditionen, Bürgschaftsumfang. Die Kosten für gute Rechts- und Finanzberatung in der Planungsphase sind verglichen mit dem Downside verschwindend gering.</p>
</div>
</div>
<div class="article-card article-card--success">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Für Wettbewerb planen</span>
<p class="article-card__body">Nicht indem man auf keine Konkurrenz hofft, sondern indem man ein Produkt aufbaut, das Stammkunden bindet — durch Qualität, Community und Dienstleistungsqualität.</p>
</div>
</div>
</div>
---

View File

@@ -138,11 +138,29 @@ Das Ergebnis ist ein Gesamtscore pro Standort, der einen strukturierten Vergleic
Die acht Kriterien oben bewerten konkrete Objekte. Bevor Sie aber mit der Objektsuche beginnen, lohnt ein Schritt zurück: In welcher Entwicklungsphase befindet sich der Markt in Ihrer Zielstadt? Die Antwort bestimmt, welche Betreiberstrategie überhaupt Aussicht auf Erfolg hat.
**Etablierte Märkte**: Buchungsplattformen zeigen durchgehende Vollauslastung zu Stoßzeiten, Wartelisten sind verbreitet, und die Nachfrage ist über jeden Zweifel hinaus belegt. Die Herausforderung liegt nicht mehr in der Nachfrage — sie liegt im Wettbewerb. Etablierte Betreiber haben Markenloyalität aufgebaut, günstige Flächen sind längst vergeben, und Bau- sowie Mietkosten spiegeln die Nachfragesituation wider. Wer in einem solchen Markt neu eintritt, braucht einen echten Differenzierungsansatz: eine bessere Standortlage innerhalb der Stadt, ein überlegenes Hallenprofil oder ein Gastronomie- und Coaching-Angebot, das die bestehenden Anlagen nicht haben. Das Eintrittsinvestment ist hoch — das Ertragspotenzial bei konsequenter Umsetzung aber auch. München ist das paradigmatische Beispiel für Deutschland.
**Wachstumsmärkte**: Die Nachfrage wächst sichtbar — Buchungszeiten füllen sich an Wochenenden, neue Anlagen werden regelmäßig eröffnet, und der Sport erreicht lokale Medienöffentlichkeit. Das Angebot hat die Nachfrage noch nicht vollständig eingeholt; in bestimmten Stadtteilen oder im Umland sind Versorgungslücken erkennbar. Das Risikoprofil ist geringer als in Frühmärkten, aber das Fenster für attraktive Flächen zu vertretbaren Konditionen schließt sich. Wer wartet, bis der Markt offensichtlich attraktiv ist, zahlt für dieses Wissen einen Aufpreis — in Form höherer Mieten, weniger Auswahl und mehr Konkurrenz beim Eintritt.
**Frühmärkte**: Geringes aktuelles Angebot, eine kleine aber wachsende Spielerbasis und ein noch nicht hinreichend bekannter Sport — die Rahmenbedingungen für günstigen Markteintritt sind vorhanden, aber Nachfrage muss aktiv aufgebaut werden, nicht abgeschöpft. Mietkosten sind niedriger, Standortauswahl größer. Der limitierende Faktor ist Geduld und Marketingfähigkeit: Anfängerkurse, Vereinskooperationen, lokale Ligen und die Konversion bestehender Tennisclubs sind die Instrumente, mit denen Betreiber in Frühmärkten Community und damit Auslastung aufbauen. Der Weg zur ersten Profitabilität ist länger — aber die Wettbewerbsposition, die in den ersten zwei Betriebsjahren aufgebaut wird, erweist sich oft als strukturell dauerhaft.
<div class="article-cards">
<div class="article-card article-card--established">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Etablierte Märkte</span>
<p class="article-card__body">Buchungsplattformen zeigen durchgehende Vollauslastung zu Stoßzeiten, Wartelisten sind verbreitet. Die Herausforderung liegt im Wettbewerb: Etablierte Betreiber haben Markenloyalität aufgebaut, günstige Flächen sind vergeben. Neueintretende Betreiber brauchen echten Differenzierungsansatz. Eintrittsinvestment ist hoch — das Ertragspotenzial bei konsequenter Umsetzung ebenfalls. München ist das paradigmatische Beispiel.</p>
</div>
</div>
<div class="article-card article-card--growth">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Wachstumsmärkte</span>
<p class="article-card__body">Die Nachfrage wächst sichtbar — Buchungszeiten füllen sich, neue Anlagen werden eröffnet. Das Angebot hat die Nachfrage noch nicht eingeholt; Versorgungslücken sind erkennbar. Das Fenster für attraktive Flächen zu vertretbaren Konditionen schließt sich. Wer wartet, zahlt den Aufpreis des offensichtlich attraktiven Markts.</p>
</div>
</div>
<div class="article-card article-card--emerging">
<div class="article-card__accent"></div>
<div class="article-card__inner">
<span class="article-card__title">Frühmärkte</span>
<p class="article-card__body">Geringes Angebot, kleine aber wachsende Spielerbasis. Mietkosten niedriger, Standortauswahl größer — aber Nachfrage muss aktiv aufgebaut werden. Anfängerkurse, Vereinskooperationen, lokale Ligen und Konversion von Tennisclubs sind die zentralen Instrumente. Der Weg zur Profitabilität ist länger; die aufgebaute Wettbewerbsposition erweist sich oft als dauerhaft.</p>
</div>
</div>
</div>
Bevor Sie in einer Stadt konkret nach Objekten suchen, sollten Sie deren Marktreife einordnen. Der Kriterienkatalog zeigt, ob ein bestimmtes Objekt geeignet ist; die Marktreife zeigt, welches Betreiberprofil und welche Strategie überhaupt die Voraussetzung für Erfolg ist.

View File

@@ -213,9 +213,10 @@ def _fetch_venues_parallel(
completed_count = 0
lock = threading.Lock()
def _worker(tenant_id: str) -> dict | None:
def _worker(tenant_id: str) -> tuple[str | None, dict | None]:
proxy_url = cycler["next_proxy"]()
return _fetch_venue_availability(tenant_id, start_min_str, start_max_str, proxy_url)
result = _fetch_venue_availability(tenant_id, start_min_str, start_max_str, proxy_url)
return proxy_url, result
with ThreadPoolExecutor(max_workers=worker_count) as pool:
for batch_start in range(0, len(tenant_ids), PARALLEL_BATCH_SIZE):
@@ -231,17 +232,17 @@ def _fetch_venues_parallel(
batch_futures = {pool.submit(_worker, tid): tid for tid in batch}
for future in as_completed(batch_futures):
result = future.result()
proxy_url, result = future.result()
with lock:
completed_count += 1
if result is not None:
venues_data.append(result)
cycler["record_success"]()
cycler["record_success"](proxy_url)
if on_result is not None:
on_result(result)
else:
venues_errored += 1
cycler["record_failure"]()
cycler["record_failure"](proxy_url)
if completed_count % 500 == 0:
logger.info(
@@ -336,16 +337,17 @@ def extract(
else:
logger.info("Serial mode: 1 worker, %d venues", len(venues_to_process))
for i, tenant_id in enumerate(venues_to_process):
proxy_url = cycler["next_proxy"]()
result = _fetch_venue_availability(
tenant_id, start_min_str, start_max_str, cycler["next_proxy"](),
tenant_id, start_min_str, start_max_str, proxy_url,
)
if result is not None:
new_venues_data.append(result)
cycler["record_success"]()
cycler["record_success"](proxy_url)
_on_result(result)
else:
venues_errored += 1
cycler["record_failure"]()
cycler["record_failure"](proxy_url)
if cycler["is_exhausted"]():
logger.error("All proxy tiers exhausted — writing partial results")
break
@@ -500,13 +502,14 @@ def extract_recheck(
venues_data = []
venues_errored = 0
for tid in venues_to_recheck:
result = _fetch_venue_availability(tid, start_min_str, start_max_str, cycler["next_proxy"]())
proxy_url = cycler["next_proxy"]()
result = _fetch_venue_availability(tid, start_min_str, start_max_str, proxy_url)
if result is not None:
venues_data.append(result)
cycler["record_success"]()
cycler["record_success"](proxy_url)
else:
venues_errored += 1
cycler["record_failure"]()
cycler["record_failure"](proxy_url)
if cycler["is_exhausted"]():
logger.error("All proxy tiers exhausted — writing partial recheck results")
break

View File

@@ -10,11 +10,11 @@ API notes (discovered 2026-02):
- `size=100` is the maximum effective page size
- ~14K venues globally as of Feb 2026
Parallel mode: when PROXY_URLS is set, fires batch_size = len(proxy_urls)
pages concurrently. Each page gets its own fresh session + proxy. Pages beyond
the last one return empty lists (safe — just triggers the done condition).
Without proxies, falls back to single-threaded with THROTTLE_SECONDS between
pages.
Parallel mode: when proxy tiers are configured, fires BATCH_SIZE pages
concurrently. Each page gets its own fresh session + proxy from the tiered
cycler. On failure the cycler escalates through free → datacenter →
residential tiers. Without proxies, falls back to single-threaded with
THROTTLE_SECONDS between pages.
Rate: 1 req / 2 s per IP (see docs/data-sources-inventory.md §1.2).
@@ -22,6 +22,7 @@ Landing: {LANDING_DIR}/playtomic/{year}/{month}/tenants.jsonl.gz
"""
import json
import os
import sqlite3
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
@@ -31,7 +32,7 @@ from pathlib import Path
import niquests
from ._shared import HTTP_TIMEOUT_SECONDS, run_extractor, setup_logging, ua_for_proxy
from .proxy import load_proxy_tiers, make_round_robin_cycler
from .proxy import load_proxy_tiers, make_tiered_cycler
from .utils import compress_jsonl_atomic, landing_path
logger = setup_logging("padelnomics.extract.playtomic_tenants")
@@ -42,6 +43,9 @@ PLAYTOMIC_TENANTS_URL = "https://api.playtomic.io/v1/tenants"
THROTTLE_SECONDS = 2
PAGE_SIZE = 100
MAX_PAGES = 500 # safety bound — ~50K venues max, well above current ~14K
BATCH_SIZE = 20 # concurrent pages per batch (fixed, independent of proxy count)
CIRCUIT_BREAKER_THRESHOLD = int(os.environ.get("CIRCUIT_BREAKER_THRESHOLD") or "10")
MAX_PAGE_ATTEMPTS = 5 # max retries per individual page before giving up
def _fetch_one_page(proxy_url: str | None, page: int) -> tuple[int, list[dict]]:
@@ -61,22 +65,57 @@ def _fetch_one_page(proxy_url: str | None, page: int) -> tuple[int, list[dict]]:
return (page, tenants)
def _fetch_pages_parallel(pages: list[int], next_proxy) -> list[tuple[int, list[dict]]]:
"""Fetch multiple pages concurrently. Returns [(page_num, tenants_list), ...]."""
def _fetch_page_via_cycler(cycler: dict, page: int) -> tuple[int, list[dict]]:
"""Fetch a single page, retrying across proxy tiers via the circuit breaker.
On each attempt, pulls the next proxy from the active tier. Records
success/failure so the circuit breaker can escalate tiers. Raises
RuntimeError if all tiers are exhausted or MAX_PAGE_ATTEMPTS is exceeded.
"""
last_exc: Exception | None = None
for attempt in range(MAX_PAGE_ATTEMPTS):
proxy_url = cycler["next_proxy"]()
if proxy_url is None: # all tiers exhausted
raise RuntimeError(f"All proxy tiers exhausted fetching page {page}")
try:
result = _fetch_one_page(proxy_url, page)
cycler["record_success"](proxy_url)
return result
except Exception as exc:
last_exc = exc
logger.warning(
"Page %d attempt %d/%d failed (proxy=%s): %s",
page,
attempt + 1,
MAX_PAGE_ATTEMPTS,
proxy_url,
exc,
)
cycler["record_failure"](proxy_url)
if cycler["is_exhausted"]():
raise RuntimeError(f"All proxy tiers exhausted fetching page {page}") from exc
raise RuntimeError(f"Page {page} failed after {MAX_PAGE_ATTEMPTS} attempts") from last_exc
def _fetch_pages_parallel(pages: list[int], cycler: dict) -> list[tuple[int, list[dict]]]:
"""Fetch multiple pages concurrently using the tiered cycler.
Returns [(page_num, tenants_list), ...]. Raises if any page exhausts all tiers.
"""
with ThreadPoolExecutor(max_workers=len(pages)) as pool:
futures = [pool.submit(_fetch_one_page, next_proxy(), p) for p in pages]
futures = [pool.submit(_fetch_page_via_cycler, cycler, p) for p in pages]
return [f.result() for f in as_completed(futures)]
def extract(
landing_dir: Path,
year_month: str, # noqa: ARG001 — unused; tenants uses ISO week partition instead
year_month: str, # noqa: ARG001 — unused; tenants uses daily partition instead
conn: sqlite3.Connection,
session: niquests.Session,
) -> dict:
"""Fetch all Playtomic venues via global pagination. Returns run metrics.
Partitioned by ISO week (e.g. 2026/W09) so each weekly run produces a
Partitioned by day (e.g. 2026/03/01) so each daily run produces a
fresh file. _load_tenant_ids() in playtomic_availability globs across all
partitions and picks the most recent one.
"""
@@ -89,12 +128,16 @@ def extract(
return {"files_written": 0, "files_skipped": 1, "bytes_written": 0}
tiers = load_proxy_tiers()
all_proxies = [url for tier in tiers for url in tier]
next_proxy = make_round_robin_cycler(all_proxies) if all_proxies else None
batch_size = len(all_proxies) if all_proxies else 1
cycler = make_tiered_cycler(tiers, CIRCUIT_BREAKER_THRESHOLD) if tiers else None
batch_size = BATCH_SIZE if cycler else 1
if next_proxy:
logger.info("Parallel mode: %d pages per batch (%d proxies across %d tier(s))", batch_size, len(all_proxies), len(tiers))
if cycler:
logger.info(
"Parallel mode: %d pages/batch, %d tier(s), threshold=%d",
batch_size,
cycler["tier_count"](),
CIRCUIT_BREAKER_THRESHOLD,
)
else:
logger.info("Serial mode: 1 page at a time (no proxies)")
@@ -104,15 +147,33 @@ def extract(
done = False
while not done and page < MAX_PAGES:
if cycler and cycler["is_exhausted"]():
logger.error(
"All proxy tiers exhausted — stopping at page %d (%d venues collected)",
page,
len(all_tenants),
)
break
batch_end = min(page + batch_size, MAX_PAGES)
pages_to_fetch = list(range(page, batch_end))
if next_proxy and len(pages_to_fetch) > 1:
if cycler and len(pages_to_fetch) > 1:
logger.info(
"Fetching pages %d-%d in parallel (%d workers, total so far: %d)",
page, batch_end - 1, len(pages_to_fetch), len(all_tenants),
page,
batch_end - 1,
len(pages_to_fetch),
len(all_tenants),
)
results = _fetch_pages_parallel(pages_to_fetch, next_proxy)
try:
results = _fetch_pages_parallel(pages_to_fetch, cycler)
except RuntimeError:
logger.error(
"Proxy tiers exhausted mid-batch — writing partial results (%d venues)",
len(all_tenants),
)
break
else:
# Serial: reuse the shared session, throttle between pages
page_num = pages_to_fetch[0]
@@ -126,7 +187,7 @@ def extract(
)
results = [(page_num, tenants)]
# Process pages in order so the done-detection on < PAGE_SIZE is deterministic
# Process pages in order so done-detection on < PAGE_SIZE is deterministic
for p, tenants in sorted(results):
new_count = 0
for tenant in tenants:
@@ -137,7 +198,11 @@ def extract(
new_count += 1
logger.info(
"page=%d got=%d new=%d total=%d", p, len(tenants), new_count, len(all_tenants),
"page=%d got=%d new=%d total=%d",
p,
len(tenants),
new_count,
len(all_tenants),
)
# Last page — fewer than PAGE_SIZE results means we've exhausted the list
@@ -146,7 +211,7 @@ def extract(
break
page = batch_end
if not next_proxy:
if not cycler:
time.sleep(THROTTLE_SECONDS)
# Write each tenant as a JSONL line, then compress atomically

View File

@@ -88,8 +88,14 @@ def load_proxy_tiers() -> list[list[str]]:
for var in ("PROXY_URLS_DATACENTER", "PROXY_URLS_RESIDENTIAL"):
raw = os.environ.get(var, "")
urls = [u.strip() for u in raw.split(",") if u.strip()]
if urls:
tiers.append(urls)
valid = []
for url in urls:
if not url.startswith(("http://", "https://")):
logger.warning("%s contains URL without scheme, skipping: %s", var, url[:60])
continue
valid.append(url)
if valid:
tiers.append(valid)
return tiers
@@ -134,8 +140,8 @@ def make_sticky_selector(proxy_urls: list[str]):
return select_proxy
def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
"""Thread-safe N-tier proxy cycler with circuit breaker.
def make_tiered_cycler(tiers: list[list[str]], threshold: int, proxy_failure_limit: int = 3) -> dict:
"""Thread-safe N-tier proxy cycler with circuit breaker and per-proxy dead tracking.
Uses tiers[0] until consecutive failures >= threshold, then escalates
to tiers[1], then tiers[2], etc. Once all tiers are exhausted,
@@ -144,13 +150,28 @@ def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
Failure counter resets on each escalation — the new tier gets a fresh start.
Once exhausted, further record_failure() calls are no-ops.
Per-proxy dead tracking (when proxy_failure_limit > 0):
Individual proxies are marked dead after proxy_failure_limit failures and
skipped by next_proxy(). If all proxies in the active tier are dead,
next_proxy() auto-escalates to the next tier. Both mechanisms coexist:
per-proxy dead tracking removes broken individuals; tier-level threshold
catches systemic failure even before any single proxy hits the limit.
Stale-failure protection:
With parallel workers, some threads may fetch a proxy just before the tier
escalates and report failure after. record_failure(proxy_url) checks which
tier the proxy belongs to and ignores the tier-level circuit breaker if the
proxy is from an already-escalated tier. This prevents in-flight failures
from a dead tier instantly exhausting the freshly-escalated one.
Returns a dict of callables:
next_proxy() -> str | None — URL from the active tier, or None
record_success() -> None — resets consecutive failure counter
record_failure() -> bool — True if just escalated to next tier
next_proxy() -> str | None — URL from active tier (skips dead), or None
record_success(proxy_url=None) -> None — resets consecutive failure counter
record_failure(proxy_url=None) -> bool — True if just escalated to next tier
is_exhausted() -> bool — True if all tiers exhausted
active_tier_index() -> int — 0-based index of current tier
tier_count() -> int — total number of tiers
dead_proxy_count() -> int — number of individual proxies marked dead
Edge cases:
Empty tiers list: next_proxy() always returns None, is_exhausted() True.
@@ -158,32 +179,97 @@ def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
"""
assert threshold > 0, f"threshold must be positive, got {threshold}"
assert isinstance(tiers, list), f"tiers must be a list, got {type(tiers)}"
assert proxy_failure_limit >= 0, f"proxy_failure_limit must be >= 0, got {proxy_failure_limit}"
# Reverse map: proxy URL -> tier index. Used in record_failure to ignore
# "in-flight" failures from workers that fetched a proxy before escalation —
# those failures belong to the old tier and must not count against the new one.
proxy_to_tier_idx: dict[str, int] = {
url: tier_idx
for tier_idx, tier in enumerate(tiers)
for url in tier
}
lock = threading.Lock()
cycles = [itertools.cycle(t) for t in tiers]
state = {
"active_tier": 0,
"consecutive_failures": 0,
"proxy_failure_counts": {}, # proxy_url -> int
"dead_proxies": set(), # proxy URLs marked dead
}
def next_proxy() -> str | None:
with lock:
idx = state["active_tier"]
if idx >= len(cycles):
return None
return next(cycles[idx])
# Try each remaining tier (bounded: at most len(tiers) escalations)
for _ in range(len(tiers) + 1):
idx = state["active_tier"]
if idx >= len(cycles):
return None
def record_success() -> None:
tier_proxies = tiers[idx]
tier_len = len(tier_proxies)
# Find a live proxy in this tier (bounded: try each proxy at most once)
for _ in range(tier_len):
candidate = next(cycles[idx])
if candidate not in state["dead_proxies"]:
return candidate
# All proxies in this tier are dead — auto-escalate
state["consecutive_failures"] = 0
state["active_tier"] += 1
new_idx = state["active_tier"]
if new_idx < len(tiers):
logger.warning(
"All proxies in tier %d are dead — auto-escalating to tier %d/%d",
idx + 1,
new_idx + 1,
len(tiers),
)
else:
logger.error(
"All proxies in all %d tier(s) are dead — no more fallbacks",
len(tiers),
)
return None # safety fallback
def record_success(proxy_url: str | None = None) -> None:
with lock:
state["consecutive_failures"] = 0
if proxy_url is not None:
state["proxy_failure_counts"][proxy_url] = 0
def record_failure() -> bool:
def record_failure(proxy_url: str | None = None) -> bool:
"""Increment failure counter. Returns True if just escalated to next tier."""
with lock:
# Per-proxy dead tracking (additional to tier-level circuit breaker)
if proxy_url is not None and proxy_failure_limit > 0:
count = state["proxy_failure_counts"].get(proxy_url, 0) + 1
state["proxy_failure_counts"][proxy_url] = count
if count >= proxy_failure_limit and proxy_url not in state["dead_proxies"]:
state["dead_proxies"].add(proxy_url)
logger.warning(
"Proxy %s marked dead after %d consecutive failures",
proxy_url,
count,
)
# Tier-level circuit breaker (existing behavior)
idx = state["active_tier"]
if idx >= len(tiers):
# Already exhausted — no-op
return False
# Ignore failures from proxies that belong to an already-escalated tier.
# With parallel workers, some threads fetch a proxy just before escalation
# and report back after — those stale failures must not penalise the new tier.
if proxy_url is not None:
proxy_tier = proxy_to_tier_idx.get(proxy_url)
if proxy_tier is not None and proxy_tier < idx:
return False
state["consecutive_failures"] += 1
if state["consecutive_failures"] < threshold:
return False
@@ -219,6 +305,10 @@ def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
def tier_count() -> int:
return len(tiers)
def dead_proxy_count() -> int:
with lock:
return len(state["dead_proxies"])
return {
"next_proxy": next_proxy,
"record_success": record_success,
@@ -226,4 +316,5 @@ def make_tiered_cycler(tiers: list[list[str]], threshold: int) -> dict:
"is_exhausted": is_exhausted,
"active_tier_index": active_tier_index,
"tier_count": tier_count,
"dead_proxy_count": dead_proxy_count,
}

View File

@@ -6,7 +6,9 @@ Operational visibility for the data extraction and transformation pipeline:
/admin/pipeline/overview → HTMX tab: extraction status, serving freshness, landing stats
/admin/pipeline/extractions → HTMX tab: filterable extraction run history
/admin/pipeline/extractions/<id>/mark-stale → POST: mark stuck "running" row as failed
/admin/pipeline/extract/trigger → POST: enqueue full extraction run
/admin/pipeline/extract/trigger → POST: enqueue extraction run (HTMX-aware)
/admin/pipeline/transform → HTMX tab: SQLMesh + export status, run history
/admin/pipeline/transform/trigger → POST: enqueue transform/export/pipeline step
/admin/pipeline/catalog → HTMX tab: data catalog (tables, columns, sample data)
/admin/pipeline/catalog/<table> → HTMX partial: table detail (columns + sample)
/admin/pipeline/query → HTMX tab: SQL query editor
@@ -18,6 +20,7 @@ Data sources:
- analytics.duckdb (DuckDB read-only via analytics.execute_user_query)
- LANDING_DIR/ (filesystem scan for file sizes + dates)
- infra/supervisor/workflows.toml (schedule definitions — tomllib, stdlib)
- app.db tasks table (run_transform, run_export, run_pipeline task rows)
"""
import asyncio
import json
@@ -626,10 +629,8 @@ async def pipeline_dashboard():
# ── Overview tab ─────────────────────────────────────────────────────────────
@bp.route("/overview")
@role_required("admin")
async def pipeline_overview():
"""HTMX tab: extraction status per source, serving freshness, landing zone."""
async def _render_overview_partial():
"""Build and render the pipeline overview partial (shared by GET and POST triggers)."""
latest_runs, landing_stats, workflows, serving_meta = await asyncio.gather(
asyncio.to_thread(_fetch_latest_per_extractor_sync),
asyncio.to_thread(_get_landing_zone_stats_sync),
@@ -650,6 +651,13 @@ async def pipeline_overview():
"stale": _is_stale(run) if run else False,
})
# Treat pending extraction tasks as "running" (queued or active).
from ..core import fetch_all as _fetch_all # noqa: PLC0415
pending_extraction = await _fetch_all(
"SELECT id FROM tasks WHERE task_name = 'run_extraction' AND status = 'pending' LIMIT 1"
)
any_running = bool(pending_extraction)
# Compute landing zone totals
total_landing_bytes = sum(s["total_bytes"] for s in landing_stats)
@@ -677,10 +685,18 @@ async def pipeline_overview():
total_landing_bytes=total_landing_bytes,
serving_tables=serving_tables,
last_export=last_export,
any_running=any_running,
format_bytes=_format_bytes,
)
@bp.route("/overview")
@role_required("admin")
async def pipeline_overview():
"""HTMX tab: extraction status per source, serving freshness, landing zone."""
return await _render_overview_partial()
# ── Extractions tab ────────────────────────────────────────────────────────────
@@ -745,7 +761,11 @@ async def pipeline_mark_stale(run_id: int):
@role_required("admin")
@csrf_protect
async def pipeline_trigger_extract():
"""Enqueue an extraction run — all extractors, or a single named one."""
"""Enqueue an extraction run — all extractors, or a single named one.
HTMX-aware: if the HX-Request header is present, returns the overview partial
directly so the UI can update in-place without a redirect.
"""
from ..worker import enqueue
form = await request.form
@@ -757,11 +777,15 @@ async def pipeline_trigger_extract():
await flash(f"Unknown extractor '{extractor}'.", "warning")
return redirect(url_for("pipeline.pipeline_dashboard"))
await enqueue("run_extraction", {"extractor": extractor})
await flash(f"Extractor '{extractor}' queued. Check the task queue for progress.", "success")
else:
await enqueue("run_extraction")
await flash("Extraction run queued. Check the task queue for progress.", "success")
is_htmx = request.headers.get("HX-Request") == "true"
if is_htmx:
return await _render_overview_partial()
msg = f"Extractor '{extractor}' queued." if extractor else "Extraction run queued."
await flash(f"{msg} Check the task queue for progress.", "success")
return redirect(url_for("pipeline.pipeline_dashboard"))
@@ -847,6 +871,156 @@ async def pipeline_lineage_schema(model: str):
)
# ── Transform tab ─────────────────────────────────────────────────────────────
_TRANSFORM_TASK_NAMES = ("run_transform", "run_export", "run_pipeline")
async def _fetch_pipeline_tasks() -> dict:
"""Fetch the latest task row for each transform task type, plus recent run history.
Returns:
{
"latest": {"run_transform": row|None, "run_export": row|None, "run_pipeline": row|None},
"history": [row, ...], # last 20 rows across all three task types, newest first
}
"""
from ..core import fetch_all as _fetch_all # noqa: PLC0415
# Latest row per task type (may be pending, complete, or failed)
latest_rows = await _fetch_all(
"""
SELECT t.*
FROM tasks t
INNER JOIN (
SELECT task_name, MAX(id) AS max_id
FROM tasks
WHERE task_name IN ('run_transform', 'run_export', 'run_pipeline')
GROUP BY task_name
) latest ON t.id = latest.max_id
"""
)
latest: dict = {"run_transform": None, "run_export": None, "run_pipeline": None}
for row in latest_rows:
latest[row["task_name"]] = dict(row)
history = await _fetch_all(
"""
SELECT id, task_name, status, created_at, completed_at, error
FROM tasks
WHERE task_name IN ('run_transform', 'run_export', 'run_pipeline')
ORDER BY id DESC
LIMIT 20
"""
)
return {"latest": latest, "history": [dict(r) for r in history]}
def _format_duration(created_at: str | None, completed_at: str | None) -> str:
"""Human-readable duration between created_at and completed_at, or '' if unavailable."""
if not created_at or not completed_at:
return ""
try:
fmt = "%Y-%m-%d %H:%M:%S"
start = datetime.strptime(created_at, fmt)
end = datetime.strptime(completed_at, fmt)
delta = int((end - start).total_seconds())
if delta < 0:
return ""
if delta < 60:
return f"{delta}s"
return f"{delta // 60}m {delta % 60}s"
except ValueError:
return ""
async def _render_transform_partial():
"""Build and render the transform tab partial."""
task_data = await _fetch_pipeline_tasks()
latest = task_data["latest"]
history = task_data["history"]
# Enrich history rows with duration
for row in history:
row["duration"] = _format_duration(row.get("created_at"), row.get("completed_at"))
# Truncate error for display
if row.get("error"):
row["error_short"] = row["error"][:120]
else:
row["error_short"] = None
any_running = any(
t is not None and t["status"] == "pending" for t in latest.values()
)
serving_meta = await asyncio.to_thread(_load_serving_meta)
return await render_template(
"admin/partials/pipeline_transform.html",
latest=latest,
history=history,
any_running=any_running,
serving_meta=serving_meta,
format_duration=_format_duration,
)
@bp.route("/transform")
@role_required("admin")
async def pipeline_transform():
"""HTMX tab: SQLMesh transform + export status, run history."""
return await _render_transform_partial()
@bp.route("/transform/trigger", methods=["POST"])
@role_required("admin")
@csrf_protect
async def pipeline_trigger_transform():
"""Enqueue a transform, export, or full pipeline task.
form field `step`: 'transform' | 'export' | 'pipeline'
Concurrency guard: rejects if the same task type is already pending.
HTMX-aware: returns the transform partial for HTMX requests.
"""
from ..core import fetch_one as _fetch_one # noqa: PLC0415
from ..worker import enqueue
form = await request.form
step = (form.get("step") or "").strip()
step_to_task = {
"transform": "run_transform",
"export": "run_export",
"pipeline": "run_pipeline",
}
if step not in step_to_task:
await flash(f"Unknown step '{step}'.", "warning")
return redirect(url_for("pipeline.pipeline_dashboard"))
task_name = step_to_task[step]
# Concurrency guard: reject if same task type is already pending
existing = await _fetch_one(
"SELECT id FROM tasks WHERE task_name = ? AND status = 'pending' LIMIT 1",
(task_name,),
)
if existing:
is_htmx = request.headers.get("HX-Request") == "true"
if is_htmx:
return await _render_transform_partial()
await flash(f"A '{step}' task is already queued (task #{existing['id']}).", "warning")
return redirect(url_for("pipeline.pipeline_dashboard"))
await enqueue(task_name)
is_htmx = request.headers.get("HX-Request") == "true"
if is_htmx:
return await _render_transform_partial()
await flash(f"'{step}' task queued. Check the task queue for progress.", "success")
return redirect(url_for("pipeline.pipeline_dashboard"))
# ── Catalog tab ───────────────────────────────────────────────────────────────

View File

@@ -169,7 +169,6 @@ async def pseo_generate_gaps(slug: str):
"template_slug": slug,
"start_date": date.today().isoformat(),
"articles_per_day": 500,
"limit": 500,
})
await flash(
f"Queued generation for {len(gaps)} missing articles in '{config['name']}'.",

View File

@@ -1865,7 +1865,7 @@ async def template_preview(slug: str, row_key: str):
@csrf_protect
async def template_generate(slug: str):
"""Generate articles from template + DuckDB data."""
from ..content import fetch_template_data, load_template
from ..content import count_template_data, load_template
try:
config = load_template(slug)
@@ -1873,8 +1873,7 @@ async def template_generate(slug: str):
await flash("Template not found.", "error")
return redirect(url_for("admin.templates"))
data_rows = await fetch_template_data(config["data_table"], limit=501)
row_count = len(data_rows)
row_count = await count_template_data(config["data_table"])
if request.method == "POST":
form = await request.form
@@ -1888,7 +1887,6 @@ async def template_generate(slug: str):
"template_slug": slug,
"start_date": start_date.isoformat(),
"articles_per_day": articles_per_day,
"limit": 500,
})
await flash(
f"Article generation queued for '{config['name']}'. "
@@ -1923,7 +1921,6 @@ async def template_regenerate(slug: str):
"template_slug": slug,
"start_date": date.today().isoformat(),
"articles_per_day": 500,
"limit": 500,
})
await flash("Regeneration queued. The worker will process it in the background.", "success")
return redirect(url_for("admin.template_detail", slug=slug))
@@ -2729,7 +2726,6 @@ async def rebuild_all():
"template_slug": t["slug"],
"start_date": date.today().isoformat(),
"articles_per_day": 500,
"limit": 500,
})
# Manual articles still need inline rebuild

View File

@@ -1,4 +1,11 @@
<!-- Pipeline Overview Tab: extraction status, serving freshness, landing zone -->
<!-- Pipeline Overview Tab: extraction status, serving freshness, landing zone
Self-polls every 5s while any extraction task is pending, stops when quiet. -->
<div id="pipeline-overview-content"
hx-get="{{ url_for('pipeline.pipeline_overview') }}"
hx-target="this"
hx-swap="outerHTML"
{% if any_running %}hx-trigger="every 5s"{% endif %}>
<!-- Extraction Status Grid -->
<div class="card mb-4">
@@ -26,12 +33,14 @@
{% if stale %}
<span class="badge-warning" style="font-size:10px;padding:1px 6px;margin-left:auto">stale</span>
{% endif %}
<form method="post" action="{{ url_for('pipeline.pipeline_trigger_extract') }}" class="m-0 ml-auto">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="extractor" value="{{ wf.name }}">
<button type="button" class="btn btn-sm" style="padding:2px 8px;font-size:11px"
onclick="confirmAction('Run {{ wf.name }} extractor?', this.closest('form'))">Run</button>
</form>
<button type="button"
class="btn btn-sm ml-auto"
style="padding:2px 8px;font-size:11px"
hx-post="{{ url_for('pipeline.pipeline_trigger_extract') }}"
hx-target="#pipeline-overview-content"
hx-swap="outerHTML"
hx-vals='{"extractor": "{{ wf.name }}", "csrf_token": "{{ csrf_token() }}"}'
onclick="if (!confirm('Run {{ wf.name }} extractor?')) return false;">Run</button>
</div>
<p class="text-xs text-slate">{{ wf.schedule_label }}</p>
{% if run %}
@@ -57,7 +66,7 @@
</div>
<!-- Two-column row: Serving Freshness + Landing Zone -->
<div style="display:grid;grid-template-columns:1fr 1fr;gap:1rem">
<div class="pipeline-two-col">
<!-- Serving Freshness -->
<div class="card">
@@ -68,6 +77,7 @@
</p>
{% endif %}
{% if serving_tables %}
<div style="overflow-x:auto">
<table class="table" style="font-size:0.8125rem">
<thead>
<tr>
@@ -86,6 +96,7 @@
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<p class="text-sm text-slate">No serving tables found — run the pipeline first.</p>
{% endif %}
@@ -99,6 +110,7 @@
</span>
</p>
{% if landing_stats %}
<div style="overflow-x:auto">
<table class="table" style="font-size:0.8125rem">
<thead>
<tr>
@@ -119,6 +131,7 @@
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<p class="text-sm text-slate">
Landing zone empty or not found at <code>data/landing</code>.
@@ -127,3 +140,5 @@
</div>
</div>
</div>{# end #pipeline-overview-content #}

View File

@@ -0,0 +1,197 @@
<!-- Pipeline Transform Tab: SQLMesh + export status, run history
Self-polls every 5s while any transform/export task is pending. -->
<div id="pipeline-transform-content"
hx-get="{{ url_for('pipeline.pipeline_transform') }}"
hx-target="this"
hx-swap="outerHTML"
{% if any_running %}hx-trigger="every 5s"{% endif %}>
<!-- Status Cards: Transform + Export -->
<div class="pipeline-two-col mb-4">
<!-- SQLMesh Transform -->
{% set tx = latest['run_transform'] %}
<div class="card">
<p class="card-header">SQLMesh Transform</p>
<div class="flex items-center gap-2 mb-3">
{% if tx is none %}
<span class="status-dot pending"></span>
<span class="text-sm text-slate">Never run</span>
{% elif tx.status == 'pending' %}
<span class="status-dot running"></span>
<span class="text-sm text-slate">Running…</span>
{% elif tx.status == 'complete' %}
<span class="status-dot ok"></span>
<span class="text-sm text-slate">Complete</span>
{% else %}
<span class="status-dot failed"></span>
<span class="text-sm text-danger">Failed</span>
{% endif %}
</div>
{% if tx %}
<p class="text-xs text-slate mono">
Started: {{ (tx.created_at or '')[:19] or '—' }}
</p>
{% if tx.completed_at %}
<p class="text-xs text-slate mono">
Finished: {{ tx.completed_at[:19] }}
</p>
{% endif %}
{% if tx.status == 'failed' and tx.error %}
<details class="mt-2">
<summary class="text-xs text-danger cursor-pointer">Error</summary>
<pre class="text-xs mt-1 p-2 bg-gray-50 rounded overflow-auto" style="max-height:8rem;white-space:pre-wrap">{{ tx.error[:400] }}</pre>
</details>
{% endif %}
{% endif %}
<div class="mt-3">
<button type="button"
class="btn btn-sm"
{% if any_running %}disabled{% endif %}
hx-post="{{ url_for('pipeline.pipeline_trigger_transform') }}"
hx-target="#pipeline-transform-content"
hx-swap="outerHTML"
hx-vals='{"step": "transform", "csrf_token": "{{ csrf_token() }}"}'
onclick="if (!confirm('Run SQLMesh transform (prod --auto-apply)?')) return false;">
Run Transform
</button>
</div>
</div>
<!-- Export Serving -->
{% set ex = latest['run_export'] %}
<div class="card">
<p class="card-header">Export Serving</p>
<div class="flex items-center gap-2 mb-3">
{% if ex is none %}
<span class="status-dot pending"></span>
<span class="text-sm text-slate">Never run</span>
{% elif ex.status == 'pending' %}
<span class="status-dot running"></span>
<span class="text-sm text-slate">Running…</span>
{% elif ex.status == 'complete' %}
<span class="status-dot ok"></span>
<span class="text-sm text-slate">Complete</span>
{% else %}
<span class="status-dot failed"></span>
<span class="text-sm text-danger">Failed</span>
{% endif %}
</div>
{% if ex %}
<p class="text-xs text-slate mono">
Started: {{ (ex.created_at or '')[:19] or '—' }}
</p>
{% if ex.completed_at %}
<p class="text-xs text-slate mono">
Finished: {{ ex.completed_at[:19] }}
</p>
{% endif %}
{% if serving_meta %}
<p class="text-xs text-slate mt-1">
Last export: <span class="font-semibold mono">{{ (serving_meta.exported_at_utc or '')[:19].replace('T', ' ') or '—' }}</span>
</p>
{% endif %}
{% if ex.status == 'failed' and ex.error %}
<details class="mt-2">
<summary class="text-xs text-danger cursor-pointer">Error</summary>
<pre class="text-xs mt-1 p-2 bg-gray-50 rounded overflow-auto" style="max-height:8rem;white-space:pre-wrap">{{ ex.error[:400] }}</pre>
</details>
{% endif %}
{% endif %}
<div class="mt-3">
<button type="button"
class="btn btn-sm"
{% if any_running %}disabled{% endif %}
hx-post="{{ url_for('pipeline.pipeline_trigger_transform') }}"
hx-target="#pipeline-transform-content"
hx-swap="outerHTML"
hx-vals='{"step": "export", "csrf_token": "{{ csrf_token() }}"}'
onclick="if (!confirm('Export serving tables (lakehouse → analytics.duckdb)?')) return false;">
Run Export
</button>
</div>
</div>
</div>
<!-- Run Full Pipeline -->
{% set pl = latest['run_pipeline'] %}
<div class="card mb-4">
<div class="flex items-center justify-between flex-wrap gap-3">
<div>
<p class="font-semibold text-navy text-sm">Full Pipeline</p>
<p class="text-xs text-slate mt-1">Runs extract → transform → export sequentially</p>
{% if pl %}
<p class="text-xs text-slate mono mt-1">
Last: {{ (pl.created_at or '')[:19] or '—' }}
{% if pl.status == 'complete' %}<span class="badge-success ml-2">Complete</span>{% endif %}
{% if pl.status == 'pending' %}<span class="badge-warning ml-2">Running…</span>{% endif %}
{% if pl.status == 'failed' %}<span class="badge-danger ml-2">Failed</span>{% endif %}
</p>
{% endif %}
</div>
<button type="button"
class="btn btn-sm"
{% if any_running %}disabled{% endif %}
hx-post="{{ url_for('pipeline.pipeline_trigger_transform') }}"
hx-target="#pipeline-transform-content"
hx-swap="outerHTML"
hx-vals='{"step": "pipeline", "csrf_token": "{{ csrf_token() }}"}'
onclick="if (!confirm('Run full ELT pipeline (extract → transform → export)?')) return false;">
Run Full Pipeline
</button>
</div>
</div>
<!-- Recent Runs -->
<div class="card">
<p class="card-header">Recent Runs</p>
{% if history %}
<div style="overflow-x:auto">
<table class="table" style="font-size:0.8125rem">
<thead>
<tr>
<th>#</th>
<th>Step</th>
<th>Started</th>
<th>Duration</th>
<th>Status</th>
<th>Error</th>
</tr>
</thead>
<tbody>
{% for row in history %}
<tr>
<td class="text-xs text-slate">{{ row.id }}</td>
<td class="mono text-xs">{{ row.task_name | replace('run_', '') }}</td>
<td class="mono text-xs text-slate">{{ (row.created_at or '')[:19] or '—' }}</td>
<td class="mono text-xs text-slate">{{ row.duration or '—' }}</td>
<td>
{% if row.status == 'complete' %}
<span class="badge-success">Complete</span>
{% elif row.status == 'failed' %}
<span class="badge-danger">Failed</span>
{% else %}
<span class="badge-warning">Running…</span>
{% endif %}
</td>
<td>
{% if row.error_short %}
<details>
<summary class="text-xs text-danger cursor-pointer">Error</summary>
<pre class="text-xs mt-1 p-2 bg-gray-50 rounded overflow-auto" style="max-width:24rem;white-space:pre-wrap">{{ row.error_short }}</pre>
</details>
{% else %}—{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<p class="text-sm text-slate">No transform runs yet.</p>
{% endif %}
</div>
</div>{# end #pipeline-transform-content #}

View File

@@ -15,6 +15,7 @@
.pipeline-tabs {
display: flex; gap: 0; border-bottom: 2px solid #E2E8F0; margin-bottom: 1.5rem;
overflow-x: auto; -webkit-overflow-scrolling: touch;
}
.pipeline-tabs button {
padding: 0.625rem 1.25rem; font-size: 0.8125rem; font-weight: 600;
@@ -32,7 +33,19 @@
.status-dot.failed { background: #EF4444; }
.status-dot.stale { background: #D97706; }
.status-dot.running { background: #3B82F6; }
@keyframes pulse-dot { 0%,100%{opacity:1} 50%{opacity:0.4} }
.status-dot.running { animation: pulse-dot 1.5s ease-in-out infinite; }
.status-dot.pending { background: #CBD5E1; }
.pipeline-two-col {
display: grid;
grid-template-columns: 1fr;
gap: 1rem;
}
@media (min-width: 640px) {
.pipeline-two-col { grid-template-columns: 1fr 1fr; }
}
</style>
{% endblock %}
@@ -43,10 +56,11 @@
<p class="text-sm text-slate mt-1">Extraction status, data catalog, and ad-hoc query editor</p>
</div>
<div class="flex gap-2">
<form method="post" action="{{ url_for('pipeline.pipeline_trigger_extract') }}" class="m-0">
<form method="post" action="{{ url_for('pipeline.pipeline_trigger_transform') }}" class="m-0">
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
<input type="hidden" name="step" value="pipeline">
<button type="button" class="btn btn-sm"
onclick="confirmAction('Enqueue a full extraction run? This will run all extractors in the background.', this.closest('form'))">
onclick="confirmAction('Run full ELT pipeline (extract → transform → export)? This runs in the background.', this.closest('form'))">
Run Pipeline
</button>
</form>
@@ -106,6 +120,10 @@
hx-get="{{ url_for('pipeline.pipeline_lineage') }}"
hx-target="#pipeline-tab-content" hx-swap="innerHTML"
hx-trigger="click">Lineage</button>
<button data-tab="transform"
hx-get="{{ url_for('pipeline.pipeline_transform') }}"
hx-target="#pipeline-tab-content" hx-swap="innerHTML"
hx-trigger="click">Transform</button>
</div>
<!-- Tab content (Overview loads on page load) -->

View File

@@ -123,17 +123,19 @@ async def get_table_columns(data_table: str) -> list[dict]:
async def fetch_template_data(
data_table: str,
order_by: str | None = None,
limit: int = 500,
limit: int = 0,
) -> list[dict]:
"""Fetch all rows from a DuckDB serving table."""
"""Fetch rows from a DuckDB serving table. limit=0 means all rows."""
assert "." in data_table, "data_table must be schema-qualified"
_validate_table_name(data_table)
order_clause = f"ORDER BY {order_by} DESC" if order_by else ""
return await fetch_analytics(
f"SELECT * FROM {data_table} {order_clause} LIMIT ?",
[limit],
)
if limit:
return await fetch_analytics(
f"SELECT * FROM {data_table} {order_clause} LIMIT ?",
[limit],
)
return await fetch_analytics(f"SELECT * FROM {data_table} {order_clause}")
async def count_template_data(data_table: str) -> int:
@@ -290,7 +292,7 @@ async def generate_articles(
start_date: date,
articles_per_day: int,
*,
limit: int = 500,
limit: int = 0,
base_url: str = "https://padelnomics.io",
task_id: int | None = None,
) -> int:

View File

@@ -570,6 +570,270 @@
@apply px-4 pb-4 text-slate-dark;
}
/* ── Article Timeline (phase/process diagrams) ── */
.article-timeline {
display: flex;
gap: 0;
margin: 1.5rem 0 2rem;
position: relative;
overflow-x: auto;
padding-bottom: 0.5rem;
}
.article-timeline__phase {
flex: 1;
min-width: 130px;
display: flex;
flex-direction: column;
align-items: center;
position: relative;
}
/* Connecting line between phases */
.article-timeline__phase + .article-timeline__phase::before {
content: '';
position: absolute;
top: 22px;
left: calc(-50% + 22px);
right: calc(50% + 22px);
height: 2px;
background: #CBD5E1;
z-index: 0;
}
.article-timeline__phase + .article-timeline__phase::after {
content: '';
position: absolute;
top: 10px;
left: calc(-50% + 18px);
font-size: 1rem;
line-height: 1;
color: #94A3B8;
z-index: 1;
}
.article-timeline__num {
width: 44px;
height: 44px;
border-radius: 50%;
background: #0F172A;
color: #fff;
display: flex;
align-items: center;
justify-content: center;
font-size: 0.75rem;
font-weight: 700;
font-family: var(--font-display);
flex-shrink: 0;
position: relative;
z-index: 2;
}
.article-timeline__card {
margin-top: 0.75rem;
background: #F8FAFC;
border: 1px solid #E2E8F0;
border-radius: 12px;
padding: 0.75rem 0.875rem;
text-align: center;
width: 100%;
}
.article-timeline__title {
font-weight: 700;
font-size: 0.8125rem;
color: #0F172A;
line-height: 1.3;
margin-bottom: 0.25rem;
font-family: var(--font-display);
}
.article-timeline__subtitle {
font-size: 0.75rem;
color: #64748B;
margin-bottom: 0.375rem;
line-height: 1.3;
}
.article-timeline__meta {
font-size: 0.6875rem;
color: #94A3B8;
line-height: 1.4;
}
/* Mobile: vertical timeline */
@media (max-width: 600px) {
.article-timeline {
flex-direction: column;
gap: 0.75rem;
overflow-x: visible;
}
.article-timeline__phase {
flex-direction: row;
align-items: flex-start;
min-width: auto;
gap: 0.75rem;
}
.article-timeline__phase + .article-timeline__phase::before {
content: '';
position: absolute;
top: calc(-0.375rem);
left: 21px;
right: auto;
width: 2px;
height: 0.75rem;
background: #CBD5E1;
}
.article-timeline__phase + .article-timeline__phase::after {
content: '';
position: absolute;
top: calc(-0.3rem);
left: 15px;
font-size: 0.9rem;
transform: rotate(90deg);
}
.article-timeline__card {
margin-top: 0;
text-align: left;
flex: 1;
}
.article-timeline__num {
flex-shrink: 0;
}
}
/* ── Article Callout Boxes ── */
.article-callout {
display: flex;
gap: 0.875rem;
padding: 1rem 1.25rem;
border-radius: 12px;
border-left: 4px solid;
margin: 1.5rem 0;
}
.article-callout::before {
font-size: 1.1rem;
flex-shrink: 0;
line-height: 1.5;
}
.article-callout__body {
flex: 1;
}
.article-callout__title {
font-weight: 700;
font-size: 0.875rem;
margin-bottom: 0.375rem;
display: block;
}
.article-callout p {
font-size: 0.875rem;
line-height: 1.6;
margin: 0;
color: inherit;
}
.article-callout--warning {
background: #FFFBEB;
border-color: #D97706;
color: #78350F;
}
.article-callout--warning::before {
content: '⚠';
color: #D97706;
}
.article-callout--warning .article-callout__title {
color: #92400E;
}
.article-callout--tip {
background: #F0FDF4;
border-color: #16A34A;
color: #14532D;
}
.article-callout--tip::before {
content: '💡';
}
.article-callout--tip .article-callout__title {
color: #166534;
}
.article-callout--info {
background: #EFF6FF;
border-color: #1D4ED8;
color: #1E3A5F;
}
.article-callout--info::before {
content: '';
color: #1D4ED8;
}
.article-callout--info .article-callout__title {
color: #1E40AF;
}
/* ── Article Cards (2-col comparison grid) ── */
.article-cards {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 1rem;
margin: 1.5rem 0;
}
@media (max-width: 580px) {
.article-cards {
grid-template-columns: 1fr;
}
}
.article-card {
border-radius: 12px;
border: 1px solid #E2E8F0;
overflow: hidden;
background: #fff;
}
.article-card__accent {
height: 4px;
}
.article-card--success .article-card__accent { background: #16A34A; }
.article-card--failure .article-card__accent { background: #EF4444; }
.article-card--neutral .article-card__accent { background: #1D4ED8; }
.article-card--established .article-card__accent { background: #0F172A; }
.article-card--growth .article-card__accent { background: #1D4ED8; }
.article-card--emerging .article-card__accent { background: #16A34A; }
.article-card__inner {
padding: 1rem 1.125rem;
}
.article-card__title {
font-weight: 700;
font-size: 0.875rem;
color: #0F172A;
margin-bottom: 0.5rem;
font-family: var(--font-display);
display: block;
}
.article-card__body {
font-size: 0.8125rem;
color: #475569;
line-height: 1.6;
margin: 0;
}
/* ── Severity Pills (risk table badges) ── */
.severity {
display: inline-block;
padding: 0.125rem 0.5rem;
border-radius: 9999px;
font-size: 0.6875rem;
font-weight: 700;
letter-spacing: 0.03em;
white-space: nowrap;
}
.severity--high {
background: #FEE2E2;
color: #991B1B;
}
.severity--medium-high {
background: #FEF3C7;
color: #92400E;
}
.severity--medium {
background: #FEF9C3;
color: #713F12;
}
.severity--low-medium {
background: #ECFDF5;
color: #065F46;
}
.severity--low {
background: #F0FDF4;
color: #166534;
}
/* Inline HTMX loading indicator for search forms.
Opacity is handled by HTMX's built-in .htmx-indicator CSS.
This class only adds positioning and the spin animation. */

View File

@@ -735,6 +735,107 @@ async def handle_run_extraction(payload: dict) -> None:
logger.info("Extraction completed: %s", result.stdout[-300:] if result.stdout else "(no output)")
@task("run_transform")
async def handle_run_transform(payload: dict) -> None:
"""Run SQLMesh transform (prod plan --auto-apply) in the background.
Shells out to `uv run sqlmesh -p transform/sqlmesh_padelnomics plan prod --auto-apply`.
2-hour absolute timeout — same as extraction.
"""
import subprocess
from pathlib import Path
repo_root = Path(__file__).resolve().parents[4]
result = await asyncio.to_thread(
subprocess.run,
["uv", "run", "sqlmesh", "-p", "transform/sqlmesh_padelnomics", "plan", "prod", "--auto-apply"],
capture_output=True,
text=True,
timeout=7200,
cwd=str(repo_root),
)
if result.returncode != 0:
raise RuntimeError(
f"SQLMesh transform failed (exit {result.returncode}): {result.stderr[:500]}"
)
logger.info("SQLMesh transform completed: %s", result.stdout[-300:] if result.stdout else "(no output)")
@task("run_export")
async def handle_run_export(payload: dict) -> None:
"""Export serving tables from lakehouse.duckdb → analytics.duckdb.
Shells out to `uv run python src/padelnomics/export_serving.py`.
10-minute absolute timeout.
"""
import subprocess
from pathlib import Path
repo_root = Path(__file__).resolve().parents[4]
result = await asyncio.to_thread(
subprocess.run,
["uv", "run", "python", "src/padelnomics/export_serving.py"],
capture_output=True,
text=True,
timeout=600,
cwd=str(repo_root),
)
if result.returncode != 0:
raise RuntimeError(
f"Export failed (exit {result.returncode}): {result.stderr[:500]}"
)
logger.info("Export completed: %s", result.stdout[-300:] if result.stdout else "(no output)")
@task("run_pipeline")
async def handle_run_pipeline(payload: dict) -> None:
"""Run full ELT pipeline: extract → transform → export, stopping on first failure."""
import subprocess
from pathlib import Path
repo_root = Path(__file__).resolve().parents[4]
steps = [
(
"extraction",
["uv", "run", "--package", "padelnomics_extract", "extract"],
7200,
),
(
"transform",
["uv", "run", "sqlmesh", "-p", "transform/sqlmesh_padelnomics", "plan", "prod", "--auto-apply"],
7200,
),
(
"export",
["uv", "run", "python", "src/padelnomics/export_serving.py"],
600,
),
]
for step_name, cmd, timeout_seconds in steps:
logger.info("Pipeline step starting: %s", step_name)
result = await asyncio.to_thread(
subprocess.run,
cmd,
capture_output=True,
text=True,
timeout=timeout_seconds,
cwd=str(repo_root),
)
if result.returncode != 0:
raise RuntimeError(
f"Pipeline failed at {step_name} (exit {result.returncode}): {result.stderr[:500]}"
)
logger.info(
"Pipeline step complete: %s%s",
step_name,
result.stdout[-200:] if result.stdout else "(no output)",
)
logger.info("Full pipeline complete (extract → transform → export)")
@task("generate_articles")
async def handle_generate_articles(payload: dict) -> None:
"""Generate articles from a template in the background."""
@@ -745,7 +846,7 @@ async def handle_generate_articles(payload: dict) -> None:
slug = payload["template_slug"]
start_date = date_cls.fromisoformat(payload["start_date"])
articles_per_day = payload.get("articles_per_day", 3)
limit = payload.get("limit", 500)
limit = payload.get("limit", 0)
task_id = payload.get("_task_id")
count = await generate_articles(

View File

@@ -500,3 +500,131 @@ class TestTieredCyclerNTier:
t.join()
assert errors == [], f"Thread safety errors: {errors}"
class TestTieredCyclerDeadProxyTracking:
"""Per-proxy dead tracking: individual proxies marked dead are skipped."""
def test_dead_proxy_skipped_in_next_proxy(self):
"""After a proxy hits the failure limit it is never returned again."""
tiers = [["http://dead", "http://live"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
# Mark http://dead as dead
cycler["record_failure"]("http://dead")
# next_proxy must always return the live one
for _ in range(6):
assert cycler["next_proxy"]() == "http://live"
def test_dead_proxy_count_increments(self):
tiers = [["http://a", "http://b", "http://c"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=2)
assert cycler["dead_proxy_count"]() == 0
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 0 # only 1 failure, limit is 2
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 1
cycler["record_failure"]("http://b")
cycler["record_failure"]("http://b")
assert cycler["dead_proxy_count"]() == 2
def test_auto_escalates_when_all_proxies_in_tier_dead(self):
"""If all proxies in the active tier are dead, next_proxy auto-escalates."""
tiers = [["http://t0a", "http://t0b"], ["http://t1"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
# Kill all proxies in tier 0
cycler["record_failure"]("http://t0a")
cycler["record_failure"]("http://t0b")
# next_proxy should transparently escalate and return tier 1 proxy
assert cycler["next_proxy"]() == "http://t1"
def test_auto_escalates_updates_active_tier_index(self):
"""Auto-escalation via dead proxies bumps active_tier_index."""
tiers = [["http://t0a", "http://t0b"], ["http://t1"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
cycler["record_failure"]("http://t0a")
cycler["record_failure"]("http://t0b")
cycler["next_proxy"]() # triggers auto-escalation
assert cycler["active_tier_index"]() == 1
def test_returns_none_when_all_tiers_exhausted_by_dead_proxies(self):
tiers = [["http://t0"], ["http://t1"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
cycler["record_failure"]("http://t0")
cycler["record_failure"]("http://t1")
assert cycler["next_proxy"]() is None
def test_record_success_resets_per_proxy_counter(self):
"""Success resets the failure count so proxy is not marked dead."""
tiers = [["http://a", "http://b"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=3)
# Two failures — not dead yet
cycler["record_failure"]("http://a")
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 0
# Success resets the counter
cycler["record_success"]("http://a")
# Two more failures — still not dead (counter was reset)
cycler["record_failure"]("http://a")
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 0
# Third failure after reset — now dead
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 1
def test_dead_proxy_stays_dead_after_success(self):
"""Once marked dead, a proxy is not revived by record_success."""
tiers = [["http://a", "http://b"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=1)
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 1
cycler["record_success"]("http://a")
assert cycler["dead_proxy_count"]() == 1
# http://a is still skipped
for _ in range(6):
assert cycler["next_proxy"]() == "http://b"
def test_backward_compat_no_proxy_url(self):
"""Calling record_failure/record_success without proxy_url still works."""
tiers = [["http://t0"], ["http://t1"]]
cycler = make_tiered_cycler(tiers, threshold=2)
cycler["record_failure"]()
cycler["record_failure"]() # escalates
assert cycler["active_tier_index"]() == 1
cycler["record_success"]()
assert cycler["dead_proxy_count"]() == 0 # no per-proxy tracking happened
def test_proxy_failure_limit_zero_disables_per_proxy_tracking(self):
"""proxy_failure_limit=0 disables per-proxy dead tracking entirely."""
tiers = [["http://a", "http://b"]]
cycler = make_tiered_cycler(tiers, threshold=10, proxy_failure_limit=0)
for _ in range(100):
cycler["record_failure"]("http://a")
assert cycler["dead_proxy_count"]() == 0
def test_thread_safety_with_per_proxy_tracking(self):
"""Concurrent record_failure(proxy_url) calls don't corrupt state."""
import threading as _threading
tiers = [["http://t0a", "http://t0b", "http://t0c"], ["http://t1a"]]
cycler = make_tiered_cycler(tiers, threshold=50, proxy_failure_limit=5)
errors = []
lock = _threading.Lock()
def worker():
try:
for _ in range(30):
p = cycler["next_proxy"]()
if p is not None:
cycler["record_failure"](p)
cycler["record_success"](p)
except Exception as e:
with lock:
errors.append(e)
threads = [_threading.Thread(target=worker) for _ in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
assert errors == [], f"Thread safety errors: {errors}"