The handler called evt.preventDefault() unconditionally, so auto-poll
requests (hx-trigger="every 5s", no hx-confirm) caused an empty dialog
to pop up every 5 seconds. Add an early return when evt.detail.question
is falsy so only actual hx-confirm interactions are intercepted.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Free Webshare proxies were timing out and exhausting the circuit breaker
before datacenter/residential proxies got a chance to run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When all proxy tiers are exhausted and 0 venues are fetched, the working
file is empty and compress_jsonl_atomic asserts non-empty. Return early
with a warning instead of crashing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add global htmx:confirm handler in base_admin.html that intercepts
hx-confirm attributes and shows #confirm-dialog instead of window.confirm()
- Convert 4 pipeline HTMX buttons (Run Transform, Run Export, Run Full
Pipeline, Run extractor) from onclick+confirm() to hx-confirm
- Convert 4 affiliate form/list delete buttons from onclick+confirm()
to confirmAction() via event.preventDefault()
- Add scrollbar-width:none + ::-webkit-scrollbar{display:none} to
.pipeline-tabs to suppress spurious horizontal scrollbar
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
start_time is "HH:MM:SS" (time only), not a full ISO datetime. Combining
with resource's start_date to get "YYYY-MM-DDTHH:MM:SS" before parsing.
The ValueError was silently caught on every slot → 0 venues found → recheck
never actually ran since it was first deployed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace inline LITESTREAM_R2_* credentials in the backup service with
the named [r2-landing] rclone remote and R2_LANDING_* env vars, matching
the beanflows pattern. Add rclone.conf setup to bootstrap_supervisor.sh
so the remote is written from env on each bootstrap run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI runs on Gitea only. GitLab is a passive push mirror — no runners,
no tagging, no deploy involvement.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PROXY_URLS_* and other secrets were defined in .env but never loaded,
causing availability to run in slow serial mode (1 req/s) instead of
parallel mode with proxies.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
With parallel workers, threads that fetch a proxy just before escalation
can report failures after the tier has already changed — those failures
were silently counting against the new tier, immediately exhausting it
before it ever got tried (Rayobyte being skipped entirely in favour of
DataImpulse because 10 in-flight Webshare failures hit the threshold).
Fix: build a proxy_url → tier_idx reverse map at construction time and
skip the tier-level circuit breaker when the failing proxy belongs to an
already-escalated tier.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PROXY_URLS_DATACENTER was missing the scheme prefix, causing SSL
handshake failures on the Rayobyte HTTP-only proxy.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Validates each URL in PROXY_URLS_DATACENTER / PROXY_URLS_RESIDENTIAL:
logs a warning and skips any entry missing an http:// or https:// scheme
instead of passing malformed URLs that cause SSL or connection errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds HTMX live polling to the Overview tab (stops when quiet) and a new
Transform tab for managing the SQLMesh + export steps of the ELT pipeline.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add proxy_failure_limit param to make_tiered_cycler (default 3).
Individual proxies hitting the limit are marked dead and permanently
skipped. next_proxy() auto-escalates when all proxies in the active
tier are dead. Both mechanisms coexist: per-proxy dead tracking removes
broken individuals; tier-level threshold catches systemic failure.
- proxy.py: dead_proxies set + proxy_failure_counts dict in state;
next_proxy skips dead proxies with bounded loop; record_failure/
record_success accept optional proxy_url; dead_proxy_count() added
- playtomic_tenants.py: pass proxy_url to record_success/record_failure
- playtomic_availability.py: _worker returns (proxy_url, result);
serial loops in extract + extract_recheck capture proxy_url
- test_supervisor.py: 11 new tests in TestTieredCyclerDeadProxyTracking
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the tenants extractor flattened all proxy tiers into a single
round-robin list, bypassing the circuit breaker entirely. When the free
Webshare tier runs out of bandwidth (402), all 20 free proxies fail and
the batch crashes — the paid datacenter/residential proxies are never tried.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the tenants extractor flattened all proxy tiers into a single
round-robin list, bypassing the circuit breaker entirely. When the free
Webshare tier runs out of bandwidth (402), all 20 free proxies fail and
the batch crashes — the paid datacenter/residential proxies are never tried.
Changes:
- Replace make_round_robin_cycler with make_tiered_cycler (same as availability)
- Add _fetch_page_via_cycler: retries per page across tiers, records
success/failure in cycler so circuit breaker can escalate
- Fix batch_size to BATCH_SIZE=20 constant (was len(all_proxies) ≈ 22)
- Check cycler.is_exhausted() before each batch; catch RuntimeError mid-batch
and write partial results rather than crashing with nothing
- CIRCUIT_BREAKER_THRESHOLD from env (default 10), matching availability
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docker-compose.prod.yml: fix volume mount for all 6 web containers
from /opt/padelnomics/data (stale) → /data/padelnomics (live supervisor output);
add LANDING_DIR=/app/data/pipeline/landing so extraction/landing stats work
- pipeline_routes.py: fix _REPO_ROOT parents[5] → parents[4] so workflows.toml
is found in dev and pipeline overview shows workflow schedules
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace ABS() bbox predicates with BETWEEN in all three spatial CTEs
(nearest_padel, padel_local, tennis_nearby). BETWEEN enables DuckDB's
IEJoin (interval join) which is O((N+M) log M) vs the previous O(N×M)
nested-loop cross-join.
Add country pre-filters to restrict the left side from ~140K global
locations to ~20K rows for padel/tennis CTEs (~8 countries each).
Expected: ~50-200x speedup on the spatial CTE portion of the model.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
stg_population_geonames → dim_locations → location_opportunity_profile
were all 0 rows in prod because the GeoNames extractor was never
scheduled. First run will backfill cities1000 to landing zone.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Centralises retailer config in affiliate_programs table (URL template,
tracking tag, commission %). Products now use program dropdown + product
identifier instead of manual URL baking. URL assembled at redirect time
via build_affiliate_url() — changing a tag propagates to all products
instantly. Backward compatible: legacy baked-URL products fall through
unchanged. Amazon OneLink (configured in Associates dashboard) handles
geo-redirect to local marketplaces with no additional programs needed.
Also fixes _rebuild_article() frontmatter rendering bug.
Commits: fix frontmatter, migration 0027, program CRUD functions,
redirect update, admin CRUD + templates, product form update, tests.
41 tests, all passing. Ruff clean.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the manual affiliate URL field with a program selector and
product identifier input. JS toggles visibility between program mode and
manual (custom URL) mode. retailer field is auto-populated from the
program name on save. INSERT/UPDATE statements include new program_id
and product_identifier columns. Validation accepts program+ID or manual
URL as the URL source.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds program list, create, edit, delete routes with appropriate guards
(delete blocked if products reference the program). Adds "Programs" tab
to the affiliate subnav. New templates: affiliate_programs.html,
affiliate_program_form.html, partials/affiliate_program_results.html.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Program-based products now get URLs assembled from the template at
redirect time. Changing a program's tracking_tag propagates instantly
to all its products without rebuilding. Legacy products (no program_id)
still use their baked affiliate_url via fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds get_all_programs(), get_program(), get_program_by_slug() for admin
CRUD. Adds build_affiliate_url() that assembles URLs from program template
+ product identifier, with fallback to baked affiliate_url for legacy
products. Updates get_product() to JOIN affiliate_programs so _program
dict is available at redirect time. _parse_product() extracts program
fields into nested _program key.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes a bug where manual article previews rendered raw frontmatter
(title:, slug:, etc.) as visible text. Now strips the --- block using
the existing _FRONTMATTER_RE before passing the body to mistune.html().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>