fix(staging): enforce grain dedup in resources + opening_hours + skip old blob in tenants

Both stg_playtomic_resources and stg_playtomic_opening_hours lacked QUALIFY ROW_NUMBER()
dedup despite declaring a grain. When both tenants.json.gz (old) and tenants.jsonl.gz (new)
exist for the same month, the UNION ALL produced exactly 2× rows.

Fixes:
- stg_playtomic_resources: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, resource_id)
- stg_playtomic_opening_hours: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, day_of_week)
- playtomic_tenants.py: skip if old blob OR new JSONL already exists for the month,
  preventing same-month dual-format writes that trigger the duplicate

Row counts after fix: ~43.8K resources, ~93.4K opening_hours (was 87.6K, 186.8K).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Deeman
2026-02-25 13:41:23 +01:00
parent b177d2c377
commit a86f1ecd3a
3 changed files with 10 additions and 0 deletions

View File

@@ -104,3 +104,6 @@ SELECT
FROM unpivoted
WHERE opening_time IS NOT NULL
AND closing_time IS NOT NULL
-- Enforce grain: if both old blob and new JSONL exist for the same month,
-- the UNION ALL produces duplicate (tenant_id, day_of_week) pairs — deduplicate.
QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, day_of_week ORDER BY tenant_id) = 1

View File

@@ -68,3 +68,6 @@ SELECT
FROM unnested
WHERE (resource_json ->> 'resource_id') IS NOT NULL
AND (resource_json ->> 'sport_id') = 'PADEL'
-- Enforce grain: if both old blob and new JSONL exist for the same month,
-- the UNION ALL produces duplicate (tenant_id, resource_id) pairs — deduplicate.
QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, resource_json ->> 'resource_id' ORDER BY tenant_id) = 1