fix(staging): enforce grain dedup in resources + opening_hours + skip old blob in tenants
Both stg_playtomic_resources and stg_playtomic_opening_hours lacked QUALIFY ROW_NUMBER() dedup despite declaring a grain. When both tenants.json.gz (old) and tenants.jsonl.gz (new) exist for the same month, the UNION ALL produced exactly 2× rows. Fixes: - stg_playtomic_resources: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, resource_id) - stg_playtomic_opening_hours: QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, day_of_week) - playtomic_tenants.py: skip if old blob OR new JSONL already exists for the month, preventing same-month dual-format writes that trigger the duplicate Row counts after fix: ~43.8K resources, ~93.4K opening_hours (was 87.6K, 186.8K). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -104,3 +104,6 @@ SELECT
|
||||
FROM unpivoted
|
||||
WHERE opening_time IS NOT NULL
|
||||
AND closing_time IS NOT NULL
|
||||
-- Enforce grain: if both old blob and new JSONL exist for the same month,
|
||||
-- the UNION ALL produces duplicate (tenant_id, day_of_week) pairs — deduplicate.
|
||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, day_of_week ORDER BY tenant_id) = 1
|
||||
|
||||
@@ -68,3 +68,6 @@ SELECT
|
||||
FROM unnested
|
||||
WHERE (resource_json ->> 'resource_id') IS NOT NULL
|
||||
AND (resource_json ->> 'sport_id') = 'PADEL'
|
||||
-- Enforce grain: if both old blob and new JSONL exist for the same month,
|
||||
-- the UNION ALL produces duplicate (tenant_id, resource_id) pairs — deduplicate.
|
||||
QUALIFY ROW_NUMBER() OVER (PARTITION BY tenant_id, resource_json ->> 'resource_id' ORDER BY tenant_id) = 1
|
||||
|
||||
Reference in New Issue
Block a user