Performance optimization on auto_cdc_flow

yit337 · ‎05-11-2026

I've got a fact streaming table, which is updated by SCD2 records from the CDF of a silver table. The join is on pk (hash key generated from dimensions business keys) and factory_code (60 unique values). On each incremental processing, it reads all the existing data from the gold model.

Why? And how to improve this?

I have already enabled liquid clustering on factory_code. It doesn't make sense to set liquid clustering on my hash keys cause these are uniformly distributed through files.

Based on the Query History, most time is spend on 'Time taken to materialize source (or determine it's not needed)'.

Percentage of file and bytes prunned are both 0%.