synapse pyspark delta lake merge scd type2 without primary key

sunil_ksheersag — Fri, 08 Dec 2023 15:15:51 GMT

Problem
I have a set of rows coming from previous process which has no primary key, and the composite keys are bound to change which are not a good case for composite key, only way the rows are unique is the whole row( including all keys and all values). I need to implement the SCD type2 on this data. The environment is Synapse pyspark, using delta lake Merge command and more.

how I tried
Using row hash: In this case the challenge without primary/composite key is to find which rows have changed/updated. With any updated values the row hash is changing and resulting into new row.

please suggest how this problem can be solved. If you have any questions on this, please write back.

topic synapse pyspark delta lake merge scd type2 without primary key in Data Engineering

synapse pyspark delta lake merge scd type2 without primary key