ManojkMohan
Honored Contributor II

@jorperort 

Merging your composite PK columns into a single column primary key would not inherently eliminate the concurrency or retry conflicts causing duplicates if multiple distributed Spark partitions are retrying the same record inserts independently. The underlying problem is that multiple distributed tasks may insert logically duplicate rows due to retries
 
Using a staging table followed by a controlled MERGE operation is still the most robust and recommended approach to:
 
Guarantee consistent writes without PK violations
 
Handle concurrent write attempts reliably
 
Avoid issues caused by retries from distributed Spark tasks