NhatHoang
Valued Contributor II

Hi,

In my experience, if you use dropDuplicates(), Spark will keep a random row.

Therefore, you should define a logic to remove duplicated rows.