I have two tables with unique IDs:
ID val ID val
1 10 1 10
2 11 2 10
3 13 3 13
I then merge those two tables so that it results in one table with only unique IDs. The logic is more or less irrelevant for my problem which is, that every 90/100 times the operation fails (without any error) and my resulting table has duplicate IDs. I persisted the table in hope that it would change something but it didn't.
Can someone please give me some reasons that can lead to a problem like that and also some solution addvice? I'm stuck here because the problem occurs very rarely. I'm using pyspark and a standard multinode cluster,