Dimitry
Valued Contributor

Hi @Coffee77 

Its not a problem with sql (I can generate batch id in other means) - it is a sudden problem with SPARK that may happen with my other existing queries out of nowhere. I don't understand how an existing field (that batch id) being visibly correct for my test set of records (which entirely fit on screen) becomes a random number in distinct or group-by operators. This frightens me that sql query suddenly becomes unreliable.

I can only recall vacuuming some of the tables prior to running the query... but how can it affect this so badly?