pvignesh92
Honored Contributor

Hi @Matthew Elsham​, As @Lakshay Goel​ pointed out, I would believe Windows would work better as it will first partition them based on the partition key and then your aggregation happens within that partition on a single worker. But it's good to see your query plan for both these cases and understand with your data what works best for you.

One good blog I checked was this - https://blog.knoldus.com/using-windows-in-spark-to-avoid-joins/. Please have a look.

View solution in original post