Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-21-2023 05:35 AM
Hi @Matthew Elsham, As @Lakshay Goel pointed out, I would believe Windows would work better as it will first partition them based on the partition key and then your aggregation happens within that partition on a single worker. But it's good to see your query plan for both these cases and understand with your data what works best for you.
One good blog I checked was this - https://blog.knoldus.com/using-windows-in-spark-to-avoid-joins/. Please have a look.