Re: Cosine similarity between all rows pairwise on...

john_odwyer · ‎09-22-2021

The issue is probably related to the self join between 100 million rows, I'm not positive without seeing the code and understanding the problem better but you may want to think about using windowing functions instead

https://blog.knoldus.com/using-windows-in-spark-to-avoid-joins/

View solution in original post