I have also the same problem. I have a big RDD and I want to calculate the similarity between elements.When I take cartesian on this Big rdd, it causes a lot of shuffles. Is there any way around?Can we comapre the elemnets without using cartesian?
v...