Does this answer still represent best practices? I am curious about the use of join rather than cartesian. Why not val joined = customers.cartesian(products)?
I am curious to know if this is still the best recommendation for doing a large cartesian product in spark. For example is it better to use 'join' rather than 'cartesian' specifically? Why not 'val joined = customer.cartesian(products)'