Re: Is there a better method to join two dataframe...

TejuNC · ‎01-23-2017

This is an expected behavior.

DataFrame.join

method is equivalent to SQL join like this

SELECT*FROM a JOIN b ON joinExprs

If you want to ignore duplicate columns just drop them or select columns of interest afterwards. If you want to disambiguate you can use access these using parent

DataFrames

:

val a:DataFrame=???val b:DataFrame=???val joinExprs:Column=???

a.join(b, joinExprs).select(a("id"), b("foo"))// drop equivalent a.alias("a").join(b.alias("b"), joinExprs).drop(b("id")).drop(a("foo"))

or use aliases:

// As for now aliases don't work with drop
a.alias("a").join(b.alias("b"), joinExprs).select($"a.id", $"b.foo")