hansonkx
New Contributor II

for those of you who are looking for a not too complicated solution, you can use the two built in spark api soundex and levenshtein

 

val newDF = accountDF.join(
  accountDF2,
  levenshtein(accountDF("name"), accountDF2("name")) < 3 && (accountDF("id") !== accountDF2("id"))
)
newDF.show