I followed the same way what it is in the above article. But did not work for me.

Both df1 & df2 have the same column set of 1006 count. The result created with 2012 columns.

scala> df1.join(df2, Seq("file_name","post_evar30") )

res24: org.apache.spark.sql.DataFrame = [file_name: string, post_evar30: string ... 2012 more fields]