I have two files and I created two dataframes prod1 and prod2 out of it.I need to find the records with column names and values that are not matching in both the dfs.
id_sk is the primary key .all the cols are string datatype
dataframe 1 (prod1)
id_sk | uuid|name
1|10|a
2|20|b
3|30|c
dataframe 2 (prod2)
id_sk | uuid|name
2|20|b-upd
3|30-up|c
4|40|d
so I need the result dataframe in the below format.
id|col_name|values
2|name|b,b-upd
3|uuid|30,30-up
I did a inner join and compared the unmatched records.
I am getting the result as follows :
id_sk | uuid_prod1|uid_prod2|name_prod1|name_prod2
2|20|20|b|b-upd
3|30|30-up|c|c
val commmon_rec = prod1.join(prod2,prod1("id_sk")===prod2("id_sk"),"inner").select(prod1("id_sk").alias("id_sk_prod1"),prod1("uuid").alias("uuid_prod1"),prod1("name").alias("name_prod1"),prod1("name").alias("name_prod2")
val compare = spark.sql("select ...from common_rec where col_prod1<>col_prod2")