Hello @Gaurav Rupnarโ
The following code snippet reproduces my statement.
See how the query plan changes when you comment the cache() on the res dataframe
spark.conf.set("spark.sql.shuffle.partitions", 2000)
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1)
val factData = Seq(1,2,3,4,5,6,7,8,9,10).toDF("value")
val dimData = Seq(1,2,3).toDF("value")
val res = factData.join(dimData, Seq("value"))
res.cache()
res.write.format("noop").mode("append").save()
res.unpersist()