Showing results for 
Search instead for 
Did you mean: 

Unexpected performance behaviors due to changes in the Spark engine or Databricks runtime



We have recently upgraded our cluster from Databricks Runtime 10.4 LTS which includes Apache Spark 3.2.1 to to Databricks Runtime 13.3 LTSincludes Apache Spark 3.2.1 powered by Apache Spark 3.3.0 and noticed that one of our jobs runtime has dramatically increased (actually it was terminated before it even finished).

On 3.2.1 the job would run aprox. 40 mins, while on 3.3.0 it was terminated after 3 hours. According to driver logs, it stuck on one stage which had 160000 tasks (TaskSetManager: Finished task 18443.0 in stage 204.0 in 8746 ms on (executor 1) (17678/160000)) while the run on 3.2.1 would never reach that number - max. a few hundred tasks. Also worth mentioning that disk was expanding to the hights I never seen.

I was able to find the function where behaviour of 3.3.0 has changed compared to 3.2.1. It is crossjoin.

Has anyone experienced such situation or maybe have suggestion why it could have happened? I have a feeling that it has something to do with new or changed default spark properties, but could not identify which ones.

Also, spark.conf.set("", "true").



Seems that broadcasting the smaller table in crossjoin did the magic.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.