cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unexpected performance behaviors due to changes in the Spark engine or Databricks runtime

Direo
Contributor

Hi!

We have recently upgraded our cluster from Databricks Runtime 10.4 LTS which includes Apache Spark 3.2.1 to to Databricks Runtime 13.3 LTSincludes Apache Spark 3.2.1 powered by Apache Spark 3.3.0 and noticed that one of our jobs runtime has dramatically increased (actually it was terminated before it even finished).

On 3.2.1 the job would run aprox. 40 mins, while on 3.3.0 it was terminated after 3 hours. According to driver logs, it stuck on one stage which had 160000 tasks (TaskSetManager: Finished task 18443.0 in stage 204.0 in 8746 ms on (executor 1) (17678/160000)) while the run on 3.2.1 would never reach that number - max. a few hundred tasks. Also worth mentioning that disk was expanding to the hights I never seen.

I was able to find the function where behaviour of 3.3.0 has changed compared to 3.2.1. It is crossjoin.

Has anyone experienced such situation or maybe have suggestion why it could have happened? I have a feeling that it has something to do with new or changed default spark properties, but could not identify which ones.

Also, spark.conf.set("spark.databricks.io.cache.enabled", "true").

1 REPLY 1

Direo
Contributor

Seems that broadcasting the smaller table in crossjoin did the magic.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group