cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unexpected performance behaviors due to changes in the Spark engine or Databricks runtime

Direo
Contributor

Hi!

We have recently upgraded our cluster from Databricks Runtime 10.4 LTS which includes Apache Spark 3.2.1 to to Databricks Runtime 13.3 LTSincludes Apache Spark 3.2.1 powered by Apache Spark 3.3.0 and noticed that one of our jobs runtime has dramatically increased (actually it was terminated before it even finished).

On 3.2.1 the job would run aprox. 40 mins, while on 3.3.0 it was terminated after 3 hours. According to driver logs, it stuck on one stage which had 160000 tasks (TaskSetManager: Finished task 18443.0 in stage 204.0 in 8746 ms on (executor 1) (17678/160000)) while the run on 3.2.1 would never reach that number - max. a few hundred tasks. Also worth mentioning that disk was expanding to the hights I never seen.

I was able to find the function where behaviour of 3.3.0 has changed compared to 3.2.1. It is crossjoin.

Has anyone experienced such situation or maybe have suggestion why it could have happened? I have a feeling that it has something to do with new or changed default spark properties, but could not identify which ones.

Also, spark.conf.set("spark.databricks.io.cache.enabled", "true").

1 REPLY 1

Direo
Contributor

Seems that broadcasting the smaller table in crossjoin did the magic.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!