cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Slow running Spark job issue - due to the unknown spark stages created by Databircks Compute cluster

anil_reddaboina
New Contributor II

Hi Team,

Recently we migrated the spark jobs from self hosted spark(YARN) Cluster to Databricks.

Currently we are using the Databricks workflows with Job_Compute clusters and the Job Type - Spark JAR type execution, so when we run the job in databricks, what we obsererved is its creating the extra job stages like mentioned in the below image. the problem here which is also taking a singnficant time which is causing the delaying of the total job runtime. 
Databricks Run time: 16.1
instance type - Standard_E16ds_v4
Can you please add your suggestions. 

databricks_new_stages.png

2 REPLIES 2

Brahmareddy
Honored Contributor III

Hi Anil,

How are you doing today?, As per my understanding, When you move Spark jobs from a self-hosted YARN cluster to Databricks and run them using Spark JARs on job compute clusters, it's normal to see a few extra stages added in the job execution plan. These stages are usually due to Databricks’ built-in features like adaptive query execution (AQE), automatic optimizations, or internal tracking. While these help in performance tuning, they can sometimes increase the total runtime if not tuned well. I’d suggest trying to disable AQE temporarily (spark.sql.adaptive.enabled to false) and reviewing the job stages in the Spark UI to see what’s taking time. Also, double-check if broadcast joins or data skew might be causing shuffle delays. Using compute pools can also reduce cold-start delays if you're launching new clusters for each run. A bit of tuning here can make a big difference — happy to help further if you share a specific job plan or logs!

Regards,

Brahma

anil_reddaboina
New Contributor II

Hey Brahma,
Thanks for your reply. As a first step I will disable AQE config and test it. 

We are using the node pools with job_compute cluster type so that its not spinning up a new cluster for each Job. 

I'm configuring the below two configs also, do you think these configs cause any side effects 

"spark.databricks.io.cache.enabled": "true",
"spark.databricks.io.cache.maxDiskUsage": "50g",

 

Thanks,

Anil 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now