cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Slow running Spark job issue - due to the unknown spark stages created by Databircks Compute cluster

anil_reddaboina
New Contributor II

Hi Team,

Recently we migrated the spark jobs from self hosted spark(YARN) Cluster to Databricks.

Currently we are using the Databricks workflows with Job_Compute clusters and the Job Type - Spark JAR type execution, so when we run the job in databricks, what we obsererved is its creating the extra job stages like mentioned in the below image. the problem here which is also taking a singnficant time which is causing the delaying of the total job runtime. 
Databricks Run time: 16.1
instance type - Standard_E16ds_v4
Can you please add your suggestions. 

databricks_new_stages.png

โ€ƒ

2 REPLIES 2

Brahmareddy
Esteemed Contributor

Hi Anil,

How are you doing today?, As per my understanding, When you move Spark jobs from a self-hosted YARN cluster to Databricks and run them using Spark JARs on job compute clusters, it's normal to see a few extra stages added in the job execution plan. These stages are usually due to Databricksโ€™ built-in features like adaptive query execution (AQE), automatic optimizations, or internal tracking. While these help in performance tuning, they can sometimes increase the total runtime if not tuned well. Iโ€™d suggest trying to disable AQE temporarily (spark.sql.adaptive.enabled to false) and reviewing the job stages in the Spark UI to see whatโ€™s taking time. Also, double-check if broadcast joins or data skew might be causing shuffle delays. Using compute pools can also reduce cold-start delays if you're launching new clusters for each run. A bit of tuning here can make a big difference โ€” happy to help further if you share a specific job plan or logs!

Regards,

Brahma

anil_reddaboina
New Contributor II

Hey Brahma,
Thanks for your reply. As a first step I will disable AQE config and test it. 

We are using the node pools with job_compute cluster type so that its not spinning up a new cluster for each Job. 

I'm configuring the below two configs also, do you think these configs cause any side effects 

"spark.databricks.io.cache.enabled": "true",
"spark.databricks.io.cache.maxDiskUsage": "50g",

 

Thanks,

Anil