โ03-17-2023 04:29 AM
Hi Team,
We have a complex ETL job running in databricks for 6 hours. The cluster has the below configuration:
Minworkers: 16
Maxworkers: 24
Worker and Driver Node Type: Standard_DS14_v2. (16 cores, 128 GB RAM)
I have monitored the job progress in Spark UI for an hour, and my observations are below:
- The jobs are progressing and not stuck for a long time.
- The workers nodes scaled up to 24 (max_workers configured)
- Shuffling (Read/Write) happens with a large amount of data. (I Ran this job with spark.sql.shuffle.partitions 4000)
We are expecting the jobs should be completed within 4 hours. Any suggestions, please, to optimize the performance of the job?
Regards,
Rajesh.
โ03-17-2023 07:35 AM
Hi @Rajesh Kannan Rโ , Can you check the Spark UI for the spark job where the job is spending most of the time. Also, look for any failed spark jobs in Spark UI.
โ03-17-2023 07:47 AM
Hi Lakshay,
Thank you for replying. One thing I noticed is in the job description in "Spark UI", each job with the below code takes an average of 15 minutes.
"save at StoreTransform.scala"
Not sure whether it is a custom code or a Databricks code.
Regards,
Rajesh.
โ03-17-2023 07:55 AM
Hi @Rajesh Kannan Rโ , It looks like a custom code. Could you please share a task-level screenshot of one of these stages?
โ03-17-2023 08:03 AM
Hi Lakshay,
Unfortunately, I haven't captured it. I will share if I run the job next time.
Regards,
Rajesh.
โ03-17-2023 12:06 PM
Sure. You can also try the below suggestions:
โ03-20-2023 11:54 PM
@Lakshay Goelโ
Hi Lakshay,
It takes a couple of days to test this recommendation. I will try the job execution with new recommendations and update this thread.
Regards,
Rajesh.
โ03-17-2023 11:08 PM
Hi @Rajesh Kannan Rโ
Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.
Please help us select the best solution by clicking on "Select As Best" if it does.
Your feedback will help us ensure that we are providing the best possible service to you.
Thank you!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group