cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How and when to capture the thread dump of the Spark driver?

User16869510359
Esteemed Contributor

What is the best way to capture the thread dump of the Spark driver process. Also, when should I capture the thread dump?

1 ACCEPTED SOLUTION

Accepted Solutions

User16869510359
Esteemed Contributor

Steps to collect the thread dump(executor)

  • Go to the cluster where the job is running and click on spark UI
  • Traverse to the stuck task in the spark UI by clicking on the long-running job -> long-running stage -> tasks
  • On the tasks page, please note down the "Task ID" and the "Host" where the task is stuck
  • Now click on the "Executors" tab in the Spark UI. Click on "Thread dump" against the corresponding host IP where the stuck task is running
  • On this screen, you will see the list of active threads running. Click on the thread which contains the "task id" you noted in step 3
  • Going through the thread, you will be able to find out which class and which function is getting executed
  • To confirm that a task is stuck, a thread dump has to be taken once in 2 minutes for 5-6 iterations. If all the collected thread dumps look the same, then we can confirm that the task is stuck.
  • At times, it may be an external library that you have attached that might be causing the thread to get stuck or at times the job might be running in infinite loops. Based on the cause, corrective actions can be taken.

View solution in original post

2 REPLIES 2

User16869510359
Esteemed Contributor

Steps to collect the thread dump(executor)

  • Go to the cluster where the job is running and click on spark UI
  • Traverse to the stuck task in the spark UI by clicking on the long-running job -> long-running stage -> tasks
  • On the tasks page, please note down the "Task ID" and the "Host" where the task is stuck
  • Now click on the "Executors" tab in the Spark UI. Click on "Thread dump" against the corresponding host IP where the stuck task is running
  • On this screen, you will see the list of active threads running. Click on the thread which contains the "task id" you noted in step 3
  • Going through the thread, you will be able to find out which class and which function is getting executed
  • To confirm that a task is stuck, a thread dump has to be taken once in 2 minutes for 5-6 iterations. If all the collected thread dumps look the same, then we can confirm that the task is stuck.
  • At times, it may be an external library that you have attached that might be causing the thread to get stuck or at times the job might be running in infinite loops. Based on the cause, corrective actions can be taken.

User16869510359
Esteemed Contributor

For Spark driver the process is the same. Choose the driver from the Executor page and view the thread dump.

A thread dump is the footprints of the JVM they are very useful in debugging issues where the JVM process is stuck or making extremely slow progress.

Thread dump collection is considered as an advanced troubleshooting technquie.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.