If you observe a hung job, thread dumps are crucial to determine the root cause. Hence, it would be a good idea to collect the thread dumps before cancelling the hung job.
Here are the Instructions to collect the Spark driver/executor thread dump:
- โGo to the cluster where the SQL/Job/Query is running and click on spark UI.
- Now click on the "Executors" tab in the Spark UI. Click on "Thread dump" against the Spark Driver/executors
- On this screen, you will see the list of active threads running.
- Click on Expand All (In latest versions you should see a Download button next to Expand all, it is a new feature). Right-click on the web page and save the thread dumps that you see on the screen as HTML files, or click on Download.
- To confirm that a thread is stuck, a thread dump has to be taken once in 1-2 minutes for 5-6 iterations.