The images below are DAG and text execution summary from Spark UI and I'm having hard time interpreting these logs. I have two questions below.
1. In the Text Execution Summary, Duration total for WholeStageCodegen (2) says 38.3 m (0 ms, 2.7 m, 16.9 m (stage 2631.0: task 17953)). In my understanding, (min, med, max) shows the (min, med, max) value among the 6 tasks. Is this correct? If so, how come total completion time is 515935 ms (about 8.6 min) as shown in the upper left corner, but the longest task took 16.9 min, which is longer than the time whole process took?
2. According to DAG image, the whole process is executed in the following order.
- WholeStageCodegen(1)
- PhotonResultsStage
- WholeStageCodegen(2)
- InMemoryTableScan(1)
- PhotonShuffleMapsStage
- PhotonResultStage
- WholeStageCodegen(1)
- ResultQueryStage(17)
- AdaptiveSparkPlan(26)
I want to know how long the above each process takes to execute, but if you look at the Duration total in Text Execution Summary, the sum of them clearly doesn't match with completion time of whole process which is 8.6 min. How can I accurately figure out execution time of each process, or there is no way to know that?