How to interpret Spark UI
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-10-2025 02:27 AM
The images below are DAG and text execution summary from Spark UI and I'm having hard time interpreting these logs. I have two questions below.
1. In the Text Execution Summary, Duration total for WholeStageCodegen (2) says 38.3 m (0 ms, 2.7 m, 16.9 m (stage 2631.0: task 17953)). In my understanding, (min, med, max) shows the (min, med, max) value among the 6 tasks. Is this correct? If so, how come total completion time is 515935 ms (about 8.6 min) as shown in the upper left corner, but the longest task took 16.9 min, which is longer than the time whole process took?
2. According to DAG image, the whole process is executed in the following order.
- WholeStageCodegen(1)
- PhotonResultsStage
- WholeStageCodegen(2)
- InMemoryTableScan(1)
- PhotonShuffleMapsStage
- PhotonResultStage
- WholeStageCodegen(1)
- ResultQueryStage(17)
- AdaptiveSparkPlan(26)
I want to know how long the above each process takes to execute, but if you look at the Duration total in Text Execution Summary, the sum of them clearly doesn't match with completion time of whole process which is 8.6 min. How can I accurately figure out execution time of each process, or there is no way to know that?
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-10-2025 04:38 AM
Hello @Junda,
The duration total for WholeStageCodegen (2) indicates 38.3 minutes, with the breakdown showing (0 ms, 2.7 m, 16.9 m) for (min, med, max) values among the tasks. Your understanding is correct that these values represent the minimum, median, and maximum durations among the tasks. The discrepancy arises because the total completion time of 515935 ms (about 8.6 minutes) shown in the upper left corner represents the overall time taken for the entire job, not just the longest task. The longest task duration (16.9 minutes) can exceed the total job duration due to parallel execution of tasks. In other words, multiple tasks can run concurrently, and the total job duration is not simply the sum of the longest task durations.
The execution order you provided from the DAG image includes multiple stages such as WholeStageCodegen, PhotonResultsStage, InMemoryTableScan, etc. The sum of the duration totals in the Text Execution Summary not matching the overall completion time of 8.6 minutes can be attributed to the parallel execution of tasks and stages. Each stage or process can have tasks running in parallel, and the total job duration reflects the time from the start of the first task to the end of the last task, not the sum of all individual task durations
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-11-2025 12:33 AM
@Junda wrote:The images below are DAG and text execution summary from Spark UI and I'm having hard time interpreting these logs. I have two questions below.
1. In the Text Execution Summary, Duration total for WholeStageCodegen (2) says 38.3 m (0 ms, 2.7 m, 16.9 m (stage 2631.0: task 17953)). In my understanding, (min, med, max) shows the (min, med, max) value among the 6 tasks. Is this correct? If so, how come total completion time is 515935 ms (about 8.6 min) as shown in the upper left corner, but the longest task took 16.9 min, which is longer than the time whole process took?
2. According to DAG image, the whole process is executed in the following order.
- WholeStageCodegen(1)
- PhotonResultsStage
- WholeStageCodegen(2)
- InMemoryTableScan(1)
- PhotonShuffleMapsStage
- PhotonResultStage
- WholeStageCodegen(1)
- ResultQueryStage(17)
- AdaptiveSparkPlan(26)
I want to know how long the above each process takes to execute, but if you look at the Duration total in Text Execution Summary, the sum of them clearly doesn't match with completion time of whole process which is 8.6 min. How can I accurately figure out execution time of each process, or there is no way to know that?
The Spark UI's Duration total for a stage reflects the duration of tasks within that stage. However, due to parallel execution, the total job completion time is not simply the sum of the longest task durations in each stage. To accurately determine the execution time of each stage, you'll need to analyze more detailed logs or utilize profiling tools. The Spark UI provides valuable insights but has limitations in directly revealing stage-level execution times.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-12-2025 02:56 AM
Hi @Alberto_Umana and @jay250terry , thank you for your reply. I know that spark execute task in parallel and the sum of the each task execution time does not correspond with the overall total job duration.
What I don't get from the text execution summary that I attached in my previous post, how can single task's execution time (16.9 minutes in this case) can exceed the total job execution time? The attached file with this post shows visual image of parallel task execution, and I understand the sum of these task execution time can be beyond the total job execution time because it is processed in parallel order. However, even if it's executed in parallel, isn't it impossible that the single task's execution time exceeds the total job duration? It's understandable if the single longest task took 8.6 minutes, but if it's beyond that, it wouldn't make any sense.

