Debugging difference between "task time" and execution time for SQL query
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2024 11:35 AM
I have a pretty large SQL query that has the following stats from the query profiler:
Tasks total time: 1.93s
Executing: 27s
Based on the information in the query profiler this can be due to tasks waiting for available nodes.
How should I approach this to figure out where this is happening?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2024 12:37 PM
Hi nengen
You may have more infos to share, so we can help you?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2024 04:01 AM
I have a pretty complex and large SQL query which does a lot of joins on CTEs. Due to the nature of the data this has to be done using cross joins so I suspect that this might be the reason it is slow. I was hoping to be able to pinpoint where the tasks are waiting for available nodes or where the query is taking so much time (wall clock duration). I tried using the query profiler but this seems to show the execution time of the tasks and not the whole process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2024 06:34 AM
@nengen Try using EXPLAIN EXTENDED: This provides a detailed breakdown of the logical and physical plan of a query in Spark SQL.
Based on the EXPLAIN EXTENDED output, here are a few things to consider:
- Broadcast Exchange: If the join causes data skew, consider switching to a sort-merge join.
- FileScan: If the scan is slow, consider partitioning or caching the data to improve performance.
- Filter Pushdown: Ensure the most restrictive filters are applied early to reduce the amount of data processed.
Please review for more details

