Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Malthe,

The discrepancy you’re seeing between Aggregated Task Time (1.94h) and the actual Execution Time (~2.45m) is actually a sign of healthy parallelism, not an error. Here is how to interpret those two numbers:

  • Aggregated task time which is 1.94h represents the total sum of work done by every individual CPU core across your entire cluster. If 10 cores each work for 2 minutes, you have technically spent 20 minutes of task time.
  • Execution time which is 2.45m is the normal time...the actual duration you waited for the query to finish.

The gap likely looks large because there is significant work happening. If you look at the IO metrics, the rows read is around 9 billion. Approxmately 50K files. To process that many rows and scan around 20 terabytes of data in under 3 minutes, your serverless warehouse distributed the load across a large number of concurrent tasks. The 1.94 hours is simply the total effort of all those parallel workers added together.

On a positive note, this is a great sign. 🙂 It shows that almost all the heavy lifting was handled by the high-performance C++ engine rather than the standard spark JVM.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***