cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Aggregated task time not accounted for in executions

Malthe
Valued Contributor II

The following serverless query has an aggregated task time of 1.94h, but each of the two executions runs in an aggregated time of just a few minutes.

How is one supposed to make sense of that?

Screenshot 2026-03-06 at 09.52.01.png

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Malthe,

Thanks for sharing the extra snapshots and observations. 

You’re absolutely right that the core‑hour math and the 8 DBUs line up. The issue you’re hitting is that the query profile UI isn’t a fully additive cost‑accounting view, especially for complex DML and serverless.

A few clarifications based on how the system works today:

  • The “Aggregated task time” metric is the sum of task time across all cores and all tasks/attempts for the query, and it’s expected to diverge from wall‑clock when there’s parallelism or waiting for resources. That’s the number that matches your core‑hour/DBU math.

  • The execution dropdown and per‑operator boxes are aggregated differently:
    • Some non‑Photon and high‑level operations (like INSERT / MERGE wrappers) are grouped and share metrics with a parent operator, so the child or parent can show “0 ms / 0 bytes” even though work happened underneath.
    • By default, metrics for certain operations are hidden, and only show up when you enable verbose mode in the profile.
    • Because of those aggregation choices, the sum of execution times and box times doesn’t necessarily equal the global aggregated task time or wall‑clock, which is why it feels non‑reconcilable

non-photon.jpg

 

  • For a more Spark‑UI‑style breakdown, you can:
    • Open the run in the Spark UI from the query profile (kebab menu → Open in Spark UI).
    • Use the Query History API together with the system billing tables to correlate query/task metrics with DBUs at whatever granularity you need.

So from a billing perspective, the DBUs you’re seeing are consistent with the aggregated task time. From a transparency perspective, your criticism is fair. The current query profile is optimised for finding bottlenecks, not yet for reconstructing every billed minute. Feedback like this is exactly what we’re using to improve the serverless profiling and cost‑attribution experience.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

12 REPLIES 12

saurabh18cs
Honored Contributor III

Hi  This is interesting however i noted this definiton for aggregated task time :

Combined time it took to execute the query across all cores of all nodes. It can be significantly longer than the wall-clock duration if multiple tasks executed in parallel. It can be shorter than the wall-clock duration if tasks waited for available nodes.

so basically, 

The aggregated task time is not related to the execution time but the sum of all individual task time.ex: if you have 5 tasks each ran for 1min in parallel then The execution time might be 1 min but the aggregated task time would be 5 min

 

Malthe
Valued Contributor II

This query resulted in two executions. I have highlighted one in the screenshot, where it says that the time there is also aggregated (across cores). So how can two executions with each an aggregated time in minutes become almost two hours?

From the system billing table the query cost about 8 DBU, which are basically unaccounted for.

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Malthe,

The discrepancy you’re seeing between Aggregated Task Time (1.94h) and the actual Execution Time (~2.45m) is actually a sign of healthy parallelism, not an error. Here is how to interpret those two numbers:

  • Aggregated task time which is 1.94h represents the total sum of work done by every individual CPU core across your entire cluster. If 10 cores each work for 2 minutes, you have technically spent 20 minutes of task time.
  • Execution time which is 2.45m is the normal time...the actual duration you waited for the query to finish.

The gap likely looks large because there is significant work happening. If you look at the IO metrics, the rows read is around 9 billion. Approxmately 50K files. To process that many rows and scan around 20 terabytes of data in under 3 minutes, your serverless warehouse distributed the load across a large number of concurrent tasks. The 1.94 hours is simply the total effort of all those parallel workers added together.

On a positive note, this is a great sign. 🙂 It shows that almost all the heavy lifting was handled by the high-performance C++ engine rather than the standard spark JVM.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Malthe
Valued Contributor II

@Ashwin_DSA the job does not scan terabytes of data, all of that is pruned away (see the screenshot); it actually reads just 28 GB of data.

But again, the time is not accounted for, despite the user interface saying that the executions are shown in aggregated time as well. It just doesn't match. And there's also no way to debug this–you simply can't see where the time is spent.

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Malthe,

Sorry. I should have been a bit more precise with the terminology. You are right.. it is not reading those terabytes of data from the storage layer into the CPU. 

Your snapshot shows bytes pruned was around 20 terabytes. Although it didn't actually read the data content of those 20 terabytes, the cluster still had to spend time (task  time) to process the metadata for approxmately 260K pruned files. Reading that metadata does take time. And then there is the time to read the other 50K files. Even if it only took a few kilobytes from each to reach that 28 GB total, the overhead of opening 50K separate cloud storage connections is significant. I believe the time that you are referring to be unaccounted for is spent here.

Appreciate what you are saying in terms of the UI not being explicit about it. Having done some research, can you go a level deeper than the summary table? This may help find out exactly where those hours went.

  1. Click the Merge Into... box in the center of your screen.
  2. Scroll down to the Task Metrics table.
  3. Look for something like Scheduler Delay and Metadata Operations.

If Scheduler Delay is high, the time was spent waiting for the cluster to coordinate the 50,000 file reads.

If Task Time is high but Data Read is low, the time was spent opening file headers to check stats.

Let me know how that goes.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Malthe
Valued Contributor II

@Ashwin_DSA there's no such metrics available here:

Screenshot 2026-03-06 at 18.05.30.png

Also interesting is that the other execution just has 0's for every single value. No time spent, 0 for everything.

I might be able to get more detailed information if writing first to a staging table, then doing the merge. I believe merge operations hide most of the query execution details currently.

Ashwin_DSA
Databricks Employee
Databricks Employee

Thanks, @Malthe for checking that. 

Since I can't replicate your exact environment, I can only assume that the missing time and the associated DBU cost are probably due to parallelism. 

The fact that Photon accounts for 95% of the task time reinforces that it’s performing vectorised compute across a lot of data... and that the cluster is simply doing a large amount of work in parallel.

With regards to your earlier comment about the DBU cost, which is unaccounted for.... DBUs are a measure of compute used over time (roughly: cores × time × SKU factor). So if your query runs for a few minutes on a serverless cluster with multiple cores, it’s completely plausible to see something like 8 DBUs, even though:

  • the query only ran for a few minutes end‑to‑end, and
  • the UI shows 1.9 hours of aggregated task time.

In other words, those 8 DBUs are accounted for... they correspond to the parallel work the cluster did. It's just expressed in a different unit. If you want to sanity-check further, compare the bytes/rows processed in the query profile with the DBUs in the billing table for the same statement ID. If it still looks off, it’s worth opening a support ticket so the internal metrics for that specific run can be reviewed.

Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Malthe
Valued Contributor II

I understand the difference between wall-clock and aggregated time across cores, it's documented under query profiling.

According to the pricing docs, presently a DBU is roughly equivalent to a DS3 v2 instance with 4 cores. What I mean by unaccounted for is that we have a query that runs for a total aggregated time of 1.94h with a wall-clock runtime of 4-5 minutes. The math checks out because that's roughly 24 core hours and matches the expectations of 8 DBUs with the 4-core instance mentioned above.

The problem is that the execution breakdown accounts for just a fraction of that and it purports to be the complete picture. It's basically totally broken for this type of query.

I changed the processing now to first materialize to a staging table, then doing the merge. This reduces the runtime to 40 minutes, but the query details are still lacking any fidelity with the actual aggregated runtime:

Screenshot 2026-03-07 at 08.10.31.png

Observations:

  1. The total time across executions (in the dropdown) doesn't match the total aggregated query time and it also does not match the wall-clock time.
  2. Within each execution, the purported aggregated time spend in the various boxes does not match the reported total execution time. In the screenshot, we see a single task with an aggregated time of 38.34s, but the whole execution is reported to take 35.23s.

Lots of tasks types contribute no metrics at all:

Screenshot 2026-03-07 at 08.14.55.png

In summary, there's basically no transparency here. The total cost of a query simply can't be traced back to its constituents in any meaningful way. That's a significant setback from classic compute where the Spark UI along with basic compute metrics did in fact account for the time spent and the resulting cost.

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Malthe,

Thanks for sharing the extra snapshots and observations. 

You’re absolutely right that the core‑hour math and the 8 DBUs line up. The issue you’re hitting is that the query profile UI isn’t a fully additive cost‑accounting view, especially for complex DML and serverless.

A few clarifications based on how the system works today:

  • The “Aggregated task time” metric is the sum of task time across all cores and all tasks/attempts for the query, and it’s expected to diverge from wall‑clock when there’s parallelism or waiting for resources. That’s the number that matches your core‑hour/DBU math.

  • The execution dropdown and per‑operator boxes are aggregated differently:
    • Some non‑Photon and high‑level operations (like INSERT / MERGE wrappers) are grouped and share metrics with a parent operator, so the child or parent can show “0 ms / 0 bytes” even though work happened underneath.
    • By default, metrics for certain operations are hidden, and only show up when you enable verbose mode in the profile.
    • Because of those aggregation choices, the sum of execution times and box times doesn’t necessarily equal the global aggregated task time or wall‑clock, which is why it feels non‑reconcilable

non-photon.jpg

 

  • For a more Spark‑UI‑style breakdown, you can:
    • Open the run in the Spark UI from the query profile (kebab menu → Open in Spark UI).
    • Use the Query History API together with the system billing tables to correlate query/task metrics with DBUs at whatever granularity you need.

So from a billing perspective, the DBUs you’re seeing are consistent with the aggregated task time. From a transparency perspective, your criticism is fair. The current query profile is optimised for finding bottlenecks, not yet for reconstructing every billed minute. Feedback like this is exactly what we’re using to improve the serverless profiling and cost‑attribution experience.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @Malthe,

There are two separate things going on here, and they are likely independent of each other. Let me break them down.

TASK TIME vs EXECUTION TIME DISCREPANCY

The gap between aggregated task time (1.39m) and total execution time (4.14m) is expected behavior and is not caused by the segfault. Aggregated task time only measures the wall-clock time that Spark tasks spend actively computing on executors. It does not include:

- Scheduler delay (time between task submission and actual execution)
- Serialization and deserialization of task results
- Shuffle read/write I/O wait time
- Driver-side orchestration overhead
- Spark Connect gRPC round-trip communication (this is especially relevant on serverless)

On serverless compute, all Spark interactions go through the Spark Connect API over gRPC. This adds a communication layer between your driver code and the Spark execution engine. For a MERGE operation inside foreachBatch, there are multiple planning, optimization, and coordination steps that happen on the driver side and over gRPC that do not count as "task time" but do count toward total execution duration.

Additionally, the MERGE operation itself has phases (reading the target table, computing the join, writing deletes/updates/inserts) where the time between phases is driver coordination, not active task execution.

You can verify this by looking at the Spark UI for that run. Under the SQL/DataFrame tab, you will see the query plan and can see time spent in different stages. The "Duration" column in the Stages tab will also show scheduling delay and other overhead separately from task compute time.

SEGMENTATION FAULT AND gRPC CHANNEL CLOSED ERRORS

The "Fatal Python error: Segmentation fault" combined with the StreamingQueryListenerBus gRPC "Channel closed" warnings points to an issue in the serverless runtime itself, not in your PySpark code. Since you confirmed you are using vanilla PySpark with no external libraries on environment version 5, this is a runtime-level crash.

The gRPC "Channel closed" messages are a symptom of the crash, not the cause. When the underlying process crashes (SIGSEGV), the Spark Connect gRPC channel is forcibly closed, which triggers those StreamingQueryListenerBus warnings.

A few things to check and try:

1. Trigger type: if you are using Trigger.Once, note that it has been deprecated since Databricks Runtime 11.3 LTS. Switch to Trigger.AvailableNow, which is the supported trigger for incremental batch processing on serverless.
https://docs.databricks.com/en/structured-streaming/triggers.html

2. Caching in foreachBatch: the Databricks documentation recommends caching the batch DataFrame before a MERGE to avoid reading the input multiple times. However, on serverless compute, DataFrame caching (df.cache(), spark.catalog.cacheTable()) is not supported and will throw exceptions. If your foreachBatch code includes any caching calls, remove them.
https://docs.databricks.com/en/structured-streaming/foreach.html

3. MERGE idempotency: make sure your MERGE statement inside foreachBatch is idempotent, as restarts can re-apply the same batch.
https://docs.databricks.com/en/structured-streaming/delta-lake.html

4. Batch size: if the MERGE is processing a very large batch, try limiting the input with maxBytesPerTrigger or maxFilesPerTrigger to reduce per-batch memory pressure. Heavy memory use during native Delta I/O operations can contribute to OS-level process termination that surfaces as SIGSEGV.

5. File a support ticket: since this is a native crash in the managed serverless runtime with vanilla PySpark, the root cause is most likely in the runtime itself. Open a support ticket and include the workspace URL, the job run URL, and the exact environment version. The engineering team can examine the crash dump and determine whether this is a known issue with a fix in a newer environment version.

The task time discrepancy is normal and unrelated to the crash. The segfault is the issue that needs attention, and a support ticket is the best path to resolution for a native crash on serverless.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.

Malthe
Valued Contributor II

@Ashwin_DSA by the way, "Open in Spark UI" is available only for SQL Warehouse; it's not available for serverless notebooks or jobs.

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Malthe,

Good callout on Spark UI access. You are right. The "Open in Spark UI" link is only documented and supported for Databricks SQL queries on SQL warehouses, not for serverless notebooks or jobs. So for pure serverless jobs/notebooks, you don’t currently have that fallback to the classic Spark UI, which makes the gaps in the query profile UI even more visible.

I agree this still falls short of the "every billed minute is traceable" experience you are used to from classic Spark UI, especially on serverless. Your feedback about making the serverless profiling view reconcile cleanly with DBUs (and exposing more of the hidden/grouped work) is completely valid and is being passed on to the product team.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***