Databricks Community

ankit001mittal · ‎04-22-2025

Hi guys,

I can see that in DLT pipelines we have query history section where we can see the duration of each tables and number of rows read.

Is this information stored somewhere in the systems catalog? can I query this information?

Walter_C · ‎04-22-2025

This might be available in the query history system table https://docs.databricks.com/aws/en/admin/system-tables/query-history

ankit001mittal · ‎04-29-2025

Hi @Walter_C ,
From what I understood from this article that it doesn't include records for DLT pipelines yet

The query history table includes records for queries run using SQL warehouses or serverless compute for notebooks and jobs. The table includes account-wide records from all workspaces in the same region from which you access the table.

RiyazAliM · ‎04-29-2025

@ankit001mittal
As mentioned in a earlier reponse, you can use eventlog to check how many records were processed in each update.
The query I've written to select the info from event_log:

Your query could be complex, as I have run my dlt only once to provide this info. But the info does exist in the DLT.

Let me know if any questions,

Riz

ankit001mittal · ‎04-29-2025

Hi @RiyazAliM ,

Thanks for sharing your query, I have also managed to write the query for retrieving all the upserts into an object/tables in DLT pipelines, but I am struggling to figure out how to break down the timestamps from the logs to determine, how much time was spent waiting for resources, and how much time was spent in executing for each tables/views in the pipelines.

RiyazAliM · ‎04-29-2025

Hi @ankit001mittal

Ideally, you have a timestamp and message columns in the eventlog, you could use this to calculate the time taken.

For example, to calculate the time taken to allocate resources, you could use a function such as datediff to calculate the difference in the timestamps of 2 records, check the snip below:

Similarly, for time taken to execute each table, you would have a message as `<Table_Name>` is RUNNING and `<Table_Name>` is COMPLETED. Get the difference between the timestamps.

Riz

ankit001mittal · ‎04-29-2025

Also, in the DLT logs I can see, this

{"flow_progress":{"metrics":{"executor_time_ms":9004,"executor_cpu_time_ms":3248}}}

Do you know what does this both executor_time_ms and executor_cpu_time_ms represent.?

RiyazAliM · ‎04-29-2025

Hey @ankit001mittal

The detail `{"flow_progress":{"metrics":{"executor_time_ms":9004,"executor_cpu_time_ms":3248}}}` belongs to the event_type - cluster resources. It might be representing how much time an executor in the cluster might have taken to finish the task assigned to it. I wouldn't use this time to calculate the overall time taken to execute the table.

Depending on your cluster size, your table execution would've been divided into numerous tasks in the spark job abd submitted to the executors, it would be hard to collate which executor is taking care of which table.

Riz