โ04-22-2025 06:14 AM
Hi guys,
I can see that in DLT pipelines we have query history section where we can see the duration of each tables and number of rows read.
Is this information stored somewhere in the systems catalog? can I query this information?
โ04-22-2025 08:41 AM
This might be available in the query history system table https://docs.databricks.com/aws/en/admin/system-tables/query-history
โ04-29-2025 02:15 AM
Hi @Walter_C ,
From what I understood from this article that it doesn't include records for DLT pipelines yet
The query history table includes records for queries run using SQL warehouses or serverless compute for notebooks and jobs. The table includes account-wide records from all workspaces in the same region from which you access the table.
โ04-29-2025 04:45 AM
@ankit001mittal
As mentioned in a earlier reponse, you can use eventlog to check how many records were processed in each update.
The query I've written to select the info from event_log:
Your query could be complex, as I have run my dlt only once to provide this info. But the info does exist in the DLT.
Let me know if any questions,
โ04-29-2025 05:31 AM
Hi @aayrm5 ,
Thanks for sharing your query, I have also managed to write the query for retrieving all the upserts into an object/tables in DLT pipelines, but I am struggling to figure out how to break down the timestamps from the logs to determine, how much time was spent waiting for resources, and how much time was spent in executing for each tables/views in the pipelines.
โ04-29-2025 09:43 PM
Ideally, you have a timestamp and message columns in the eventlog, you could use this to calculate the time taken.
For example, to calculate the time taken to allocate resources, you could use a function such as datediff to calculate the difference in the timestamps of 2 records, check the snip below:
Similarly, for time taken to execute each table, you would have a message as `<Table_Name>` is RUNNING and `<Table_Name>` is COMPLETED. Get the difference between the timestamps.
โ04-29-2025 05:43 AM
Also, in the DLT logs I can see, this
{"flow_progress":{"metrics":{"executor_time_ms":9004,"executor_cpu_time_ms":3248}}}
Do you know what does this both executor_time_ms and executor_cpu_time_ms represent.?
โ04-29-2025 09:48 PM
Hey @ankit001mittal
The detail `{"flow_progress":{"metrics":{"executor_time_ms":9004,"executor_cpu_time_ms":3248}}}` belongs to the event_type - cluster resources. It might be representing how much time an executor in the cluster might have taken to finish the task assigned to it. I wouldn't use this time to calculate the overall time taken to execute the table.
Depending on your cluster size, your table execution would've been divided into numerous tasks in the spark job abd submitted to the executors, it would be hard to collate which executor is taking care of which table.
โ04-30-2025 04:51 AM
Hey @ankit001mittal - if any of the above responses provided answer to your questions, kindly, mark it as a solution.
Thanks,
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now