cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT Query History

ankit001mittal
New Contributor III

Hi guys,

I can see that in DLT pipelines we have query history section where we can see the duration of each tables and number of rows read.

Screenshot 2025-04-22 145638.png

Is this information stored somewhere in the systems catalog? can I query this information?

8 REPLIES 8

Walter_C
Databricks Employee
Databricks Employee

This might be available in the query history system table https://docs.databricks.com/aws/en/admin/system-tables/query-history 

ankit001mittal
New Contributor III

Hi @Walter_C ,
From what I understood from this article that it doesn't include records for DLT pipelines yet

The query history table includes records for queries run using SQL warehouses or serverless compute for notebooks and jobs. The table includes account-wide records from all workspaces in the same region from which you access the table.

 

aayrm5
Honored Contributor

@ankit001mittal 
As mentioned in a earlier reponse, you can use eventlog to check how many records were processed in each update.
The query I've written to select the info from event_log:

aayrm5_0-1745926979895.png

aayrm5_1-1745927078326.png

Your query could be complex, as I have run my dlt only once to provide this info. But the info does exist in the DLT.

Let me know if any questions,

Riz

Hi @aayrm5 ,

Thanks for sharing your query, I have also managed to write the query for retrieving all the upserts into an object/tables in DLT pipelines, but I am struggling to figure out how to break down the timestamps from the logs to determine, how much time was spent waiting for resources, and how much time was spent in executing for each tables/views in the pipelines.

Hi @ankit001mittal 

Ideally, you have a timestamp and message columns in the eventlog, you could use this to calculate the time taken.

For example, to calculate the time taken to allocate resources, you could use a function such as datediff to calculate the difference in the timestamps of 2 records, check the snip below:

aayrm5_0-1745988124592.png

Similarly, for time taken to execute each table, you would have a message as `<Table_Name>` is RUNNING and `<Table_Name>` is COMPLETED. Get the difference between the timestamps.

Riz

ankit001mittal
New Contributor III

Also, in the DLT logs I can see, this

{"flow_progress":{"metrics":{"executor_time_ms":9004,"executor_cpu_time_ms":3248}}}

Do you know what does this both executor_time_ms and executor_cpu_time_ms represent.?

Hey @ankit001mittal 

The detail `{"flow_progress":{"metrics":{"executor_time_ms":9004,"executor_cpu_time_ms":3248}}}` belongs to the event_type - cluster resources. It might be representing how much time an executor in the cluster might have taken to finish the task assigned to it. I wouldn't use this time to calculate the overall time taken to execute the table. 

Depending on your cluster size, your table execution would've been divided into numerous tasks in the spark job abd submitted to the executors, it would be hard to collate which executor is taking care of which table.

Riz

aayrm5
Honored Contributor

Hey @ankit001mittal - if any of the above responses provided answer to your questions, kindly, mark it as a solution.

Thanks,

Riz

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now