3 weeks ago
I'm currently testing materialized views and I need some help understanding the refresh behavior. Specifically, I want to know if my materialized view is querying the full table (performing a full refresh) or just doing an incremental refresh.
From some tests I did on a medium-sized table (similar to the sample table created in this guide), I observed the following:
Based on these observations, I suppose that a delete operation triggers a full refresh. However, apart from the execution time, I have nothing else to validate my theory.
Does anyone have any tips or methods for identifying whether a materialized view is doing a full refresh or an incremental refresh in Databricks? Thanks!
3 weeks ago
Hi @guiferviz,
To determine the type of refresh used, you can query the Delta Live Tables event log. Look for the event_type called planning_information to see the technique used for the refresh. The techniques include:
You can use the following SQL query to check the refresh type
SELECT timestamp, message
FROM event_log(TABLE(<fully-qualified-table-name>))
WHERE event_type = 'planning_information'
ORDER BY timestamp DESC;
Replace <fully-qualified-table-name> with the fully qualified name of your materialized view, including the catalog and schema.
3 weeks ago
Hi @guiferviz,
To determine the type of refresh used, you can query the Delta Live Tables event log. Look for the event_type called planning_information to see the technique used for the refresh. The techniques include:
You can use the following SQL query to check the refresh type
SELECT timestamp, message
FROM event_log(TABLE(<fully-qualified-table-name>))
WHERE event_type = 'planning_information'
ORDER BY timestamp DESC;
Replace <fully-qualified-table-name> with the fully qualified name of your materialized view, including the catalog and schema.
3 weeks ago
Hi @Alberto_Umana, thanks a lot for your answer!! It works really well!
I just wanted to point out that I’ve noticed different values in the `message` column. Here’s what I’ve come across so far:
The values are quite self-explanatory, and the `details` column explain why a complete recompute is sometimes preferred over a group aggregate. Thanks again for your help!
2 weeks ago
There are certain limitations with the event log table. Only the owner of the materialized view (MV) has access to view these logs. Additionally, in client environments where deployment and creation are managed using a service principal, the necessary details may not be accessible.
3 weeks ago
In Databricks, determining whether a materialized view is performing a full or incremental refresh typically depends on how the underlying table operations and refresh mechanisms are configured. From your observations, the execution times suggest that new rows may trigger incremental refreshes, while deletions might lead to full refreshes. Incremental refreshes usually depend on metadata tracking changes, while full refreshes rebuild the entire materialized view.
To confirm this behavior, you can monitor the query execution plan or logs during the refresh operation. Tools like the Databricks Query History or Apache Spark UI can help you inspect the operations. Check whether the materialized view refresh scans only updated rows or the entire table. Additionally, review the materialized view's settings or configurations, as some platforms offer explicit options to control the refresh type. If incremental refresh is not supported for delete operations, the system may default to a full refresh.
For further insights, you could enable logging for the underlying queries or run tests with verbose output to see exactly what data is being processed during the refresh. Comparing execution plans between different types of changes (insert, delete, update) can provide clarity.
2 weeks ago
Hi @jack533, thanks for your answer. The issue I'm facing is that I'm using materialized views in a serverless compute environment, which prevents me from seeing the query plan. Here’s the only thing I can get from the query history:
It always shows 0 files read, which makes sense since the refresh command does not read any files; it just triggers a refresh. However, I am not able to view the refresh query in the query history. The solution proposed by Alberto is the only method I've found to get more insight into the refreshes.
2 weeks ago
I am glad it helped you @guiferviz! if you have any other questions let me know!
2 weeks ago
Thank you so much for the solution.
2 weeks ago
To validate the status of your materialized view (MV) refresh, run a DESCRIBE EXTENDED command and check the row corresponding to the "last refresh status type."
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group