cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to Determine if Materialized View is Performing Full or Incremental Refresh?

guiferviz
New Contributor III

I'm currently testing materialized views and I need some help understanding the refresh behavior. Specifically, I want to know if my materialized view is querying the full table (performing a full refresh) or just doing an incremental refresh.

From some tests I did on a medium-sized table (similar to the sample table created in this guide), I observed the following:

  • When I add new rows, the refresh takes about 10 seconds.
  • A full refresh takes about 1 minute.
  • When I delete some rows, the refresh also takes about 1 minute.

Based on these observations, I suppose that a delete operation triggers a full refresh. However, apart from the execution time, I have nothing else to validate my theory.

Does anyone have any tips or methods for identifying whether a materialized view is doing a full refresh or an incremental refresh in Databricks? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @guiferviz,

To determine the type of refresh used, you can query the Delta Live Tables event log. Look for the event_type called planning_information to see the technique used for the refresh. The techniques include:

 

  • FULL_RECOMPUTE: Indicates a full refresh.
  • ROW_BASED or PARTITION_OVERWRITE: Indicates an incremental refresh.

You can use the following SQL query to check the refresh type

 

SELECT timestamp, message

FROM event_log(TABLE(<fully-qualified-table-name>))

WHERE event_type = 'planning_information'

ORDER BY timestamp DESC;

 

Replace <fully-qualified-table-name> with the fully qualified name of your materialized view, including the catalog and schema.

View solution in original post

8 REPLIES 8

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @guiferviz,

To determine the type of refresh used, you can query the Delta Live Tables event log. Look for the event_type called planning_information to see the technique used for the refresh. The techniques include:

 

  • FULL_RECOMPUTE: Indicates a full refresh.
  • ROW_BASED or PARTITION_OVERWRITE: Indicates an incremental refresh.

You can use the following SQL query to check the refresh type

 

SELECT timestamp, message

FROM event_log(TABLE(<fully-qualified-table-name>))

WHERE event_type = 'planning_information'

ORDER BY timestamp DESC;

 

Replace <fully-qualified-table-name> with the fully qualified name of your materialized view, including the catalog and schema.

Hi @Alberto_Umana, thanks a lot for your answer!! It works really well!

I just wanted to point out that Iโ€™ve noticed different values in the `message` column. Hereโ€™s what Iโ€™ve come across so far:

  • Flow 'amount_view' has been planned in DLT to be executed as GROUP_AGGREGATE.
  • Flow 'amount_view' has been planned in DLT to be executed as COMPLETE_RECOMPUTE.
  • Flow 'amount_view' has been planned in DLT to be executed as NO_OP.
  • Flow 'amount_view' has been planned in DLT to be executed as COMPLETE_RECOMPUTE. Another option is GROUP_AGGREGATE, but COMPLETE_RECOMPUTE was chosen for this run due to its better performance.

The values are quite self-explanatory, and the `details` column explain why a complete recompute is sometimes preferred over a group aggregate. Thanks again for your help!

There are certain limitations with the event log table. Only the owner of the materialized view (MV) has access to view these logs. Additionally, in client environments where deployment and creation are managed using a service principal, the necessary details may not be accessible.

jack533
New Contributor III

In Databricks, determining whether a materialized view is performing a full or incremental refresh typically depends on how the underlying table operations and refresh mechanisms are configured. From your observations, the execution times suggest that new rows may trigger incremental refreshes, while deletions might lead to full refreshes. Incremental refreshes usually depend on metadata tracking changes, while full refreshes rebuild the entire materialized view.

To confirm this behavior, you can monitor the query execution plan or logs during the refresh operation. Tools like the Databricks Query History or Apache Spark UI can help you inspect the operations. Check whether the materialized view refresh scans only updated rows or the entire table. Additionally, review the materialized view's settings or configurations, as some platforms offer explicit options to control the refresh type. If incremental refresh is not supported for delete operations, the system may default to a full refresh.

For further insights, you could enable logging for the underlying queries or run tests with verbose output to see exactly what data is being processed during the refresh. Comparing execution plans between different types of changes (insert, delete, update) can provide clarity.

guiferviz
New Contributor III

Hi @jack533, thanks for your answer. The issue I'm facing is that I'm using materialized views in a serverless compute environment, which prevents me from seeing the query plan. Hereโ€™s the only thing I can get from the query history:

guiferviz_0-1735673169911.png

It always shows 0 files read, which makes sense since the refresh command does not read any files; it just triggers a refresh. However, I am not able to view the refresh query in the query history. The solution proposed by Alberto is the only method I've found to get more insight into the refreshes.

Alberto_Umana
Databricks Employee
Databricks Employee

I am glad it helped you @guiferviz! if you have any other questions let me know!

DelaneyClark
New Contributor II

Thank you so much for the solution.

TejeshS
New Contributor

To validate the status of your materialized view (MV) refresh, run a DESCRIBE EXTENDED command and check the row corresponding to the "last refresh status type."

  • RECOMPUTE indicates a full load execution was completed.
  • NO_OPERATION means no operation was performed during the refresh.
  • INCREMENTAL or ROW_BASED signifies that incremental processing was carried out

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group