cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How can we disable incremental refresh for a Materialized View when using Databricks DLT

surajitDE
Contributor

How can we disable incremental refresh for a Materialized View when using Databricks Delta Live Tables (DLT)?

I am using serverless compute,here is the code

@Dlt.table(
    name="orders_destination_table_testing_16"
)
def orders_final():   
    return (
        dlt.read("orders_destination_table_testing_15")
        )
Surajit Metya
2 ACCEPTED SOLUTIONS

Accepted Solutions

aleksandra_ch
Databricks Employee
Databricks Employee

Hi @surajitDE ,

You can set the refresh policy to FULL:  

from pyspark import pipelines as dp

@dp.materialized_view(
    name="orders_destination_table_testing_16",
    refresh_policy = 'full'
)
def orders_final(): 
...
 
Best regards, 

View solution in original post

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @surajitDE,

First, a quick naming note: Delta Live Tables (DLT) has been renamed to Lakeflow Spark Declarative Pipelines (SDP). The functionality is the same, just a new name.

MATERIALIZED VIEW REFRESH BEHAVIOR

For materialized views in SDP, the pipeline optimizer automatically decides whether to perform an incremental or full refresh based on cost efficiency. The key point is that materialized views return the same results regardless of which refresh mode is used. The optimizer picks the approach that is most efficient for the given data and query, so there is typically no need to disable incremental refresh since the output is identical either way.

This is different from streaming tables, where a full refresh clears checkpoints and reprocesses everything from scratch, which can produce different results depending on source data retention.

HOW TO FORCE A FULL REFRESH WHEN NEEDED

If you still want to explicitly trigger a full refresh for a specific materialized view, you have several options:

1. Via the UI:

 - On the pipeline monitoring page, click the dropdown next to "Refresh failed tables"
 - Select "Select tables for refresh"
 - Click the tables you want to refresh
 - Click the dropdown next to "Refresh selection" and choose "Full Refresh selection"

2. Via the REST API, use the full_refresh_selection parameter:

curl -X POST \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"full_refresh_selection": ["orders_destination_table_testing_16"]}' \
https://<instance>/api/2.0/pipelines/<pipeline-id>/updates

3. Via the Databricks CLI:

databricks pipelines start-update <pipeline-id> --full-refresh-selection orders_destination_table_testing_16

PREVENTING FULL REFRESH (OPPOSITE DIRECTION)

For reference, if you ever need to prevent full refresh on a materialized view (for example, to protect against accidental data loss if source data has been deleted), you can set the table property pipelines.reset.allowed to false:

@dlt.table(
  name="orders_destination_table_testing_16",
  table_properties={"pipelines.reset.allowed": "false"}
)
def orders_final():
  return (
      dlt.read("orders_destination_table_testing_15")
  )

SUMMARY

There is no built-in setting to permanently disable incremental refresh and force every pipeline update to do a full refresh for a materialized view. The optimizer handles this automatically, and since materialized views produce the same results either way, it is generally best to let the optimizer choose. When you do need a full refresh on demand, use the UI, REST API, or CLI options above.

Documentation references:
https://docs.databricks.com/en/delta-live-tables/updates.html
https://docs.databricks.com/en/delta-live-tables/properties.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.

View solution in original post

2 REPLIES 2

aleksandra_ch
Databricks Employee
Databricks Employee

Hi @surajitDE ,

You can set the refresh policy to FULL:  

from pyspark import pipelines as dp

@dp.materialized_view(
    name="orders_destination_table_testing_16",
    refresh_policy = 'full'
)
def orders_final(): 
...
 
Best regards, 

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @surajitDE,

First, a quick naming note: Delta Live Tables (DLT) has been renamed to Lakeflow Spark Declarative Pipelines (SDP). The functionality is the same, just a new name.

MATERIALIZED VIEW REFRESH BEHAVIOR

For materialized views in SDP, the pipeline optimizer automatically decides whether to perform an incremental or full refresh based on cost efficiency. The key point is that materialized views return the same results regardless of which refresh mode is used. The optimizer picks the approach that is most efficient for the given data and query, so there is typically no need to disable incremental refresh since the output is identical either way.

This is different from streaming tables, where a full refresh clears checkpoints and reprocesses everything from scratch, which can produce different results depending on source data retention.

HOW TO FORCE A FULL REFRESH WHEN NEEDED

If you still want to explicitly trigger a full refresh for a specific materialized view, you have several options:

1. Via the UI:

 - On the pipeline monitoring page, click the dropdown next to "Refresh failed tables"
 - Select "Select tables for refresh"
 - Click the tables you want to refresh
 - Click the dropdown next to "Refresh selection" and choose "Full Refresh selection"

2. Via the REST API, use the full_refresh_selection parameter:

curl -X POST \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"full_refresh_selection": ["orders_destination_table_testing_16"]}' \
https://<instance>/api/2.0/pipelines/<pipeline-id>/updates

3. Via the Databricks CLI:

databricks pipelines start-update <pipeline-id> --full-refresh-selection orders_destination_table_testing_16

PREVENTING FULL REFRESH (OPPOSITE DIRECTION)

For reference, if you ever need to prevent full refresh on a materialized view (for example, to protect against accidental data loss if source data has been deleted), you can set the table property pipelines.reset.allowed to false:

@dlt.table(
  name="orders_destination_table_testing_16",
  table_properties={"pipelines.reset.allowed": "false"}
)
def orders_final():
  return (
      dlt.read("orders_destination_table_testing_15")
  )

SUMMARY

There is no built-in setting to permanently disable incremental refresh and force every pipeline update to do a full refresh for a materialized view. The optimizer handles this automatically, and since materialized views produce the same results either way, it is generally best to let the optimizer choose. When you do need a full refresh on demand, use the UI, REST API, or CLI options above.

Documentation references:
https://docs.databricks.com/en/delta-live-tables/updates.html
https://docs.databricks.com/en/delta-live-tables/properties.html

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.