cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
Pravas007
Databricks Employee
Databricks Employee

You created a materialized view. You assumed it refreshed incrementally. Then, at 6 AM, a refresh on a billion-row source ran a full recompute, and your monthly Databricks bill grew a leg.

This is the surprise every team hits at least once. By default, Databricks picks the most cost-effective refresh strategy for each MV: incremental when possible, full when necessary. That cost-based default is good. It is also opaque. The day someone changes a query, adds a UDF, or refactors a CTE, the MV silently slips from incremental into full, and nothing in the pipeline tells you. The bill tells you.

Today's announcement closes that gap. The new REFRESH POLICY clause for materialized views (GA) lets you tell Databricks exactly how aggressively to enforce incremental refresh and what to do when incremental refresh is no longer possible. There are four policy values, and each one maps onto a different operating posture: leave it to the cost model, prefer incremental, require it, or always go full. There's also a new EXPLAIN CREATE MATERIALIZED VIEW command that previews incremental eligibility before you commit, so you stop finding out at runtime that something in your query was blocking incremental all along.

In this post, we'll walk through where the surprise full refreshes come from, how REFRESH POLICY gives you the levers to control them, how EXPLAIN lets you check eligibility before you create the MV, and we'll break a working MV on purpose to show what the new error message looks like in practice.

The Challenge: when "incremental" silently becomes "full"

If you build dashboards or downstream pipelines on materialized views, you have probably hit one of these:

  • Window function landed without `PARTITION BY`. Window aggregates without partitioning aren't incrementalizable today. The query still runs; it just recomputes the whole MV every refresh.
  • The third join showed up. The incremental engine supports up to two joins; three or more fall back to full. Same with deep aggregate nesting, more than 4 levels default to full.
  • The source table properties changed, or row tracking was disabled on a Delta source. Either way, change tracking is unavailable, and every refresh is full.
  • Small refactor put an aggregate above the top-level `GROUP BY`. A "HAVING" clause with a complex expression, a wrapped subquery, or an unwrapping of a JOIN can all knock the query out of incremental.

What makes these so painful is that nothing in the pipeline breaks. The query still compiles. The refresh still runs. The numbers in the dashboard are still correct. The only signal that anything changed is the cost line on the next billing cycle, and by then, you've been paying for full recomputes for a week.

REFRESH POLICY at a glance: four policies, one clause

REFRESH POLICY is a single clause you add to a CREATE MATERIALIZED VIEW statement. It takes one of four values, and that value tells Databricks exactly what behavior you want when incremental refresh is and is not possible.

 

Policy

When incremental is possible

When incremental is not possible

On creation with a non-incrementalizable query

When to use this

AUTO (default)

Picks incremental or full based on a cost model

Performs a full refresh

Performs a full refresh

Most workloads. Let the cost model decide the recommended default.

INCREMENTAL

Performs incremental refresh

Falls back to full refresh

Statement fails

Override the cost model to incrementalize the MV, but you don't want it to fail and would rather prioritize data delivery. 

INCREMENTAL STRICT

Performs incremental refresh

Refresh fails with an error

Statement fails

When the MV should never fully refresh, and you'd rather get a notification than a surprise bill. Regulatory MVs, hot dashboards on large sources, fixed-cost contracts. Also, re-initialization protection against accidental query edits.

FULL

Performs a full refresh anyway

Performs a full refresh

Performs a full refresh

When you've measured, and it's full, it's cheaper or simpler for this MV. Small, dim tables; sources where most rows change every refresh.

The other three rows are where the new control lives.

INCREMENTAL is the "soft" option. You're saying you want incremental whenever possible, and you'll accept a full refresh as a fallback if something goes sideways at runtime, but you don't want to accidentally create an MV that can never be incremental in the first place. The create statement fails fast on a query that the planner can't incrementalize, so the friction shows up at development time rather than production time. At refresh time, both transient and permanent issues fall back to full.

INCREMENTAL STRICT is the "hard" option. You're saying you want incremental, period. If the engine can't deliver, fail the refresh and page someone. This is the policy for MVs where an unexpected full recompute is genuinely unacceptable: a regulatory MV with a 5-minute SLA, a 2 TB table where a full refresh costs more than the team's monthly Databricks bill, a downstream MV that cascades full refreshes through a dozen dependents. INCREMENTAL STRICT also doubles as re-initialization protection: if someone edits the query definition next quarter and that edit would force a full refresh, the next refresh fails with a clear reason ("Query Definition Change", "Schema Evolution", etc.). You can revert the edit and keep your incremental state, or if you actually do want the recompute, explicitly run REFRESH MATERIALIZED VIEW <name> FULL.

FULL is the "I measured this" option. You ran the comparison, full is cheaper for this particular MV, and you don't want the system second-guessing you. Small dimension tables fall into this bucket. So do MVs over fast-churning source data where most rows have changed by the time you refresh. It's not common, but it does exist.

Syntax at a glance

The clause goes in the CREATE MATERIALIZED VIEW statement, between the name and the query body:

CREATE MATERIALIZED VIEW IF NOT EXISTS my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL STRICT
AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;

That is the entire surface area. One clause, four values.

If you have an existing MV, you can also alter the policy on it without rebuilding:

ALTER MATERIALIZED VIEW my_catalog.sales.daily_revenue
SET REFRESH POLICY INCREMENTAL STRICT;

Check before you commit: EXPLAIN CREATE MATERIALIZED VIEW

Before you set INCREMENTAL or INCREMENTAL STRICT and discover at create time that the query isn't incrementalizable, you can check eligibility beforehand. The EXPLAIN CREATE MATERIALIZED VIEW command returns a structured analysis of whether your query qualifies for incremental refresh and, if not, what the blocker is.

EXPLAIN CREATE MATERIALIZED VIEW
SELECT k, sum(v) FROM source.src_schema.table GROUP BY k;

The output has three sections:

== Incremental Update Eligibility ==
The Materialized View can be incrementally refreshed.
== Detailed Incrementalization Info ==
No issues detected.
== Physical Plan ==
...

When the query is not eligible, the second section names the exact blocker:

== Incremental Update Eligibility ==
Query structure is not incrementalizable. See Detailed Incrementalization Info.
== Detailed Incrementalization Info ==
ROW_TRACKING_NOT_ENABLED
Table name: table1
Prevents incrementalization: Yes

The blocker is usually one DDL statement away from being fixed:

ALTER TABLE source.src_schema.table1
SET TBLPROPERTIES (delta.enableRowTracking = true);

Re-run EXPLAIN. Once structural eligibility passes, you can set REFRESH POLICY INCREMENTAL STRICT on the real CREATE and trust the policy to enforce what you just verified.

One thing worth noting: EXPLAIN confirms structural eligibility, not the runtime decision. Under the default AUTO policy, the cost model can still choose a full recompute at runtime if it judges the changeset large enough that full is actually cheaper for this run. INCREMENTAL and INCREMENTAL STRICT override the cost model. EXPLAIN is the tool you use to confirm that the structural prerequisites are in place before you commit to either of those.

Hands-on: build an MV that's incremental, then break it

Let's build a working incremental MV, prove it is incremental, then deliberately break the incremental property and watch INCREMENTAL STRICT catch it. Every step below runs in the Databricks SQL Query Editor on a serverless SQL warehouse.

Step 1: Create a source table with row tracking enabled

Row tracking is what lets the planner identify which rows have changed between refreshes. Enable it explicitly on the source:

CREATE OR REPLACE TABLE my_catalog.sales.orders (
 order_id    BIGINT,
 customer_id INT,
 order_date  DATE,
 total_amount DECIMAL(10,2)
)
TBLPROPERTIES (delta.enableRowTracking = true);
INSERT INTO my_catalog.sales.orders VALUES
 (1, 101, '2026-04-30', 120.00),
 (2, 102, '2026-04-30',  85.50),
 (3, 101, '2026-05-01', 240.00),
 (4, 103, '2026-05-01',  60.75);

Step 2: Verify eligibility with EXPLAIN

Before committing the MV, ask Databricks whether the query qualifies:

EXPLAIN CREATE MATERIALIZED VIEW my_catalog.sales.daily_revenue AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;

Expect == Incremental Update Eligibility == The Materialized View can be incrementally refreshed. with no issues.

Screenshot 2026-05-05 at 10.10.42 AM.png

Step 3: Create the MV with INCREMENTAL STRICT 

CREATE MATERIALIZED VIEW my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL STRICT
AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;

This MV is a clean incremental candidate: a single Delta source with row tracking, a top-level GROUP BY, no window functions, and no UDFs. Databricks compiles it as GROUP_AGGREGATE, one of the supported incremental techniques.

Step 4: Add some data and refresh

INSERT INTO my_catalog.sales.orders VALUES
 (5, 102, '2026-05-02', 410.00),
 (6, 104, '2026-05-03',  75.25);
REFRESH MATERIALIZED VIEW my_catalog.sales.daily_revenue;
SELECT * FROM my_catalog.sales.daily_revenue ORDER BY order_date;

You will see four daily totals. The refresh processed only the two new rows, not all six.

Step 5: Prove it was incremental (the event log)

event_log() returns one row per planning decision per refresh. Filter for the planning event:

SELECT timestamp, message
FROM event_log(TABLE(my_catalog.sales.daily_revenue))
WHERE event_type = 'planning_information'
ORDER BY timestamp DESC
LIMIT 1;

The message field on the latest row will name the technique. For our query, expect to see GROUP_AGGREGATE. If the planner chose FULL_RECOMPUTE, that's a tell; it means something in the query no longer qualifies for incremental.Screenshot 2026-05-15 at 12.20.10 PM.png

Step 6 — Break it on purpose

Now let's see INCREMENTAL STRICT do its job. Add a non-deterministic expression to the MV RAND() is the canonical example. Replace the MV definition:

CREATE OR REPLACE MATERIALIZED VIEW my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL STRICT
AS
SELECT order_date,
      sum(total_amount) AS revenue,
      rand() AS sample_jitter
FROM my_catalog.sales.orders
GROUP BY order_date;

The statement fails. The error class is MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE and the sub-code names the exact reason:

[MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE] Cannot create the materialized view with IncrementalStrict refresh policy because it is not supported for incremental refresh. Reason:
- Expressions Rand are not deterministic. Consider rewriting the query to avoid using them.

This is the exact behavior we wanted. The bad MV never gets created. The team that was about to merge this change sees the error in their PR pipeline, fixes the query, and ships an MV that actually does what they meant.

If you want the same protection on an existing MV that's already in production, switch its policy with ALTER MATERIALIZED VIEW … SET REFRESH POLICY INCREMENTAL STRICT. From that point forward, any edit that breaks incrementalization causes the next refresh to fail instead of silently performing a full recompute.

For the curious, decoding the planning_information event log

This section is for readers who want to debug the why behind an MV slipping out of incremental. It is also the section you bookmark and share with the engineer who wrote the UDF.

The planning_information event log entry returns a message that identifies the technique used. There are eight values you'll see in practice:

 

Technique

What it means

ROW_BASED

The planner identified the changed rows in the source and applied them directly. The cheapest pattern.

PARTITION_OVERWRITE

A whole partition changed; the MV rewrote the partition slice rather than every row.

WINDOW_FUNCTION

A window aggregate with PARTITION BY was incrementalized via window-state reuse.

APPEND_ONLY

The source only saw inserts; the MV was appended without checking for updates.

GROUP_AGGREGATE

A top-level GROUP BY was incrementalized using the delta aggregation state.

GENERIC_AGGREGATE

A more complex aggregate path, still incremental.

FULL_RECOMPUTE

The MV was recomputed from scratch. The bill signal.

NO_OP

Nothing changed in the source; the MV was unchanged.

If the technique is FULL_RECOMPUTE and your policy was INCREMENTAL, you got the graceful fallback you opted into. If the technique is FULL_RECOMPUTE and your policy was AUTO, the cost model picked full for that run; sometimes that's right, but it's worth a sanity check, especially if the changeset was small.

When INCREMENTAL STRICT rejects an MV at create time or fails a refresh, the error message names a sub-code under MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE. Each sub-code points at a fixable cause, and you can usually act on it without needing to escalate:

 

Sub-code

Cause

Fix

AGGREGATE_NOT_TOP_NODE

A complex expression sits above the GROUP BY

Simplify; lift the aggregate to the top

EXPRESSION_NOT_DETERMINSTIC

RAND(), current_timestamp(), or similar in the projection

Remove or precompute

INPUT_NOT_IN_DELTA

A source table is not Delta

Convert to Delta or load via staging

OPERATOR_NOT_INCREMENTALIZABLE

An unsupported operator (some complex joins)

Restructure, or accept AUTO/FULL

ROW_TRACKING_NOT_ENABLED

The Delta source lacks row tracking

ALTER TABLE … SET TBLPROPERTIES ('delta.enableRowTracking' = 'true')

SUBQUERY_EXPRESSION_NOT_INCREMENTALIZABLE

A correlated/non-incrementalizable subquery

Rewrite as a join

UDF_NOT_DETERMINISTIC

A UDF was not declared deterministic

Declare it deterministic or remove

WINDOW_WITHOUT_PARTITION_BY

A window spec missing PARTITION BY

Add the partition column

Most of these are five-minute fixes once you know what the sub-code is asking for.

Practical rules of thumb for keeping MVs incremental

 

Pattern

Recommended?

Note

Simple SELECT with filters

Yes

Best candidate

1–2 joins

Yes

Best candidate

3+ joins

No

Falls back to full refresh today

Top-level GROUP BY aggregates

Yes

Delta aggregation state used

2+ levels of nested aggregates

No

Falls back to full refresh

Window functions, without Partition

No

Supported with WINDOW PARTITION BY

Time function in leaf filters is incrementalizable. For example:


CREATE MATERIALIZED VIEW mv AS

SELECT val FROM T

WHERE date > CURRENT_DATE() - 3

Yes

Putting time functions as projections is not supported, e.g.


CREATE MATERIALIZED VIEW mv AS

SELECT val, CURRENT_DATE() date

FROM T

Changeset > ~15% of MV size

Reconsider

Full refresh is often cheaper

The single habit that pays off most consistently here is layering complex queries into multiple MVs. Rather than one MV that does three joins and a sum, build the sum in its own MV and join it in a second one. Each layer remains cleanly incrementalizable, dependencies are clearly visible in the DAG, and the planner has a much easier time at each level.

A reasonable starting posture: develop on INCREMENTAL so create-time mistakes show up early, then move to INCREMENTAL STRICT for the production MVs where surprise full refreshes are not OK. Reach for FULL only after you've actually measured and confirmed it's the cheaper option.

Requirements caveats

  • Compute: Serverless SQL Warehouse, or SDP Pro / Advanced edition
  • Source tables: Delta with row tracking enabled for many incremental techniques

Get started

A good way to start is to pick one MV from your own warehouse, the one you'd most hate to see drift into a full refresh, and run the event_log() query against it. If the latest planning_information says FULL_RECOMPUTE and you didn't expect that, you've just found a cost line worth investigating. Run EXPLAIN CREATE MATERIALIZED VIEW against the query, read the sub-code that comes back, fix the blocker, and switch the policy to INCREMENTAL STRICT so the next regression fails loudly instead of silently.

If you try this and hit something interesting, or if the error message your MV throws doesn't quite match what's documented, drop it in the comments. We read them.