Databricks Community

Pravas007

You created a materialized view. You assumed it refreshed incrementally. Then, at 6 a.m., a refresh on a billion-row source ran a full recompute, and your monthly Databricks bill grew a leg.

This is the surprise every team hits at least once. By default, Databricks picks the most cost-effective refresh strategy for each MV, incremental when it can, full when it has to. That cost-based default is good. It is also opaque. The day someone changes a query, adds a UDF, or refactors a CTE, the MV silently slips from incremental into full, and nothing in the pipeline tells you. The bill tells you.

Today's announcement closes that gap. The new REFRESH POLICY clause for materialized views (GA) lets you tell Databricks exactly how aggressively to enforce incremental refresh and what to do when incremental refresh is no longer possible. There are four policy values, and each one maps onto a different operating posture: leave it to the cost model, prefer incremental, require it, or always go full. There's also a new EXPLAIN CREATE MATERIALIZED VIEW command that previews incremental eligibility before you commit, so you stop finding out at runtime that something in your query was blocking incremental all along.

In this post, we'll walk through where the surprise full refreshes come from, how REFRESH POLICY gives you the levers to control them, how EXPLAIN lets you check eligibility before you create the MV, and we'll break a working MV on purpose to show what the new error message looks like in practice.

The Challenge: when "incremental" silently becomes "full"

If you build dashboards or downstream pipelines on materialized views, you have probably hit one of these:

Window function landed without `PARTITION BY`. Window aggregates without partitioning aren't incrementalizable today. The query still runs; it just recomputes the whole MV every refresh.
The source table properties changed, or row tracking was disabled on a Delta source. Either way, change tracking is unavailable, and every refresh is full.
The cost model makes a suboptimal decision because we see a very different refresh pattern, e.g. data volume.

What makes these so painful is that nothing in the pipeline breaks. The query still compiles. The refresh still runs. The numbers in the dashboard are still correct. The only signal that anything changed is the cost line on the next billing cycle, and by then, you've been paying for full recomputes for a week.

REFRESH POLICY at a glance: four policies, one clause

REFRESH POLICY is a single clause you add to a CREATE MATERIALIZED VIEW statement. It takes one of four values, and that value tells Databricks exactly what behavior you want when incremental refresh is and is not possible.

Policy	When incremental is possible	When incremental is not possible	On create with a non-incrementalizable query	When to use this
AUTO (default)	Picks incremental or full based on a cost model	Performs a full refresh	Performs a full refresh	Most workloads because the cost model improves over time and takes into account historical execution statistics.
INCREMENTAL	Performs incremental refresh	Falls back to full refresh	Statement fails	Override the cost model to incrementalize the MV, but you don't want it to fail and would rather prioritize data delivery.
INCREMENTAL	Performs incremental refresh	Refresh fails with an error	Statement fails	When the MV should never fully refresh, and you'd rather get a notification than a surprise bill. Regulatory MVs, hot dashboards on large sources, fixed-cost contracts. Also, re-initialization protection against accidental query edits.
FULL	Performs a full refresh anyway	Performs a full refresh	Performs a full refresh	When you've measured, and it's full, it's cheaper or simpler for this MV.

The other three rows are where the new control lives.

INCREMENTAL is the "soft" option. You're saying you want incremental whenever possible, and you'll accept a full refresh as a fallback if something goes sideways at runtime, but you don't want to accidentally create an MV that can never be incremental in the first place. The create statement fails fast on a query that the engine can't incrementalize, so the friction shows up at development time rather than in production. At refresh time, both transient and permanent issues fall back to full.

INCREMENTAL is the "hard" option. You're saying you want incremental, period. If the engine can't deliver, fail the refresh and page someone. This is the policy for MVs where an unexpected full recompute is genuinely unacceptable: a regulatory MV with a 5-minute SLA, a 2 TB table where a full refresh costs more than the team's monthly Databricks bill, a downstream MV that cascades full refreshes through a dozen dependents. INCREMENTAL also doubles as re-initialization protection: if someone edits the query definition next quarter and that edit would force a full refresh, the next refresh fails with a clear reason ("Query Definition Change", "Schema Evolution", etc.). You can revert the edit and keep your incremental state, or if you actually do want the recompute, explicitly run REFRESH MATERIALIZED VIEW <name> FULL.

FULL is the "I measured this" option. You ran the comparison, full is cheaper for this particular MV and its downstream closure, and you don't want the system second-guessing you. So do MVs over fast-churning source data where most rows have changed by the time you refresh. It's not common, but it does exist.

Additionally, before choosing full recompute, the user needs to verify the downstream impact, and one option could be to run the refresh more frequently to keep changes in check. If those options are not effective, they can choose a full refresh.

Syntax at a glance

The clause goes in the CREATE MATERIALIZED VIEW statement, between the name and the query body:

CREATE MATERIALIZED VIEW IF NOT EXISTS my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL
AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;

That is the entire surface area. One clause, four values.

If you have an existing MV, you can also alter the policy on it without rebuilding:

ALTER MATERIALIZED VIEW my_catalog.sales.daily_revenue
SET REFRESH POLICY INCREMENTAL;

Check before you commit: EXPLAIN CREATE MATERIALIZED VIEW

Before you set INCREMENTAL or INCREMENTAL and discover at create time that the query isn't incrementalizable, you can check eligibility beforehand. The EXPLAIN CREATE MATERIALIZED VIEW command returns a structured analysis of whether your query qualifies for incremental refresh and, if not, what the blocker is.

EXPLAIN CREATE MATERIALIZED VIEW
SELECT k, sum(v) FROM source.src_schema.table GROUP BY k;

The output has three sections:

== Incremental Update Eligibility ==
The Materialized View can be incrementally refreshed.

== Detailed Incrementalization Info ==
No issues detected.

== Physical Plan ==
...

When the query is not eligible, the second section names the exact blocker:

== Incremental Update Eligibility ==
Query structure is not incrementalizable. See Detailed Incrementalization Info.

== Detailed Incrementalization Info ==
ROW_TRACKING_NOT_ENABLED
Table name: table1
Prevents incrementalization: Yes

The blocker is usually one DDL statement away from being fixed:

ALTER TABLE source.src_schema.table1
SET TBLPROPERTIES (delta.enableRowTracking = true);

Re-run EXPLAIN. Once structural eligibility passes, you can set REFRESH POLICY INCREMENTAL on the real CREATE and trust the policy to enforce what you just verified.

One thing worth noting: EXPLAIN confirms structural eligibility, not the runtime decision. Under the default AUTO policy, the cost model can still choose a full recompute at runtime if it judges the changeset large enough that full is actually cheaper for this run. INCREMENTAL overrides the cost model. EXPLAIN is the tool you use to confirm that the structural prerequisites are in place before you commit to either of those.

Hands-on: build an MV that's incremental, then break it

Let's build a working incremental MV, prove it is incremental, then deliberately break the incremental property and watch INCREMENTAL catch it. Every step below runs in the Databricks SQL Query Editor on a serverless SQL warehouse.

Step 1 — Create a source table with row tracking enabled

Row tracking is what lets the planner identify which rows have changed between refreshes. Enable it explicitly on the source:

CREATE OR REPLACE TABLE my_catalog.sales.orders (
 order_id    BIGINT,
 customer_id INT,
 order_date  DATE,
 total_amount DECIMAL(10,2)
)
TBLPROPERTIES (delta.enableRowTracking = true);

INSERT INTO my_catalog.sales.orders VALUES
 (1, 101, '2026-04-30', 120.00),
 (2, 102, '2026-04-30',  85.50),
 (3, 101, '2026-05-01', 240.00),
 (4, 103, '2026-05-01',  60.75);

Step 2 — Verify eligibility with EXPLAIN

Before committing the MV, ask Databricks whether the query qualifies:

EXPLAIN CREATE MATERIALIZED VIEW my_catalog.sales.daily_revenue AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;

Expect == Incremental Update Eligibility == The Materialized View can be incrementally refreshed. with no issues.

Screenshot 2026-05-20 at 8.27.01 PM.png

Step 3 — Create the MV with INCREMENTAL

CREATE MATERIALIZED VIEW my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL
AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;

This MV is a clean incremental candidate: a single Delta source with row tracking and a top-level GROUP BY. Databricks compiles it as GROUP_AGGREGATE, one of the supported incremental techniques.

Step 4 — Add some data and refresh

INSERT INTO my_catalog.sales.orders VALUES
 (5, 102, '2026-05-02', 410.00),
 (6, 104, '2026-05-03',  75.25);

REFRESH MATERIALIZED VIEW my_catalog.sales.daily_revenue;

SELECT * FROM my_catalog.sales.daily_revenue ORDER BY order_date;

You will see four daily totals. The refresh processed only the two new rows, not all six.

Step 5 — Prove it was incremental (the event log)

event_log() returns one row per planning decision per refresh. Filter for the planning event:

SELECT timestamp, message
FROM event_log(TABLE(my_catalog.sales.daily_revenue))
WHERE event_type = 'planning_information'
ORDER BY timestamp DESC
LIMIT 1;

The message field on the latest row will name the technique. For our query, expect to see GROUP_AGGREGATE. If the planner chose FULL_RECOMPUTE, that's a tell; it means something in the query no longer qualifies for incremental. Screenshot 2026-05-15 at 12.20.10 PM.png

Step 6 — Break it on purpose

Now let's see INCREMENTAL do its job. Add a non-deterministic expression to the MV RAND() is the canonical example. Replace the MV definition:

CREATE OR REPLACE MATERIALIZED VIEW my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL
AS
SELECT order_date,
      sum(total_amount) AS revenue,
      rand() AS sample_jitter
FROM my_catalog.sales.orders
GROUP BY order_date;

The statement fails. The error class is MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE, and the sub-code names the exact reason:

[MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE] Cannot create the materialized view with IncrementalStrict refresh policy because it is not supported for incremental refresh. Reason:
- Expressions Rand are not deterministic. Consider rewriting the query to avoid using them.

This is the exact behavior we wanted. The bad MV never gets created. The team that was about to merge this change sees the error in their PR pipeline, fixes the query, and ships an MV that actually does what they meant.

If you want the same protection on an existing MV that's already in production, switch its policy with ALTER MATERIALIZED VIEW … SET REFRESH POLICY INCREMENTAL. From that point forward, any edit that breaks incrementalization causes the next refresh to fail instead of silently performing a full recompute.

For the curious — decoding the planning_information event log

This section is for readers who want to debug the why behind an MV slipping out of incremental. It is also the section you bookmark and share with the engineer who wrote the UDF.

The planning_information event log entry returns a message that identifies the technique used. There are eight values you'll see in practice:

Technique	What it means
ROW_BASED	The planner identified the changed rows in the source and applied them directly.
PARTITION_OVERWRITE	A whole partition changed; the MV rewrote the partition slice rather than every row.
WINDOW_FUNCTION	A window aggregate with PARTITION BY was incrementalized via window-state reuse.
APPEND_ONLY	The source only saw inserts; the MV was appended without checking for updates.
GROUP_AGGREGATE	A top-level GROUP BY was incrementalized using the delta aggregation state. (only changes to the sources are read)
GENERIC_AGGREGATE	Also an incremental top-level GROUP BY. (Might read portions of the sources in addition to the changes)
FULL_RECOMPUTE	The MV was recomputed from scratch. The bill signal.
NO_OP	Nothing changed in the source; the MV was unchanged.

If the technique is FULL_RECOMPUTE and your policy was INCREMENTAL, you got the graceful fallback you opted into. If the technique is FULL_RECOMPUTE and your policy was AUTO, the cost model picked full for that run; sometimes that's right, but it's worth a sanity check, especially if the changeset was small.

When INCREMENTAL rejects an MV at create time or fails a refresh, the error message names a sub-code under MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE. Each sub-code points at a fixable cause, and you can usually act on it without needing to escalate:

Sub-code	Cause	Fix
AGGREGATE_NOT_TOP_NODE	A complex expression sits above the GROUP BY	Simplify; consider materializing the aggregate
EXPRESSION_NOT_DETERMINSTIC	RAND(), current_timestamp(), or similar in the projection	Remove or precompute
INPUT_NOT_IN_DELTA	A source table is not Delta	Convert to Delta or load via staging
OPERATOR_NOT_INCREMENTALIZABLE	An unsupported operator	Restructure, or accept AUTO/FULL
ROW_TRACKING_NOT_ENABLED	The Delta source lacks row tracking	ALTER TABLE … SET TBLPROPERTIES ('delta.enableRowTracking' = 'true')
SUBQUERY_EXPRESSION_NOT_INCREMENTALIZABLE	A subquery expression which cannot be re-written as a join	Rewrite the subquery as a join or materialize any expensive non-correlated subquery as a separate MV.
UDF_NOT_DETERMINISTIC	A UDF was not declared deterministic	Declare it deterministic or remove
WINDOW_WITHOUT_PARTITION_BY	A window spec missing PARTITION BY	Add the partition column

Most of these are five-minute fixes once you know what the sub-code is asking for.

Practical rules of thumb for keeping MVs incremental

Pattern	Recommendation	Note
Simple SELECT with filters	Yes	Good candidate
5+ joins	Discretionary	Evaluate using incremental refresh policy or consider simplifying the MV definition
Top-level GROUP BY aggregates	Yes	Well supported with GROUP_AGGREGATE and GENERIC_AGGREGATE
Several layers of nested aggregates	Discretionary	Evaluate using incremental refresh policy or consider materializing expensive aggregates
Window functions, without Partition	No	Supported with WINDOW PARTITION BY
Time function filters on sources. For example: CREATE MATERIALIZED VIEW mv AS SELECT val FROM T WHERE date > CURRENT_DATE() - INTERVAL 3 DAYS	Yes	Most efficient when filters can be pushed down to the source tables.
Time function usage in projections. For eg. CREATE MATERIALIZED VIEW mv AS SELECT val, CURRENT_DATE() date FROM T	No	Engine has to rewrite each row upon refresh
Tables with large relative changes	Discretionary	Full refresh is often cheaper. Consider downstream impact before overriding the cost model with incremental refresh policy.

The single habit that pays off most consistently here is layering complex queries into multiple MVs.

A reasonable starting posture: develop on INCREMENTAL so create-time mistakes show up early, then move to INCREMENTAL for the production MVs where surprise full refreshes are not OK. Reach for FULL only after you've actually measured and confirmed it's the cheaper option.

Requirements caveats

Compute: Serverless SQL Warehouse, or SDP Pro / Advanced edition
Source tables: Delta with row tracking enabled for many incremental techniques

Get started

A good way to start is to pick one MV from your own warehouse, the one you'd most hate to see drift into a full refresh, and run the event_log() query against it. If the latest planning_information says FULL_RECOMPUTE and you didn't expect that, you've just found a cost line worth investigating. Run EXPLAIN CREATE MATERIALIZED VIEW against the query, read the sub-code that comes back, fix the blocker, and switch the policy to INCREMENTAL so the next regression fails loudly instead of silently.

If you try this and hit something interesting, or if the error message your MV throws doesn't quite match what's documented, drop it in the comments. We read them.