You created a materialized view. You assumed it refreshed incrementally. Then, at 6 a.m., a refresh on a billion-row source ran a full recompute, and your monthly Databricks bill grew a leg.
This is the surprise every team hits at least once. By default, Databricks picks the most cost-effective refresh strategy for each MV, incremental when it can, full when it has to. That cost-based default is good. It is also opaque. The day someone changes a query, adds a UDF, or refactors a CTE, the MV silently slips from incremental into full, and nothing in the pipeline tells you. The bill tells you.
Today's announcement closes that gap. The new REFRESH POLICY clause for materialized views (GA) lets you tell Databricks exactly how aggressively to enforce incremental refresh and what to do when incremental refresh is no longer possible. There are four policy values, and each one maps onto a different operating posture: leave it to the cost model, prefer incremental, require it, or always go full. There's also a new EXPLAIN CREATE MATERIALIZED VIEW command that previews incremental eligibility before you commit, so you stop finding out at runtime that something in your query was blocking incremental all along.
In this post, we'll walk through where the surprise full refreshes come from, how REFRESH POLICY gives you the levers to control them, how EXPLAIN lets you check eligibility before you create the MV, and we'll break a working MV on purpose to show what the new error message looks like in practice.
If you build dashboards or downstream pipelines on materialized views, you have probably hit one of these:
What makes these so painful is that nothing in the pipeline breaks. The query still compiles. The refresh still runs. The numbers in the dashboard are still correct. The only signal that anything changed is the cost line on the next billing cycle, and by then, you've been paying for full recomputes for a week.
REFRESH POLICY is a single clause you add to a CREATE MATERIALIZED VIEW statement. It takes one of four values, and that value tells Databricks exactly what behavior you want when incremental refresh is and is not possible.
|
Policy |
When incremental is possible |
When incremental is not possible |
On create with a non-incrementalizable query |
When to use this |
|
AUTO (default) |
Picks incremental or full based on a cost model |
Performs a full refresh |
Performs a full refresh |
Most workloads because the cost model improves over time and takes into account historical execution statistics. |
|
INCREMENTAL |
Performs incremental refresh |
Falls back to full refresh |
Statement fails |
Override the cost model to incrementalize the MV, but you don't want it to fail and would rather prioritize data delivery. |
|
INCREMENTAL |
Performs incremental refresh |
Refresh fails with an error |
Statement fails |
When the MV should never fully refresh, and you'd rather get a notification than a surprise bill. Regulatory MVs, hot dashboards on large sources, fixed-cost contracts. Also, re-initialization protection against accidental query edits. |
|
FULL |
Performs a full refresh anyway |
Performs a full refresh |
Performs a full refresh |
When you've measured, and it's full, it's cheaper or simpler for this MV. |
The other three rows are where the new control lives.
INCREMENTAL is the "soft" option. You're saying you want incremental whenever possible, and you'll accept a full refresh as a fallback if something goes sideways at runtime, but you don't want to accidentally create an MV that can never be incremental in the first place. The create statement fails fast on a query that the engine can't incrementalize, so the friction shows up at development time rather than in production. At refresh time, both transient and permanent issues fall back to full.
INCREMENTAL is the "hard" option. You're saying you want incremental, period. If the engine can't deliver, fail the refresh and page someone. This is the policy for MVs where an unexpected full recompute is genuinely unacceptable: a regulatory MV with a 5-minute SLA, a 2 TB table where a full refresh costs more than the team's monthly Databricks bill, a downstream MV that cascades full refreshes through a dozen dependents. INCREMENTAL also doubles as re-initialization protection: if someone edits the query definition next quarter and that edit would force a full refresh, the next refresh fails with a clear reason ("Query Definition Change", "Schema Evolution", etc.). You can revert the edit and keep your incremental state, or if you actually do want the recompute, explicitly run REFRESH MATERIALIZED VIEW <name> FULL.
FULL is the "I measured this" option. You ran the comparison, full is cheaper for this particular MV and its downstream closure, and you don't want the system second-guessing you. So do MVs over fast-churning source data where most rows have changed by the time you refresh. It's not common, but it does exist.
Additionally, before choosing full recompute, the user needs to verify the downstream impact, and one option could be to run the refresh more frequently to keep changes in check. If those options are not effective, they can choose a full refresh.
The clause goes in the CREATE MATERIALIZED VIEW statement, between the name and the query body:
CREATE MATERIALIZED VIEW IF NOT EXISTS my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL
AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;
That is the entire surface area. One clause, four values.
If you have an existing MV, you can also alter the policy on it without rebuilding:
ALTER MATERIALIZED VIEW my_catalog.sales.daily_revenue
SET REFRESH POLICY INCREMENTAL;
Before you set INCREMENTAL or INCREMENTAL and discover at create time that the query isn't incrementalizable, you can check eligibility beforehand. The EXPLAIN CREATE MATERIALIZED VIEW command returns a structured analysis of whether your query qualifies for incremental refresh and, if not, what the blocker is.
EXPLAIN CREATE MATERIALIZED VIEW
SELECT k, sum(v) FROM source.src_schema.table GROUP BY k;
The output has three sections:
== Incremental Update Eligibility ==
The Materialized View can be incrementally refreshed.== Detailed Incrementalization Info ==
No issues detected.== Physical Plan ==
...
When the query is not eligible, the second section names the exact blocker:
== Incremental Update Eligibility ==
Query structure is not incrementalizable. See Detailed Incrementalization Info.== Detailed Incrementalization Info ==
ROW_TRACKING_NOT_ENABLED
Table name: table1
Prevents incrementalization: Yes
The blocker is usually one DDL statement away from being fixed:
ALTER TABLE source.src_schema.table1
SET TBLPROPERTIES (delta.enableRowTracking = true);
Re-run EXPLAIN. Once structural eligibility passes, you can set REFRESH POLICY INCREMENTAL on the real CREATE and trust the policy to enforce what you just verified.
One thing worth noting: EXPLAIN confirms structural eligibility, not the runtime decision. Under the default AUTO policy, the cost model can still choose a full recompute at runtime if it judges the changeset large enough that full is actually cheaper for this run. INCREMENTAL overrides the cost model. EXPLAIN is the tool you use to confirm that the structural prerequisites are in place before you commit to either of those.
Let's build a working incremental MV, prove it is incremental, then deliberately break the incremental property and watch INCREMENTAL catch it. Every step below runs in the Databricks SQL Query Editor on a serverless SQL warehouse.
Row tracking is what lets the planner identify which rows have changed between refreshes. Enable it explicitly on the source:
CREATE OR REPLACE TABLE my_catalog.sales.orders (
order_id BIGINT,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10,2)
)
TBLPROPERTIES (delta.enableRowTracking = true);INSERT INTO my_catalog.sales.orders VALUES
(1, 101, '2026-04-30', 120.00),
(2, 102, '2026-04-30', 85.50),
(3, 101, '2026-05-01', 240.00),
(4, 103, '2026-05-01', 60.75);
Before committing the MV, ask Databricks whether the query qualifies:
EXPLAIN CREATE MATERIALIZED VIEW my_catalog.sales.daily_revenue AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;
Expect == Incremental Update Eligibility == The Materialized View can be incrementally refreshed. with no issues.
Step 3 — Create the MV with INCREMENTAL
CREATE MATERIALIZED VIEW my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL
AS
SELECT order_date, sum(total_amount) AS revenue
FROM my_catalog.sales.orders
GROUP BY order_date;
This MV is a clean incremental candidate: a single Delta source with row tracking and a top-level GROUP BY. Databricks compiles it as GROUP_AGGREGATE, one of the supported incremental techniques.
INSERT INTO my_catalog.sales.orders VALUES
(5, 102, '2026-05-02', 410.00),
(6, 104, '2026-05-03', 75.25);REFRESH MATERIALIZED VIEW my_catalog.sales.daily_revenue;SELECT * FROM my_catalog.sales.daily_revenue ORDER BY order_date;
You will see four daily totals. The refresh processed only the two new rows, not all six.
event_log() returns one row per planning decision per refresh. Filter for the planning event:
SELECT timestamp, message
FROM event_log(TABLE(my_catalog.sales.daily_revenue))
WHERE event_type = 'planning_information'
ORDER BY timestamp DESC
LIMIT 1;
The message field on the latest row will name the technique. For our query, expect to see GROUP_AGGREGATE. If the planner chose FULL_RECOMPUTE, that's a tell; it means something in the query no longer qualifies for incremental.
Now let's see INCREMENTAL do its job. Add a non-deterministic expression to the MV RAND() is the canonical example. Replace the MV definition:
CREATE OR REPLACE MATERIALIZED VIEW my_catalog.sales.daily_revenue
REFRESH POLICY INCREMENTAL
AS
SELECT order_date,
sum(total_amount) AS revenue,
rand() AS sample_jitter
FROM my_catalog.sales.orders
GROUP BY order_date;
The statement fails. The error class is MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE, and the sub-code names the exact reason:
[MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE] Cannot create the materialized view with IncrementalStrict refresh policy because it is not supported for incremental refresh. Reason:
- Expressions Rand are not deterministic. Consider rewriting the query to avoid using them.
This is the exact behavior we wanted. The bad MV never gets created. The team that was about to merge this change sees the error in their PR pipeline, fixes the query, and ships an MV that actually does what they meant.
If you want the same protection on an existing MV that's already in production, switch its policy with ALTER MATERIALIZED VIEW … SET REFRESH POLICY INCREMENTAL. From that point forward, any edit that breaks incrementalization causes the next refresh to fail instead of silently performing a full recompute.
This section is for readers who want to debug the why behind an MV slipping out of incremental. It is also the section you bookmark and share with the engineer who wrote the UDF.
The planning_information event log entry returns a message that identifies the technique used. There are eight values you'll see in practice:
|
Technique |
What it means |
|
ROW_BASED |
The planner identified the changed rows in the source and applied them directly. |
|
PARTITION_OVERWRITE |
A whole partition changed; the MV rewrote the partition slice rather than every row. |
|
WINDOW_FUNCTION |
A window aggregate with PARTITION BY was incrementalized via window-state reuse. |
|
APPEND_ONLY |
The source only saw inserts; the MV was appended without checking for updates. |
|
GROUP_AGGREGATE |
A top-level GROUP BY was incrementalized using the delta aggregation state. (only changes to the sources are read) |
|
GENERIC_AGGREGATE |
Also an incremental top-level GROUP BY. (Might read portions of the sources in addition to the changes) |
|
FULL_RECOMPUTE |
The MV was recomputed from scratch. The bill signal. |
|
NO_OP |
Nothing changed in the source; the MV was unchanged. |
If the technique is FULL_RECOMPUTE and your policy was INCREMENTAL, you got the graceful fallback you opted into. If the technique is FULL_RECOMPUTE and your policy was AUTO, the cost model picked full for that run; sometimes that's right, but it's worth a sanity check, especially if the changeset was small.
When INCREMENTAL rejects an MV at create time or fails a refresh, the error message names a sub-code under MATERIALIZED_VIEW_NOT_INCREMENTALIZABLE. Each sub-code points at a fixable cause, and you can usually act on it without needing to escalate:
|
Sub-code |
Cause |
Fix |
|
AGGREGATE_NOT_TOP_NODE |
A complex expression sits above the GROUP BY |
Simplify; consider materializing the aggregate |
|
EXPRESSION_NOT_DETERMINSTIC |
RAND(), current_timestamp(), or similar in the projection |
Remove or precompute |
|
INPUT_NOT_IN_DELTA |
A source table is not Delta |
Convert to Delta or load via staging |
|
OPERATOR_NOT_INCREMENTALIZABLE |
An unsupported operator |
Restructure, or accept AUTO/FULL |
|
ROW_TRACKING_NOT_ENABLED |
The Delta source lacks row tracking |
ALTER TABLE … SET TBLPROPERTIES ('delta.enableRowTracking' = 'true') |
|
SUBQUERY_EXPRESSION_NOT_INCREMENTALIZABLE |
A subquery expression which cannot be re-written as a join |
Rewrite the subquery as a join or materialize any expensive non-correlated subquery as a separate MV. |
|
UDF_NOT_DETERMINISTIC |
A UDF was not declared deterministic |
Declare it deterministic or remove |
|
WINDOW_WITHOUT_PARTITION_BY |
A window spec missing PARTITION BY |
Add the partition column |
Most of these are five-minute fixes once you know what the sub-code is asking for.
|
Pattern |
Recommendation |
Note |
|
Simple SELECT with filters |
Yes |
Good candidate |
|
5+ joins |
Discretionary |
Evaluate using incremental refresh policy or consider simplifying the MV definition |
|
Top-level GROUP BY aggregates |
Yes |
Well supported with GROUP_AGGREGATE and GENERIC_AGGREGATE |
|
Several layers of nested aggregates |
Discretionary |
Evaluate using incremental refresh policy or consider materializing expensive aggregates |
|
Window functions, without Partition |
No |
Supported with WINDOW PARTITION BY |
|
Time function filters on sources. For example: CREATE MATERIALIZED VIEW mv AS SELECT val FROM T WHERE date > CURRENT_DATE() - INTERVAL 3 DAYS |
Yes |
Most efficient when filters can be pushed down to the source tables. |
|
Time function usage in projections. For eg. SELECT val, CURRENT_DATE() date FROM T |
No |
Engine has to rewrite each row upon refresh |
|
Tables with large relative changes |
Discretionary |
Full refresh is often cheaper. Consider downstream impact before overriding the cost model with incremental refresh policy. |
The single habit that pays off most consistently here is layering complex queries into multiple MVs.
A reasonable starting posture: develop on INCREMENTAL so create-time mistakes show up early, then move to INCREMENTAL for the production MVs where surprise full refreshes are not OK. Reach for FULL only after you've actually measured and confirmed it's the cheaper option.
A good way to start is to pick one MV from your own warehouse, the one you'd most hate to see drift into a full refresh, and run the event_log() query against it. If the latest planning_information says FULL_RECOMPUTE and you didn't expect that, you've just found a cost line worth investigating. Run EXPLAIN CREATE MATERIALIZED VIEW against the query, read the sub-code that comes back, fix the blocker, and switch the policy to INCREMENTAL so the next regression fails loudly instead of silently.
If you try this and hit something interesting, or if the error message your MV throws doesn't quite match what's documented, drop it in the comments. We read them.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.