cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Incremental Load on Materialized Views

Dharinip
Contributor

Is incremental load possible on Materialized views. I would like to get some tutorials or videos on how to perform incremental refresh on MVs in gold layers. Also is it mandatory to have PKs for performing incremental loads in MVs.

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @Dharinip , excuse me for the super late reply! I had been very busy last month...

Thanks for providing the additional details. It seems you would like to have both SCD Type 1 and 2.

Are you using a DLT pipeline to implement this? I assume so, because my previous demo example didn't resonate with you, which used a standalone materialized view approach.

So, I just created a simple DLT pipeline based on your example. Having a bronze table (client_raw), apply changes into a SCD type 2 streaming table (client_history), then create a MV (client_current). The client_current MV should be updated incrementally using a Serverless used. Related doc can be found here.

I've uploaded the sample DLT source code here. I hope it suites your need. https://gist.github.com/koji-kawamura-db/999b7fb9e114a236c15442776c5c0a0f

BTW, creating both type 1 and 2 as two Streaming Tables might be an alternative solution. So, the sample pipeline includes such a route just in case.

The DLT pipeline looks like this:

koji_kawamura_0-1746186062932.png

 

 

View solution in original post

4 REPLIES 4

koji_kawamura
Databricks Employee
Databricks Employee

Hi @Dharinip 

Yes, Databricks materialized views support incremental updates. You can refer this doc page for details. It uses a DLT pipeline internally and a serverless SQL warehouse is required. It utilizes row tracking and PKs are not mandatory.

I created a very simple example notebook to demonstrate updating a materialized view incrementally.

My question is: can materialized views be updated incrementally. For example:

In my case, we store the data in Iron layer and it gets flattened in bronze and silver layer as separate tables. The required transformations happen from silver to gold layer.

All the Iron, bronze and silver layers are streaming tables as SCD type 2 and gets incrementally loaded. 

The Gold layer are materialized views. Every time the pipeline runs, the materialized views are fully refreshed. But we wanted to have those incrementally loaded.
Example:

SILVER        
         
Table: Client        
         
ClientIDClientNameAddressStatusCreatedDateStartDateEnddateRunDate 
1ABCFloridaNew03.15.202503.15.202503.16.202503.15.2025 
1ABCMiamiNew03.16.202503.16.202503.17.202503.16.2025 
1ABCMiamiActive03.17.202503.17.2025NULL03.17.2025 
2QWEKansasActive03.15.202503.15.202503.18.202503.15.2025 
2QWEKansasInactive03.18.202503.18.2025NULL03.18.2025 
3GHJBuffaloActive03.18.202503.18.2025NULL03.18.2025 
         
         
GOLD        
         
Materialized  View: Client       
         
IDNAMEAddressStatusCreatedDateStartDateEnddateRunDate 
1ABCMiamiActive03.17.202503.17.2025NULL03.17.2025These should also be incrementally loaded.
2QWEKansasInactive03.18.202503.18.2025NULL03.18.2025
3GHJBuffaloActive03.18.202503.18.2025NULL03.18.2025



Hi @Dharinip , excuse me for the super late reply! I had been very busy last month...

Thanks for providing the additional details. It seems you would like to have both SCD Type 1 and 2.

Are you using a DLT pipeline to implement this? I assume so, because my previous demo example didn't resonate with you, which used a standalone materialized view approach.

So, I just created a simple DLT pipeline based on your example. Having a bronze table (client_raw), apply changes into a SCD type 2 streaming table (client_history), then create a MV (client_current). The client_current MV should be updated incrementally using a Serverless used. Related doc can be found here.

I've uploaded the sample DLT source code here. I hope it suites your need. https://gist.github.com/koji-kawamura-db/999b7fb9e114a236c15442776c5c0a0f

BTW, creating both type 1 and 2 as two Streaming Tables might be an alternative solution. So, the sample pipeline includes such a route just in case.

The DLT pipeline looks like this:

koji_kawamura_0-1746186062932.png

 

 

Dharinip
Contributor

This is great. Thank you so much.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now