cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Event-driven Architecture with Lake Monitoring without "Trigger on Arrival" on DABs

tana_sakakimiya
New Contributor II

AWS databricks

I want to create data quality monitoring and event-driven architecture without trigger on file arrival but once at deploy.
I plan to create a job which trigger once at deploy.
The job run this tasks sequentially.
1. run script to create external table if not exist to load data in delta format from S3 as tables in landing schema. configure with properties such as
enableChangeDataFeed= true
delta.enableRowTracking = true
delta.enableDeletionVectors = true to enable incremental update in downstream materialized view
2. dlt task
- create materialized view tables as bronze schema with expectation (warning) trigger on update
- create materialized view tables as silver schema with expectation (drop) trigger on update
- create materialzied view for data profile based on DAMA framework trigger on schedule. it pulls data quality enabled by lake monitoring feature.

Does this make sense and realistic?

1 ACCEPTED SOLUTION

Accepted Solutions

tana_sakakimiya
New Contributor II

i found out that materializeed view can't incremental update when it references from external location.

this architecture doesn't work

View solution in original post

6 REPLIES 6

tana_sakakimiya
New Contributor II

i found out that materializeed view can't incremental update when it references from external location.

this architecture doesn't work

BS_THE_ANALYST
Esteemed Contributor

@tana_sakakimiya just out of curiosity, where did you find this out? 

I'm looking at the docs right now for incremental refreshes for materialized viewshttps://docs.databricks.com/aws/en/optimizations/incremental-refresh 

This section seems to say external is supported?

BS_THE_ANALYST_0-1757853998989.png


Could you point to where it says otherwise? I appreciate I might be chucking a red herring out there 🙂

All the best,
BS

 

tana_sakakimiya
New Contributor II

@BS_THE_ANALYST 

I appreciate your response.

I found from Azure documentation
Incremental refresh for materialized views - Azure Databricks | Microsoft Learn

I'm not sure i misunderstand or not.
It says 
"Sources such as volumes, external locations, and foreign catalogs are not supported." 

so i think external table is not supported. how do you think? 

Thank you. 

tana_sakakimiya_0-1757855618179.png

 

tana_sakakimiya
New Contributor II

maybe it works only when data stored in S3 is in delta format

@tana_sakakimiya  I do understand the confusion. In the screenshot I sent you, it looks like it should work. In the screenshot you sent me, it looks like it shouldn't work. I guess our best hope is Delta format if using an External Location

Could you give that a try and see if it gives some success 🙏. Fingers crossed.

All the best,
BS

BS_THE_ANALYST
Esteemed Contributor

@tana_sakakimiya ah, I think I see the difference. 

My screenshot says that "external tables" backed by delta lake will work. This means, you'll need to have the table already created in databricks, from your external location i.e. make an external table. 

Perhaps you could include that as part of your pipeline? External Location -> External Table -> Execute Rest of Pipeline 🤔.

All the best,
BS