Hi all,
I am building a realtime dashboard using Databricks Delta Live Tables Pipeline and using the following steps : -
Bronze Table : Using the autoloader functionality provided by databricks, its incrementally ingesting new files records into a bronze table.
Silver Table : Using the read_stream function provided in spark for structured streaming, we are creating the silver table by filtering the records and selecting few fields from the bronze table that are required.
Gold Table : Using the read function provided in spark for reading complete record, we are creating the gold table, which is the materialized view and also using aggregate function (SUM), and group by clause to create it.
Problem :
Bronze and silver table are doing incremental ingestion, however incase of gold table, the entire record in the table is getting recomputed everytime a new record is received in the silver table.
What I want to ensure is that for the particular group by clause only updates should be performed and rest of the records are locked and dont require any update.
I have also tried using streaming table instead of materialized view for gold as well, in this case also the entire records are getting recomputed.
Any help would be appreciated.