Databricks Community

Manzilla · ‎05-01-2024

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream function

A daily batch job does some transformation on bronze data and stores results in the silver table.

New Process

Bronze still the same.

A stream has been created to ingest the bronze table into a view where the data transformation occurs and is used as the source for the silver table that is updated with dlt.apply_changes.

dlt,apply_changes adds 4 hidden columns for tracking my question is what will happen when this runs the first time against the production data?

Will the stream that handles the stream associated with the silver process look at the entire bronze table and reprocess it or will it start from the current date/time and move forward?