Databricks Community

Manzilla · ‎05-01-2024

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream function

A daily batch job does some transformation on bronze data and stores results in the silver table.

New Process

Bronze still the same.

A stream has been created to ingest the bronze table into a view where the data transformation occurs and is used as the source for the silver table that is updated with dlt.apply_changes.

dlt,apply_changes adds 4 hidden columns for tracking my question is what will happen when this runs the first time against the production data?

Will the stream that handles the stream associated with the silver process look at the entire bronze table and reprocess it or will it start from the current date/time and move forward?

Manzilla · ‎05-03-2024

Thank you thats what I understood too. It is just nice to get validation from someone else that works with this.

Sidhant07 · ‎12-08-2024

When you use `dlt.apply_changes` to update the silver table, it adds four hidden columns for tracking changes. These columns include `event_time`, `read_version`, `commit_version`, and `is_deleted`.

When you run this process for the first time against the production data, the stream that handles the silver process will not reprocess the entire bronze table.

This is because the stream processing in Delta Live Tables (DLT) is designed to process new data as it arrives, rather than reprocessing all the data each time. When you start the stream,the checkpoint is used to determine where to start processing data the next time the stream is started.

So, when you run the stream for the first time against the production data, it will start processing data from the current date/time and move forward, using the checkpoint to keep track of its progress. It will not reprocess the entire bronze table, unless you explicitly configure it to do so.

Databricks Community

Delta Live table - Adding streaming to existing table

Join Us as a Local Community Builder!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟