cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT full refresh

PassionateDBD
New Contributor II

Running a task with full refresh in delta live tables removes existing data and reloads it from scratch. We are ingesting data from an event hub topic and from files. The event hub topic stores messages for seven days after arrival. If we would run a full refresh at some point in the future then wouldn't we lose all data except from the previous seven days as older data cannot be loaded anymore from the event hub topic?

If my interpretation is correct then is there a way to "mark" tables so that they are the "original source" of data and they should not be recalculated even if we run a full refresh? I would assume that this is a common case that bronze+silver+gold layers would need to be reloaded but landing would keep the full history of data in original form. 

1 REPLY 1

JesseS
New Contributor II

I know it's a bit after the fact, but in case you didn't solve it, I came across this article in the Databricks documentation.  You can set pipelines.reset.allowed to false on a table to prevent a full refresh of a table.  Ref: https://docs.databricks.com/en/optimizations/incremental-refresh.html

Hope it helps, and hope it helps any future Googlers!  ๐Ÿ™‚

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group