cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

History Retention in DLT table (Delta Live Table)

JothyGanesan
New Contributor III

Hi All

We have the DLT table as our curated layer with apply changes. The DLT pipelines runs on continuous mode for streaming real time data ingestion. There is a requirement as per regulatory to retain only 1 year data in the DLT table and to move the history to another Delta table. 

How to execute the Delete command without impacting the regular real time load? The delete can be run once in a day and the data to be moved to the new history delta table and not a DLT

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

@JothyGanesan , it should not have an impact. Delta supports concurrent operation. And since you will be deleting old rows there is low risk that you hit some error due to concurrency, because probably you won't need to modify the row you're going to delete at the same time. You can read about isolation level here:

Isolation levels and write conflicts on Databricks | Databricks Documentation

View solution in original post

5 REPLIES 5

szymon_dybczak
Esteemed Contributor III

Hi @JothyGanesan ,

You can write a notebook that will be responsible for archiving all the data that is older than 1 year. Then you can run a process that will delete the data from DLT table. How to do this is described at following article:

Transform data with pipelines - Azure Databricks | Microsoft Learn

 

One question, is your table a source from some downstream tables as well? Do you need to propagate those deletes to other tables?

JothyGanesan
New Contributor III

Thank you @szymon_dybczak 

This is a continuous mode target table as per DLT pipeline and it is on apply changes mode. This does not have any further downstream tables for propogation.

If we have a separate notebook to run the delete for this DLT will it not impact the actual data ingestion of this table?

szymon_dybczak
Esteemed Contributor III

@JothyGanesan , it should not have an impact. Delta supports concurrent operation. And since you will be deleting old rows there is low risk that you hit some error due to concurrency, because probably you won't need to modify the row you're going to delete at the same time. You can read about isolation level here:

Isolation levels and write conflicts on Databricks | Databricks Documentation

szymon_dybczak
Esteemed Contributor III

And if the answer was helpful to you, @JothyGanesan please consider marking it as accepted solution. This will help others with similar problem find correct answer faster.

Ranga_naik1180
New Contributor III

@szymon_dybczak  ..Can you please suggest how we can delete records ..we have a scd2 target table(Silver) on top of that scd2 table we are having another scd2 table (Gold_layer) target table ..idea was if i delete a row in silver table how it can propogate that delete from silver to gold..can you please provide a sample code

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now