โ08-11-2025 12:58 AM
Hi All
We have the DLT table as our curated layer with apply changes. The DLT pipelines runs on continuous mode for streaming real time data ingestion. There is a requirement as per regulatory to retain only 1 year data in the DLT table and to move the history to another Delta table.
How to execute the Delete command without impacting the regular real time load? The delete can be run once in a day and the data to be moved to the new history delta table and not a DLT
โ08-11-2025 03:22 AM - edited โ08-11-2025 03:23 AM
@JothyGanesan , it should not have an impact. Delta supports concurrent operation. And since you will be deleting old rows there is low risk that you hit some error due to concurrency, because probably you won't need to modify the row you're going to delete at the same time. You can read about isolation level here:
Isolation levels and write conflicts on Databricks | Databricks Documentation
โ08-11-2025 01:45 AM - edited โ08-11-2025 02:00 AM
Hi @JothyGanesan ,
You can write a notebook that will be responsible for archiving all the data that is older than 1 year. Then you can run a process that will delete the data from DLT table. How to do this is described at following article:
Transform data with pipelines - Azure Databricks | Microsoft Learn
One question, is your table a source from some downstream tables as well? Do you need to propagate those deletes to other tables?
โ08-11-2025 03:19 AM
Thank you @szymon_dybczak
This is a continuous mode target table as per DLT pipeline and it is on apply changes mode. This does not have any further downstream tables for propogation.
If we have a separate notebook to run the delete for this DLT will it not impact the actual data ingestion of this table?
โ08-11-2025 03:22 AM - edited โ08-11-2025 03:23 AM
@JothyGanesan , it should not have an impact. Delta supports concurrent operation. And since you will be deleting old rows there is low risk that you hit some error due to concurrency, because probably you won't need to modify the row you're going to delete at the same time. You can read about isolation level here:
Isolation levels and write conflicts on Databricks | Databricks Documentation
โ08-11-2025 04:18 AM
And if the answer was helpful to you, @JothyGanesan please consider marking it as accepted solution. This will help others with similar problem find correct answer faster.
a month ago
@szymon_dybczak ..Can you please suggest how we can delete records ..we have a scd2 target table(Silver) on top of that scd2 table we are having another scd2 table (Gold_layer) target table ..idea was if i delete a row in silver table how it can propogate that delete from silver to gold..can you please provide a sample code
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now