Deleting Records from DLT Bronze and Silver Tables
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-25-2024 08:01 AM
I have a pipeline that generates two DLT streaming tables: a Bronze table and a Silver table. I need to delete specific records from both tables. I've read an article (https://www.databricks.com/blog/handling-right-be-forgotten-gdpr-and-ccpa-using-delta-live-tables-dl...) suggesting that with two streaming tables, the only option is to delete the data in the Bronze table and then perform a full refresh of the Silver table.
However, the Silver table is very large, and I'd like to avoid a full refresh. Are there any alternative solutions available?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2024 08:58 PM
Hi @Gianfranco,
How are you doing it today?
As per my understanding, Consider using a selective delete approach on both the Bronze and Silver tables to avoid a full refresh. Instead of deleting data and refreshing the entire Silver table, you could delete the specific records in both tables by writing an appropriate delete query directly against each Delta table. This way, you're only removing the required records without needing to rebuild the Silver table from scratch. Another option might be to implement Change Data Capture (CDC), so you can track and update only the affected records, reducing the need for full table operations. Additionally, explore using vacuum or optimize commands to maintain table performance after deletions.
Give a try and see if it works.
Good day.
Regards,
Brahma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-01-2024 11:40 PM
Hi @Gianfranco ,
Try APPLY CHANGES. This will work well in your scenario, as it supports DELETE operation as well:
https://docs.databricks.com/en/delta-live-tables/cdc.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2024 08:29 PM
Remove records using the DELETE operation in both Bronze & Silver tables.
After doing each delete step, you can Optimize the table which rewrites the parquet files for that table behind the scenes to improve the data layout (Read more about optimize here: https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-optimize). It is a simple command: OPTIMIZE <Table Name>
Also, as you move on in this process, you can run the VACUUM command to clean up old versions of the data and free up storage space: https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-vacuum
Hope this helps.
Data Architect | MS/MBA
Data + AI/ML/GenAI
17x Databricks Credentials

