cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Deleting Records from DLT Bronze and Silver Tables

Gianfranco
New Contributor II

I have a pipeline that generates two DLT streaming tables: a Bronze table and a Silver table. I need to delete specific records from both tables. I've read an article (https://www.databricks.com/blog/handling-right-be-forgotten-gdpr-and-ccpa-using-delta-live-tables-dl...) suggesting that with two streaming tables, the only option is to delete the data in the Bronze table and then perform a full refresh of the Silver table.

However, the Silver table is very large, and I'd like to avoid a full refresh. Are there any alternative solutions available?

Thanks

3 REPLIES 3

Brahmareddy
Honored Contributor

Hi @Gianfranco,

How are you doing it today?

As per my understanding, Consider using a selective delete approach on both the Bronze and Silver tables to avoid a full refresh. Instead of deleting data and refreshing the entire Silver table, you could delete the specific records in both tables by writing an appropriate delete query directly against each Delta table. This way, you're only removing the required records without needing to rebuild the Silver table from scratch. Another option might be to implement Change Data Capture (CDC), so you can track and update only the affected records, reducing the need for full table operations. Additionally, explore using vacuum or optimize commands to maintain table performance after deletions.

Give a try and see if it works.

Good day.

Regards,

Brahma

filipniziol
Contributor III

Hi @Gianfranco ,

Try APPLY CHANGES. This will work well in your scenario, as it supports DELETE operation as well:
https://docs.databricks.com/en/delta-live-tables/cdc.html

karthickrs
New Contributor

Remove records using the DELETE operation in both Bronze & Silver tables.

After doing each delete step, you can Optimize the table which rewrites the parquet files for that table behind the scenes to improve the data layout (Read more about optimize here: https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-optimize). It is a simple command: OPTIMIZE <Table Name>

Also, as you move on in this process, you can run the VACUUM command to clean up old versions of the data and free up storage space: https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-vacuum

Hope this helps.

Karthick Ramachandran Seshadri
Data Architect | MS/MBA
Data + AI/ML/GenAI
17x Databricks Credentials

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group