cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Retention Period for Parquet Data in e.g. S3 After Dropping a Managed Delta Table

Volker
Contributor

Hey community,

I have a question regarding the data retention policy for managed Delta tables stored e.g. in Amazon S3. Specifically:​

  • When a managed Delta table is dropped, what is the retention period for the underlying Parquet data files in S3 before they are permanently deleted?

I understand that Unity Catalog supports the UNDROP TABLE command to recover dropped managed tables within 7 days. However, I am interested in understanding the total duration the data remains in S3 before it is permanently removed.

In the past the documentation mentioned 30 days but I cannot find this information in the current documentation. I guess this was updated since in the german Azure documentation the 30 days are still mentioned: https://learn.microsoft.com/de-de/azure/databricks/sql/language-manual/sql-ref-syntax-ddl-drop-table

Additionally, is there a way to configure this retention period, or expedite the deletion process if immediate removal of data is required?

Thank you!

4 REPLIES 4

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @Volker,

The default retention period for managed Delta table data files in Unity Catalog is 30 days. I would check if there is a setting to reduce it to immediate removel.

Volker
Contributor

Thank you for your quick response already!
Would be great if this default retention period could again be mentioned in the docs.

Alberto_Umana
Databricks Employee
Databricks Employee

No problem.

Please see: https://learn.microsoft.com/en-us/azure/databricks/delta/table-properties it is mentioned 30 days for deleted files for Delta Tables.

  • delta.logRetentionDuration = "interval <interval>": controls how long the history for a table is kept. The default is interval 30 days.
  • delta.deletedFileRetentionDuration = "interval <interval>": determines the threshold VACUUM uses to remove data files no longer referenced in the current table version. The default is interval 7 days.

Volker
Contributor

Thanks for the resources!

So, to adjust how long Parquet files are stored in the S3 bucket after I drop a table, I would need to adjust the delta.logRetentionDuration, right?
And since dropping a Delta table marks the files for deletion after 7 days, I would need to wait 37 days for the files to be permanently deleted if I have the default settings, right?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now