cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Drop Delta Log seems not to be working

a_user12
New Contributor III
 
I have a delta table where I set the following property
logRetentionDuration: "interval 1 days"

I was doing some table operations and see in the _delta_log folder files such as

00000000000000000000.json
00000000000000000001.json
00000000000000000002.json

After one day, I was doing some other operations and I see that all 10 commit compacted files are created in the log. However, I was expecting that the old log jsons (older than 1 day) are removed. Why are they still there?

1 ACCEPTED SOLUTION

Accepted Solutions

K_Anudeep
Databricks Employee
Databricks Employee

Hello @a_user12 ,

deltaLogRetentionDuration is the interval after which the delta log files will be removed from the delta log. Delta Lake adheres to a set of internal rules to clean up the delta log when the retention duration is exceeded.

Setting delta.logRetentionDuration alone will not automatically remove delta log files. Databricks uses a specific internal cleanup logic, and log files are only deleted once both retention criteria and the required checkpoint files are present. In other words, simply configuring the retention duration does not directly trigger the removal of delta log files: Delta Lake manages delta log deletion asynchronously, and deletes only files eligible per retention rules after periodic checkpointing

Anudeep

View solution in original post

1 REPLY 1

K_Anudeep
Databricks Employee
Databricks Employee

Hello @a_user12 ,

deltaLogRetentionDuration is the interval after which the delta log files will be removed from the delta log. Delta Lake adheres to a set of internal rules to clean up the delta log when the retention duration is exceeded.

Setting delta.logRetentionDuration alone will not automatically remove delta log files. Databricks uses a specific internal cleanup logic, and log files are only deleted once both retention criteria and the required checkpoint files are present. In other words, simply configuring the retention duration does not directly trigger the removal of delta log files: Delta Lake manages delta log deletion asynchronously, and deletes only files eligible per retention rules after periodic checkpointing

Anudeep