Hi team.
Everyday once we overwrite the last X month data in tables. So it generate a every day a larger amount of history. We don't use time travel so we don't need it.
What we done:
SET spark.databricks.delta.retentionDurationCheck.enabled = false
ALTER TABLE table_name SET TBLPROPERTIES ('delta.logRetentionDuration'='interval 48 HOURS', 'delta.deletedFileRetentionDuration'='interval 48 HOURS')
We run the following commands on delta tables (full is table name)
print(" OPTIMIZE,Vacuum")
spark.sql("REORG TABLE {} APPLY ( PURGE )".format(full))
spark.sql("OPTIMIZE {}".format(full))
spark.sql("VACUUM {} RETAIN 48 HOURS".format(full))
After this in my understanding the will be only 48 hour history and the file size must be shrink.
After run the history looks like this:
The history stay as is and the file size is the same.
Can you provide me some additional information what i doing wrong, or i misunderstood the concept?