We have a small table, which undergoes a merge operation on a daily basis. This causes that currently the table has 83 versions.
When trying to query this table, we receive the following error:
DeltaFileNotFoundException: dbfs:/mnt/XXXXX/warehouse/XXXXXXXXX.db/XXXXXXXXX /_delta_log/00000000000000000000.json: Unable to reconstruct state at version 83 as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=30 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)
When reviewing the DBFS using %fs command, we found out that effectively the json and crc files for the 0 and 1 version, were missing. All the other files corresponding to version 2 to 83 were inside the _delta_log folder. However, since the files:
00000000000000000000.crc
00000000000000000000.json
00000000000000000001.crc
00000000000000000001.json
… were missing, Databricks was unable to query the table.
How is it possible that this log files are deleted automatically?
Is it related with the following fragment (Reference: https://github.com/delta-io/delta/blob/master/PROTOCOL.md)?
So far, the only solution for this kind of problem has been to reconstruct the table. However, this problem emerges occasionally but affecting different tables every time.
How can I prevent this error to keep happening in the future?