We have a small table, which undergoes a merge operation on a daily basis. This causes that currently the table has 83 versions.
When trying to query this table, we receive the following error:
DeltaFileNotFoundException: dbfs:/mnt/XXXXX/warehouse/XXXXXXXXX.db/XXXXXXXXX /_delta_log/00000000000000000000.json: Unable to reconstruct state at version 83 as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=30 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)
When reviewing the DBFS using %fs command, we found out that effectively the json and crc files for the 0 and 1 version, were missing. All the other files corresponding to version 2 to 83 were inside the _delta_log folder. However, since the files:
00000000000000000000.crc
00000000000000000000.json
00000000000000000001.crc
00000000000000000001.json
… were missing, Databricks was unable to query the table.
How is it possible that this log files are deleted automatically?
Is it related with the following fragment (Reference: https://github.com/delta-io/delta/blob/master/PROTOCOL.md)?
![rgualans_0-1705078443409.png rgualans_0-1705078443409.png](/t5/image/serverpage/image-id/5808i8C1BCED61632A18B/image-size/medium/is-moderation-mode/true?v=v2&px=400)
So far, the only solution for this kind of problem has been to reconstruct the table. However, this problem emerges occasionally but affecting different tables every time.
How can I prevent this error to keep happening in the future?