The issue you're encountering with the error DeltaFileNotFoundException: [DELTA_TRUNCATED_TRANSACTION_LOG]
is related to Delta Lake's retention policy for logs and checkpoints, which manages the lifecycle of transaction log files and checkpoint files.
This error occurs because Delta Lake is trying to reconstruct the table's state from version 899, but the transaction log files for this version have already been removed as part of the log retention policy. This usually happens if there is no checkpoint file available for the requested version or for versions immediately preceding it.
Delta tables rely on periodic checkpoints to prevent the need for full log replay. If older checkpoint files or their corresponding JSON files have been removed due to the retention policies, operations that require time traveling or version-specific processing might fail.
The default settings for Delta Lake retain:
- Transaction log entries for 30 days (
delta.logRetentionDuration
)
- Checkpoint files for 2 days (
delta.checkpointRetentionDuration
)
If the requested version exceeds the retention period, the files may no longer exist, which results in a DeltaFileNotFoundException error.
If you have S3 Versioning enabled on AWS, Soft Delete enabled on Azure, Soft Delete enabled on GCP, or a similar backup mechanism that periodically saves a copy of the files, you should be able to recover your files.
You could also increase delta.logRetentionDuration and delta.checkpointRetentionDuration for the source Delta table.
This KB will be helpful https://kb.databricks.com/delta/deltafilenotfoundexception-when-reading-a-table/
Thanks.