02-04-2025 08:06 PM
Dear Databricks Experts,
I am encountering a recurring issue while working with Delta streaming tables in my system. The error message is as follows:
com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: [DELTA_TRUNCATED_TRANSACTION_LOG] gs://cimb-prod-lakehouse/bronze-layer/icoredb/dpb_revi_loan/_delta_log/00000000000000000000.json: Unable to reconstruct state at version 899 as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=3 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)
I would greatly appreciate any insights or suggestions on how to resolve this issue and prevent it from occurring in the future.
Thank you!
02-05-2025 08:50 PM
Hi, does anyone have any suggestions for this topic?
02-06-2025 06:45 AM
Without knowing the read patterns it's hard to say what the checkpointing issue is. But I'd recommend leaving the default retention periods for log and checkpoint locations if your table's not updated that often. I'd rarely recommend lower than 7 days unless you had some very large fast pipeline.
I've also never seen someone set checkpoint retention differently from log retention. Not saying it's wrong, just never seen it before.
I'd also recommend looking into predictive optimisation - it's a great way to manage stale files without having to think about it much.
02-06-2025 07:56 PM
And the reason I had to set log retention and checkpoint retention to less than 7 days is that if I leave the default values, my pipeline will get a 'Listing file' error which we don't know how to fix yet. So the temporary solution is to reduce the default values to less than 7 days.
02-06-2025 07:42 PM
Hi @holly ,
Thanks for the suggestions and solutions you gave, I will try to apply them again and check the results.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now