Your issue—encountering "the transaction log has failed integrity checks" in Databricks Delta Lake—indicates metadata corruption or an inconsistency in the Delta transaction log (_delta_log). This commonly disrupts DML operations like OPTIMIZE, DELETE, INSERT, or schema evolution, while still allowing read-only queries, as the query engine can skip over many log problems during regular reads but not when writing or updating metadata.
What Causes Transaction Log Integrity Checks to Fail?
-
Manual interference: Direct changes in the _delta_log directory (such as moving, renaming or deleting JSON/CRC files) outside of supported APIs.
-
Storage layer instability: Issues in the underlying cloud storage (S3/ADLS/Blob), such as eventual consistency or filesystem caching.
-
Process interruption: Terminated write/merge jobs that leave partial state.
-
Concurrent operations: Unusual concurrency patterns or forced interruptions.
-
Bugs/edge cases: Less common, but occasionally bugs in Delta Lake can leave corrupt states after job failures or crashes, especially with third-party Delta implementations.
Risks of Disabling the Integrity Check
Switching spark.databricks.delta.state.corruptionIsFatal to False simply suppresses the error and lets the system skip the check—it does NOT fix your log or underlying corruption. While this can restore write access, you'll risk:
-
Data loss if underlying files are missing or inconsistent
-
Inability to audit or "time travel" reliably, as older table states may be gone or corrupt
-
Harder troubleshooting later
Safer Remediation Steps
1. Pinpoint Corruption
-
Use DESCRIBE HISTORY <table> and look for out-of-order, missing, or duplicate versions.
-
Inspect the _delta_log directory, looking for missing, truncated, or outright corrupted JSON/CRC files.
2. Restore or Repair Transaction Log
If you have a backup, restoring an older (healthy) state via the Delta Lake RESTORE command or by copying _delta_log from backup can recover the timeline.
-
Delta 2.0+ has experimental FSCK and VACUUM tools, but these are still in development and not always available.
-
Sometimes, Databricks Support can manually repair your _delta_log if you are an enterprise customer.
3. Export and Reload
If all else fails, copying data out (just as you did), then recreating the table, is the only way to guarantee integrity for the future—at the cost of transaction history.
4. Prevent Recurrence
-
Never modify files inside _delta_log directly.
-
Use only supported Databricks APIs for writes/merges.
-
Address any storage layer anomalies.
-
Use table access controls and monitoring.
Consult Databricks Support
If retaining your table history is business-critical, raise a support ticket with Databricks, referencing table path, workspace ID, error message, and a description of all recent operations. They have internal tools to repair many types of log corruption you cannot fix yourself.
In summary:
Do not rely on disabling corruptionIsFatal as a permanent solution—it hides symptoms, not causes. For enterprise/critical tables, escalate to Databricks Support for possible transaction log repair. For non-critical tables, or if support cannot help, copying/reloading will restore your table’s health but reset its log. Prevent further incidents by reviewing processes and ensuring only supported APIs and reliable storage layers are involved.
References
-
Community and Databricks documentation on transaction log integrity.