cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

corrupted delta logs

Khaja_Zaffer
Contributor

ERROR: 

DeltaVersionsNotContiguousException: Versions (0, 2) are not contiguous. This can happen when files have been manually removed from the Delta log. Please contact Databricks support to repair the table.

Cause of the error: 

You are getting this error msg because there is no continuity in .json files under the _delta_log folder. 

What is delta_log folder her? as we know data lake lacks ACID principles, which is automicity, consistency, Isolation and durability. 

WHY ACID principles are required? 

ACID transactions ensure the highest possible data reliability and integrity. They ensure that your data never falls into an inconsistent state because of an operation that only partially completes. For example, without ACID transactions, if you were writing some data to a database table, but the power went out unexpectedly, it's possible that only some of your data would have been saved, while some of it would not. Now your database is in an inconsistent state that is very difficult and time-consuming to recover from.

So in simple layman terms, lets say if you have a parquet format file in data lake if it has delta_log folder then its a delta lake. 

Because in delta_log we have schema and a short summary of the files available ie., min and max values, count values, etc., which also makes the read quicker. To learn more about delta lake : https://delta.io/blog/delta-lake-optimize/ 

 

So do we have a solution for the error? Yes we do have a solution because we are engineers. 

Solution

We can use below 2 approaches to recover the table up to last contiguous delta version:

  • Approach 1:

    For this case, we will have 0.json and 2.json. (1.json file is missing), By manually deleting the 2.json file, we can query the table. (delete all JSON files from versions that are higher than where the continuity is missing)

  • Approach 2: (This approach is recommended if there is no merge or delete operations on the delta table, i.e. Only, appending or overwriting the data)

    This approach helps to retrieve the delta table without dropping it.

    1. Remove the _delta_log folder

      %sh
      rm -r /dbfs/user/hive/warehouse/<TABLENAME>/_delta_log
    2. Now without _delta_log, this will be treated as a parquet table.

    3. Convert this parquet table to a delta table

      %sql
      CONVERT TO DELTA parquet. `/user/hive/warehouse/<TABLENAME>/`;

 

Thank you so much for reading!!

 

 

1 REPLY 1

szymon_dybczak
Esteemed Contributor III

Thanks for sharing @Khaja_Zaffer!