cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

DeltaFileNotFoundException in a multi cluster conflict

ammarchalifah
New Contributor

I have several parallel data pipeline running in different Airflow DAGs. All of these pipeline execute two dbt selectors in a dedicated Databricks cluster: one of them is a common selector executed in all DAGs. This selector includes a test that is defined in dbt.

To visualize this setup:

----- AIRFLOW ----

> DAG A:

----- > dbt run model A

----- > dbt test common model

> DAG B:

----- > dbt run model B

----- > dbt test common model

However, now I face an error in the `dbt test common model` stage. DAG A & DAG B uses different cluster, but both interacts with the same model & object in the background. I receive this error:

com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: s3://s3-bucket/common_model_test_name/_delta_log/00000000000000000000.json: Unable to reconstruct state at version 2 as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=30 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)

I don't know how to resolve this issue. A blog I read tells me I should clear Delta Log's cache, but I'm not sure how to do that if I don't have access to the cluster. Could anyone help me to understand what's going on and how to resolve this issue?

Thank you

1 REPLY 1

Anonymous
Not applicable

@Ammar Ammar​ :

The error message you're seeing suggests that the Delta Lake transaction log for the common model's test table has been truncated or deleted, either manually or due to the retention policies set in your cluster. This can happen if the log gets too big or if it's been around for too long.

To fix this issue, you can try the following steps:

  1. Confirm that the Delta Log has been truncated or deleted. You can do this by checking the cluster logs or running a query against the common model's test table to see if it fails with the same error message. If it has been deleted, you will need to recreate the table and reload the data.
  2. If the Delta Log has not been deleted, you can try clearing the Delta Log's cache by running the following command in a Databricks notebook:
%sql
CLEAR CACHE

This will clear the cached state of all Delta tables in the current cluster. If you don't have access to the cluster, you may need to ask your Databricks administrator to run this command for you.

3) If clearing the cache doesn't work, you can try setting the retention policies for the Delta Log and checkpoint files to longer durations, so that they don't get deleted before your pipelines have a chance to run. You can do this by setting the following configuration options in your Databricks cluster:

spark.databricks.delta.retentionDurationCheck.enabled = true
spark.databricks.delta.retentionDurationCheck.intervalHours = 1
spark.databricks.delta.logRetentionDuration = "30 days"
spark.databricks.delta.checkpointRetentionDuration = "2 days"

4) This will enable retention duration checks, which will warn you when the Delta Log or checkpoint files are about to be deleted due to the retention policies. You can then adjust the policies as necessary to ensure that the files are retained for a longer period of time.

I hope this helps you resolve your issue!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.