DeltaFileNotFoundException in a multi cluster conflict

ammarchalifah
New Contributor

I have several parallel data pipeline running in different Airflow DAGs. All of these pipeline execute two dbt selectors in a dedicated Databricks cluster: one of them is a common selector executed in all DAGs. This selector includes a test that is defined in dbt.

To visualize this setup:

----- AIRFLOW ----

> DAG A:

----- > dbt run model A

----- > dbt test common model

> DAG B:

----- > dbt run model B

----- > dbt test common model

However, now I face an error in the `dbt test common model` stage. DAG A & DAG B uses different cluster, but both interacts with the same model & object in the background. I receive this error:

com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: s3://s3-bucket/common_model_test_name/_delta_log/00000000000000000000.json: Unable to reconstruct state at version 2 as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=30 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)

I don't know how to resolve this issue. A blog I read tells me I should clear Delta Log's cache, but I'm not sure how to do that if I don't have access to the cluster. Could anyone help me to understand what's going on and how to resolve this issue?

Thank you