DeltaFileNotFoundException in a multi cluster conflict
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-11-2023 09:30 AM
I have several parallel data pipeline running in different Airflow DAGs. All of these pipeline execute two dbt selectors in a dedicated Databricks cluster: one of them is a common selector executed in all DAGs. This selector includes a test that is defined in dbt.
To visualize this setup:
----- AIRFLOW ----
> DAG A:
----- > dbt run model A
----- > dbt test common model
> DAG B:
----- > dbt run model B
----- > dbt test common model
However, now I face an error in the `dbt test common model` stage. DAG A & DAG B uses different cluster, but both interacts with the same model & object in the background. I receive this error:
com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: s3://s3-bucket/common_model_test_name/_delta_log/00000000000000000000.json: Unable to reconstruct state at version 2 as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=30 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)
I don't know how to resolve this issue. A blog I read tells me I should clear Delta Log's cache, but I'm not sure how to do that if I don't have access to the cluster. Could anyone help me to understand what's going on and how to resolve this issue?
Thank you