cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DeltaFileNotFoundException: No file found in the directory (sudden task failure)

Juju
New Contributor II

Hi all,

I am currently running a job that will upsert a table by reading from delta change data feed from my silver table. Here is the relevent snippet of code:

 

 

rds_changes = spark.read.format("delta") \
  .option("readChangeFeed", "true") \
  .option("startingVersion", 0) \
  .table("main.default.gold_table") \
  .where(f"_commit_timestamp >= '{(datetime.now() - timedelta(hours=1)).strftime('%Y-%m-%d %H:%M:%S')}'")

 

 

Here is the error returned

 

 

com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: No file found in the directory: s3://databricks-workspace-stack-70da1-metastore-bucket/60ed403c-0a54-4f42-8b8a-73b8cea1bdc3/tables/6d4a9b3d-f88b-436e-be1b-09852f605f4c/_delta_log.

 

 

 I have done the following:

  • Verify that the delta log folder is not deleted by accessing S3 directly
  • Able to query the table directly and perform `DESCRIBE HISTORY gold_table` on it without any issue

Anyone has any idea why this happen when I am running the job which was working fine previously without any changes

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @Juju, The error message you’re encountering, com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException, indicates that the Delta log file is missing in the specified directory. 

 

Let’s explore some potential solutions to address this issue:

 

Check Delta Log Truncation or Deletion:

Spark Configuration Options:

  • Consider the following Spark configuration options:
    • Use a New Checkpoint Directory:
      • Create a new checkpoint directory for your job. However, you mentioned that this might not be feasible due to the need to process existing data.
    • Set spark.sql.files.ignoreMissingFiles to True:
      • This property allows Spark to ignore missing files during processing. It won’t reprocess data from the beginning; instead, it will resume from where the last checkpoint left off.
  • Adjusting these settings may help you avoid data loss while resolving the issue.

Remember to carefully evaluate the impact of any changes on your existing data and processing flow. If possible, test these solutions in a controlled environment to minimize disruptions. Good luck, and I hope this helps you resolve the issue! 🚀

View solution in original post

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @Juju, The error message you’re encountering, com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException, indicates that the Delta log file is missing in the specified directory. 

 

Let’s explore some potential solutions to address this issue:

 

Check Delta Log Truncation or Deletion:

Spark Configuration Options:

  • Consider the following Spark configuration options:
    • Use a New Checkpoint Directory:
      • Create a new checkpoint directory for your job. However, you mentioned that this might not be feasible due to the need to process existing data.
    • Set spark.sql.files.ignoreMissingFiles to True:
      • This property allows Spark to ignore missing files during processing. It won’t reprocess data from the beginning; instead, it will resume from where the last checkpoint left off.
  • Adjusting these settings may help you avoid data loss while resolving the issue.

Remember to carefully evaluate the impact of any changes on your existing data and processing flow. If possible, test these solutions in a controlled environment to minimize disruptions. Good luck, and I hope this helps you resolve the issue! 🚀

Juju
New Contributor II

Hey @Kaniz_Fatma , found the issue is due to truncated delta log. Thanks for the help man

Kaniz_Fatma
Community Manager
Community Manager

Hi @Juju , Hi, We value your perspective! It's great to hear that your query has been successfully resolved. Thank you for your contribution.




 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group