cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Error while reading file <file path>. [DEFAULT_FILE_NOT_FOUND]

The_raj
New Contributor

Hi,

I have a workflow created where there are 5 notebooks in it. One of the notebooks is failing with below error. I have tried refreshing the table. Still facing the same issue. When I try to run the notebook manually, it works fine. Can someone please help me to find the permanent solution for this.

Job aborted due to stage failure: Task 736 in stage 92.0 failed 4 times, most recent failure: Lost task 736.3 in stage 92.0 (TID 3715) (executor 18): com.databricks.sql.io.FileReadException: Error while reading file <path>. [DEFAULT_FILE_NOT_FOUND] It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. If disk cache is stale or the underlying files have been removed, you can invalidate disk cache manually by restarting the cluster.

Hubert_Dudek1 werners1 @Prabakar  @Debayan daniel_sahal

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @The_raj , 

The error message you are encountering indicates a failure during the execution of a Spark job on Databricks. Specifically, it seems that Task 736 in Stage 92.0 failed multiple times, and the most recent loss was due to a "DEFAULT_FILE_NOT_FOUND" error while reading a file at a specific <path>.

The error message provides some helpful suggestions to resolve the issue:

  1. Refresh Table: It suggests explicitly invalidating the cache in Spark by running the’ REFRESH TABLE tableName’ command in SQL. If you use cached tables or views in your Spark job, refreshing them may help if the underlying data has changed.

  2. Recreate Dataset/DataFrame: Another option is to recreate the Dataset/DataFrame involved in the operation. This can help if there are any inconsistencies between the cached data and the actual data in the storage.

  3. Invalidate Disk Cache: If the disk cache is stale or the underlying files have been removed, the error message suggests invalidating the disk cache manually by restarting the cluster. This can help ensure that the data is read fresh from the storage.

Here are the steps you can take to address the issue:

  1. Refresh Table: If you are using any cached tables or views in your Spark job, run the following command in SQL to refresh them before re-executing the job:

     

    REFRESH TABLE tableName;
  2. Recreate Dataset/DataFrame: If you are working with intermediate DataFrames or Datasets, try recreating them before rerunning the problematic operation.

  3. Invalidate Disk Cache: If refreshing the table or recreating the DataFrame doesn't resolve the issue, consider restarting the cluster to invalidate the disk cache. In Databricks, you can continue the group by following these steps:

    • Go to the Databricks workspace.
    • Select the group where the job is running.
    • Click on "Restart Cluster" to initiate the restart.

After performing these steps, rerun your Spark job to see if the issue is resolved. If the problem persists, you may need to investigate why the file is not found at the specified path. It's possible that there could be issues with the data source or the path configuration in your code. Check if the file exists in the specified location and verify your Spark job's path settings.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.