cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Recover files from previous cluster execution

carlosna
New Contributor II

I saved a file with results by just opening a file via fopen("filename.csv", "a").

Once the execution ended (and the cluster shutted down) I couldn't retrieve the file.

I found that the file was stored in "/databricks/driver", and that folder empties when the cluster shuts down.

Is there any way I can retrieve it?

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @carlosna , Unfortunately, if the file was saved to the /databricks/driver location, it will be lost when the cluster is shut down. The driver node of a Databricks cluster is ephemeral, meaning that its storage is attached to the machine hosting the driver node, and is destroyed when the node is terminated.

To prevent this from happening, you should save your results to a more reliable location, such as DBFS (Databricks file system) or a cloud storage service such as Azure Blob Storage or AWS S3.

You can use DBFS to save data in an S3- like interface. The DBFS provides an abstraction over underlying cloud storage, allowing you to store different types of data such as notebooks, libraries, and data files. The files are persisted independently of the cluster's lifecycle, so you can save files in DBFS even after the cluster has terminated.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @carlosna , Unfortunately, if the file was saved to the /databricks/driver location, it will be lost when the cluster is shut down. The driver node of a Databricks cluster is ephemeral, meaning that its storage is attached to the machine hosting the driver node, and is destroyed when the node is terminated.

To prevent this from happening, you should save your results to a more reliable location, such as DBFS (Databricks file system) or a cloud storage service such as Azure Blob Storage or AWS S3.

You can use DBFS to save data in an S3- like interface. The DBFS provides an abstraction over underlying cloud storage, allowing you to store different types of data such as notebooks, libraries, and data files. The files are persisted independently of the cluster's lifecycle, so you can save files in DBFS even after the cluster has terminated.

Kaniz
Community Manager
Community Manager

Hi @carlosna , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.




 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.