cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Recover files from previous cluster execution

carlosna
New Contributor II

I saved a file with results by just opening a file via fopen("filename.csv", "a").

Once the execution ended (and the cluster shutted down) I couldn't retrieve the file.

I found that the file was stored in "/databricks/driver", and that folder empties when the cluster shuts down.

Is there any way I can retrieve it?

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @carlosna , Unfortunately, if the file was saved to the /databricks/driver location, it will be lost when the cluster is shut down. The driver node of a Databricks cluster is ephemeral, meaning that its storage is attached to the machine hosting the driver node, and is destroyed when the node is terminated.

To prevent this from happening, you should save your results to a more reliable location, such as DBFS (Databricks file system) or a cloud storage service such as Azure Blob Storage or AWS S3.

You can use DBFS to save data in an S3- like interface. The DBFS provides an abstraction over underlying cloud storage, allowing you to store different types of data such as notebooks, libraries, and data files. The files are persisted independently of the cluster's lifecycle, so you can save files in DBFS even after the cluster has terminated.

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @carlosna , Unfortunately, if the file was saved to the /databricks/driver location, it will be lost when the cluster is shut down. The driver node of a Databricks cluster is ephemeral, meaning that its storage is attached to the machine hosting the driver node, and is destroyed when the node is terminated.

To prevent this from happening, you should save your results to a more reliable location, such as DBFS (Databricks file system) or a cloud storage service such as Azure Blob Storage or AWS S3.

You can use DBFS to save data in an S3- like interface. The DBFS provides an abstraction over underlying cloud storage, allowing you to store different types of data such as notebooks, libraries, and data files. The files are persisted independently of the cluster's lifecycle, so you can save files in DBFS even after the cluster has terminated.

Kaniz_Fatma
Community Manager
Community Manager

Hi @carlosna , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.




 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!