cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How To Save a File as a Pickle Object to the Databricks File System

Rasputin312
New Contributor II

I tried running this code:

```

def save_file(name, obj😞
   with open(name, 'wb') as
        pickle.dump(obj, f)

``` 

One file was saved in the local file system, but the second was too large and so I need to save in the dbfs file system.  Unfortunately, I don't see any method that allows me to do that.  They all refer to saving dataframes, and this is not a dataframe.  It's a python object.

1 ACCEPTED SOLUTION

Accepted Solutions

JissMathew
Contributor III

To save a Python object to the Databricks File System (DBFS), you can use the dbutils.fs module to write files to DBFS. Since you are dealing with a Python object and not a DataFrame, you can use the pickle module to serialize the object and then write it to DBFS. Here's how you can modify your code to achieve this:

First, ensure you have imported the necessary modules:
Python

import pickle
import os
Use the dbutils.fs module to write the serialized object to DBFS. You can use the open function with a DBFS path to write the file:
Python

def save_file_to_dbfs(dbfs_path, obj):
# Serialize the object to a byte stream
serialized_obj = pickle.dumps(obj)

# Write the serialized object to a file in DBFS
with open('/dbfs' + dbfs_path, 'wb') as f:
f.write(serialized_obj)

my_object = {'key': 'value'} # Replace with your actual object
dbfs_file_path = '/FileStore/my_object.pkl' # Path in DBFS
save_file_to_dbfs(dbfs_file_path, my_object)

Jiss Mathew
India .

View solution in original post

1 REPLY 1

JissMathew
Contributor III

To save a Python object to the Databricks File System (DBFS), you can use the dbutils.fs module to write files to DBFS. Since you are dealing with a Python object and not a DataFrame, you can use the pickle module to serialize the object and then write it to DBFS. Here's how you can modify your code to achieve this:

First, ensure you have imported the necessary modules:
Python

import pickle
import os
Use the dbutils.fs module to write the serialized object to DBFS. You can use the open function with a DBFS path to write the file:
Python

def save_file_to_dbfs(dbfs_path, obj):
# Serialize the object to a byte stream
serialized_obj = pickle.dumps(obj)

# Write the serialized object to a file in DBFS
with open('/dbfs' + dbfs_path, 'wb') as f:
f.write(serialized_obj)

my_object = {'key': 'value'} # Replace with your actual object
dbfs_file_path = '/FileStore/my_object.pkl' # Path in DBFS
save_file_to_dbfs(dbfs_file_path, my_object)

Jiss Mathew
India .