cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Inconsistent behavior while loading pickle file

mh-hsn
New Contributor II

I have a pickle file "vectorizer.pkl" and I am currently facing an inconsistent behavior when trying to load that file. Sometimes it gets loaded successfully and sometimes I face an error. Here is how I am trying to load the file:

from joblib import load
tmp_path = client.download_artifacts(run_id=run_id, path='')
vectorizer = load(os.path.join(tmp_path, 'vectorizer.pickle'))

The error that I get is:

ConnectException error
This is often caused by an OOM error that causes the connection to the Python REPL to be closed. Check your query's memory usage.

There are two things to note about the above issue:

  • The size of the pickle file is just 7.5 mb.
  • There is no other process running on the cluster.

I have experienced the same inconsistent behavior on multiple clusters. Here are the specs of my two different clusters:

My old cluster (My main cluster):

9.1-LTS ML (includes Apache Spark 3.1.2, Scala 2.12)
Worker type: Standard_D16s_v3 (min "1", max "8" )
Driver type: Standard_D64s_v3
Spot instances = True
 
My new cluster (Created just to reproduce the error):
9.1-LTS ML (includes Apache Spark 3.1.2, Scala 2.12)
Worker type: Standard_DS3_v2 (min "1", max "8" )
Driver type: Standard_DS3_v2
Spot instances = True

A bit more information about the experiment that I performed after creating my new cluster. When I created new cluster and tried to load the pickle file, first time I got the following error:

joblib.load RecursionError: maximum recursion depth exceeded while calling a Python object

When I google searched this error, I came across a few threads where they suggested to increase the recursion limit. So I added following two lines in my code:

import sys
sys.setrecursionlimit(30000)

After adding the above two lines I got the same error that I got on my main cluster. i.e.

ConnectException error
This is often caused by an OOM error that causes the connection to the Python REPL to be closed. Check your query's memory usage.

Now the next day when I executed the same piece of code (without newly added two lines) on my new cluster the code executed just fine i.e. it was able to load the pickle file.

I am currently experiencing the same inconsistent behavior on both of my clusters. Even on my main cluster, my parent notebook calls the child notebook twice, this child notebook in turn load the pickle file. In my failed workflow run, first time it loaded the same file just fine and ran into issue when the child notebook was called again later in the parent notebook.

0 REPLIES 0
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!