cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pickle/joblib.dump a pre-processing function defined in a notebook

aswanson
New Contributor

I've built a custom MLFlow model class which I know functions. As part of a given run the model class uses `joblib.dump` to store necessary parameters on the databricks DBFS before logging them as artifacts in the MLFlow run. This works fine when using functions defined within the libraries contained in the custom model class, but I run into SPARK-5063 CONTEXT_ONLY_VALID_ON_DRIVER errors if I use functions defined in the notebook in the model parameters. 

This extends to trivial python functions defined in the notebook such as:

```

import joblib
def tmpfun(val😞
    return val + 'bar'

joblib.dump(tmpfun, 'tmp.pkl')

```

It seems like the spark context is being injected into the function call or something, but I have no idea how to isolate the required functions such that they can be loaded later to rebuild the model.

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now