Pickle/joblib.dump a pre-processing function defined in a notebook
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-24-2025 02:45 PM
I've built a custom MLFlow model class which I know functions. As part of a given run the model class uses `joblib.dump` to store necessary parameters on the databricks DBFS before logging them as artifacts in the MLFlow run. This works fine when using functions defined within the libraries contained in the custom model class, but I run into SPARK-5063 CONTEXT_ONLY_VALID_ON_DRIVER errors if I use functions defined in the notebook in the model parameters.
This extends to trivial python functions defined in the notebook such as:
```
```
It seems like the spark context is being injected into the function call or something, but I have no idea how to isolate the required functions such that they can be loaded later to rebuild the model.

