I've been getting this error pretty regularly while working with mlflow:
"It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."
I have a class that extends the mlflow.pyfunc.PythonModel. It has a method that is used to train the data (so not used in prediction) that takes a spark dataframe and applies some filters to get the training dataset. only when I remove this function does the model save.
I was just wondering how mlflow determines whether a class accesses the spark context.
edit: this is really frustrating. it feels like mlflow is designed not to work with data robot time aware modeling.