Runtime error using MLFlow and Spark on databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-07-2022 08:49 AM
Here is some model I created:
class SomeModel(mlflow.pyfunc.PythonModel):
def predict(self, context, input):
# do fancy ML stuff
# log results
pandas_df = pd.DataFrame(...insert predictions here...)
spark_df = spark.createDataFrame(pandas_df)
spark_df.write.saveAsTable('tablename', mode='append')
I'm trying to log my model in this manner by calling it later in my code:
with mlflow.start_run(run_name="SomeModel_run"):
model = SomeModel()
mlflow.pyfunc.log_model("somemodel", python_model=model)
Unfortunately it gives me this Error Message:
RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
The error is caused because of the line
mlflow.pyfunc.log_model("somemodel", python_model=model)
If I comment it out my model will make its predictions and log the results in my table.
Alternatively, removing the lines in my predict function where I call spark to create a dataframe and save the table, I am able to log my model.
How do I go about resolving this issue? I need my model to not only write to the table but also be logged
- Labels:
-
MlFlow
-
Mlflow Model
-
Pyspark
-
Python
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-17-2022 11:26 PM
this sis something new we have to explore this , do you have any docs that you re following here
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-07-2023 08:08 AM
Any updates on this? I am running into the same issue
@Patrick Tawil were you able to solve this problem? If so, do you mind sharing?

