Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
Runtime error using MLFlow and Spark on databricks

New Contributor III

Here is some model I created:

class SomeModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, input):
        # do fancy ML stuff
        # log results
        pandas_df = pd.DataFrame(...insert predictions here...)
        spark_df = spark.createDataFrame(pandas_df)
        spark_df.write.saveAsTable('tablename', mode='append')

I'm trying to log my model in this manner by calling it later in my code:

with mlflow.start_run(run_name="SomeModel_run"):
    model = SomeModel()
    mlflow.pyfunc.log_model("somemodel", python_model=model)

Unfortunately it gives me this Error Message:

RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

The error is caused because of the line

mlflow.pyfunc.log_model("somemodel", python_model=model)

If I comment it out my model will make its predictions and log the results in my table.

Alternatively, removing the lines in my predict function where I call spark to create a dataframe and save the table, I am able to log my model.

How do I go about resolving this issue? I need my model to not only write to the table but also be logged


Esteemed Contributor III

this sis something new we have to explore this , do you have any docs that you re following here

New Contributor III

Any updates on this? I am running into the same issue

@Patrick Tawil​ were you able to solve this problem? If so, do you mind sharing?

