boskicl
New Contributor III

I also attempted this:

mlflow.set_registry_uri("databricks-uc")

loaded_model = mlflow.pyfunc.spark_udf(
    spark,
    model_uri=f"models:/{model_name}@production",
    result_type="double"
)

 

And got it to load in this command cell and tried to do predictions like:

assemble_transform = assembler.transform(allNewRels)

preds_final_df = (
assemble_transform.withColumn(
    "prediction",
    loaded_model(struct(col("features")))
).select("id", "second_id", "prediction")
)

But trying to save the above dataframe to a Delta table caused a Python worker to error out.

pyspark.errors.exceptions.captured.IllegalArgumentException: requirement failed: Column features must be of type class org.apache.spark.ml.linalg.VectorUDT:struct<type:tinyint,size:int,indices:array<int>,values:array<double>> but was actually class org.apache.spark.sql.types.StructType:struct<indices:array<int>,size:bigint,type:bigint,values:array<double>>.

I think we can't load the model due to being a Regressor and has this VectorUDT

Just trying to post the things I tried and where it fell short.