Re: Are UDFs necessary for applying models from ML...

Hubert-Dudek · ‎01-25-2023

MlLib is in the maintenance model. Currently, Spark ML is used mainly. Creating the model is not using UDFs in most cases https://www.databricks.com/spark/getting-started-with-apache-spark/machine-learning, but anyway, UDF is usually run in a distributed way. For example, when you append data to your table, you can use UDF to run prediction using a registered model (even on a real-time stream):

import mlflow
predict = mlflow.pyfunc.spark_udf(spark, model_uri=f"runs:/{run_id}/model")
predDF = testDF.withColumn("prediction", predict(*testDF.columns))

My blog: https://databrickster.medium.com/