Are UDFs necessary for applying models from ML libraries at scale ?

anvil
Databricks Partner

Hello,

I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method.

Spark MLlib models don't need this kind of refactoring since they are made for distributed executions but I was wondering if this kind of UDF was necessary for other libraries such as TensorFlow, PyTorch, SpaCy, Keras, etc.

Thank you !