Are UDFs necessary for applying models from ML lib...

anvil · ‎01-24-2023

Hello,

I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method.

Spark MLlib models don't need this kind of refactoring since they are made for distributed executions but I was wondering if this kind of UDF was necessary for other libraries such as TensorFlow, PyTorch, SpaCy, Keras, etc.

Thank you !

Are UDFs necessary for applying models from ML libraries at scale ?