Are UDFs necessary for applying models from ML libraries at scale ?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-24-2023 01:14 PM
Hello,
I recently finished the "scalable machine learning with apache spark" course and saw that SKLearn models could be applied faster in a distributed manner when used in pandas UDFs or with mapInPandas() method.
Spark MLlib models don't need this kind of refactoring since they are made for distributed executions but I was wondering if this kind of UDF was necessary for other libraries such as TensorFlow, PyTorch, SpaCy, Keras, etc.
Thank you !
Labels: