cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Why should I use Spark MLlib for ML vs other available libraries?

User16826992666
Valued Contributor
 
1 ACCEPTED SOLUTION

Accepted Solutions

sean_owen
Honored Contributor II
Honored Contributor II

You don't have to. If you don't have a huge data set, there may not be much value in Spark ML over anything else. There are also other distributed modeling libraries that work on Spark like xgboost, and Horovod + TF, Keras, Pytorch. Spark ML is a good choice when you have a very large data set and need a fairly basic algorithm like logistic regression.

View solution in original post

1 REPLY 1

sean_owen
Honored Contributor II
Honored Contributor II

You don't have to. If you don't have a huge data set, there may not be much value in Spark ML over anything else. There are also other distributed modeling libraries that work on Spark like xgboost, and Horovod + TF, Keras, Pytorch. Spark ML is a good choice when you have a very large data set and need a fairly basic algorithm like logistic regression.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.