cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Machine Learning
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Alternatives for xg boost

Dk_1802
New Contributor III

Are there any alternatives for xg boost that work well with pyspark?

1 ACCEPTED SOLUTION

Accepted Solutions

spark_ds
New Contributor III

XGboost is now an option in pyspark pipelines (link here) but PysparkML also supports a number of alternatives including

  1. Gradient-Boosted Trees (GBTs): Gradient-boosted trees is another boosting ensemble technique that learns from its mistakes in previous iterations. For this one can use the GBTClassifier from pyspark.ml.classification.
  2. Random Forest: Random Forest is a bagging ensemble learning method. For this one can use the RandomForestClassifier from pyspark.ml.classification.
  3. Decision Tree Classifier: The decision tree is a simple yet effective machine learning algorithm. For this one can use theDecisionTreeClassifier from pyspark.ml.classification.

 

View solution in original post

2 REPLIES 2

spark_ds
New Contributor III

XGboost is now an option in pyspark pipelines (link here) but PysparkML also supports a number of alternatives including

  1. Gradient-Boosted Trees (GBTs): Gradient-boosted trees is another boosting ensemble technique that learns from its mistakes in previous iterations. For this one can use the GBTClassifier from pyspark.ml.classification.
  2. Random Forest: Random Forest is a bagging ensemble learning method. For this one can use the RandomForestClassifier from pyspark.ml.classification.
  3. Decision Tree Classifier: The decision tree is a simple yet effective machine learning algorithm. For this one can use theDecisionTreeClassifier from pyspark.ml.classification.

 

Vinay_M_R
Valued Contributor II
Valued Contributor II

Spark MLlib GBT: Spark MLlib, the machine learning library included with Apache Spark, provides its own implementation of gradient boosting trees (GBT). It offers similar functionality to XGBoost and can be used directly within PySpark ML pipelines without requiring external libraries.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.