cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mradassaad
by New Contributor III
  • 3938 Views
  • 2 replies
  • 1 kudos

Resolved! Tuning `CrossValidator` spark job performance

I am running a 3-fold cross validation of an ML pipeline that utilizes `GBTClassifier` as the final step. It takes 18 hours to run and I am looking for feedback into how to improve the performance as I expect this to go faster.For context here is the...

Random Forest Job Random Forest Job Summary GBT storage top half
  • 3938 Views
  • 2 replies
  • 1 kudos
Latest Reply
cchalc
New Contributor III
  • 1 kudos

Hello @Assaad Mrad​ , So this looks like trying to decide between putting the pipeline in the cross validator or the cross validator in the pipeline. Since you are doing the polynomial expansion as part of the pipeline you might want to consider putt...

  • 1 kudos
1 More Replies
Wayne
by New Contributor III
  • 1646 Views
  • 2 replies
  • 3 kudos
  • 1646 Views
  • 2 replies
  • 3 kudos
Latest Reply
Wayne
New Contributor III
  • 3 kudos

No error, just seeing the EXPAND DISK in cluster event logs. This is just a regular spark application. I am not sure if the cloud storage matters - a spark application uses it as input and output.

  • 3 kudos
1 More Replies
Labels