cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826992666
by Valued Contributor
  • 2042 Views
  • 1 replies
  • 0 kudos

Resolved! What's the difference between SparkML and Spark MLlib?

I have heard people talk about SparkML but when reading documentation it talks about MLlib. I don't understand the difference, could anyone help me understand this?

  • 2042 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

They're not really different. Before DataFrames in Spark, older implementations of ML algorithms build on the RDD API. This is generally called "Spark MLlib". After DataFrames, some newer implementations were added as wrappers on top of the old ones ...

  • 0 kudos
Joseph_B
by New Contributor III
  • 539 Views
  • 1 replies
  • 0 kudos

When doing hyperparameter tuning with Hyperopt, when should I use SparkTrials? Does it work with both single-machine ML (like sklearn) and distributed ML (like Apache Spark ML)?

I want to know how to use Hyperopt in different situations:Tuning a single-machine algorithm from scikit-learn or single-node TensorFlowTuning a distributed algorithm from Spark ML or distributed TensorFlow / Horovod

  • 539 Views
  • 1 replies
  • 0 kudos
Latest Reply
Joseph_B
New Contributor III
  • 0 kudos

The right question to ask is indeed: Is the algorithm you want to tune single-machine or distributed?If it's a single-machine algorithm like any from scikit-learn, then you can use SparkTrials with Hyperopt to distribute hyperparameter tuning.If it's...

  • 0 kudos
z160896
by New Contributor II
  • 6540 Views
  • 2 replies
  • 0 kudos

why spark very slow with large number of dataframe columns

scala Spark App: I have a dataset of 130x14000. I read from a parquet file with SparkSession. Then used for Spark ML Random Forest model (using pipeline). It takes 7 hours to complete! for reading the parquet file takes about 1 minute. If I implemen...

  • 6540 Views
  • 2 replies
  • 0 kudos
Latest Reply
EliasHaydar
New Contributor II
  • 0 kudos

I've already answered a similar question on StackOverflow so I'll repeat what a I said there. The following may not solve your problem completely but it should give you some pointer to start. The first problem that you are facing is the disproportio...

  • 0 kudos
1 More Replies
Labels