cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pavan1
by New Contributor II
  • 1347 Views
  • 3 replies
  • 2 kudos

Resolved! Does Spark MLlib support Generalized Additive Modeling? How does one go about implementing GAM models in Spark?

I want to implement GAM (Generalized additive model) model in Spark. Based on my research on online forums, I could not find the implementation of GAM models on Spark. Has anyone in this community attempted this? Does Spark MLlib support GAM?

  • 1347 Views
  • 3 replies
  • 2 kudos
Latest Reply
Pavan1
New Contributor II
  • 2 kudos

Hi @Kaniz Fatma​, thanks for sharing this. We ended up using pyGAM library in Python for this. This PDF is a good introduction. I will share my learnings once we complete our experiments.

  • 2 kudos
2 More Replies
admo
by New Contributor III
  • 3399 Views
  • 5 replies
  • 7 kudos

Scaling issue for inference with a spark.mllib model

Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns) interaction = build_initial_df()...

  • 3399 Views
  • 5 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...

  • 7 kudos
4 More Replies
User16826992666
by Valued Contributor
  • 992 Views
  • 1 replies
  • 0 kudos
  • 992 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

You don't have to. If you don't have a huge data set, there may not be much value in Spark ML over anything else. There are also other distributed modeling libraries that work on Spark like xgboost, and Horovod + TF, Keras, Pytorch. Spark ML is a goo...

  • 0 kudos
User16826992666
by Valued Contributor
  • 8500 Views
  • 2 replies
  • 0 kudos

Why do Spark MLlib models only accept a vector column as input?

In other libraries I can just use the feature columns themselves as inputs, why do I need to make a vector out of my features when I use MLlib?

  • 8500 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

Yeah, it's more a design choice. Rather than have every implementation take column(s) params, this is handled once in VectorAssembler for all of them. One way or the other, most implementations need a vector of inputs anyway. VectorAssembler can do s...

  • 0 kudos
1 More Replies
User16826992666
by Valued Contributor
  • 2000 Views
  • 1 replies
  • 0 kudos

Resolved! What's the difference between SparkML and Spark MLlib?

I have heard people talk about SparkML but when reading documentation it talks about MLlib. I don't understand the difference, could anyone help me understand this?

  • 2000 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 0 kudos

They're not really different. Before DataFrames in Spark, older implementations of ML algorithms build on the RDD API. This is generally called "Spark MLlib". After DataFrames, some newer implementations were added as wrappers on top of the old ones ...

  • 0 kudos
Labels