by
Pavan1
• New Contributor II
- 1256 Views
- 3 replies
- 2 kudos
I want to implement GAM (Generalized additive model) model in Spark. Based on my research on online forums, I could not find the implementation of GAM models on Spark. Has anyone in this community attempted this? Does Spark MLlib support GAM?
- 1256 Views
- 3 replies
- 2 kudos
Latest Reply
Hi @Kaniz Fatma​, thanks for sharing this. We ended up using pyGAM library in Python for this. This PDF is a good introduction. I will share my learnings once we complete our experiments.
2 More Replies
by
admo
• New Contributor III
- 1730 Views
- 5 replies
- 7 kudos
Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns)
interaction = build_initial_df()...
- 1730 Views
- 5 replies
- 7 kudos
Latest Reply
It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...
4 More Replies
- 8366 Views
- 2 replies
- 0 kudos
In other libraries I can just use the feature columns themselves as inputs, why do I need to make a vector out of my features when I use MLlib?
- 8366 Views
- 2 replies
- 0 kudos
Latest Reply
Yeah, it's more a design choice. Rather than have every implementation take column(s) params, this is handled once in VectorAssembler for all of them. One way or the other, most implementations need a vector of inputs anyway. VectorAssembler can do s...
1 More Replies
- 1910 Views
- 1 replies
- 0 kudos
I have heard people talk about SparkML but when reading documentation it talks about MLlib. I don't understand the difference, could anyone help me understand this?
- 1910 Views
- 1 replies
- 0 kudos
Latest Reply
They're not really different. Before DataFrames in Spark, older implementations of ML algorithms build on the RDD API. This is generally called "Spark MLlib". After DataFrames, some newer implementations were added as wrappers on top of the old ones ...