Data Engineering

Forum Posts

Sorted by:

by Pavan1 • New Contributor II

02-14-2022 11:18:38 PM

2391 Views
1 replies
2 kudos

Does Spark MLlib support Generalized Additive Modeling? How does one go about implementing GAM models in Spark?

I want to implement GAM (Generalized additive model) model in Spark. Based on my research on online forums, I could not find the implementation of GAM models on Spark. Has anyone in this community attempted this? Does Spark MLlib support GAM?

Data Engineering

2391 Views
1 replies
2 kudos

02-14-2022 11:18:38 PM

View Replies

Latest Reply

Pavan1
New Contributor II

07-11-2022 11:49:59 PM

2 kudos

Hi @Kaniz Fatma, thanks for sharing this. We ended up using pyGAM library in Python for this. This PDF is a good introduction. I will share my learnings once we complete our experiments.

2 kudos

07-11-2022 11:49:59 PM

by admo • New Contributor III

03-17-2022 2:11:05 AM

9177 Views
4 replies
7 kudos

Scaling issue for inference with a spark.mllib model

Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns) interaction = build_initial_df()...

Data Engineering

9177 Views
4 replies
7 kudos

03-17-2022 2:11:05 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-17-2022 3:42:49 AM

7 kudos

It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...

7 kudos

03-17-2022 3:42:49 AM

3 More Replies

by User16826992666 • Valued Contributor

06-15-2021 1:57:14 PM

1736 Views
1 replies
0 kudos

Resolved! Why should I use Spark MLlib for ML vs other available libraries?

Data Engineering

1736 Views
1 replies
0 kudos

06-15-2021 1:57:14 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 4:08:08 PM

0 kudos

You don't have to. If you don't have a huge data set, there may not be much value in Spark ML over anything else. There are also other distributed modeling libraries that work on Spark like xgboost, and Horovod + TF, Keras, Pytorch. Spark ML is a goo...

0 kudos

06-17-2021 4:08:08 PM

by User16826992666 • Valued Contributor

06-15-2021 2:10:04 PM

9597 Views
2 replies
0 kudos

Why do Spark MLlib models only accept a vector column as input?

In other libraries I can just use the feature columns themselves as inputs, why do I need to make a vector out of my features when I use MLlib?

Data Engineering

9597 Views
2 replies
0 kudos

06-15-2021 2:10:04 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 4:05:12 PM

0 kudos

Yeah, it's more a design choice. Rather than have every implementation take column(s) params, this is handled once in VectorAssembler for all of them. One way or the other, most implementations need a vector of inputs anyway. VectorAssembler can do s...

0 kudos

06-17-2021 4:05:12 PM

1 More Replies

by User16826992666 • Valued Contributor

06-17-2021 8:02:38 AM

3541 Views
1 replies
0 kudos

Resolved! What's the difference between SparkML and Spark MLlib?

I have heard people talk about SparkML but when reading documentation it talks about MLlib. I don't understand the difference, could anyone help me understand this?

Data Engineering

3541 Views
1 replies
0 kudos

06-17-2021 8:02:38 AM

View Replies

Latest Reply

sean_owen
Databricks Employee

06-17-2021 11:23:47 AM

0 kudos

They're not really different. Before DataFrames in Spark, older implementations of ML algorithms build on the RDD API. This is generally called "Spark MLlib". After DataFrames, some newer implementations were added as wrappers on top of the old ones ...

0 kudos

06-17-2021 11:23:47 AM

Databricks Community

Does Spark MLlib support Generalized Additive Modeling? How does one go about implementing GAM models in Spark?

Scaling issue for inference with a spark.mllib model

Resolved! Why should I use Spark MLlib for ML vs other available libraries?

Why do Spark MLlib models only accept a vector column as input?

Resolved! What's the difference between SparkML and Spark MLlib?