cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pavan1
by New Contributor II
  • 2391 Views
  • 1 replies
  • 2 kudos

Does Spark MLlib support Generalized Additive Modeling? How does one go about implementing GAM models in Spark?

I want to implement GAM (Generalized additive model) model in Spark. Based on my research on online forums, I could not find the implementation of GAM models on Spark. Has anyone in this community attempted this? Does Spark MLlib support GAM?

  • 2391 Views
  • 1 replies
  • 2 kudos
Latest Reply
Pavan1
New Contributor II
  • 2 kudos

Hi @Kaniz Fatma​, thanks for sharing this. We ended up using pyGAM library in Python for this. This PDF is a good introduction. I will share my learnings once we complete our experiments.

  • 2 kudos
admo
by New Contributor III
  • 9177 Views
  • 4 replies
  • 7 kudos

Scaling issue for inference with a spark.mllib model

Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns) interaction = build_initial_df()...

  • 9177 Views
  • 4 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...

  • 7 kudos
3 More Replies
User16826992666
by Valued Contributor
  • 1736 Views
  • 1 replies
  • 0 kudos
  • 1736 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

You don't have to. If you don't have a huge data set, there may not be much value in Spark ML over anything else. There are also other distributed modeling libraries that work on Spark like xgboost, and Horovod + TF, Keras, Pytorch. Spark ML is a goo...

  • 0 kudos
User16826992666
by Valued Contributor
  • 9597 Views
  • 2 replies
  • 0 kudos

Why do Spark MLlib models only accept a vector column as input?

In other libraries I can just use the feature columns themselves as inputs, why do I need to make a vector out of my features when I use MLlib?

  • 9597 Views
  • 2 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

Yeah, it's more a design choice. Rather than have every implementation take column(s) params, this is handled once in VectorAssembler for all of them. One way or the other, most implementations need a vector of inputs anyway. VectorAssembler can do s...

  • 0 kudos
1 More Replies
User16826992666
by Valued Contributor
  • 3541 Views
  • 1 replies
  • 0 kudos

Resolved! What's the difference between SparkML and Spark MLlib?

I have heard people talk about SparkML but when reading documentation it talks about MLlib. I don't understand the difference, could anyone help me understand this?

  • 3541 Views
  • 1 replies
  • 0 kudos
Latest Reply
sean_owen
Databricks Employee
  • 0 kudos

They're not really different. Before DataFrames in Spark, older implementations of ML algorithms build on the RDD API. This is generally called "Spark MLlib". After DataFrames, some newer implementations were added as wrappers on top of the old ones ...

  • 0 kudos
Labels