Data Engineering

by admo • New Contributor III

03-17-2022 2:11:05 AM

12130 Views
4 replies
7 kudos

Scaling issue for inference with a spark.mllib model

Hello,I'm writing this because I have tried a lot of different directions to get a simple model inference working with no success.Here is the outline of the job# 1 - Load the base data (~1 billion lines of ~6 columns) interaction = build_initial_df()...

Data Engineering

12130 Views
4 replies
7 kudos

03-17-2022 2:11:05 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

03-17-2022 3:42:49 AM

7 kudos

It is hard to analyze without Spark UI and more detailed information, but anyway few tips:look for data skews some partitions can be very big some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit...

7 kudos

03-17-2022 3:42:49 AM

3 More Replies

by Joseph_B • Databricks Employee

06-24-2021 1:29:49 PM

3294 Views
1 replies
0 kudos

How can I use Databricks to "automagically" distribute scikit-learn model training?

Is there a way to automatically distribute training and model tuning across a Spark cluster, if I want to keep using scikit-learn?

Data Engineering

3294 Views
1 replies
0 kudos

06-24-2021 1:29:49 PM

View Replies

Latest Reply

Joseph_B
Databricks Employee

06-24-2021 1:42:11 PM

0 kudos

It depends on what you mean by "automagically."If you want to keep using scikit-learn, there are ways to distribute parts of training and tuning with minimal effort. However, there is no "magic" way to distribute training an individual model in scik...

0 kudos

06-24-2021 1:42:11 PM

Databricks Community

Forum Posts

Scaling issue for inference with a spark.mllib model

How can I use Databricks to "automagically" distribute scikit-learn model training?