cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

how to make distributed predictions with sklearn model?

ChaseM
New Contributor II

So I have a sklearn style model which predicts on a pandas df. The data to predict on is a spark df. Simply converting the whole thing at once to pandas and predicting is not an option due to time and memory constraints.

Is there a way to chunk a spark df, and use the worker nodes to convert to pandas and predict on the chunks, then get all the predictions back in the driver node?

A bit new to databricks ecosystem so sorry if the question isn't phrased in the best way, but hopefully I got the goal across.

Thank you so much in advance!

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @ChaseM, You can chunk a Spark DataFrame, convert each chunk to a Pandas DataFrame, and predict each chunk in parallel using worker nodes in Databricks. 

ChaseM
New Contributor II

right, that's exactly what I'm trying to do, but have no idea how to do it!

I can chunk the spark df with the following:

def df_in_chunks(df, row_count):    """    in: df    out: [df1, df2, ..., df100]    """    count = df.count()
    if count > row_count:        num_chunks = count//row_count        chunk_percent = 1/num_chunks  # 1% would become 0.01        return df.randomSplit([chunk_percent]*num_chunks, seed=1234)    return [df]

 so I have a list of spark dfs, but if I do "for df in dfs: df_pd = df.toPandas(); model.predict(df_pd)" it does it serially not in parallel, do you have any suggestion on how to make it parallel?

Thank you so much!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!