Machine Learning

Forum Posts

Sorted by:

by rgbuckley • New Contributor III

06-12-2023 11:44:30 AM

13055 Views
5 replies
6 kudos

Resolved! Fix Hanging Task in Databricks

I am applying a pandas UDF to a grouped dataframe in databricks. When I do this, a couple tasks hang forever, while the rest complete quickly.I start by repartitioning my dataset so that each group is in one partition:group_factors = ['a','b','c'] #m...

Machine Learning

13055 Views
5 replies
6 kudos

06-12-2023 11:44:30 AM

View Replies

Latest Reply

rgbuckley
New Contributor III

06-15-2023 8:49:13 AM

6 kudos

Thank you Suteja. I had watched the resources and had never reached capacity for any. The data was evenly distributed across partitions and groups as well. I did end up taking your advice in (1). I set a timer and killed the process if the group took...

6 kudos

06-15-2023 8:49:13 AM

4 More Replies

by ryojikn • New Contributor III

01-15-2023 8:26:07 PM

2016 Views
1 replies
1 kudos

Error on pandas udf usage in databricks, sc.broadcasting random forest loaded from Kedro MLFlow Logger DataSet, cannot pickle '_thread.RLock' object

I'm trying to broadcast a Random forest (sklearn 1.2.0) recently loaded from mlflow, and using Pandas UDF to predict a model.However, the same code works perfectly on Spark 2.4 + our OnPrem cluster.I thought it was due to Spark 2.4 to 3 changes, an...

Machine Learning

2016 Views
1 replies
1 kudos

01-15-2023 8:26:07 PM

View Replies

Latest Reply

ryojikn
New Contributor III

01-30-2023 5:03:31 AM

1 kudos

Anyone?

1 kudos

01-30-2023 5:03:31 AM

by yopbibo • Contributor II

08-29-2022 12:44:54 AM

1410 Views
1 replies
0 kudos

Sending R functions to worker nodes

Hi!If I need to use many workers to distributes regular pandas, I would use a pandas_UDF. (having regular python crunching a slice of my data, on each node, and combining all results back to the driver node)Is there something equivalent for R?Thanks,

Machine Learning

1410 Views
1 replies
0 kudos

08-29-2022 12:44:54 AM

View Replies

by Dan_Z • Databricks Employee

10-22-2021 9:06:35 AM

799 Views
0 replies
0 kudos

spark.apache.org

mapInPandas is one of the most powerful Spark functions. It uses an arrow-like in-memory data structure to split up Spark Data Frames into chunks and feeding them to a function that takes a Pandas DF as input and output. Check it out here:https://spa...

Machine Learning

799 Views
0 replies
0 kudos

10-22-2021 9:06:35 AM

Databricks Community

Resolved! Fix Hanging Task in Databricks

Error on pandas udf usage in databricks, sc.broadcasting random forest loaded from Kedro MLFlow Logger DataSet, cannot pickle '_thread.RLock' object

Sending R functions to worker nodes

spark.apache.org