- 13055 Views
- 5 replies
- 6 kudos
- 13055 Views
- 5 replies
- 6 kudos
Latest Reply
Thank you Suteja. I had watched the resources and had never reached capacity for any. The data was evenly distributed across partitions and groups as well. I did end up taking your advice in (1). I set a timer and killed the process if the group took...
4 More Replies
- 2016 Views
- 1 replies
- 1 kudos
I'm trying to broadcast a Random forest (sklearn 1.2.0) recently loaded from mlflow, and using Pandas UDF to predict a model.However, the same code works perfectly on Spark 2.4 + our OnPrem cluster.I thought it was due to Spark 2.4 to 3 changes, an...
- 2016 Views
- 1 replies
- 1 kudos
- 1410 Views
- 1 replies
- 0 kudos
Hi!If I need to use many workers to distributes regular pandas, I would use a pandas_UDF. (having regular python crunching a slice of my data, on each node, and combining all results back to the driver node)Is there something equivalent for R?Thanks,
- 1410 Views
- 1 replies
- 0 kudos
by
Dan_Z
• Databricks Employee
- 799 Views
- 0 replies
- 0 kudos
mapInPandas is one of the most powerful Spark functions. It uses an arrow-like in-memory data structure to split up Spark Data Frames into chunks and feeding them to a function that takes a Pandas DF as input and output. Check it out here:https://spa...
- 799 Views
- 0 replies
- 0 kudos