cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rgbuckley
by New Contributor III
  • 12352 Views
  • 5 replies
  • 6 kudos

Resolved! Fix Hanging Task in Databricks

I am applying a pandas UDF to a grouped dataframe in databricks. When I do this, a couple tasks hang forever, while the rest complete quickly.I start by repartitioning my dataset so that each group is in one partition:group_factors = ['a','b','c'] #m...

Spark UI for compute cluster stderr for hanging task stdout for hanging task
  • 12352 Views
  • 5 replies
  • 6 kudos
Latest Reply
rgbuckley
New Contributor III
  • 6 kudos

Thank you Suteja. I had watched the resources and had never reached capacity for any. The data was evenly distributed across partitions and groups as well. I did end up taking your advice in (1). I set a timer and killed the process if the group took...

  • 6 kudos
4 More Replies
ryojikn
by New Contributor III
  • 1895 Views
  • 1 replies
  • 1 kudos

Error on pandas udf usage in databricks, sc.broadcasting random forest loaded from Kedro MLFlow Logger DataSet, cannot pickle '_thread.RLock' object

I'm trying to broadcast a Random forest (sklearn 1.2.0) recently loaded from mlflow, and using Pandas UDF to predict a model.​However, the same code works perfectly on Spark 2.4 + our OnPrem cluster.​I thought it was due to Spark 2.4 to 3 changes, an...

  • 1895 Views
  • 1 replies
  • 1 kudos
Latest Reply
ryojikn
New Contributor III
  • 1 kudos

Anyone?

  • 1 kudos
yopbibo
by Contributor II
  • 1354 Views
  • 1 replies
  • 0 kudos

Sending R functions to worker nodes

Hi!If I need to use many workers to distributes regular pandas, I would use a pandas_UDF. (having regular python crunching a slice of my data, on each node, and combining all results back to the driver node)Is there something equivalent for R?Thanks,

  • 1354 Views
  • 1 replies
  • 0 kudos
Dan_Z
by Databricks Employee
  • 743 Views
  • 0 replies
  • 0 kudos

spark.apache.org

mapInPandas is one of the most powerful Spark functions. It uses an arrow-like in-memory data structure to split up Spark Data Frames into chunks and feeding them to a function that takes a Pandas DF as input and output. Check it out here:https://spa...

  • 743 Views
  • 0 replies
  • 0 kudos
Labels