cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

tytytyc26
by New Contributor II
  • 2518 Views
  • 3 replies
  • 0 kudos

Resolved! Problem with accessing element using Pandas UDF in Image Processing

Hi everyone,I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element...

  • 2518 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

 @Yan Chong Tan​ :The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attr...

  • 0 kudos
2 More Replies
sarosh
by New Contributor
  • 8062 Views
  • 2 replies
  • 1 kudos

ModuleNotFoundError / SerializationError when executing over databricks-connect

I am running into the following error when I run a model fitting process over databricks-connect.It looks like worker nodes are unable to access modules from the project's parent directory. Note that the program runs successfully up to this point; n...

modulenotfoundanno
  • 8062 Views
  • 2 replies
  • 1 kudos
Latest Reply
Manjunath
Databricks Employee
  • 1 kudos

@Sarosh Ahmad​ , Could you try adding the zip of the module to the addPyFile like belowspark.sparkContext.addPyFile("src.zip")

  • 1 kudos
1 More Replies
user_b22ce5eeAl
by New Contributor II
  • 1689 Views
  • 2 replies
  • 0 kudos

pandas udf type grouped map fails

Hello, I am trying to get the shap values for my whole dataset using pandas udf for each category of a categorical variable. It runs well when I run it on a few categories but when I want to run the function on the whole dataset my job fails. I see ...

  • 1689 Views
  • 2 replies
  • 0 kudos
Latest Reply
Jackson
New Contributor II
  • 0 kudos

I want to use data.groupby.apply() to apply a function to each row of my Pyspark Dataframe per group.I used The Grouped Map Pandas UDFs. However I can't figure out how to add another argument to my function. DGCustomerFirst SurveyI tried using the ar...

  • 0 kudos
1 More Replies
twotwoiscute
by New Contributor
  • 1590 Views
  • 0 replies
  • 0 kudos

PySpark pandas_udf slower than single thread

I used @pandas_udf write a function for speeding up the process(parsing xml file ) and then compare it's speed with single thread , Surprisingly , Using @pandas_udf is two times slower than single-thread code. And the number of xml files I need to p...

  • 1590 Views
  • 0 replies
  • 0 kudos
User16826994223
by Honored Contributor III
  • 540 Views
  • 0 replies
  • 0 kudos

Spark 3.0 Pandas UDF  Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new int...

Spark 3.0 Pandas UDF Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new interface can also be used for the existing Grouped Aggregate Pandas UDFs. In addition, the old Pandas U...

  • 540 Views
  • 0 replies
  • 0 kudos
Labels