Data Engineering

Forum Posts

Sorted by:

by tytytyc26 • New Contributor II

04-02-2023 8:59:59 PM

2908 Views
3 replies
0 kudos

Resolved! Problem with accessing element using Pandas UDF in Image Processing

Hi everyone,I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element...

Data Engineering

2908 Views
3 replies
0 kudos

04-02-2023 8:59:59 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-17-2023 6:48:04 AM

0 kudos

@Yan Chong Tan :The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attr...

0 kudos

04-17-2023 6:48:04 AM

2 More Replies

by sarosh • New Contributor

09-27-2021 1:36:38 PM

8947 Views
2 replies
1 kudos

ModuleNotFoundError / SerializationError when executing over databricks-connect

I am running into the following error when I run a model fitting process over databricks-connect.It looks like worker nodes are unable to access modules from the project's parent directory. Note that the program runs successfully up to this point; n...

Data Engineering

8947 Views
2 replies
1 kudos

09-27-2021 1:36:38 PM

View Replies

Latest Reply

Manjunath
Databricks Employee

11-02-2021 12:01:43 AM

1 kudos

@Sarosh Ahmad , Could you try adding the zip of the module to the addPyFile like belowspark.sparkContext.addPyFile("src.zip")

1 kudos

11-02-2021 12:01:43 AM

1 More Replies

by user_b22ce5eeAl • New Contributor II

08-13-2021 7:07:18 AM

1997 Views
2 replies
0 kudos

pandas udf type grouped map fails

Hello, I am trying to get the shap values for my whole dataset using pandas udf for each category of a categorical variable. It runs well when I run it on a few categories but when I want to run the function on the whole dataset my job fails. I see ...

Data Engineering

1997 Views
2 replies
0 kudos

08-13-2021 7:07:18 AM

View Replies

Latest Reply

Jackson
New Contributor II

08-16-2021 9:01:03 PM

0 kudos

I want to use data.groupby.apply() to apply a function to each row of my Pyspark Dataframe per group.I used The Grouped Map Pandas UDFs. However I can't figure out how to add another argument to my function. DGCustomerFirst SurveyI tried using the ar...

0 kudos

08-16-2021 9:01:03 PM

1 More Replies

by twotwoiscute • New Contributor

07-14-2021 8:00:55 PM

1741 Views
0 replies
0 kudos

PySpark pandas_udf slower than single thread

I used @pandas_udf write a function for speeding up the process(parsing xml file ) and then compare it's speed with single thread , Surprisingly , Using @pandas_udf is two times slower than single-thread code. And the number of xml files I need to p...

Data Engineering

1741 Views
0 replies
0 kudos

07-14-2021 8:00:55 PM

by User16826994223 • Honored Contributor III

06-17-2021 12:26:20 AM

637 Views
0 replies
0 kudos

Spark 3.0 Pandas UDF Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new int...

Spark 3.0 Pandas UDF Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new interface can also be used for the existing Grouped Aggregate Pandas UDFs. In addition, the old Pandas U...

Data Engineering

637 Views
0 replies
0 kudos

06-17-2021 12:26:20 AM

Databricks Community

Resolved! Problem with accessing element using Pandas UDF in Image Processing

ModuleNotFoundError / SerializationError when executing over databricks-connect

pandas udf type grouped map fails

PySpark pandas_udf slower than single thread

Spark 3.0 Pandas UDF Old vs New Pandas UDF interfaceThis slide shows the difference between the old and the new interface. The same here. The new int...