cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with accessing element using Pandas UDF in Image Processing

tytytyc26
New Contributor II

Hi everyone,

I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element in my UDF. I have tried accessing in both UDF and pandas UDF, but it keeps throwing the same error.

from PIL import Image
import pandas as pd
from pyspark.sql.functions import struct, pandas_udf, col
 
dataset = spark.read.format("image").load("/databricks-datasets/x/xx")
modified_dataset = dataset.select("image.*").select("origin", "data", struct("width", "height").alias("image_dim"))
 
def resize_image(data, input_dim):
  input_width = input_dim.width
  input_height = input_dim.height
  output_width = input_width * 0.5
  output_height = input_height * 0.5
  img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
  img = resizeimage.resize_cover(img, [output_width, output_height])
  img = np.asarray(img)
  img = bytearray(img)
  return img
 
@pandas_udf("binary")
def resize_image_udf(img_data, input_dim):
  return pd.Series([resize_image(i, j) for i,j in zip(img_data, input_dim)])
 
modified_dataset = modified_dataset.withColumn("thumbnail", resize_image_udf(col("data"), col("image_dim")))
modified_dataset.collect()

Below is the error I have faced.

PythonException: An exception was thrown from a UDF: 'AttributeError: 'str' object has no attribute 'width'', from , line 9. Full traceback below:
Traceback (most recent call last):
  File "", line 21, in resize_image_udf
  File "", line 21, in 
  File "", line 9, in resize_image
AttributeError: 'str' object has no attribute 'width'

Does anyone have any idea where I have gone wrong? Advice is more appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

 @Yan Chong Tan​ :

The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.

To fix this error, you should first extract the width and height values from the input_dim struct using the

getField() method, like this:

input_width = input_dim.getField("width")
input_height = input_dim.getField("height")

Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:

def resize_image(data, input_dim):
    input_width = input_dim.getField("width")
    input_height = input_dim.getField("height")
    output_width = input_width * 0.5
    output_height = input_height * 0.5
    img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
    img = resizeimage.resize_cover(img, [output_width, output_height])
    img = np.asarray(img)
    img = bytearray(img)
    return img

This should resolve the issue you were facing.

View solution in original post

3 REPLIES 3

jeanne_choo
New Contributor II
New Contributor II

Anonymous
Not applicable

 @Yan Chong Tan​ :

The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.

To fix this error, you should first extract the width and height values from the input_dim struct using the

getField() method, like this:

input_width = input_dim.getField("width")
input_height = input_dim.getField("height")

Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:

def resize_image(data, input_dim):
    input_width = input_dim.getField("width")
    input_height = input_dim.getField("height")
    output_width = input_width * 0.5
    output_height = input_height * 0.5
    img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
    img = resizeimage.resize_cover(img, [output_width, output_height])
    img = np.asarray(img)
    img = bytearray(img)
    return img

This should resolve the issue you were facing.

tytytyc26
New Contributor II

This looks really awesome. Thanks so much Suteja!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.