04-02-2023 08:59 PM
Hi everyone,
I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element in my UDF. I have tried accessing in both UDF and pandas UDF, but it keeps throwing the same error.
from PIL import Image
import pandas as pd
from pyspark.sql.functions import struct, pandas_udf, col
dataset = spark.read.format("image").load("/databricks-datasets/x/xx")
modified_dataset = dataset.select("image.*").select("origin", "data", struct("width", "height").alias("image_dim"))
def resize_image(data, input_dim):
input_width = input_dim.width
input_height = input_dim.height
output_width = input_width * 0.5
output_height = input_height * 0.5
img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
img = resizeimage.resize_cover(img, [output_width, output_height])
img = np.asarray(img)
img = bytearray(img)
return img
@pandas_udf("binary")
def resize_image_udf(img_data, input_dim):
return pd.Series([resize_image(i, j) for i,j in zip(img_data, input_dim)])
modified_dataset = modified_dataset.withColumn("thumbnail", resize_image_udf(col("data"), col("image_dim")))
modified_dataset.collect()
Below is the error I have faced.
PythonException: An exception was thrown from a UDF: 'AttributeError: 'str' object has no attribute 'width'', from , line 9. Full traceback below:
Traceback (most recent call last):
File "", line 21, in resize_image_udf
File "", line 21, in
File "", line 9, in resize_image
AttributeError: 'str' object has no attribute 'width'
Does anyone have any idea where I have gone wrong? Advice is more appreciated!
04-17-2023 06:48 AM
@Yan Chong Tan :
The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.
To fix this error, you should first extract the width and height values from the input_dim struct using the
getField() method, like this:
input_width = input_dim.getField("width")
input_height = input_dim.getField("height")
Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:
def resize_image(data, input_dim):
input_width = input_dim.getField("width")
input_height = input_dim.getField("height")
output_width = input_width * 0.5
output_height = input_height * 0.5
img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
img = resizeimage.resize_cover(img, [output_width, output_height])
img = np.asarray(img)
img = bytearray(img)
return img
This should resolve the issue you were facing.
04-04-2023 04:24 PM
Cross-posting the answer from Stack Overflow here https://stackoverflow.com/questions/75915968/problem-with-accessing-element-using-pandas-udf-in-imag...
04-17-2023 06:48 AM
@Yan Chong Tan :
The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.
To fix this error, you should first extract the width and height values from the input_dim struct using the
getField() method, like this:
input_width = input_dim.getField("width")
input_height = input_dim.getField("height")
Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:
def resize_image(data, input_dim):
input_width = input_dim.getField("width")
input_height = input_dim.getField("height")
output_width = input_width * 0.5
output_height = input_height * 0.5
img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
img = resizeimage.resize_cover(img, [output_width, output_height])
img = np.asarray(img)
img = bytearray(img)
return img
This should resolve the issue you were facing.
04-17-2023 06:40 PM
This looks really awesome. Thanks so much Suteja!
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.