- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2023 08:59 PM
Hi everyone,
I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element in my UDF. I have tried accessing in both UDF and pandas UDF, but it keeps throwing the same error.
from PIL import Image
import pandas as pd
from pyspark.sql.functions import struct, pandas_udf, col
dataset = spark.read.format("image").load("/databricks-datasets/x/xx")
modified_dataset = dataset.select("image.*").select("origin", "data", struct("width", "height").alias("image_dim"))
def resize_image(data, input_dim):
input_width = input_dim.width
input_height = input_dim.height
output_width = input_width * 0.5
output_height = input_height * 0.5
img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
img = resizeimage.resize_cover(img, [output_width, output_height])
img = np.asarray(img)
img = bytearray(img)
return img
@pandas_udf("binary")
def resize_image_udf(img_data, input_dim):
return pd.Series([resize_image(i, j) for i,j in zip(img_data, input_dim)])
modified_dataset = modified_dataset.withColumn("thumbnail", resize_image_udf(col("data"), col("image_dim")))
modified_dataset.collect()
Below is the error I have faced.
PythonException: An exception was thrown from a UDF: 'AttributeError: 'str' object has no attribute 'width'', from , line 9. Full traceback below:
Traceback (most recent call last):
File "", line 21, in resize_image_udf
File "", line 21, in
File "", line 9, in resize_image
AttributeError: 'str' object has no attribute 'width'
Does anyone have any idea where I have gone wrong? Advice is more appreciated!
- Labels:
-
Pandas udf
Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2023 06:48 AM
@Yan Chong Tan :
The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.
To fix this error, you should first extract the width and height values from the input_dim struct using the
getField() method, like this:
input_width = input_dim.getField("width")
input_height = input_dim.getField("height")
Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:
def resize_image(data, input_dim):
input_width = input_dim.getField("width")
input_height = input_dim.getField("height")
output_width = input_width * 0.5
output_height = input_height * 0.5
img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
img = resizeimage.resize_cover(img, [output_width, output_height])
img = np.asarray(img)
img = bytearray(img)
return img
This should resolve the issue you were facing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2023 04:24 PM
Cross-posting the answer from Stack Overflow here https://stackoverflow.com/questions/75915968/problem-with-accessing-element-using-pandas-udf-in-imag...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2023 06:48 AM
@Yan Chong Tan :
The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.
To fix this error, you should first extract the width and height values from the input_dim struct using the
getField() method, like this:
input_width = input_dim.getField("width")
input_height = input_dim.getField("height")
Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:
def resize_image(data, input_dim):
input_width = input_dim.getField("width")
input_height = input_dim.getField("height")
output_width = input_width * 0.5
output_height = input_height * 0.5
img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
img = resizeimage.resize_cover(img, [output_width, output_height])
img = np.asarray(img)
img = bytearray(img)
return img
This should resolve the issue you were facing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2023 06:40 PM
This looks really awesome. Thanks so much Suteja!

