Hi everyone,
I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element in my UDF. I have tried accessing in both UDF and pandas UDF, but it keeps throwing the same error.
from PIL import Image
import pandas as pd
from pyspark.sql.functions import struct, pandas_udf, col
dataset = spark.read.format("image").load("/databricks-datasets/x/xx")
modified_dataset = dataset.select("image.*").select("origin", "data", struct("width", "height").alias("image_dim"))
def resize_image(data, input_dim):
input_width = input_dim.width
input_height = input_dim.height
output_width = input_width * 0.5
output_height = input_height * 0.5
img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
img = resizeimage.resize_cover(img, [output_width, output_height])
img = np.asarray(img)
img = bytearray(img)
return img
@pandas_udf("binary")
def resize_image_udf(img_data, input_dim):
return pd.Series([resize_image(i, j) for i,j in zip(img_data, input_dim)])
modified_dataset = modified_dataset.withColumn("thumbnail", resize_image_udf(col("data"), col("image_dim")))
modified_dataset.collect()
Below is the error I have faced.
PythonException: An exception was thrown from a UDF: 'AttributeError: 'str' object has no attribute 'width'', from , line 9. Full traceback below:
Traceback (most recent call last):
File "", line 21, in resize_image_udf
File "", line 21, in
File "", line 9, in resize_image
AttributeError: 'str' object has no attribute 'width'
Does anyone have any idea where I have gone wrong? Advice is more appreciated!