cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem with accessing element using Pandas UDF in Image Processing

tytytyc26
New Contributor II

Hi everyone,

I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element in my UDF. I have tried accessing in both UDF and pandas UDF, but it keeps throwing the same error.

from PIL import Image
import pandas as pd
from pyspark.sql.functions import struct, pandas_udf, col
 
dataset = spark.read.format("image").load("/databricks-datasets/x/xx")
modified_dataset = dataset.select("image.*").select("origin", "data", struct("width", "height").alias("image_dim"))
 
def resize_image(data, input_dim):
  input_width = input_dim.width
  input_height = input_dim.height
  output_width = input_width * 0.5
  output_height = input_height * 0.5
  img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
  img = resizeimage.resize_cover(img, [output_width, output_height])
  img = np.asarray(img)
  img = bytearray(img)
  return img
 
@pandas_udf("binary")
def resize_image_udf(img_data, input_dim):
  return pd.Series([resize_image(i, j) for i,j in zip(img_data, input_dim)])
 
modified_dataset = modified_dataset.withColumn("thumbnail", resize_image_udf(col("data"), col("image_dim")))
modified_dataset.collect()

Below is the error I have faced.

PythonException: An exception was thrown from a UDF: 'AttributeError: 'str' object has no attribute 'width'', from , line 9. Full traceback below:
Traceback (most recent call last):
  File "", line 21, in resize_image_udf
  File "", line 21, in 
  File "", line 9, in resize_image
AttributeError: 'str' object has no attribute 'width'

Does anyone have any idea where I have gone wrong? Advice is more appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

 @Yan Chong Tan​ :

The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.

To fix this error, you should first extract the width and height values from the input_dim struct using the

getField() method, like this:

input_width = input_dim.getField("width")
input_height = input_dim.getField("height")

Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:

def resize_image(data, input_dim):
    input_width = input_dim.getField("width")
    input_height = input_dim.getField("height")
    output_width = input_width * 0.5
    output_height = input_height * 0.5
    img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
    img = resizeimage.resize_cover(img, [output_width, output_height])
    img = np.asarray(img)
    img = bytearray(img)
    return img

This should resolve the issue you were facing.

View solution in original post

3 REPLIES 3

jeanne_choo
Databricks Employee
Databricks Employee

Anonymous
Not applicable

 @Yan Chong Tan​ :

The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.

To fix this error, you should first extract the width and height values from the input_dim struct using the

getField() method, like this:

input_width = input_dim.getField("width")
input_height = input_dim.getField("height")

Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:

def resize_image(data, input_dim):
    input_width = input_dim.getField("width")
    input_height = input_dim.getField("height")
    output_width = input_width * 0.5
    output_height = input_height * 0.5
    img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
    img = resizeimage.resize_cover(img, [output_width, output_height])
    img = np.asarray(img)
    img = bytearray(img)
    return img

This should resolve the issue you were facing.

tytytyc26
New Contributor II

This looks really awesome. Thanks so much Suteja!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group