<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Problem with accessing element using Pandas UDF in Image Processing in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6585#M2651</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element in my UDF. I have tried accessing in both UDF and pandas UDF, but it keeps throwing the same error.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from PIL import Image
import pandas as pd
from pyspark.sql.functions import struct, pandas_udf, col
&amp;nbsp;
dataset = spark.read.format("image").load("/databricks-datasets/x/xx")
modified_dataset = dataset.select("image.*").select("origin", "data", struct("width", "height").alias("image_dim"))
&amp;nbsp;
def resize_image(data, input_dim):
  input_width = input_dim.width
  input_height = input_dim.height
  output_width = input_width * 0.5
  output_height = input_height * 0.5
  img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
  img = resizeimage.resize_cover(img, [output_width, output_height])
  img = np.asarray(img)
  img = bytearray(img)
  return img
&amp;nbsp;
@pandas_udf("binary")
def resize_image_udf(img_data, input_dim):
  return pd.Series([resize_image(i, j) for i,j in zip(img_data, input_dim)])
&amp;nbsp;
modified_dataset = modified_dataset.withColumn("thumbnail", resize_image_udf(col("data"), col("image_dim")))
modified_dataset.collect()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Below is the error I have faced.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;PythonException: An exception was thrown from a UDF: 'AttributeError: 'str' object has no attribute 'width'', from , line 9. Full traceback below:
Traceback (most recent call last):
  File "", line 21, in resize_image_udf
  File "", line 21, in 
  File "", line 9, in resize_image
AttributeError: 'str' object has no attribute 'width'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Does anyone have any idea where I have gone wrong? Advice is more appreciated!&lt;/P&gt;</description>
    <pubDate>Mon, 03 Apr 2023 03:59:59 GMT</pubDate>
    <dc:creator>tytytyc26</dc:creator>
    <dc:date>2023-04-03T03:59:59Z</dc:date>
    <item>
      <title>Problem with accessing element using Pandas UDF in Image Processing</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6585#M2651</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element in my UDF. I have tried accessing in both UDF and pandas UDF, but it keeps throwing the same error.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;from PIL import Image
import pandas as pd
from pyspark.sql.functions import struct, pandas_udf, col
&amp;nbsp;
dataset = spark.read.format("image").load("/databricks-datasets/x/xx")
modified_dataset = dataset.select("image.*").select("origin", "data", struct("width", "height").alias("image_dim"))
&amp;nbsp;
def resize_image(data, input_dim):
  input_width = input_dim.width
  input_height = input_dim.height
  output_width = input_width * 0.5
  output_height = input_height * 0.5
  img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
  img = resizeimage.resize_cover(img, [output_width, output_height])
  img = np.asarray(img)
  img = bytearray(img)
  return img
&amp;nbsp;
@pandas_udf("binary")
def resize_image_udf(img_data, input_dim):
  return pd.Series([resize_image(i, j) for i,j in zip(img_data, input_dim)])
&amp;nbsp;
modified_dataset = modified_dataset.withColumn("thumbnail", resize_image_udf(col("data"), col("image_dim")))
modified_dataset.collect()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Below is the error I have faced.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;PythonException: An exception was thrown from a UDF: 'AttributeError: 'str' object has no attribute 'width'', from , line 9. Full traceback below:
Traceback (most recent call last):
  File "", line 21, in resize_image_udf
  File "", line 21, in 
  File "", line 9, in resize_image
AttributeError: 'str' object has no attribute 'width'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Does anyone have any idea where I have gone wrong? Advice is more appreciated!&lt;/P&gt;</description>
      <pubDate>Mon, 03 Apr 2023 03:59:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6585#M2651</guid>
      <dc:creator>tytytyc26</dc:creator>
      <dc:date>2023-04-03T03:59:59Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with accessing element using Pandas UDF in Image Processing</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6586#M2652</link>
      <description>&lt;P&gt;Cross-posting the answer from Stack Overflow here &lt;A href="https://stackoverflow.com/questions/75915968/problem-with-accessing-element-using-pandas-udf-in-image-processing/75925105#75925105" target="test_blank"&gt;https://stackoverflow.com/questions/75915968/problem-with-accessing-element-using-pandas-udf-in-image-processing/75925105#75925105&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Apr 2023 23:24:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6586#M2652</guid>
      <dc:creator>jeanne_choo</dc:creator>
      <dc:date>2023-04-04T23:24:01Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with accessing element using Pandas UDF in Image Processing</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6587#M2653</link>
      <description>&lt;P&gt;&amp;nbsp;@Yan Chong Tan​&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attribute, which does not exist for strings.&lt;/P&gt;&lt;P&gt;To fix this error, you should first extract the width and height values from the input_dim struct using the &lt;/P&gt;&lt;P&gt;getField() method, like this:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;input_width = input_dim.getField("width")
input_height = input_dim.getField("height")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Then, you can proceed with the rest of your code as before. Here's the modified resize_image function:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def resize_image(data, input_dim):
    input_width = input_dim.getField("width")
    input_height = input_dim.getField("height")
    output_width = input_width * 0.5
    output_height = input_height * 0.5
    img = Image.frombytes("RGB", [input_width, input_height], bytes(data))
    img = resizeimage.resize_cover(img, [output_width, output_height])
    img = np.asarray(img)
    img = bytearray(img)
    return img&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;This should resolve the issue you were facing.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Apr 2023 13:48:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6587#M2653</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2023-04-17T13:48:04Z</dc:date>
    </item>
    <item>
      <title>Re: Problem with accessing element using Pandas UDF in Image Processing</title>
      <link>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6588#M2654</link>
      <description>&lt;P&gt;This looks really awesome. Thanks so much Suteja!&lt;/P&gt;</description>
      <pubDate>Tue, 18 Apr 2023 01:40:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/problem-with-accessing-element-using-pandas-udf-in-image/m-p/6588#M2654</guid>
      <dc:creator>tytytyc26</dc:creator>
      <dc:date>2023-04-18T01:40:09Z</dc:date>
    </item>
  </channel>
</rss>

