cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

pyspark SQL cannot resolve 'explode()' due to data type mismatch

KevinXu
New Contributor III

Running Pyspark script getting the following error depending on which xml I query:

cannot resolve 'explode(...)' due to data type mismatch

The pyspark code:

from pyspark.sql import SparkSession
 
JOB_NAME = "Complex file to delimeted files transformer"
 
spark = SparkSession.builder.appName(JOB_NAME)\
    .config("spark.scheduler.mode", "FAIR")\
    .config('spark.jars.packages', 'com.databricks:spark-xml_2.12:0.12.0')\
    .getOrCreate()
 
sql_script = "select create_date, item['_id'], item['_VALUE'] from my_data lateral view explode(items.item) t as item"
 
# works fine
read_options = {"rowTag": "my_data"}
df = spark.read\
    .format("xml")\
    .options(**read_options)\
    .load("./xml")
df.createOrReplaceTempView("my_data")
spark.sql(sql_script).show()
 
# Error
df2 = spark.read\
    .format("xml")\
    .options(**read_options)\
    .load("./xml/test2.xml")
df2.createOrReplaceTempView("my_data")
spark.sql(sql_script).show()

the xml is in xml folder.

test1.xml:

<my_data><create_date>2021-05-01</create_date><items><item id="1">item 1</item><item id="2">item 2</item></items>
</my_data>

test2.xml:

<my_data><create_date>2021-06-01</create_date><items><item id="3">item 3</item></items>
</my_data>

Expected result: the same SQL statement should work all the time and not break, nor have a chance of erroring if one run happens to have only one <item> in <items>.

1 REPLY 1

KevinXu
New Contributor III

It's on line 10

sql_script = "select create_date, item['_id'], item['_VALUE'] from my_data lateral view explode(items.item) t as item"

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group