Translating SQL Value Function For XML To Databricks SQL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 09:04 AM
Trying to translate this line of a SQL query that evaluates XML to Databricks SQL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 09:09 AM - edited 01-08-2025 09:19 AM
Hi @BNV,
You can leverage UDF or pandasUDF to register user defined functions to customize to parse the XML data using standard python libraries or even in Scala or Java in Databricks notebooks.
In SQL warehouse, you can create custom SQL UDF, follow this link for more Introducing SQL User-Defined Functions | Databricks Blog.
Regards,
Hari Prasad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 09:14 AM
Thank you but I'm not very familiar with Pandas. This might be out of my realm of knowledge.
Are you saying Pandas would have this functionality including using SQL and that SQL function or that I would need to create a UDF to parse XML (which sounds quite difficult).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 09:33 AM - edited 01-08-2025 09:34 AM
@BNV, you can leverage xpath SQL function which can parse the XML which works in both Notebook and SQL warehouse, follow this Spark SQL doc for more details https://spark.apache.org/docs/3.5.4/api/sql/#xpath
here is a sample example
Regards,
Hari Prasad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 01:23 PM
This might be a good start but I do get an error ("Invalid XPath") when trying to access the column as the xpath. Is it not possible to use a column as the xpath?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 07:55 PM - edited 01-08-2025 08:03 PM
you can share a sample or mocked value, how your xml looks?
mean while you can give a try with below query
Regards,
Hari Prasad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2025 08:56 AM
Hi. Thank you for replying. My XML sample is in the original post above if it helps.
I doesn't seem like the "_string" version because it's saying:
AnalysisException: Can't extract value from xpath_string(ExData#251799, /XMLData/Values/ValueDefinition[@colID="10"]/@Value')[1]): need struct type but got string
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2025 09:20 AM
@BNV, below SQL code worked for me, I'm able to extract Red for colID=10
select
xpath(
'''
<XMLData>
<Values>
<ValueDefinition colID="10" Value="Red"/>
<ValueDefinition colID="20" Value="Square"/>
<ValueDefinition colID="3" Value=""/>
</Values>
</XMLData>''',
'//ValueDefinition[@colID="10"]/@Value'
)[0] as value
Regards,
Hari Prasad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2025 09:28 AM
@BNV , little more complex querying to convert XML into rows use below query. For your case just replace XML string with column containing XML value, spark will handle.
select c_value as colID, v_value as value from(select
posexplode(
xpath(
'''
<XMLData>
<Values>
<ValueDefinition colID="10" Value="Red"/>
<ValueDefinition colID="20" Value="Square"/>
<ValueDefinition colID="3" Value=""/>
</Values>
</XMLData>''',
'//ValueDefinition/@colID'
)
) as (c_index, c_value),
posexplode(
xpath(
'''
<XMLData>
<Values>
<ValueDefinition colID="10" Value="Red"/>
<ValueDefinition colID="20" Value="Square"/>
<ValueDefinition colID="3" Value=""/>
</Values>
</XMLData>''',
'//ValueDefinition/@Value'
)
) as (v_index, v_value))
where c_index = v_index
Regards,
Hari Prasad
Regards,
Hari Prasad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 10:02 PM
Since Spark Runtime 14.3 and higher, it is possible to read XML using the Spark Read method.
For example:
df = spark.read.option("rowTag", "books").format("xml").load(xmlPath)
df.printSchema()
df.show(truncate=False)
Have a look at the docu: https://docs.databricks.com/en/query/formats/xml.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 11:13 PM
Yes, now they support XML parse directly in databricks 14.3 or higher, else earlier you could have leveraged spark xml library jars to parse it.
You can still leverage xpath in case where one of data column hold XML value in a dataset. As @BNV is looking for some SQL based approach.
Regards,
Hari Prasad

