cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Translating SQL Value Function For XML To Databricks SQL

BNV
Visitor

Trying to translate this line of a SQL query that evaluates XML to Databricks SQL.

SELECT
   MyColumn.value('(/XMLData/Values/ValueDefinition[@colID="10"]/@Value)[1]', 'VARCHAR(max)') as Color
 
The XML looks like this:
<XMLData><Values><ValueDefinition colID="10" Value="Red"/><ValueDefinition colID="20" Value="Square"/><ValueDefinition colID="3" Value=""/></Values></XMLData>
 
Databricks SQL doesn't seem to support this function ("AnalysisException: Undefined function: MyColumn.value").
 
Is there anything I can substitute within the query line that will work?
6 REPLIES 6

hari-prasad
Contributor II

Hi @BNV,

You can leverage UDF or pandasUDF to register user defined functions to customize to parse the XML data using standard python libraries or even in Scala or Java in Databricks notebooks.

In SQL warehouse, you can create custom SQL UDF, follow this link for more Introducing SQL User-Defined Functions | Databricks Blog.



Regards,
Hari Prasad

Thank you but I'm not very familiar with Pandas. This might be out of my realm of knowledge.

Are you saying Pandas would have this functionality including using SQL and that SQL function or that I would need to create a UDF to parse XML (which sounds quite difficult).

hari-prasad
Contributor II

@BNV, you can leverage xpath SQL function which can parse the XML which works in both Notebook and SQL warehouse, follow this Spark SQL doc for more details https://spark.apache.org/docs/3.5.4/api/sql/#xpath

here is a sample example

hariprasad_0-1736357597347.png

 



Regards,
Hari Prasad

BNV
Visitor

This might be a good start but I do get an error ("Invalid XPath") when trying to access the column as the xpath. Is it not possible to use a column as the xpath?

SELECT xpath(MyColumn.value'/XMLData/Values/ValueDefinition[@colID="10"]/@Value)[1]') as Remarks



hari-prasad
Contributor II

you can share a sample or mocked value, how your xml looks?

mean while you can give a try with below query

SELECT xpath_string(MyColumn.value, '/XMLData/Values/ValueDefinition[@colID="10"]/@Value')[1] as Remarks


Regards,
Hari Prasad

Stefan-Koch
Contributor III

Since Spark Runtime 14.3 and higher, it is possible to read XML using the Spark Read method.

For example:

df = spark.read.option("rowTag", "books").format("xml").load(xmlPath)
df.printSchema()
df.show(truncate=False)

 

Have a look at the docu: https://docs.databricks.com/en/query/formats/xml.html

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group