14 hours ago
Trying to translate this line of a SQL query that evaluates XML to Databricks SQL.
13 hours ago - last edited 13 hours ago
Hi @BNV,
You can leverage UDF or pandasUDF to register user defined functions to customize to parse the XML data using standard python libraries or even in Scala or Java in Databricks notebooks.
In SQL warehouse, you can create custom SQL UDF, follow this link for more Introducing SQL User-Defined Functions | Databricks Blog.
13 hours ago
Thank you but I'm not very familiar with Pandas. This might be out of my realm of knowledge.
Are you saying Pandas would have this functionality including using SQL and that SQL function or that I would need to create a UDF to parse XML (which sounds quite difficult).
13 hours ago - last edited 13 hours ago
@BNV, you can leverage xpath SQL function which can parse the XML which works in both Notebook and SQL warehouse, follow this Spark SQL doc for more details https://spark.apache.org/docs/3.5.4/api/sql/#xpath
here is a sample example
9 hours ago
This might be a good start but I do get an error ("Invalid XPath") when trying to access the column as the xpath. Is it not possible to use a column as the xpath?
3 hours ago - last edited 3 hours ago
you can share a sample or mocked value, how your xml looks?
mean while you can give a try with below query
32m ago
Since Spark Runtime 14.3 and higher, it is possible to read XML using the Spark Read method.
For example:
df = spark.read.option("rowTag", "books").format("xml").load(xmlPath)
df.printSchema()
df.show(truncate=False)
Have a look at the docu: https://docs.databricks.com/en/query/formats/xml.html
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group