cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to load xml files with spark-xml ?

leaw
New Contributor III

Hello,

I cannot load xml files.

First, I tried to install Maven library com.databricks:spark-xml_2.12:0.14.0 as it is told in documentation, but I could not find it. I only have HyukjinKwon:spark-xml:0.1.1-s_2.10, with this one I have this error: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: Library resolution failed because unresolved dependency: com.databricks:spark-xml_2.12:0.17.0: not found

Then I tried to install library via dbfs using a JAR file. I tried spark_xml_2_12_0_15_0.jar and spark_xml_2_12_0_17_0.jar, doing this I progressed a little but I have now this error: java.lang.NoClassDefFoundError: scala/$less$colon$less

My cluster Runtime Version is: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)

I have to read my xml files via notebook, thank you in advance for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

leaw
New Contributor III

I think I have resolved my issue by dowloading and adding last version jar file for scala 2.12, but I don't know if it is a long term solution.

(yesterday it was working then it was not, then yes, it is not very steady.)

leaw_1-1705321241586.png

If anybody faces this problem, I'll be grateful for sharing experience about reading xml files in databricks.

View solution in original post

7 REPLIES 7

Lakshay
Databricks Employee
Databricks Employee

Hi @leaw , you can install the maven library in your cluster as below:

Screenshot 2024-01-13 at 1.15.45 AM.png

After that you just need to follow the document: https://docs.databricks.com/en/query/formats/xml.html

leaw
New Contributor III

Thanks for your answer. I had already tried this but I have an error as I don't have this library on my databricks.

leaw_0-1705320875596.png

 

leaw
New Contributor III

I think I have resolved my issue by dowloading and adding last version jar file for scala 2.12, but I don't know if it is a long term solution.

(yesterday it was working then it was not, then yes, it is not very steady.)

leaw_1-1705321241586.png

If anybody faces this problem, I'll be grateful for sharing experience about reading xml files in databricks.

Lakshay
Databricks Employee
Databricks Employee

Hi @leaw , The option I suggested should have downloaded the jar directly from maven but it seems like due to some issue it is unable to download. 

Lakshay
Databricks Employee
Databricks Employee

Anyway, glad to know that you were able to find an alternate solution.

Frustrated_DE
New Contributor III

Hi All,

 Installed spark-xml_2.13-0.17.0.jar on runtime 14.2  scala 2.12 and also receiving the error when attempting to read XML. Any advice would be appreciated around how to resolve.

"java.lang.NoClassDefFoundError: scala/$less$colon$less"

Frustrated_DE
New Contributor III

Mismatch on Scala version, my bad! Sorted

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group