cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to load xml files with spark-xml ?

leaw
New Contributor III

Hello,

I cannot load xml files.

First, I tried to install Maven library com.databricks:spark-xml_2.12:0.14.0 as it is told in documentation, but I could not find it. I only have HyukjinKwon:spark-xml:0.1.1-s_2.10, with this one I have this error: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: Library resolution failed because unresolved dependency: com.databricks:spark-xml_2.12:0.17.0: not found

Then I tried to install library via dbfs using a JAR file. I tried spark_xml_2_12_0_15_0.jar and spark_xml_2_12_0_17_0.jar, doing this I progressed a little but I have now this error: java.lang.NoClassDefFoundError: scala/$less$colon$less

My cluster Runtime Version is: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)

I have to read my xml files via notebook, thank you in advance for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

leaw
New Contributor III

I think I have resolved my issue by dowloading and adding last version jar file for scala 2.12, but I don't know if it is a long term solution.

(yesterday it was working then it was not, then yes, it is not very steady.)

leaw_1-1705321241586.png

If anybody faces this problem, I'll be grateful for sharing experience about reading xml files in databricks.

View solution in original post

7 REPLIES 7

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @leaw , you can install the maven library in your cluster as below:

Screenshot 2024-01-13 at 1.15.45โ€ฏAM.png

After that you just need to follow the document: https://docs.databricks.com/en/query/formats/xml.html

leaw
New Contributor III

Thanks for your answer. I had already tried this but I have an error as I don't have this library on my databricks.

leaw_0-1705320875596.png

 

leaw
New Contributor III

I think I have resolved my issue by dowloading and adding last version jar file for scala 2.12, but I don't know if it is a long term solution.

(yesterday it was working then it was not, then yes, it is not very steady.)

leaw_1-1705321241586.png

If anybody faces this problem, I'll be grateful for sharing experience about reading xml files in databricks.

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @leaw , The option I suggested should have downloaded the jar directly from maven but it seems like due to some issue it is unable to download. 

Lakshay
Esteemed Contributor
Esteemed Contributor

Anyway, glad to know that you were able to find an alternate solution.

Frustrated_DE
New Contributor II

Hi All,

 Installed spark-xml_2.13-0.17.0.jar on runtime 14.2  scala 2.12 and also receiving the error when attempting to read XML. Any advice would be appreciated around how to resolve.

"java.lang.NoClassDefFoundError: scala/$less$colon$less"

Frustrated_DE
New Contributor II

Mismatch on Scala version, my bad! Sorted

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.