cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

can we read XML files into Dataframes in Spark?

Srikanth_Gupta_
Valued Contributor
 
4 REPLIES 4

Ryan_Chynoweth
Honored Contributor III

Yes, you can use the Spark XML library. https://github.com/databricks/spark-xml

Srikanth_Gupta_
Valued Contributor

Yes we can read using below code snippet

val df = spark.read .format("com.databricks.spark.xml").option("rowTag", "<message>") .load("sample.xml")

display(df)

rowTag is important to specify to read the actual content in XML

please read for more details

Mooune_DBU
Valued Contributor

Yes of course, you can use the

OR use

val df = spark.read
      .format("xml")
      .load("my_file.xml")

More info on the spark-xml api here

sean_owen
Honored Contributor II
Honored Contributor II

Note that you will need to install the spark-xml library to make this work: https://github.com/databricks/spark-xml For example you can create a Library in the workspace that references com.databricks:spark-xml_2.12:0.12.0 and then attach it to a cluster

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.