cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

reading XML file of mutiple row Tags

shubham7
New Contributor II

I have multiple xml files in a folder. i am reading into dataframe in a databricks cell. It has one rootTag and multiple rowTags. can i read into single spark dataframe (pyspark) for all the rowTags. Any reference for this or approach would greatly appreciated.

thanks

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @shubham7 ,

I don't know if I understood your requirements correctly, but maybe something like this could work for you.
I have 2 xml files in my volume, each with following structure:

<root>
  <row>
    <id>2</id>
    <value>foo2</value>
  </row>
  <row>
    <id>3</id>
    <value>bar3</value>
  </row>
</root>


<root>
  <row>
    <id>1</id>
    <value>foo</value>
  </row>
  <row>
    <id>2</id>
    <value>bar</value>
  </row>
</root>


szymon_dybczak_0-1751292958941.png

 

To load those file into single dataframe I've used following code:


df = (
    spark.read.format("xml") 
    .option("rootTag", "root") 
    .option("rowTag", "row") 
    .load("/Volumes/workspace/default/my_volume/*.xml")
) 

shubham7
New Contributor II

you are correct, but i have N number of different rowTags. how to read in a dataframe.