Databricks Community

shubham7 · ‎06-30-2025

I have multiple xml files in a folder. i am reading into dataframe in a databricks cell. It has one rootTag and multiple rowTags. can i read into single spark dataframe (pyspark) for all the rowTags. Any reference for this or approach would greatly appreciated.

thanks

szymon_dybczak · ‎06-30-2025

Hi @shubham7 ,

I don't know if I understood your requirements correctly, but maybe something like this could work for you.
I have 2 xml files in my volume, each with following structure:

<root>
  <row>
    <id>2</id>
    <value>foo2</value>
  </row>
  <row>
    <id>3</id>
    <value>bar3</value>
  </row>
</root>


<root>
  <row>
    <id>1</id>
    <value>foo</value>
  </row>
  <row>
    <id>2</id>
    <value>bar</value>
  </row>
</root>

To load those file into single dataframe I've used following code:

df = (
    spark.read.format("xml") 
    .option("rootTag", "root") 
    .option("rowTag", "row") 
    .load("/Volumes/workspace/default/my_volume/*.xml")
)