cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks XML - Bypassing rootTag and rowTag

RobsonNLPT
Contributor II

I see the current conversion of dataframe to xml need to be improved.

My dataframe schema is a perfect nested schema based on structs but when I create a xml I have the follow issues:

1) I can't add elements to root

2) rootTag and rowTag are required

In the end I remove the first level of hierarchy (rowTag) using string methods or manually. The rowTag is already part of the dataframe nested schema so it doesn't make any sense

 

 

 

 

4 REPLIES 4

Hi Kaniz. Willl test your suggestions but I think the documentation provided by Databricks / Spark  should include those relevant topics in depth. I've seen lots of posts on web regarding this topic.

Thank you

Hi Kaniz . I tested option("rowTag", "") using the library com.databricks:spark-xml_2.12:0.17.0 and also adb native format (runtime 14.3) but in both I got the error  "requirement failed: 'rowTag' option should not be empty string"..

 

sandip_a
Databricks Employee
Databricks Employee

Here is one of the ways to use the struct field name as rowTag:

 

 
import org.apache.spark.sql.types._
val schema = new StructType().add("Record",
  new StructType().add("age", IntegerType).add("name", StringType))
val data = Seq(Row(Row(18, "John Doe")), Row(Row(19, "Mary Doe")))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
val rowTag = schema.fields.head.name
df.coalesce(1).select(s"$rowTag.*").write.mode("Overwrite").option("rowTag", rowTag).xml("/tmp/xml_test")

If the generated XML file above read again, it will have a flattened schema with two fields ('age' and 'name') instead of a single struct column.

Hi. In this case rootTag is required also. Otherwise it will be the default "ROWS".

I have attributes at root level (in bold) before rows

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root x = 1>
 <rat1>434343</rat1>
 <rat2>
 <x>4</x>
 <y>6</y>
 </rat2>
 <rows>
  <row>
   <a>5</a>
   <b>5</b>
  </row>
  <row>
   <a>5</a>
   <b>5</b>
  </row>
</rows>
</root>

The best would be bypassing rootTag and rowTag as my dataframe has the full nested structure. The behaviour should be same as json libraries

 

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group