cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Spark XML parser : support for namespace declared at the ancestor level.

Ben_Spark
New Contributor III

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.

Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level.

For this to work I have to move down the namespace declaration to the level of RowTag.

Example

<RootTag xmlns:myPrefix1="http:....." xmlns:myPrefix2="http:....." ... >

< myPrefix1:ParentMember>

< myPrefixe2:ChildMember>

............

</myPrefixe2:ChildMember>

<myPrefix1:ParentMember>

</RootTag>

Reading the above structure using the rowValidationXSDPath option would end with the following error : the prefix "myPrefixe2" for element "myPrefixe2:ChildMember" is not bound.

I know that was a bug in previous versions but wondering if it was fixed too when the option rowValidationXSDPath is enabled.

Thank you in advance for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

Ben_Spark
New Contributor III

Hi

sorry for the late response got busy looking for a permanent solution to this problem .

At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .

Thank you anyway for your help and support

View solution in original post

9 REPLIES 9

Kaniz
Community Manager
Community Manager

Hi @Ben Ben​ , This article describes how to read and write an XML file as an Apache Spark data source.

Ben_Spark
New Contributor III

Hi Kaniz

Thank you for you answer.

I'm aware of the article and reading an XML without the XSD is not an issue.

The problem is that I need to validate my "row" against an XSD using rowValidationXSDPath , which does not support Prefixes at Row level with namespace declaration at ancestor level.

Kaniz
Community Manager
Community Manager

Hi @Ben Ben​ , You can validate individual rows against an XSD schema using 

rowValidationXSDPath. You use the utility com.databricks.spark.XML.util.XSDToSchema to extract a Spark DataFrame schema from some XSD files.

It supports only simple, complex sequence types, only basic XSD functionality, and is experimental.

If you wish to add any feature request, please go ahead and share your ideas. We would love to hear.

Kaniz
Community Manager
Community Manager

Hi @Ben Ben​ , Would you like to raise a feature request?

Dan_Z
Honored Contributor
Honored Contributor

Hey @Ben Ben​ , so Spark-XML is not a package maintained by Databricks. It seems like the community doesn't have any inputs here. I'd suggest you reach out to the package maintainers via an Issue on their GitHub here: https://github.com/databricks/spark-xml.

Ben_Spark
New Contributor III

Thank you Dan your feedback and proposal.

As per now I will parser the XML file differently. Really no time to raise a ticket and follow-up on it.

Kaniz
Community Manager
Community Manager

Hi @Ben Ben​  , Just a friendly follow-up. Do you still need help, or @Dan Zafar​ 's response help you to find the solution? Please let us know.

Ben_Spark
New Contributor III

Hi

sorry for the late response got busy looking for a permanent solution to this problem .

At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .

Thank you anyway for your help and support

Kaniz
Community Manager
Community Manager

Hi @Ben Ben​ , Thank you for providing the solution here.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.