cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks Spark XML parser : support for namespace declared at the ancestor level.

Ben_Spark
New Contributor III

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.

Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level.

For this to work I have to move down the namespace declaration to the level of RowTag.

Example

<RootTag xmlns:myPrefix1="http:....." xmlns:myPrefix2="http:....." ... >

< myPrefix1:ParentMember>

< myPrefixe2:ChildMember>

............

</myPrefixe2:ChildMember>

<myPrefix1:ParentMember>

</RootTag>

Reading the above structure using the rowValidationXSDPath option would end with the following error : the prefix "myPrefixe2" for element "myPrefixe2:ChildMember" is not bound.

I know that was a bug in previous versions but wondering if it was fixed too when the option rowValidationXSDPath is enabled.

Thank you in advance for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

Ben_Spark
New Contributor III

Hi

sorry for the late response got busy looking for a permanent solution to this problem .

At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .

Thank you anyway for your help and support

View solution in original post

4 REPLIES 4

Ben_Spark
New Contributor III

Hi Kaniz

Thank you for you answer.

I'm aware of the article and reading an XML without the XSD is not an issue.

The problem is that I need to validate my "row" against an XSD using rowValidationXSDPath , which does not support Prefixes at Row level with namespace declaration at ancestor level.

Dan_Z
Databricks Employee
Databricks Employee

Hey @Ben Ben​ , so Spark-XML is not a package maintained by Databricks. It seems like the community doesn't have any inputs here. I'd suggest you reach out to the package maintainers via an Issue on their GitHub here: https://github.com/databricks/spark-xml.

Ben_Spark
New Contributor III

Thank you Dan your feedback and proposal.

As per now I will parser the XML file differently. Really no time to raise a ticket and follow-up on it.

Ben_Spark
New Contributor III

Hi

sorry for the late response got busy looking for a permanent solution to this problem .

At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .

Thank you anyway for your help and support

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group