04-14-2022 03:11 AM
I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.
Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level.
For this to work I have to move down the namespace declaration to the level of RowTag.
Example
<RootTag xmlns:myPrefix1="http:....." xmlns:myPrefix2="http:....." ... >
< myPrefix1:ParentMember>
< myPrefixe2:ChildMember>
............
</myPrefixe2:ChildMember>
<myPrefix1:ParentMember>
</RootTag>
Reading the above structure using the rowValidationXSDPath option would end with the following error : the prefix "myPrefixe2" for element "myPrefixe2:ChildMember" is not bound.
I know that was a bug in previous versions but wondering if it was fixed too when the option rowValidationXSDPath is enabled.
Thank you in advance for your help.
05-11-2022 06:34 AM
Hi
sorry for the late response got busy looking for a permanent solution to this problem .
At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .
Thank you anyway for your help and support
04-18-2022 04:25 AM
Hi @Ben Ben , This article describes how to read and write an XML file as an Apache Spark data source.
04-18-2022 04:40 AM
Hi Kaniz
Thank you for you answer.
I'm aware of the article and reading an XML without the XSD is not an issue.
The problem is that I need to validate my "row" against an XSD using rowValidationXSDPath , which does not support Prefixes at Row level with namespace declaration at ancestor level.
04-18-2022 08:43 AM
Hi @Ben Ben , You can validate individual rows against an XSD schema using
rowValidationXSDPath. You use the utility com.databricks.spark.XML.util.XSDToSchema to extract a Spark DataFrame schema from some XSD files.
It supports only simple, complex sequence types, only basic XSD functionality, and is experimental.
If you wish to add any feature request, please go ahead and share your ideas. We would love to hear.
04-26-2022 03:27 AM
Hi @Ben Ben , Would you like to raise a feature request?
05-04-2022 09:47 AM
Hey @Ben Ben , so Spark-XML is not a package maintained by Databricks. It seems like the community doesn't have any inputs here. I'd suggest you reach out to the package maintainers via an Issue on their GitHub here: https://github.com/databricks/spark-xml.
05-11-2022 06:38 AM
Thank you Dan your feedback and proposal.
As per now I will parser the XML file differently. Really no time to raise a ticket and follow-up on it.
05-11-2022 04:34 AM
Hi @Ben Ben , Just a friendly follow-up. Do you still need help, or @Dan Zafar 's response help you to find the solution? Please let us know.
05-11-2022 06:34 AM
Hi
sorry for the late response got busy looking for a permanent solution to this problem .
At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .
Thank you anyway for your help and support
05-13-2022 06:35 AM
Hi @Ben Ben , Thank you for providing the solution here.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.