cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ben_Spark
by New Contributor III
  • 7197 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level. For this to...

  • 7197 Views
  • 4 replies
  • 2 kudos
Latest Reply
Ben_Spark
New Contributor III
  • 2 kudos

Hi sorry for the late response got busy looking for a permanent solution to this problem .At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .Thank you anyway for ...

  • 2 kudos
3 More Replies
SreedharVengala
by New Contributor III
  • 11674 Views
  • 2 replies
  • 1 kudos

Parsing deeply nested XML in Databricks

Hi Guys,Can someone point me to libraries to parse XML files in Databricks using Python / Scala.Any link to blog / documentations will be helpful.Looked into https://docs.databricks.com/data/data-sources/xml.html.Want to parse XSD's, seem this is exp...

  • 11674 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Sreedhar Vengala​ - I heard back from the team. As you noted, the feature is still experimental and not supported at this time.I would like to assure you that the team is aware of this. I have no information about a time frame to make this a support...

  • 1 kudos
1 More Replies
User16783853501
by Databricks Employee
  • 1166 Views
  • 1 replies
  • 0 kudos

What types of files does autoloader support for streaming ingestion ? I see good support for CSV and JSON, how can I ingest files like XML, avro, parquet etc ? would XML rely on Spark-XML ?

What types of files does autoloader support for streaming ingestion ? I see good support for CSV and JSON, how can I ingest files like XML, avro, parquet etc ? would XML rely on Spark-XML ? 

  • 1166 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Please raise a feature request via ideas portal for XML support in autoloader As a workaround, you could look at reading this with wholeTextFiles (which loads the data into a PairRDD with one record per input file) and parsing it with from_xml from ...

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 2560 Views
  • 4 replies
  • 1 kudos
  • 2560 Views
  • 4 replies
  • 1 kudos
Latest Reply
sean_owen
Databricks Employee
  • 1 kudos

Note that you will need to install the spark-xml library to make this work: https://github.com/databricks/spark-xml For example you can create a Library in the workspace that references com.databricks:spark-xml_2.12:0.12.0 and then attach it to a clu...

  • 1 kudos
3 More Replies
FrancisLau1897
by New Contributor
  • 21227 Views
  • 7 replies
  • 0 kudos

Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML

Both the following commands fail df1 = sqlContext.read.format("xml").load(loadPath) df2 = sqlContext.read.format("com.databricks.spark.xml").load(loadPath) with the following error message: java.lang.ClassNotFoundException: Failed to find data sour...

  • 21227 Views
  • 7 replies
  • 0 kudos
Latest Reply
alvaroagx
New Contributor II
  • 0 kudos

Hi, If you are getting this error is due com.sun.xml.bind library is obsolete now. You need to download org.jvnet.jaxb2.maven package into a library by using Maven Central and attach that into a cluster. Then you are going to be able to use xml...

  • 0 kudos
6 More Replies
Labels