<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Getting &amp;quot;java.lang.ClassNotFoundException: Failed to find data source: xml&amp;quot; error when loading XML in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28650#M20427</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt; Hi,&lt;/P&gt;
&lt;P&gt; If you are getting this error is due com.sun.xml.bind library is obsolete now.&lt;/P&gt;
&lt;P&gt; You need to download org.jvnet.jaxb2.maven package into a library by using Maven Central and attach that into a cluster. &lt;/P&gt;
&lt;P&gt; Then you are going to be able to use xml-spark.&lt;/P&gt;
&lt;P&gt; For further references you can check this page: &lt;A href="https://datamajor.net/how-to-convert-dataframes-into-xml-files-on-spark/" target="test_blank"&gt;https://datamajor.net/how-to-convert-dataframes-into-xml-files-on-spark/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt; Please tell me if you have more issues related with this library.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 09 Jun 2021 23:39:44 GMT</pubDate>
    <dc:creator>alvaroagx</dc:creator>
    <dc:date>2021-06-09T23:39:44Z</dc:date>
    <item>
      <title>Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28643#M20420</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Both the following commands fail&lt;/P&gt;
&lt;P&gt;df1 = sqlContext.read.format("xml").load(loadPath) &lt;/P&gt;
&lt;P&gt;df2 = sqlContext.read.format("com.databricks.spark.xml").load(loadPath)&lt;/P&gt;
&lt;P&gt;with the following error message:&lt;/P&gt;
&lt;P&gt;java.lang.ClassNotFoundException: Failed to find data source: xml. Please find packages at &lt;A href="http://spark.apache.org/third-party-projects.html" target="test_blank"&gt;http://spark.apache.org/third-party-projects.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;I read several articles on this forum but none had a resolution. I thought Databricks has the XML library installed already. This is on a DBC cluster with "4.2 (includes Apache Spark 2.3.1, Scala 2.11)"&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Aug 2018 17:35:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28643#M20420</guid>
      <dc:creator>FrancisLau1897</dc:creator>
      <dc:date>2018-08-03T17:35:22Z</dc:date>
    </item>
    <item>
      <title>Re: Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28644#M20421</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;You must add the spark-xml library to your cluster. No, it is not preinstalled in any runtime.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 30 Dec 2018 20:28:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28644#M20421</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2018-12-30T20:28:58Z</dc:date>
    </item>
    <item>
      <title>Re: Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28645#M20422</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I've installed the spark-xml library using the databricks spark package interface and it shows attached to the cluster - I get the same error (even after restarting the cluster.) Is there something I'm missing for installing the library?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Jan 2019 05:46:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28645#M20422</guid>
      <dc:creator>msft_Ted</dc:creator>
      <dc:date>2019-01-04T05:46:39Z</dc:date>
    </item>
    <item>
      <title>Re: Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28646#M20423</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hm, it seems to work for me. I attached com.databricks:spark-xml:0.5.0 to a new runtime 5.1 cluster, and successfully executed a command like below. Did the library attach successfully? that should be all there is to it.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;display(spark.read.option("rowTag", "book").format("xml").load("/dbfs/tmp/sean.owen/books.xml"))&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Jan 2019 15:11:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28646#M20423</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2019-01-04T15:11:13Z</dc:date>
    </item>
    <item>
      <title>Re: Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28647#M20424</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;That was the issue - the Spark Packages version is 0.1.1, the maven central version is 0.5.0 - changing to use the Maven package made the whole thing work.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Jan 2019 22:09:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28647#M20424</guid>
      <dc:creator>msft_Ted</dc:creator>
      <dc:date>2019-01-04T22:09:24Z</dc:date>
    </item>
    <item>
      <title>Re: Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28648#M20425</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Putting this as a top-level comment. credit to &lt;A href="https://users/28712/srowen.html" target="_blank"&gt;@srowen&lt;/A&gt; for the answer: Use the Maven Central library ( version 0.5.0) instead of the Spark Packages version (0.1.1)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Jan 2019 22:11:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28648#M20425</guid>
      <dc:creator>msft_Ted</dc:creator>
      <dc:date>2019-01-04T22:11:08Z</dc:date>
    </item>
    <item>
      <title>Re: Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28649#M20426</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Adding further details to existing comments, latest packages can be derived from maven. &lt;/P&gt;
&lt;P&gt;Example: com.databricks:spark-xml_2.12:0.9.0 is latest as of today. Here 2.12 means the latest Scala version. So we can choose latest jars based on our configuration.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 May 2020 15:34:01 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28649#M20426</guid>
      <dc:creator>VISWANATHANRENG</dc:creator>
      <dc:date>2020-05-19T15:34:01Z</dc:date>
    </item>
    <item>
      <title>Re: Getting "java.lang.ClassNotFoundException: Failed to find data source: xml" error when loading XML</title>
      <link>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28650#M20427</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt; Hi,&lt;/P&gt;
&lt;P&gt; If you are getting this error is due com.sun.xml.bind library is obsolete now.&lt;/P&gt;
&lt;P&gt; You need to download org.jvnet.jaxb2.maven package into a library by using Maven Central and attach that into a cluster. &lt;/P&gt;
&lt;P&gt; Then you are going to be able to use xml-spark.&lt;/P&gt;
&lt;P&gt; For further references you can check this page: &lt;A href="https://datamajor.net/how-to-convert-dataframes-into-xml-files-on-spark/" target="test_blank"&gt;https://datamajor.net/how-to-convert-dataframes-into-xml-files-on-spark/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt; Please tell me if you have more issues related with this library.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jun 2021 23:39:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/getting-quot-java-lang-classnotfoundexception-failed-to-find/m-p/28650#M20427</guid>
      <dc:creator>alvaroagx</dc:creator>
      <dc:date>2021-06-09T23:39:44Z</dc:date>
    </item>
  </channel>
</rss>

