<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: spark-xml not working with Databricks Connect and Pyspark in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-xml-not-working-with-databricks-connect-and-pyspark/m-p/13804#M8409</link>
    <description>&lt;P&gt;Are you adding spark-xml as a dependency 'locally'? you're doing it right, and the name of the data source doesn't matter. Both are correct. You do not need to install JARs manually.&lt;/P&gt;</description>
    <pubDate>Sun, 10 Oct 2021 16:26:09 GMT</pubDate>
    <dc:creator>sean_owen</dc:creator>
    <dc:date>2021-10-10T16:26:09Z</dc:date>
    <item>
      <title>spark-xml not working with Databricks Connect and Pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-xml-not-working-with-databricks-connect-and-pyspark/m-p/13802#M8407</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I currently have a cluster configured in databricks with spark-xml  (version com.databricks:spark-xml_2.12:0.13.0) which was installed using Maven. The spark-xml library itself works fine with Pyspark when I am using it in a notebook within the databricks web-app.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I often use databricks connect with Pyspark for development though. More specifically, using VS Code. Again, databricks connect works fine when I am  performing commands on the cluster such as spark.read.csv. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, when I try and run my spark-xml code from within VS code, i receive the following error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;java.lang.ClassNotFoundException: Failed to find data source: xml. Please find packages at &lt;A href="http://spark.apache.org/third-party-projects.html" target="test_blank"&gt;http://spark.apache.org/third-party-projects.html&lt;/A&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I have tried using both read formats below with no luck. I have also tried placing the spark-xml jar file that matches the version in databricks within my Pyspark jars but again it did not work.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df = spark.read.format('xml')
&amp;nbsp;
df = spark.read.format('com.databricks.spark.xml')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Any ideas how I can get my local databricks connect venv to recognise the xml data source would be much appreciated!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Sun, 10 Oct 2021 00:35:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-xml-not-working-with-databricks-connect-and-pyspark/m-p/13802#M8407</guid>
      <dc:creator>brendan-b</dc:creator>
      <dc:date>2021-10-10T00:35:37Z</dc:date>
    </item>
    <item>
      <title>Re: spark-xml not working with Databricks Connect and Pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-xml-not-working-with-databricks-connect-and-pyspark/m-p/13804#M8409</link>
      <description>&lt;P&gt;Are you adding spark-xml as a dependency 'locally'? you're doing it right, and the name of the data source doesn't matter. Both are correct. You do not need to install JARs manually.&lt;/P&gt;</description>
      <pubDate>Sun, 10 Oct 2021 16:26:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-xml-not-working-with-databricks-connect-and-pyspark/m-p/13804#M8409</guid>
      <dc:creator>sean_owen</dc:creator>
      <dc:date>2021-10-10T16:26:09Z</dc:date>
    </item>
    <item>
      <title>Re: spark-xml not working with Databricks Connect and Pyspark</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-xml-not-working-with-databricks-connect-and-pyspark/m-p/13805#M8410</link>
      <description>&lt;P&gt;@Sean Owen​&amp;nbsp;I do not believe I have. Do you have any documentation on how to install spark-xml locally? I have tried the following with no luck. IS this what you are referring to?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;PYSPARK_HOME/bin/pyspark --packages com.databricks:spark-xml_2.12:0.13.0&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Oct 2021 21:55:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-xml-not-working-with-databricks-connect-and-pyspark/m-p/13805#M8410</guid>
      <dc:creator>brendan-b</dc:creator>
      <dc:date>2021-10-10T21:55:33Z</dc:date>
    </item>
  </channel>
</rss>

