<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12536#M7336</link>
    <description>&lt;P&gt;Please install spark-xml from Maven. As it is from Maven you need to install it for cluster which you are using in cluster settings (alternatively using API or CLI)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://mvnrepository.com/artifact/com.databricks/spark-xml" target="test_blank"&gt;https://mvnrepository.com/artifact/com.databricks/spark-xml&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 26 Jul 2022 12:51:39 GMT</pubDate>
    <dc:creator>Hubert-Dudek</dc:creator>
    <dc:date>2022-07-26T12:51:39Z</dc:date>
    <item>
      <title>Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?</title>
      <link>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12533#M7333</link>
      <description>&lt;P&gt;When writing unit tests with unittest / pytest in PySpark, reading mockup datasources with built-in datatypes like csv, json (spark.read.format("json")) works just fine.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But when reading XML´s with spark.read.format("com.databricks.spark.xml") in the unit test, this does not work out of the box:&lt;/P&gt;&lt;P&gt;java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;NOTE: the unit test do NOT run on a databricks cluster, but on a local hadoop winutils directory.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there any way to implement this, or should I use some python build-in xml libraries?&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2022 09:30:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12533#M7333</guid>
      <dc:creator>Michael_Galli</dc:creator>
      <dc:date>2022-07-26T09:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?</title>
      <link>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12534#M7334</link>
      <description>&lt;P&gt;I suppose you run spark locally? Because com.databricks.spark.xml is a library for spark.&lt;/P&gt;&lt;P&gt;It is not installed by default so you have to add it to your local spark install.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2022 12:10:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12534#M7334</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-07-26T12:10:29Z</dc:date>
    </item>
    <item>
      <title>Re: Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?</title>
      <link>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12535#M7335</link>
      <description>&lt;P&gt;This is correct.. the following worked for me:&lt;/P&gt;&lt;P&gt;SparkSession.builder.(..).config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.12.0")&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2022 12:49:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12535#M7335</guid>
      <dc:creator>Michael_Galli</dc:creator>
      <dc:date>2022-07-26T12:49:58Z</dc:date>
    </item>
    <item>
      <title>Re: Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?</title>
      <link>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12536#M7336</link>
      <description>&lt;P&gt;Please install spark-xml from Maven. As it is from Maven you need to install it for cluster which you are using in cluster settings (alternatively using API or CLI)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://mvnrepository.com/artifact/com.databricks/spark-xml" target="test_blank"&gt;https://mvnrepository.com/artifact/com.databricks/spark-xml&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2022 12:51:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12536#M7336</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-07-26T12:51:39Z</dc:date>
    </item>
    <item>
      <title>Re: Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?</title>
      <link>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12537#M7337</link>
      <description>&lt;P&gt;See above, I already found the solution. There is no cluster, but only a local spark session.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jul 2022 13:19:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unittest-in-pyspark-how-to-read-xml-with-maven-com-databricks/m-p/12537#M7337</guid>
      <dc:creator>Michael_Galli</dc:creator>
      <dc:date>2022-07-26T13:19:05Z</dc:date>
    </item>
  </channel>
</rss>

