cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?

Michael_Galli
Contributor III

When writing unit tests with unittest / pytest in PySpark, reading mockup datasources with built-in datatypes like csv, json (spark.read.format("json")) works just fine.

But when reading XMLยดs with spark.read.format("com.databricks.spark.xml") in the unit test, this does not work out of the box:

java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml.

NOTE: the unit test do NOT run on a databricks cluster, but on a local hadoop winutils directory.

Is there any way to implement this, or should I use some python build-in xml libraries?

1 ACCEPTED SOLUTION

Accepted Solutions

This is correct.. the following worked for me:

SparkSession.builder.(..).config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.12.0")

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

I suppose you run spark locally? Because com.databricks.spark.xml is a library for spark.

It is not installed by default so you have to add it to your local spark install.

This is correct.. the following worked for me:

SparkSession.builder.(..).config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.12.0")

Hubert-Dudek
Esteemed Contributor III

Please install spark-xml from Maven. As it is from Maven you need to install it for cluster which you are using in cluster settings (alternatively using API or CLI)

https://mvnrepository.com/artifact/com.databricks/spark-xml

See above, I already found the solution. There is no cluster, but only a local spark session.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group