cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?

Michael_Galli
Contributor II

When writing unit tests with unittest / pytest in PySpark, reading mockup datasources with built-in datatypes like csv, json (spark.read.format("json")) works just fine.

But when reading XMLยดs with spark.read.format("com.databricks.spark.xml") in the unit test, this does not work out of the box:

java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml.

NOTE: the unit test do NOT run on a databricks cluster, but on a local hadoop winutils directory.

Is there any way to implement this, or should I use some python build-in xml libraries?

1 ACCEPTED SOLUTION

Accepted Solutions

This is correct.. the following worked for me:

SparkSession.builder.(..).config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.12.0")

View solution in original post

4 REPLIES 4

-werners-
Esteemed Contributor III

I suppose you run spark locally? Because com.databricks.spark.xml is a library for spark.

It is not installed by default so you have to add it to your local spark install.

This is correct.. the following worked for me:

SparkSession.builder.(..).config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.12.0")

Hubert-Dudek
Esteemed Contributor III

Please install spark-xml from Maven. As it is from Maven you need to install it for cluster which you are using in cluster settings (alternatively using API or CLI)

https://mvnrepository.com/artifact/com.databricks/spark-xml

See above, I already found the solution. There is no cluster, but only a local spark session.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.