When writing unit tests with unittest / pytest in PySpark, reading mockup datasources with built-in datatypes like csv, json (spark.read.format("json")) works just fine.
But when reading XMLยดs with spark.read.format("com.databricks.spark.xml") in the unit test, this does not work out of the box:
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.xml.
NOTE: the unit test do NOT run on a databricks cluster, but on a local hadoop winutils directory.
Is there any way to implement this, or should I use some python build-in xml libraries?