Error with Read XML data using the spark-xml library
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2025 02:48 AM
hi, would appritiate any help with an error with loading an XML file with spark-xml library.
my enviorment :
14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)
library : com.databricks:spark-xml_2.12:0.15.0
on databricks notebook.
when running this script :
it was tested and there is a file like that in the blob.
can library connect directly to the blob?
what is the format for that and the best practice?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2025 04:37 AM
Hi @citizenX7042,
Since the error indicates an issue with the configuration value for fs.azure.account.key
Can you test with the below code:
from pyspark.sql.functions import regexp_extract, input_file_name
# Set the storage account key
spark.conf.set("fs.azure.account.key.<your-storage-account-name>.dfs.core.windows.net", "<your-storage-account-key>")
# Define the file path
single_file = "abfss://external-sources@<your-storage-account-name>.dfs.core.windows.net/Bronze/Tribe_Report/20241210/visa-10079563/cards-11-15967860899208-10079563-20241210.xml"
# Load the single file
raw_df_single = (
spark.read.format("com.databricks.spark.xml") # XML format
.option("rowTag", "Card") # Specify the row tag for parsing
.load(single_file) # Load the single file
.withColumn("@FileName", regexp_extract(input_file_name(), r"([^/]+)$", 1)) # Extract file name
)
# Show a preview of the data
raw_df_single.show()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2025 04:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2025 01:37 AM
Hi @Alberto_Umana , I am facing the same issue. It works when i try to read the xml file as text using spark.read.text(), but fails when I try to read it in xml format. I'm authenticating using spn and the config is correct as i'm able to read json files from the same folder and also the xml file in text as mentioned.
Also it works if i use the mounted path to the file and not when i use the abfss path.
Could it be an issue with the spark-xml library not being able to work directly with abfss?
I have the following installed in my cluster: com.databricks:spark-xml_2.12:0.15.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
UPDATE:
It is now possible to read xml files directly: https://docs.databricks.com/en/query/formats/xml.html
Make sure to update your Databricks Runtime to 14.3 and above, and remove the spark-xml maven library from your cluster.

