EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster

der
Valued Contributor

I want to read an Excel xlsx file on DBR 17.3. On the Cluster the library dev.mauch:spark-excel_2.13:4.0.0_0.31.2 is installed. 

V1 Implementation works fine:

df = spark.read.format("dev.mauch.spark.excel").schema(schema).load(excel_file)
display(df)

V2 throws [EXCEL_DATA_SOURCE_NOT_ENABLED] Excel data source is not enabled in this cluster. SQLSTATE: 56038

df = spark.read.format("excel").schema(schema).load(excel_file)
display(df)

https://learn.microsoft.com/en-gb/azure/databricks/error-messages/error-classes#excel_data_source_no...

https://docs.databricks.com/aws/en/error-messages/excel-file-error-error-class

I wonder, why do we have specific excel error codes and how do we enable the excel data source? 

 

 

der
Valued Contributor

If I build the spark-excel library with another short name (example "excelv2"), everything works fine. 

https://github.com/nightscape/spark-excel/issues/896#issuecomment-3486861693

mmayorga
Databricks Employee
Databricks Employee

hi @der 

Thank you for reaching out with your question, very interesting behavior.

I attempted to reproduce your issue using DBR 17.3 LTS but was unable to install the specific library version “dev.mauch:spark-excel_2.13:4.0.0_0.31.2.” However, I installed the next available version, “dev.mauch:spark-excel_2.13:4.0.0_0.32.2-prerelease0.”

With this version, I was able to use the “spark.read.format(“excel”)” function without any issue.

spark.read.format("excel").load(excel_file)

Aside from the links you shared, I couldn’t find documentation explaining why the error occurred with the “Excel” format. It appears that the issue may have been resolved in this newer version, possibly due to a missing internal registration.

I recommend updating your dev.mauch:spark-excel library to the newer version.

Alternatively, you can handle Excel (xlsx) files using the Pandas API on Spark, with the read_excel function, then convert it to a spark dataframe using the code below; just ensure you have “openpyxl” installed (using pip install)

import pyspark.pandas as ps

df = ps.read_excel(excel_file)
sp_df = df.to_spark()
display(sp_df)

 I hope this helps and if it does, please Accept as Solution.
Thank you!

der
Valued Contributor

Hi @mmayorga 

Thank you for your response. I can install both version, but both fail.

My current Azure Databricks cluster configuration json:

{
    "cluster_name": "test spark-excel",
    "spark_version": "17.3.x-scala2.13",
    "node_type_id": "Standard_F4s",
    "autotermination_minutes": 60,
    "single_user_name": "abc@company.com",
    "data_security_mode": "DATA_SECURITY_MODE_DEDICATED",
    "runtime_engine": "STANDARD",
    "kind": "CLASSIC_PREVIEW",
    "is_single_node": true
}

 Package installation:

der_0-1762419380757.png

How is your setup?

der
Valued Contributor

@mmayorga 

Another thing is the specific error message.

https://learn.microsoft.com/en-gb/azure/databricks/error-messages/error-classes#excel_data_source_no...

I still not get it, why there is this specific "excel" error message?

if i run:

spark.read.format("test").load(excel_file)

 The error message is different:

Py4JJavaError: An error occurred while calling o495.load. : org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: test. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version. SQLSTATE: 42K02

mmayorga
Databricks Employee
Databricks Employee

hi @der 

First of all thank you for your patience and for providing more information about your case.

Use of ".format("excel")"

I replicated equally your cluster config in Azure. Without installing any library, I was able to run and load the xlsx file using ".format("excel")". Unfortunately the documentation does not mention about it, and at this point I'm not using "dev.mauch.spark.excel" library

mmayorga_0-1762445991717.png

Then I proceeded to check the "SparkUI" and identified within the "SQL / Dataframe Properties" for "spark.databricks.sql.excel.enabled" with value "true".

You can check this configuration by executing the following code:

spark.conf.get("spark.databricks.sql.excel.enabled")

Next I disabled it to test it:

spark.conf.set("spark.databricks.sql.excel.enabled","false")

With this configuration now I was able to replicate the issue:

mmayorga_1-1762446510138.png

With this said, please on your cluster, enable this configuration and try again to read the file

spark.conf.set("spark.databricks.sql.excel.enabled","true")

 Use of ".format("dev.mauch.spark.excel)""

Then I installed the "dev.mauch.spark.excel" and was also able to use it as well.

mmayorga_2-1762447091028.png

Conclusion

This behavior highlights that these two implementations are separated and that ".format("excel")" leverages "spark.databricks.sql.excel" instead of the expected "dev.mauch.spark.excel". I agree that is a bit confusing while updating API versions where both use "excel" as format value.

 I hope this helps and if it does, please Accept as Solution.

der
Valued Contributor

Hi @mmayorga 

Thank you very much for your insights. 

I am not sure if the format "excel" has something to do with open source spark. I think this is more a Databricks thing spark.databricks.sql.excel.enabled

Current state:

Databricks RuntimeSpark Excel
default of config spark.databricks.sql.excel.enabled
V2 Reader .format("excel")
16.4.x-scala2.12
com.crealytics:spark-excel_2.12:3.5.0_0.20.3
 
cannot be found
works
17.3.x-scala2.13dev.mauch:spark-excel_2.13:4.0.0_0.31.2false does not work

How to use V2 data source .format("excel") of dev.mauch:spark-excel_2.13:4.0.0_0.31.2 with 17.3.x-scala2.13?

 

der
Valued Contributor

I reached out to Databricks support and they fixed it with December 2025 maintenance update. Now the open source excel reader and the new build in should work.

https://learn.microsoft.com/en-gb/azure/databricks/query/formats/excel 

View solution in original post