cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

spark.databricks.sql.excel.enabled false at cluster level

der
Valued Contributor

Native databricks excel data source is GA
https://www.reddit.com/r/databricks/comments/1t4un82/native_excel_support_is_now_ga/
https://docs.databricks.com/aws/en/query/formats/excel

However, as long as it is not possible to read from another adress than A1 without specify a full range .option("dataAddress", "Sheet1!E10"), we stick to open source solution. However, it is not possible to deactivate the native databricks excel source on cluster level ☹️

Compute → <YOUR CLUSTER> → Configuration → Advanced → Spark spark.databricks.sql.excel.enabled false

spark.conf.get("spark.databricks.sql.excel.enabled")

'true'

If you do the same on notebook level, it works.

spark.conf.set("spark.databricks.sql.excel.enabled", "false")
spark.conf.get("spark.databricks.sql.excel.enabled")

'false'

EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster 

@mmayorga 

4 REPLIES 4

der
Valued Contributor

I missed to add the job error, if we add it on cluster spark configuration:

[MULTIPLE_EXCEL_DATA_SOURCE] Detected multiple Excel data sources with the name excel (dev.mauch.spark.excel.v2.ExcelDataSource, org.apache.spark.sql.execution.datasources.excel.ExcelFileFormat). Please specify the fully qualified class name or remove dev.mauch.spark.excel.v2.ExcelDataSource from the classpath. SQLSTATE: 42710

nidhin
New Contributor II

Just wondering ,why do you need it cluster wide instead of notebook or session level? Any specific job requirement? Notebook level works!

der
Valued Contributor

Define it once on cluster VS define it in each notebook session. Get Consistency across all notebooks. And why should it not work on cluster level? 

szymon_dybczak
Esteemed Contributor III

Hi  @der ,

Most likely because spark.databricks.sql.excel.enabled is a Databricks SQL/session-level internal config, not a SparkConf setting.

This specific key appears to be read from the Spark SQL session config, so setting it after the notebook session starts works:

 
spark.conf.set("spark.databricks.sql.excel.enabled", "false")
But putting this in the cluster Spark config:
 

 

spark.databricks.sql.excel.enabled false

is ignored when Databricks initializes the SQL session