2 weeks ago
Native databricks excel data source is GA
https://www.reddit.com/r/databricks/comments/1t4un82/native_excel_support_is_now_ga/
https://docs.databricks.com/aws/en/query/formats/excel
However, as long as it is not possible to read from another adress than A1 without specify a full range .option("dataAddress", "Sheet1!E10"), we stick to open source solution. However, it is not possible to deactivate the native databricks excel source on cluster level âšī¸
Compute â <YOUR CLUSTER> â Configuration â Advanced â Spark spark.databricks.sql.excel.enabled false
spark.conf.get("spark.databricks.sql.excel.enabled")
'true'
If you do the same on notebook level, it works.
spark.conf.set("spark.databricks.sql.excel.enabled", "false")
spark.conf.get("spark.databricks.sql.excel.enabled")
'false'
EXCEL_DATA_SOURCE_NOT_ENABLED Excel data source is not enabled in this cluster
2 weeks ago
Hi @der ,
Most likely because spark.databricks.sql.excel.enabled is a Databricks SQL/session-level internal config, not a SparkConf setting.
This specific key appears to be read from the Spark SQL session config, so setting it after the notebook session starts works:
spark.conf.set("spark.databricks.sql.excel.enabled", "false")
spark.databricks.sql.excel.enabled false
is ignored when Databricks initializes the SQL session
2 weeks ago
I missed to add the job error, if we add it on cluster spark configuration:
[MULTIPLE_EXCEL_DATA_SOURCE] Detected multiple Excel data sources with the name excel (dev.mauch.spark.excel.v2.ExcelDataSource, org.apache.spark.sql.execution.datasources.excel.ExcelFileFormat). Please specify the fully qualified class name or remove dev.mauch.spark.excel.v2.ExcelDataSource from the classpath. SQLSTATE: 42710
2 weeks ago
Just wondering ,why do you need it cluster wide instead of notebook or session level? Any specific job requirement? Notebook level works!
2 weeks ago
Define it once on cluster VS define it in each notebook session. Get Consistency across all notebooks. And why should it not work on cluster level?
2 weeks ago
Hi @der ,
Most likely because spark.databricks.sql.excel.enabled is a Databricks SQL/session-level internal config, not a SparkConf setting.
This specific key appears to be read from the Spark SQL session config, so setting it after the notebook session starts works:
spark.conf.set("spark.databricks.sql.excel.enabled", "false")
spark.databricks.sql.excel.enabled false
is ignored when Databricks initializes the SQL session
a week ago
Thx @szymon_dybczak
I hope we can unset this from all our code base as soon as they implement the start address change.