topic How do you read an Excel spreadsheet with Databricks in Data Engineering

How do you read an Excel spreadsheet with Databricks

LPlates — Thu, 10 Mar 2022 11:15:28 GMT

My cluster has Scala 2.12

I've installed Maven Library com.crealytics:spark-excel_2.12:0.14.0

I get an error

java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell

when trying to execute the following

%python

excelFileName="/mnt/dlstor/raw/sales/Budget vols FY 21-22 FY 22-23.xlsx"

excelWorksheetName="'22-23'!A2"

isHeaderOn="true"

isInferSchemaOn="true"

df = spark.read.format("com.crealytics.spark.excel") \

.option("header", isHeaderOn) \

.option("inferSchema", isInferSchemaOn) \

.option("treatEmptyValuesAsNulls", "true") \

.option("dataAddress", excelWorksheetName) \

.load(excelFileName)

display(df)

I couldn't find a similar post. Any suggestions would be gratefully received.

Regards

LPlates — Fri, 11 Mar 2022 15:39:34 GMT

Okay, I've 'resolved' my issue

I changed the isHeaderOn="true" to isHeaderOn="false" and was able to load the dataframe.

Anonymous — Sat, 19 Nov 2022 10:12:13 GMT

Another way also help for your case is usign Pandas to read excel then convert Pandas Dataframe to Pyspark Dataframe 🙂