cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How do you read an Excel spreadsheet with Databricks

LPlates
New Contributor III

My cluster has Scala 2.12

I've installed Maven Library com.crealytics:spark-excel_2.12:0.14.0

I get an error

java.lang.IllegalStateException: Cannot get a STRING value from a NUMERIC cell

when trying to execute the following

%python

excelFileName="/mnt/dlstor/raw/sales/Budget vols FY 21-22 FY 22-23.xlsx"

excelWorksheetName="'22-23'!A2"

isHeaderOn="true"

isInferSchemaOn="true"

df = spark.read.format("com.crealytics.spark.excel") \

        .option("header", isHeaderOn) \

        .option("inferSchema", isInferSchemaOn) \

        .option("treatEmptyValuesAsNulls", "true") \

        .option("dataAddress", excelWorksheetName) \

        .load(excelFileName)

display(df)

I couldn't find a similar post. Any suggestions would be gratefully received.

Regards

1 ACCEPTED SOLUTION

Accepted Solutions

LPlates
New Contributor III

Okay, I've 'resolved' my issue

I changed the isHeaderOn="true" to isHeaderOn="false" and was able to load the dataframe.

View solution in original post

3 REPLIES 3

LPlates
New Contributor III

Okay, I've 'resolved' my issue

I changed the isHeaderOn="true" to isHeaderOn="false" and was able to load the dataframe.

Kaniz
Community Manager
Community Manager

Awesome!

Thank you for sharing the solution with us @Mike Knox​!

Anonymous
Not applicable

Another way also help for your case is usign Pandas to read excel then convert Pandas Dataframe to Pyspark Dataframe 🙂

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.