cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to read excel file using databricks

PraveenSaini
New Contributor

0

I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory) But file is available. Please help me on this code.

val df = spark.read.format("com.crealytics.spark.excel")

.option("location", "/FileStore/tables/Airline.xlsx")

.option("useHeader", "true")

.option("treatEmptyValuesAsNulls", "false")

.option("inferSchema", "false")

.option("addColorColumns", "false")

.load("/FileStore/tables/Airline.xlsx")

34 REPLIES 34

ashish1
New Contributor III

Hi,

You can try -

val df = spark.read
          .format("org.zuinnote.spark.office.excel")
          .option("read.spark.useHeader", "true")  
          .load("dbfs:/FileStore/tables/Airline.xlsx") 

@ashish@databricks.com.. Hi Ashish... I'm getting error java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileFormat.$init$(Lorg/apache/spark/sql/execution/datasources/FileFormat;) when I used your logic..

I have installed spark_hadoopoffice_ds_2_12_1_3_1.jar for the above class.. Please help

 I have the same problem, did you solve it?

For me the problem was the library was for scala 2.12 and my cluster was running scale 2.11 (should've been spark_hadoopoffice_ds_2_11_1_3_1)

Datab
New Contributor II

No thanks 

MounicaVemulapa
New Contributor III

@praveen.. Hi Praveen.. Did you get any workaround for this.. I'm facing the same issue.

Saphira
New Contributor II

There should be nothing wrong with your code, the same code (except for the file name) works for me. Can you confirm that using: dbutils.fs.ls("dbfs:/FileStore/tables") prints at least your FileInfo, and that your cluster shows status 'installed' for the library with maven coordinates "com.crealytics:spark-excel_2.11:0.11.1" ?

vikrantm
New Contributor II

also tried with suggested library, but installation of "com.crealytics:spark-excel_2.11:0.11.1" is failing continuously. (tried for latest versions also).

Saphira
New Contributor II

Does it give the error while installing : ?

AttributeError: module 'lib' has no attribute 'SSL_ST_INIT'

vikrantm
New Contributor II

Yes it gives below error while installing on cluster :

Library resolution failed. Cause: java.lang.RuntimeException: org.tukaani:xz download failed. at com.databricks.libraries.server.MavenInstaller.$anonfun$resolveDependencyPaths$5(MavenLibraryResolver.scala:253) at scala.collection.MapLike.getOrElse(MapLike.scala:131) at scala.collection.MapLike.getOrElse$(MapLike.scala:129) at

.

.

.

LeiSun1992
New Contributor II

(1) login in your databricks account, click clusters, then double click the cluster you want to work with.

(2) click Libraries , click Install New

(3) click Maven,In Coordinates , paste this line

 com.crealytics:spark-excel_2.11:0.12.2

to intall libs.

(4) After the lib installation is over, open a notebook to read excel file as follow code shows, it can work!

val sparkDF = spark.read.format("com.crealytics.spark.excel")
.option("useHeader", "true")
.option("inferSchema", "true")
.load("/mnt/lsTest/test.xlsx")<br>display(sparkDF.collect())
<br>

LeiSun1992
New Contributor II

The lib u use is out of date.

you have to install the latest lib.

(1) login in your databricks account, click clusters, then double click the cluster you want to work with.

(2) click Libraries , click Install New

(3) click Maven,In Coordinates , paste this line

com.crealytics:spark-excel_2.11:0.12.2

to intall libs.

SakthivelNachim
New Contributor II

This works as expected with com.crealytics:spark-excel_2.11:0.12.5 libray.

val df_excel= spark.read. format("com.crealytics.spark.excel"). option("useHeader", "true"). option("treatEmptyValuesAsNulls", "false"). option("inferSchema", "false"). option("addColorColumns", "false").load(file_path) display(df_excel)

PrekshaPunwani
New Contributor II

dropping the ".xlsx" from the file path worked for me!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.