Databricks Community

PraveenSaini · ‎05-07-2019

0

I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory) But file is available. Please help me on this code.

val df = spark.read.format("com.crealytics.spark.excel")

.option("location", "/FileStore/tables/Airline.xlsx")

.option("useHeader", "true")

.option("treatEmptyValuesAsNulls", "false")

.option("inferSchema", "false")

.option("addColorColumns", "false")

.load("/FileStore/tables/Airline.xlsx")

ashish1 · ‎05-07-2019

Hi,

You can try -

val df = spark.read
          .format("org.zuinnote.spark.office.excel")
          .option("read.spark.useHeader", "true")  
          .load("dbfs:/FileStore/tables/Airline.xlsx")

MounicaVemulapa · ‎06-11-2019

@ashish@databricks.com.. Hi Ashish... I'm getting error java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileFormat.$init$(Lorg/apache/spark/sql/execution/datasources/FileFormat;) when I used your logic..

I have installed spark_hadoopoffice_ds_2_12_1_3_1.jar for the above class.. Please help

darkfenixx1 · ‎06-27-2019

 I have the same problem, did you solve it?

ttration · ‎09-24-2019

For me the problem was the library was for scala 2.12 and my cluster was running scale 2.11 (should've been spark_hadoopoffice_ds_2_11_1_3_1)

Datab · ‎09-14-2023

No thanks

MounicaVemulapa · ‎06-11-2019

@praveen.. Hi Praveen.. Did you get any workaround for this.. I'm facing the same issue.

Saphira · ‎06-13-2019

There should be nothing wrong with your code, the same code (except for the file name) works for me. Can you confirm that using: dbutils.fs.ls("dbfs:/FileStore/tables") prints at least your FileInfo, and that your cluster shows status 'installed' for the library with maven coordinates "com.crealytics:spark-excel_2.11:0.11.1" ?

vikrantm · ‎09-24-2019

also tried with suggested library, but installation of "com.crealytics:spark-excel_2.11:0.11.1" is failing continuously. (tried for latest versions also).

Saphira · ‎09-24-2019

Does it give the error while installing : ?

AttributeError: module 'lib' has no attribute 'SSL_ST_INIT'

vikrantm · ‎09-24-2019

Yes it gives below error while installing on cluster :

Library resolution failed. Cause: java.lang.RuntimeException: org.tukaani:xz download failed. at com.databricks.libraries.server.MavenInstaller.$anonfun$resolveDependencyPaths$5(MavenLibraryResolver.scala:253) at scala.collection.MapLike.getOrElse(MapLike.scala:131) at scala.collection.MapLike.getOrElse$(MapLike.scala:129) at

.

LeiSun1992 · ‎11-18-2019

(1) login in your databricks account, click clusters, then double click the cluster you want to work with.

(2) click Libraries , click Install New

(3) click Maven,In Coordinates , paste this line

 com.crealytics:spark-excel_2.11:0.12.2

to intall libs.

(4) After the lib installation is over, open a notebook to read excel file as follow code shows, it can work!

val sparkDF = spark.read.format("com.crealytics.spark.excel")
.option("useHeader", "true")
.option("inferSchema", "true")
.load("/mnt/lsTest/test.xlsx")<br>display(sparkDF.collect())

<br>

LeiSun1992 · ‎11-18-2019

The lib u use is out of date.

you have to install the latest lib.

(1) login in your databricks account, click clusters, then double click the cluster you want to work with.

(2) click Libraries , click Install New

(3) click Maven,In Coordinates , paste this line

com.crealytics:spark-excel_2.11:0.12.2

to intall libs.

SakthivelNachim · ‎02-23-2020

This works as expected with com.crealytics:spark-excel_2.11:0.12.5 libray.

val df_excel= spark.read. format("com.crealytics.spark.excel"). option("useHeader", "true"). option("treatEmptyValuesAsNulls", "false"). option("inferSchema", "false"). option("addColorColumns", "false").load(file_path) display(df_excel)