<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to read excel file using databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28162#M19985</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;You can try -&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val df = spark.read
          .format("org.zuinnote.spark.office.excel")
          .option("read.spark.useHeader", "true")  
          .load("dbfs:/FileStore/tables/Airline.xlsx") &lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 07 May 2019 14:46:39 GMT</pubDate>
    <dc:creator>ashish1</dc:creator>
    <dc:date>2019-05-07T14:46:39Z</dc:date>
    <item>
      <title>How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28161#M19984</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;0&lt;/P&gt;
&lt;P&gt;I have a excel file as source file and i want to read data from excel file and convert data in data frame using databricks. I have already added maven dependence for Excel file format. when i a tring below code it is giving error .(Error: java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory) But file is available. Please help me on this code.&lt;/P&gt;
&lt;P&gt; val df = spark.read.format("com.crealytics.spark.excel")&lt;/P&gt;
&lt;P&gt; .option("location", "/FileStore/tables/Airline.xlsx")&lt;/P&gt;
&lt;P&gt; .option("useHeader", "true") &lt;/P&gt;
&lt;P&gt;.option("treatEmptyValuesAsNulls", "false")&lt;/P&gt;
&lt;P&gt; .option("inferSchema", "false")&lt;/P&gt;
&lt;P&gt; .option("addColorColumns", "false")&lt;/P&gt;
&lt;P&gt; .load("/FileStore/tables/Airline.xlsx")&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 May 2019 12:14:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28161#M19984</guid>
      <dc:creator>PraveenSaini</dc:creator>
      <dc:date>2019-05-07T12:14:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28162#M19985</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;You can try -&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val df = spark.read
          .format("org.zuinnote.spark.office.excel")
          .option("read.spark.useHeader", "true")  
          .load("dbfs:/FileStore/tables/Airline.xlsx") &lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 May 2019 14:46:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28162#M19985</guid>
      <dc:creator>ashish1</dc:creator>
      <dc:date>2019-05-07T14:46:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28163#M19986</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@praveen.. Hi Praveen.. Did you get any workaround for this.. I'm facing the same issue.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2019 08:36:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28163#M19986</guid>
      <dc:creator>MounicaVemulapa</dc:creator>
      <dc:date>2019-06-11T08:36:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28164#M19987</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@ashish@databricks.com.. Hi Ashish... I'm getting error java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.FileFormat.$init$(Lorg/apache/spark/sql/execution/datasources/FileFormat;) when I used your logic.. &lt;/P&gt;
&lt;P&gt;I have installed spark_hadoopoffice_ds_2_12_1_3_1.jar for the above class.. Please help&lt;/P&gt;
&lt;P&gt;&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2019 08:39:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28164#M19987</guid>
      <dc:creator>MounicaVemulapa</dc:creator>
      <dc:date>2019-06-11T08:39:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28165#M19988</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;There should be nothing wrong with your code, the same code (except for the file name) works for me. Can you confirm that using: dbutils.fs.ls("dbfs:/FileStore/tables") prints at least your FileInfo, and that your cluster shows status 'installed' for the library with maven coordinates "com.crealytics:spark-excel_2.11:0.11.1" ? &lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2019 09:52:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28165#M19988</guid>
      <dc:creator>Saphira</dc:creator>
      <dc:date>2019-06-13T09:52:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28166#M19989</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt; I have the same problem, did you solve it?&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jun 2019 23:11:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28166#M19989</guid>
      <dc:creator>darkfenixx1</dc:creator>
      <dc:date>2019-06-27T23:11:30Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28167#M19990</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;also tried with suggested library, but installation of "com.crealytics:spark-excel_2.11:0.11.1" is failing continuously. (tried for latest versions also).&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Sep 2019 09:42:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28167#M19990</guid>
      <dc:creator>vikrantm</dc:creator>
      <dc:date>2019-09-24T09:42:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28168#M19991</link>
      <description>&lt;P&gt;Does it give the error while installing : ? &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;AttributeError: module 'lib' has no attribute 'SSL_ST_INIT'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Sep 2019 09:50:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28168#M19991</guid>
      <dc:creator>Saphira</dc:creator>
      <dc:date>2019-09-24T09:50:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28169#M19992</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Yes it gives below error while installing on cluster :&lt;/P&gt;
&lt;P&gt;Library resolution failed. Cause: java.lang.RuntimeException: org.tukaani:xz download failed. at com.databricks.libraries.server.MavenInstaller.$anonfun$resolveDependencyPaths$5(MavenLibraryResolver.scala:253) at scala.collection.MapLike.getOrElse(MapLike.scala:131) at scala.collection.MapLike.getOrElse$(MapLike.scala:129) at&lt;/P&gt;
&lt;P&gt;.&lt;/P&gt;
&lt;P&gt;.&lt;/P&gt;
&lt;P&gt;.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Sep 2019 10:14:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28169#M19992</guid>
      <dc:creator>vikrantm</dc:creator>
      <dc:date>2019-09-24T10:14:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28170#M19993</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;For me the problem was the library was for scala 2.12 and my cluster was running scale 2.11 (should've been spark_hadoopoffice_ds_2_11_1_3_1)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Sep 2019 13:00:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28170#M19993</guid>
      <dc:creator>ttration</dc:creator>
      <dc:date>2019-09-24T13:00:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28171#M19994</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;(1) login in your databricks account, click clusters, then double click the cluster you want to work with.&lt;/P&gt;
&lt;P&gt; (2) click Libraries , click Install New&lt;/P&gt;
&lt;P&gt;(3) click Maven,In Coordinates , paste this line&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt; com.crealytics:spark-excel_2.11:0.12.2&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt; to intall libs.&lt;/P&gt;
&lt;P&gt;(4) After the lib installation is over, open a notebook to read excel file as follow code shows, it can work!&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;val sparkDF = spark.read.format("com.crealytics.spark.excel")
.option("useHeader", "true")
.option("inferSchema", "true")
.load("/mnt/lsTest/test.xlsx")&amp;lt;br&amp;gt;display(sparkDF.collect())&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;PRE&gt;&lt;CODE&gt;&amp;lt;br&amp;gt;&lt;/CODE&gt;&lt;/PRE&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Nov 2019 02:52:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28171#M19994</guid>
      <dc:creator>LeiSun1992</dc:creator>
      <dc:date>2019-11-19T02:52:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28172#M19995</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;The lib u use is out of date.&lt;/P&gt;
&lt;P&gt;you have to install the latest lib. &lt;/P&gt;
&lt;P&gt;(1) login in your databricks account, click clusters, then double click the cluster you want to work with.&lt;/P&gt;
&lt;P&gt;(2) click Libraries , click Install New&lt;/P&gt;
&lt;P&gt;(3) click Maven,In Coordinates , paste this line&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;com.crealytics:spark-excel_2.11:0.12.2&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;to intall libs.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Nov 2019 02:55:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28172#M19995</guid>
      <dc:creator>LeiSun1992</dc:creator>
      <dc:date>2019-11-19T02:55:36Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28173#M19996</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt; This works as expected with com.crealytics:spark-excel_2.11:0.12.5 libray.&lt;/P&gt;
&lt;P&gt; val df_excel= spark.read. format("com.crealytics.spark.excel"). option("useHeader", "true"). option("treatEmptyValuesAsNulls", "false"). option("inferSchema", "false"). option("addColorColumns", "false").load(file_path) display(df_excel)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 23 Feb 2020 13:46:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28173#M19996</guid>
      <dc:creator>SakthivelNachim</dc:creator>
      <dc:date>2020-02-23T13:46:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28174#M19997</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;dropping the ".xlsx" from the file path worked for me!&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Jul 2020 07:32:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28174#M19997</guid>
      <dc:creator>PrekshaPunwani</dc:creator>
      <dc:date>2020-07-22T07:32:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28175#M19998</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Don’t worry you have several other options to open Excel file without Excel. Here are those options, so please check it out..!&lt;/P&gt;
&lt;P&gt;&lt;A target="_blank" href="https://"&gt;http://www.repairmsexcel.com/blog/open-excel-files-without-excel&lt;/A&gt;&lt;B&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&amp;nbsp;&lt;/B&gt;&lt;/P&gt;&lt;B&gt;&lt;/B&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Dec 2020 08:37:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28175#M19998</guid>
      <dc:creator>edwards142</dc:creator>
      <dc:date>2020-12-11T08:37:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28176#M19999</link>
      <description>&lt;P&gt;First of all check your spark and scala version.&lt;/P&gt;&lt;P&gt;Then install the library with Maven coordinates according to your spark and scala version.&lt;/P&gt;&lt;P&gt;Check further on this link to know more about the Maven coordinates  to use:&lt;/P&gt;&lt;P&gt;&lt;A href="https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.12" alt="https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.12" target="_blank"&gt;https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.12&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Selected Cluster --&amp;gt; Libraries --&amp;gt; Install New --&amp;gt; Maven --&amp;gt; &lt;/P&gt;&lt;P&gt;Coordinates- com.crealytics:spark-excel_2.12:3.2.1_0.16.4&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For pyspark use the following code:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;df2 = spark.read.format("com.crealytics.spark.excel").option("header", "true").option("inferSchema", "true").load("dbfs:/FileStore/shared_uploads/abc@gmail.com/book.xlsx")
display(df2)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 17 May 2022 14:00:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28176#M19999</guid>
      <dc:creator>Devarsh</dc:creator>
      <dc:date>2022-05-17T14:00:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28177#M20000</link>
      <description>&lt;P&gt;Another way also help for your case is usign Pandas to read excel then convert Pandas Dataframe to Pyspark Dataframe &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2022 10:16:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28177#M20000</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-19T10:16:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28178#M20001</link>
      <description>&lt;P&gt;This really worked. However I see this error for larger excel files.&lt;/P&gt;&lt;P&gt;shadeio.poi.util.RecordFormatException: Tried to allocate an array of length 208,933,193, but the maximum length for this record type is 100,000,000.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Nov 2022 19:25:52 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/28178#M20001</guid>
      <dc:creator>Ananth</dc:creator>
      <dc:date>2022-11-30T19:25:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/44886#M27743</link>
      <description>&lt;P&gt;No thanks&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2023 05:44:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/44886#M27743</guid>
      <dc:creator>Datab</dc:creator>
      <dc:date>2023-09-15T05:44:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to read excel file using databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/44888#M27744</link>
      <description>&lt;PRE&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;# Example: Show the first 5 rows of the DataFrame&lt;/SPAN&gt;&lt;BR /&gt;df.head()&lt;/SPAN&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;# For Scala&lt;BR /&gt;// Example: Show the first 5 rows of the DataFrame&lt;BR /&gt;df.show(5)&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P class=""&gt;&lt;STRONG&gt;Step 7: Perform Data Visualization (Optional) If you wish to visualize the data, Databricks provides various plotting libraries and visualization tools to present your findings effectively.&lt;/STRONG&gt;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Step 8: Save or Export Results (Optional) After performing your analysis, if you want to save the processed data or export the results, Databricks supports various formats such as Parquet, CSV, JSON, etc.&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2023 05:51:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-read-excel-file-using-databricks/m-p/44888#M27744</guid>
      <dc:creator>Gaurav_Databric</dc:creator>
      <dc:date>2023-09-15T05:51:16Z</dc:date>
    </item>
  </channel>
</rss>

