<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Error when reading Excel file: &amp;quot;java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument&amp;quot; in Machine Learning</title>
    <link>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21638#M1195</link>
    <description>&lt;P&gt;Hi @Mohammad Saber​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I reproduced it and it seems an error related to the spark-excel version.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can you try to install the latest version: com.crealytics:spark-excel_2.12:3.3.1_0.18.5&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This should fix the issue.&lt;/P&gt;</description>
    <pubDate>Sat, 19 Nov 2022 12:18:27 GMT</pubDate>
    <dc:creator>DavideAnghileri</dc:creator>
    <dc:date>2022-11-19T12:18:27Z</dc:date>
    <item>
      <title>Error when reading Excel file: "java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument"</title>
      <link>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21636#M1193</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to read an Excel file by:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;filepath_xlsx = "dbfs:/FileStore/data.xlsx"
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
sampleDF = (spark.read.format("com.crealytics.spark.excel")
&amp;nbsp;
 .option("Header", "true") 
&amp;nbsp;
 .option("inferSchema", "false") 
&amp;nbsp;
 .option("treatEmptyValuesAsNulls", "false") 
&amp;nbsp;
 .load(filepath_xlsx)
&amp;nbsp;
      )&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However, I get the error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
Py4JJavaError                             Traceback (most recent call last)
&amp;nbsp;
&amp;lt;command-3204971640072140&amp;gt; in &amp;lt;cell line: 2&amp;gt;()
&amp;nbsp;
      1 # Read excel file
&amp;nbsp;
----&amp;gt; 2 sample1DF = (spark.read.format("com.crealytics.spark.excel")
&amp;nbsp;
      3   .option("Header", "true")
&amp;nbsp;
      4   .option("inferSchema", "false")
&amp;nbsp;
      5   .option("treatEmptyValuesAsNulls", "false")
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
&amp;nbsp;
     46             start = time.perf_counter()
&amp;nbsp;
     47             try:
&amp;nbsp;
---&amp;gt; 48                 res = func(*args, **kwargs)
&amp;nbsp;
     49                 logger.log_success(
&amp;nbsp;
     50                     module_name, class_name, function_name, time.perf_counter() - start, signature
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
&amp;nbsp;
    175         self.options(**options)
&amp;nbsp;
    176         if isinstance(path, str):
&amp;nbsp;
--&amp;gt; 177             return self._df(self._jreader.load(path))
&amp;nbsp;
    178         elif path is not None:
&amp;nbsp;
    179             if type(path) != list:
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args)
&amp;nbsp;
   1319 
&amp;nbsp;
   1320         answer = self.gateway_client.send_command(command)
&amp;nbsp;
-&amp;gt; 1321         return_value = get_return_value(
&amp;nbsp;
   1322             answer, self.gateway_client, self.target_id, self.name)
&amp;nbsp;
   1323 
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
&amp;nbsp;
    194     def deco(*a: Any, **kw: Any) -&amp;gt; Any:
&amp;nbsp;
    195         try:
&amp;nbsp;
--&amp;gt; 196             return f(*a, **kw)
&amp;nbsp;
    197         except Py4JJavaError as e:
&amp;nbsp;
    198             converted = convert_exception(e.java_exception)
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
&amp;nbsp;
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
&amp;nbsp;
    325             if answer[1] == REFERENCE_TYPE:
&amp;nbsp;
--&amp;gt; 326                 raise Py4JJavaError(
&amp;nbsp;
    327                     "An error occurred while calling {0}{1}{2}.\n".
&amp;nbsp;
    328                     format(target_id, ".", name), value)
&amp;nbsp;
&amp;nbsp;
&amp;nbsp;
Py4JJavaError: An error occurred while calling o918.load.
&amp;nbsp;
: java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument
&amp;nbsp;
	at shadeio.poi.xssf.usermodel.XSSFVMLDrawing.read(XSSFVMLDrawing.java:135)
&amp;nbsp;
	at shadeio.poi.xssf.usermodel.XSSFVMLDrawing.&amp;lt;init&amp;gt;(XSSFVMLDrawing.java:123)
&amp;nbsp;
	at shadeio.poi.ooxml.POIXMLFactory.createDocumentPart(POIXMLFactory.java:61)
&amp;nbsp;
	at shadeio.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:661)
&amp;nbsp;
	at shadeio.poi.ooxml.POIXMLDocumentPart.read(POIXMLDocumentPart.java:678)
&amp;nbsp;
	at shadeio.poi.ooxml.POIXMLDocument.load(POIXMLDocument.java:165)
&amp;nbsp;
	at shadeio.poi.xssf.usermodel.XSSFWorkbook.&amp;lt;init&amp;gt;(XSSFWorkbook.java:274)
&amp;nbsp;
	at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:118)
&amp;nbsp;
	at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.create(XSSFWorkbookFactory.java:98)
&amp;nbsp;
	at shadeio.poi.xssf.usermodel.XSSFWorkbookFactory.create(XSSFWorkbookFactory.java:36)
&amp;nbsp;
	at shadeio.poi.ss.usermodel.WorkbookFactory.lambda$create$2(WorkbookFactory.java:224)
&amp;nbsp;
	at shadeio.poi.ss.usermodel.WorkbookFactory.wp(WorkbookFactory.java:329)
&amp;nbsp;
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:224)
&amp;nbsp;
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:185)
&amp;nbsp;
	at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55)
&amp;nbsp;
	at scala.Option.fold(Option.scala:251)
&amp;nbsp;
	at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55)
&amp;nbsp;
	at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16)
&amp;nbsp;
	at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15)
&amp;nbsp;
	at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50)
&amp;nbsp;
	at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32)
&amp;nbsp;
	at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32)
&amp;nbsp;
	at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104)
&amp;nbsp;
	at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103)
&amp;nbsp;
	at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172)
&amp;nbsp;
	at scala.Option.getOrElse(Option.scala:189)
&amp;nbsp;
	at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171)
&amp;nbsp;
	at com.crealytics.spark.excel.ExcelRelation.&amp;lt;init&amp;gt;(ExcelRelation.scala:36)
&amp;nbsp;
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36)
&amp;nbsp;
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13)
&amp;nbsp;
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8)
&amp;nbsp;
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385)
&amp;nbsp;
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:368)
&amp;nbsp;
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:324)
&amp;nbsp;
	at scala.Option.getOrElse(Option.scala:189)
&amp;nbsp;
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:324)
&amp;nbsp;
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237)
&amp;nbsp;
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
&amp;nbsp;
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
&amp;nbsp;
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
&amp;nbsp;
	at java.lang.reflect.Method.invoke(Method.java:498)
&amp;nbsp;
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
&amp;nbsp;
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
&amp;nbsp;
	at py4j.Gateway.invoke(Gateway.java:306)
&amp;nbsp;
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
&amp;nbsp;
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
&amp;nbsp;
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
&amp;nbsp;
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
&amp;nbsp;
	at java.lang.Thread.run(Thread.java:750)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I installed "com.crealytics:spark-excel_2.12:3.2.1_0.16.4" on the cluster following this &lt;A href="https://community.databricks.com/s/question/0D53f00001HKHeOCAX/how-to-read-excel-file-using-databricks" alt="https://community.databricks.com/s/question/0D53f00001HKHeOCAX/how-to-read-excel-file-using-databricks" target="_blank"&gt;thread&lt;/A&gt;. Still getting error.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;Environment: &lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Single node cluster&lt;/LI&gt;&lt;LI&gt;11.2 ML (includes Apache Spark 3.3.0, Scala 2.12)&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any idea to solve this issue?&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2022 08:03:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21636#M1193</guid>
      <dc:creator>Mado</dc:creator>
      <dc:date>2022-11-19T08:03:26Z</dc:date>
    </item>
    <item>
      <title>Re: Error when reading Excel file: "java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument"</title>
      <link>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21637#M1194</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Which language did you use to execute that code? Is it ok if you share me you data file so that I can reproduce from my side.​&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2022 08:14:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21637#M1194</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-11-19T08:14:44Z</dc:date>
    </item>
    <item>
      <title>Re: Error when reading Excel file: "java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument"</title>
      <link>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21638#M1195</link>
      <description>&lt;P&gt;Hi @Mohammad Saber​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I reproduced it and it seems an error related to the spark-excel version.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can you try to install the latest version: com.crealytics:spark-excel_2.12:3.3.1_0.18.5&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This should fix the issue.&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2022 12:18:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21638#M1195</guid>
      <dc:creator>DavideAnghileri</dc:creator>
      <dc:date>2022-11-19T12:18:27Z</dc:date>
    </item>
    <item>
      <title>Re: Error when reading Excel file: "java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument"</title>
      <link>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21639#M1196</link>
      <description>&lt;P&gt;@davide.anghileri davide.anghileri​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks a lot. It worked. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried to read another Excel file (with several sheets &amp;amp; multi-row header), and this time I get the error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;org.apache.poi.ooxml.POIXMLException: Strict OOXML isn't currently supported, please see bug #57699
&amp;nbsp;
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
&amp;lt;command-496434324351845&amp;gt; in &amp;lt;cell line: 4&amp;gt;()
      2 
      3 # Read excel file
----&amp;gt; 4 sampleDF_xlsx = (spark.read.format("com.crealytics.spark.excel")
      5   .option("Header", "true")
      6   .option("inferSchema", "false")
&amp;nbsp;
/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
     46             start = time.perf_counter()
     47             try:
---&amp;gt; 48                 res = func(*args, **kwargs)
     49                 logger.log_success(
     50                     module_name, class_name, function_name, time.perf_counter() - start, signature
&amp;nbsp;
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
    175         self.options(**options)
    176         if isinstance(path, str):
--&amp;gt; 177             return self._df(self._jreader.load(path))
    178         elif path is not None:
    179             if type(path) != list:
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1319 
   1320         answer = self.gateway_client.send_command(command)
-&amp;gt; 1321         return_value = get_return_value(
   1322             answer, self.gateway_client, self.target_id, self.name)
   1323 
&amp;nbsp;
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    194     def deco(*a: Any, **kw: Any) -&amp;gt; Any:
    195         try:
--&amp;gt; 196             return f(*a, **kw)
    197         except Py4JJavaError as e:
    198             converted = convert_exception(e.java_exception)
&amp;nbsp;
/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--&amp;gt; 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)
&amp;nbsp;
Py4JJavaError: An error occurred while calling o634.load.
: org.apache.poi.ooxml.POIXMLException: Strict OOXML isn't currently supported, please see bug #57699
	at org.apache.poi.ooxml.POIXMLDocumentPart.getPartFromOPCPackage(POIXMLDocumentPart.java:757)
	at org.apache.poi.ooxml.POIXMLDocumentPart.&amp;lt;init&amp;gt;(POIXMLDocumentPart.java:151)
	at org.apache.poi.ooxml.POIXMLDocumentPart.&amp;lt;init&amp;gt;(POIXMLDocumentPart.java:141)
	at org.apache.poi.ooxml.POIXMLDocument.&amp;lt;init&amp;gt;(POIXMLDocument.java:60)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.&amp;lt;init&amp;gt;(XSSFWorkbook.java:254)
	at org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:118)
	at org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.create(XSSFWorkbookFactory.java:98)
	at org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.create(XSSFWorkbookFactory.java:36)
	at org.apache.poi.ss.usermodel.WorkbookFactory.lambda$create$2(WorkbookFactory.java:224)
	at org.apache.poi.ss.usermodel.WorkbookFactory.wp(WorkbookFactory.java:329)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:224)
	at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:185)
	at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$3(WorkbookReader.scala:107)
	at scala.Option.fold(Option.scala:251)
	at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:107)
	at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:34)
	at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:33)
	at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:92)
	at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:48)
	at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:48)
	at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:121)
	at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:120)
	at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:189)
	at scala.Option.getOrElse(Option.scala:189)
	at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:188)
	at com.crealytics.spark.excel.ExcelRelation.&amp;lt;init&amp;gt;(ExcelRelation.scala:52)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:52)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:29)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:24)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:368)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:324)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:324)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)
&amp;nbsp;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2022 12:54:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21639#M1196</guid>
      <dc:creator>Mado</dc:creator>
      <dc:date>2022-11-19T12:54:40Z</dc:date>
    </item>
    <item>
      <title>Re: Error when reading Excel file: "java.lang.NoClassDefFoundError: shadeio/poi/schemas/vmldrawing/XmlDocument"</title>
      <link>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21640#M1197</link>
      <description>&lt;P&gt;For this dataset, I also tried binary file reading as below:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;xldf_xlsx = (
  spark.read.format("binaryFile")
 .option("pathGlobFilter", "*.xls*")
 .load(filepath_xlsx)
)
&amp;nbsp;
 excel_content = xldf_xlsx.head(1)[0].content
file_like_obj = io.BytesIO(excel_content)
xl = pd.ExcelFile(file_like_obj, engine="openpyxl")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;And the last line gives the error:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
&amp;lt;command-496434324351842&amp;gt; in &amp;lt;cell line: 3&amp;gt;()
      1 excel_content = xldf_xlsx.head(1)[0].content
      2 file_like_obj = io.BytesIO(excel_content)
----&amp;gt; 3 xl = pd.ExcelFile(file_like_obj, engine="openpyxl")
&amp;nbsp;
/databricks/python/lib/python3.9/site-packages/pandas/io/excel/_base.py in __init__(self, path_or_buffer, engine, storage_options)
   1231         self.storage_options = storage_options
   1232 
-&amp;gt; 1233         self._reader = self._engines[engine](self._io, storage_options=storage_options)
   1234 
   1235     def __fspath__(self):
&amp;nbsp;
/databricks/python/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py in __init__(self, filepath_or_buffer, storage_options)
    520         """
    521         import_optional_dependency("openpyxl")
--&amp;gt; 522         super().__init__(filepath_or_buffer, storage_options=storage_options)
    523 
    524     @property
&amp;nbsp;
/databricks/python/lib/python3.9/site-packages/pandas/io/excel/_base.py in __init__(self, filepath_or_buffer, storage_options)
    418             self.handles.handle.seek(0)
    419             try:
--&amp;gt; 420                 self.book = self.load_workbook(self.handles.handle)
    421             except Exception:
    422                 self.close()
&amp;nbsp;
/databricks/python/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py in load_workbook(self, filepath_or_buffer)
    531         from openpyxl import load_workbook
    532 
--&amp;gt; 533         return load_workbook(
    534             filepath_or_buffer, read_only=True, data_only=True, keep_links=False
    535         )
&amp;nbsp;
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/openpyxl/reader/excel.py in load_workbook(filename, read_only, keep_vba, data_only, keep_links)
    315     reader = ExcelReader(filename, read_only, keep_vba,
    316                         data_only, keep_links)
--&amp;gt; 317     reader.read()
    318     return reader.wb
&amp;nbsp;
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/openpyxl/reader/excel.py in read(self)
    276         self.read_manifest()
    277         self.read_strings()
--&amp;gt; 278         self.read_workbook()
    279         self.read_properties()
    280         self.read_theme()
&amp;nbsp;
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/openpyxl/reader/excel.py in read_workbook(self)
    148         wb_part = _find_workbook_part(self.package)
    149         self.parser = WorkbookParser(self.archive, wb_part.PartName[1:], keep_links=self.keep_links)
--&amp;gt; 150         self.parser.parse()
    151         wb = self.parser.wb
    152         wb._sheets = []
&amp;nbsp;
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/openpyxl/reader/workbook.py in parse(self)
     47         src = self.archive.read(self.workbook_part_name)
     48         node = fromstring(src)
---&amp;gt; 49         package = WorkbookPackage.from_tree(node)
     50         if package.properties.date1904:
     51             self.wb.epoch = CALENDAR_MAC_1904
&amp;nbsp;
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py in from_tree(cls, node)
     81             if hasattr(desc, 'from_tree'):
     82                 #descriptor manages conversion
---&amp;gt; 83                 obj = desc.from_tree(el)
     84             else:
     85                 if hasattr(desc.expected_type, "from_tree"):
&amp;nbsp;
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/openpyxl/descriptors/sequence.py in from_tree(self, node)
     83 
     84     def from_tree(self, node):
---&amp;gt; 85         return [self.expected_type.from_tree(el) for el in node]
     86 
     87 
&amp;nbsp;
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/openpyxl/descriptors/sequence.py in &amp;lt;listcomp&amp;gt;(.0)
     83 
     84     def from_tree(self, node):
---&amp;gt; 85         return [self.expected_type.from_tree(el) for el in node]
     86 
     87 
&amp;nbsp;
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py in from_tree(cls, node)
    101                 attrib[tag] = obj
    102 
--&amp;gt; 103         return cls(**attrib)
    104 
    105 
&amp;nbsp;
TypeError: __init__() missing 1 required positional argument: 'id'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 19 Nov 2022 12:56:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/machine-learning/error-when-reading-excel-file-quot-java-lang/m-p/21640#M1197</guid>
      <dc:creator>Mado</dc:creator>
      <dc:date>2022-11-19T12:56:34Z</dc:date>
    </item>
  </channel>
</rss>

