โ02-04-2022 01:23 PM
I am using Azure DBX 9.1 LTS and successfully installed the following library on the cluster using Maven coordinates:
com.crealytics:spark-excel_2.12:3.2.0_0.16.0
When I executed the following line:
excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)
I get the following exception thrown:
Py4JJavaError: An error occurred while calling o438.load.
: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.<init>(UnsynchronizedByteArrayOutputStream.java:51)
at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110)
at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172)
at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:107)
at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:122)
at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:72)
at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:43)
at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80)
at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:388)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
When I tried to install the following dependency library using Azure Databricks Cluster Libraries web UI using the following Maven coordinates, it failed.
org.apache.commons:commons-io:2.11.0
Questions:
Thanks.
Update 01:
Update 02:
Py4JJavaError: An error occurred while calling o386.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
Update 03:
Update 04:
excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)
display(excelSDF)
NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V
โ02-16-2022 02:54 PM
Hi @Jim Huangโ , Please try installing
com.crealytics:spark-excel_2.12:0.14.0
instead of
com.crealytics:spark-excel_2.12:3.2.0_0.16.0
It will work. Please let me know if that helped.
โ02-04-2022 11:24 PM
Hi @Jim Huangโ ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else I will get back to you soon. Thanks.
โ02-16-2022 02:54 PM
Hi @Jim Huangโ , Please try installing
com.crealytics:spark-excel_2.12:0.14.0
instead of
com.crealytics:spark-excel_2.12:3.2.0_0.16.0
It will work. Please let me know if that helped.
โ04-15-2022 08:24 AM
Using the older library as suggested worked in DBR 10.4 LTS. Thank you.
On a separate note, my curiosity in understanding the changes in the underlying datasource v2 API is ongoing. ๐
โ04-17-2022 09:46 PM
Cool!
โ03-15-2022 09:41 PM
This is the library dependency. You need to exclude the dependency to get it working. @Jim Huangโ
โ04-14-2022 03:48 PM
Thank you for providing another option to address this issue.
I have follow up questions:
Thanks!
โ11-09-2023 04:32 AM
Hi @dataslicer were you able to solve this issue?
I am using 9.1 lts databricks version with Spark 3.1.2 and scala 2.12. I have installed com.crealytics:spark-excel-2.12.17-3.1.2_2.12:3.1.2_0.18.1. It was working fine but now facing same exception as you. Could you please help..
Thank you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group