02-04-2022 01:23 PM
I am using Azure DBX 9.1 LTS and successfully installed the following library on the cluster using Maven coordinates:
com.crealytics:spark-excel_2.12:3.2.0_0.16.0
When I executed the following line:
excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)
I get the following exception thrown:
Py4JJavaError: An error occurred while calling o438.load.
: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.<init>(UnsynchronizedByteArrayOutputStream.java:51)
at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110)
at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172)
at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:107)
at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:122)
at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:72)
at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:43)
at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80)
at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:388)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
When I tried to install the following dependency library using Azure Databricks Cluster Libraries web UI using the following Maven coordinates, it failed.
org.apache.commons:commons-io:2.11.0
Questions:
Thanks.
Update 01:
Update 02:
Py4JJavaError: An error occurred while calling o386.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
Update 03:
Update 04:
excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)
display(excelSDF)
NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V
04-15-2022 08:24 AM
Using the older library as suggested worked in DBR 10.4 LTS. Thank you.
On a separate note, my curiosity in understanding the changes in the underlying datasource v2 API is ongoing. 😀
03-15-2022 09:41 PM
This is the library dependency. You need to exclude the dependency to get it working. @Jim Huang
04-14-2022 03:48 PM
Thank you for providing another option to address this issue.
I have follow up questions:
Thanks!
11-09-2023 04:32 AM
Hi @dataslicer were you able to solve this issue?
I am using 9.1 lts databricks version with Spark 3.1.2 and scala 2.12. I have installed com.crealytics:spark-excel-2.12.17-3.1.2_2.12:3.1.2_0.18.1. It was working fine but now facing same exception as you. Could you please help..
Thank you.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group