02-04-2022 01:23 PM
I am using Azure DBX 9.1 LTS and successfully installed the following library on the cluster using Maven coordinates:
com.crealytics:spark-excel_2.12:3.2.0_0.16.0
When I executed the following line:
excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)
I get the following exception thrown:
Py4JJavaError: An error occurred while calling o438.load.
: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.<init>(UnsynchronizedByteArrayOutputStream.java:51)
at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110)
at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172)
at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:107)
at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:122)
at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:72)
at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:43)
at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80)
at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:388)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
When I tried to install the following dependency library using Azure Databricks Cluster Libraries web UI using the following Maven coordinates, it failed.
org.apache.commons:commons-io:2.11.0
Questions:
Thanks.
Update 01:
Update 02:
Py4JJavaError: An error occurred while calling o386.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
Update 03:
Update 04:
excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)
display(excelSDF)
NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V
04-15-2022 08:24 AM
Using the older library as suggested worked in DBR 10.4 LTS. Thank you.
On a separate note, my curiosity in understanding the changes in the underlying datasource v2 API is ongoing. 😀
03-15-2022 09:41 PM
This is the library dependency. You need to exclude the dependency to get it working. @Jim Huang
04-14-2022 03:48 PM
Thank you for providing another option to address this issue.
I have follow up questions:
Thanks!
11-09-2023 04:32 AM
Hi @dataslicer were you able to solve this issue?
I am using 9.1 lts databricks version with Spark 3.1.2 and scala 2.12. I have installed com.crealytics:spark-excel-2.12.17-3.1.2_2.12:3.1.2_0.18.1. It was working fine but now facing same exception as you. Could you please help..
Thank you.
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now