<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Successfully installed Maven:Coordinates:com.crealytics:spark-excel_2.12:3.2.0_0.16.0 on Azure DBX 9.1 LTS runtime but getting error for missing dependency: org.apache.commons.io.IOUtils.byteArray(I) in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29446#M21172</link>
    <description>&lt;P&gt;I am using Azure DBX 9.1 LTS and successfully installed the following library on the cluster using &lt;A href="https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.12" alt="https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.12" target="_blank"&gt;Maven coordinates&lt;/A&gt;: &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;com.crealytics:spark-excel_2.12:3.2.0_0.16.0&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I executed the following line:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I get the following exception thrown:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Py4JJavaError: An error occurred while calling o438.load.
: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
	at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
	at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.&amp;lt;init&amp;gt;(UnsynchronizedByteArrayOutputStream.java:51)
	at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110)
	at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172)
	at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:107)
	at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:122)
	at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:72)
	at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80)
	at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:388)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:251)
	at java.lang.Thread.run(Thread.java:748)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;When I tried to install the following dependency library using Azure Databricks Cluster Libraries web UI using the following &lt;A href="https://mvnrepository.com/artifact/commons-io/commons-io" alt="https://mvnrepository.com/artifact/commons-io/commons-io" target="_blank"&gt;Maven coordinates&lt;/A&gt;, it failed.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;org.apache.commons:commons-io:2.11.0&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Questions:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Is there a safe guard that Databricks is preventing the installation of this package?&lt;/LI&gt;&lt;LI&gt;How can users of the `spark-excel` library address this dependency on Databricks cluster?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Update 01:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;This seems to be a &lt;A href="https://github.com/crealytics/spark-excel/issues/467" alt="https://github.com/crealytics/spark-excel/issues/467" target="_blank"&gt;known open issue&lt;/A&gt; that others in the community are also facing.&lt;UL&gt;&lt;LI&gt;&lt;A href="https://github.com/crealytics/spark-excel/issues/467" alt="https://github.com/crealytics/spark-excel/issues/467" target="_blank"&gt;https://github.com/crealytics/spark-excel/issues/467&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;The temporary work around from that thread is to revert back to Data Source API v1.0&lt;/LI&gt;&lt;LI&gt;The desire goal is to utilize Data Source API v2.0. &lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Update 02:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Another attempt tried as follows:&lt;UL&gt;&lt;LI&gt;Downloaded the binary (&lt;A href="https://dlcdn.apache.org//commons/io/binaries/commons-io-2.11.0-bin.tar.gz" alt="https://dlcdn.apache.org//commons/io/binaries/commons-io-2.11.0-bin.tar.gz" target="_blank"&gt;commons-io-2.11.0-bin.tar.gz&lt;/A&gt;) and extracted the &lt;A href="https://commons.apache.org/proper/commons-io/download_io.cgi" alt="https://commons.apache.org/proper/commons-io/download_io.cgi" target="_blank"&gt;jar directly from Apache Commons&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;Uploaded the downloaded jar to Azure Databricks spark cluster library as JAR&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;After the spark cluster has been restarted with the additional libraries (installed successfully), new error popped up complaining that &lt;B&gt;org/apache/spark/sql/sources/v2/ReadSupport&lt;/B&gt;&amp;nbsp;is not in&amp;nbsp;commons-io 2.11 jar.&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;Py4JJavaError: An error occurred while calling o386.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;The missing class seems to be a class packaged in spark-sql jar. &lt;/LI&gt;&lt;LI&gt;There seems to be some dependency weirdness with the DataSourceV2 classes. &lt;/LI&gt;&lt;LI&gt;The dependency nightmare seems be nested and never ending.&lt;/LI&gt;&lt;LI&gt;Hopefully the experts can weigh in on this.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Update 03: &lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Performed a quick search regarding &lt;A href="https://spark.apache.org/docs/2.4.5/api/java/index.html?org/apache/spark/sql/sources/v2/DataSourceV2.html" alt="https://spark.apache.org/docs/2.4.5/api/java/index.html?org/apache/spark/sql/sources/v2/DataSourceV2.html" target="_blank"&gt;DataSourceV2&lt;/A&gt; and this is an API that only exists in the Spark 2.x branch. Databricks 9.1 LTS is running Spark 3.1.2. With this limited knowledge, I believe the spark-excel library is some how referring to some stale / deprecated Spark 2.x API. &lt;UL&gt;&lt;LI&gt;Does anyone know how to determine which custom jar maybe still calling this old DataSourceV2 API? &lt;UL&gt;&lt;LI&gt;Once that offending jar is isolated, how to overwrite it so the correct Spark API? &lt;/LI&gt;&lt;LI&gt;Again, not exactly 80% confident this is the root cause. Just sharing the existing hypothesis to see if some progress can be made here. &lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Update 04:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;I have tried several different versions of the libraries and they all throw some sort of exceptions in different call stacks&lt;UL&gt;&lt;LI&gt;com.crealytics:spark-excel_2.12:3.1.2_0.16.0&lt;UL&gt;&lt;LI&gt;java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;com.crealytics:spark-excel_2.12:3.1.2_0.15.2&lt;UL&gt;&lt;LI&gt;java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;com.crealytics:spark-excel_2.12:0.14.0&lt;UL&gt;&lt;LI&gt;Does not throw any exception when completing this 1 line command&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;However, when I executed the following line of code in the next Cmd cell, &lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;display(excelSDF)&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;I get a different exception:&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.&amp;lt;init&amp;gt;(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 04 Feb 2022 21:23:22 GMT</pubDate>
    <dc:creator>dataslicer</dc:creator>
    <dc:date>2022-02-04T21:23:22Z</dc:date>
    <item>
      <title>Successfully installed Maven:Coordinates:com.crealytics:spark-excel_2.12:3.2.0_0.16.0 on Azure DBX 9.1 LTS runtime but getting error for missing dependency: org.apache.commons.io.IOUtils.byteArray(I)</title>
      <link>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29446#M21172</link>
      <description>&lt;P&gt;I am using Azure DBX 9.1 LTS and successfully installed the following library on the cluster using &lt;A href="https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.12" alt="https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.12" target="_blank"&gt;Maven coordinates&lt;/A&gt;: &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;com.crealytics:spark-excel_2.12:3.2.0_0.16.0&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I executed the following line:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I get the following exception thrown:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;Py4JJavaError: An error occurred while calling o438.load.
: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
	at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
	at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.&amp;lt;init&amp;gt;(UnsynchronizedByteArrayOutputStream.java:51)
	at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110)
	at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206)
	at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172)
	at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:107)
	at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:122)
	at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:72)
	at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69)
	at scala.Option.orElse(Option.scala:447)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82)
	at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80)
	at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:388)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:295)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:251)
	at java.lang.Thread.run(Thread.java:748)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;When I tried to install the following dependency library using Azure Databricks Cluster Libraries web UI using the following &lt;A href="https://mvnrepository.com/artifact/commons-io/commons-io" alt="https://mvnrepository.com/artifact/commons-io/commons-io" target="_blank"&gt;Maven coordinates&lt;/A&gt;, it failed.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;org.apache.commons:commons-io:2.11.0&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Questions:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Is there a safe guard that Databricks is preventing the installation of this package?&lt;/LI&gt;&lt;LI&gt;How can users of the `spark-excel` library address this dependency on Databricks cluster?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Update 01:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;This seems to be a &lt;A href="https://github.com/crealytics/spark-excel/issues/467" alt="https://github.com/crealytics/spark-excel/issues/467" target="_blank"&gt;known open issue&lt;/A&gt; that others in the community are also facing.&lt;UL&gt;&lt;LI&gt;&lt;A href="https://github.com/crealytics/spark-excel/issues/467" alt="https://github.com/crealytics/spark-excel/issues/467" target="_blank"&gt;https://github.com/crealytics/spark-excel/issues/467&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;The temporary work around from that thread is to revert back to Data Source API v1.0&lt;/LI&gt;&lt;LI&gt;The desire goal is to utilize Data Source API v2.0. &lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Update 02:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Another attempt tried as follows:&lt;UL&gt;&lt;LI&gt;Downloaded the binary (&lt;A href="https://dlcdn.apache.org//commons/io/binaries/commons-io-2.11.0-bin.tar.gz" alt="https://dlcdn.apache.org//commons/io/binaries/commons-io-2.11.0-bin.tar.gz" target="_blank"&gt;commons-io-2.11.0-bin.tar.gz&lt;/A&gt;) and extracted the &lt;A href="https://commons.apache.org/proper/commons-io/download_io.cgi" alt="https://commons.apache.org/proper/commons-io/download_io.cgi" target="_blank"&gt;jar directly from Apache Commons&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;Uploaded the downloaded jar to Azure Databricks spark cluster library as JAR&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;After the spark cluster has been restarted with the additional libraries (installed successfully), new error popped up complaining that &lt;B&gt;org/apache/spark/sql/sources/v2/ReadSupport&lt;/B&gt;&amp;nbsp;is not in&amp;nbsp;commons-io 2.11 jar.&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;Py4JJavaError: An error occurred while calling o386.load.
: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;The missing class seems to be a class packaged in spark-sql jar. &lt;/LI&gt;&lt;LI&gt;There seems to be some dependency weirdness with the DataSourceV2 classes. &lt;/LI&gt;&lt;LI&gt;The dependency nightmare seems be nested and never ending.&lt;/LI&gt;&lt;LI&gt;Hopefully the experts can weigh in on this.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Update 03: &lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Performed a quick search regarding &lt;A href="https://spark.apache.org/docs/2.4.5/api/java/index.html?org/apache/spark/sql/sources/v2/DataSourceV2.html" alt="https://spark.apache.org/docs/2.4.5/api/java/index.html?org/apache/spark/sql/sources/v2/DataSourceV2.html" target="_blank"&gt;DataSourceV2&lt;/A&gt; and this is an API that only exists in the Spark 2.x branch. Databricks 9.1 LTS is running Spark 3.1.2. With this limited knowledge, I believe the spark-excel library is some how referring to some stale / deprecated Spark 2.x API. &lt;UL&gt;&lt;LI&gt;Does anyone know how to determine which custom jar maybe still calling this old DataSourceV2 API? &lt;UL&gt;&lt;LI&gt;Once that offending jar is isolated, how to overwrite it so the correct Spark API? &lt;/LI&gt;&lt;LI&gt;Again, not exactly 80% confident this is the root cause. Just sharing the existing hypothesis to see if some progress can be made here. &lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;U&gt;Update 04:&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;I have tried several different versions of the libraries and they all throw some sort of exceptions in different call stacks&lt;UL&gt;&lt;LI&gt;com.crealytics:spark-excel_2.12:3.1.2_0.16.0&lt;UL&gt;&lt;LI&gt;java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;com.crealytics:spark-excel_2.12:3.1.2_0.15.2&lt;UL&gt;&lt;LI&gt;java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;com.crealytics:spark-excel_2.12:0.14.0&lt;UL&gt;&lt;LI&gt;Does not throw any exception when completing this 1 line command&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;excelSDF = spark.read.format("excel").option("dataAddress", "'Sheet1'!A1:C4").option("header", "true").option("treatEmptyValuesAsNulls", "true").option("inferSchema", "true").load(excel_sample)&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;However, when I executed the following line of code in the next Cmd cell, &lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;display(excelSDF)&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;I get a different exception:&lt;/LI&gt;&lt;/UL&gt;&lt;PRE&gt;&lt;CODE&gt;NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.&amp;lt;init&amp;gt;(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V&lt;/CODE&gt;&lt;/PRE&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Feb 2022 21:23:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29446#M21172</guid>
      <dc:creator>dataslicer</dc:creator>
      <dc:date>2022-02-04T21:23:22Z</dc:date>
    </item>
    <item>
      <title>Re: Successfully installed Maven:Coordinates:com.crealytics:spark-excel_2.12:3.2.0_0.16.0 on Azure DBX 9.1 LTS runtime but getting error for missing dependency: org.apache.commons.io.IOUtils.byteArray(I)</title>
      <link>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29449#M21175</link>
      <description>&lt;P&gt;This is the library dependency. You need to exclude the dependency to get it working. @Jim Huang​&amp;nbsp; &lt;/P&gt;</description>
      <pubDate>Wed, 16 Mar 2022 04:41:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29449#M21175</guid>
      <dc:creator>Atanu</dc:creator>
      <dc:date>2022-03-16T04:41:19Z</dc:date>
    </item>
    <item>
      <title>Re: Successfully installed Maven:Coordinates:com.crealytics:spark-excel_2.12:3.2.0_0.16.0 on Azure DBX 9.1 LTS runtime but getting error for missing dependency: org.apache.commons.io.IOUtils.byteArray(I)</title>
      <link>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29450#M21176</link>
      <description>&lt;P&gt;Thank you for providing another option to address this issue.  &lt;/P&gt;&lt;P&gt;I have follow up questions:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;What should be the dependency to be excluded in this situation?&lt;/LI&gt;&lt;LI&gt;How to exclude such dependency in Databricks runtime environment?&lt;OL&gt;&lt;LI&gt;Is there a reference you can provide regarding this approach? &lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks! &lt;/P&gt;</description>
      <pubDate>Thu, 14 Apr 2022 22:48:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29450#M21176</guid>
      <dc:creator>dataslicer</dc:creator>
      <dc:date>2022-04-14T22:48:48Z</dc:date>
    </item>
    <item>
      <title>Re: Successfully installed Maven:Coordinates:com.crealytics:spark-excel_2.12:3.2.0_0.16.0 on Azure DBX 9.1 LTS runtime but getting error for missing dependency: org.apache.commons.io.IOUtils.byteArray(I)</title>
      <link>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29451#M21177</link>
      <description>&lt;P&gt;Using the older library as suggested worked in DBR 10.4 LTS.  Thank you.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;On a separate note, my curiosity in understanding the changes in the underlying datasource v2 API is ongoing.  &lt;span class="lia-unicode-emoji" title=":grinning_face:"&gt;😀&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2022 15:24:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/29451#M21177</guid>
      <dc:creator>dataslicer</dc:creator>
      <dc:date>2022-04-15T15:24:08Z</dc:date>
    </item>
    <item>
      <title>Re: Successfully installed Maven:Coordinates:com.crealytics:spark-excel_2.12:3.2.0_0.16.0 on Azure D</title>
      <link>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/50739#M28882</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/24360"&gt;@dataslicer&lt;/a&gt;&amp;nbsp; were you able to solve this issue?&lt;/P&gt;&lt;P&gt;I am using 9.1 lts databricks version with Spark 3.1.2 and scala 2.12. I have installed&amp;nbsp;com.crealytics:spark-excel-2.12.17-3.1.2_2.12:3.1.2_0.18.1.&amp;nbsp;&amp;nbsp;It was working fine but now facing same exception as you. Could you please help..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Nov 2023 12:32:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/successfully-installed-maven-coordinates-com-crealytics-spark/m-p/50739#M28882</guid>
      <dc:creator>RamRaju</dc:creator>
      <dc:date>2023-11-09T12:32:08Z</dc:date>
    </item>
  </channel>
</rss>

