<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Install maven package on job cluster in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/43912#M27578</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for replying. I tried the second approach of using init script. Apart from maven package I also installed a python package using script. I imported python package in notebook&amp;nbsp; and it was installed. But when I was trying to read excel file it gave below error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Traceback (most recent call last):
  File "&amp;lt;command-709773507629376&amp;gt;", line 4, in &amp;lt;module&amp;gt;
    df = (spark.read.format("com.crealytics.spark.excel") \
  File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 48, in wrapper
    res = func(*args, **kwargs)
  File "/databricks/spark/python/pyspark/sql/readwriter.py", line 302, in load
    return self._df(self._jreader.load(path))
  File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/databricks/spark/python/pyspark/errors/exceptions.py", line 228, in deco
    return f(*a, **kw)
  File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o602.load.
: java.lang.NoSuchMethodError: scala.collection.immutable.Seq.map(Lscala/Function1;)Ljava/lang/Object;
	at com.crealytics.spark.excel.Utils$MapIncluding.unapply(Utils.scala:28)
	at com.crealytics.spark.excel.WorkbookReader$.apply(WorkbookReader.scala:68)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:39)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:29)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:24)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:382)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:378)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:334)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:334)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)&lt;/LI-CODE&gt;&lt;P&gt;Below is the code in notebook that I am using:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;#read the Excelfile from storage account
try:
    sheet_name="LPT_Control_"+str(as_of_date)+"!A1"
    df = (spark.read.format("com.crealytics.spark.excel") \
                .option("header", "true") \
                .option("treatEmptyValuesAsNulls", "false") \
                .option("dataAddress", sheet_name) \
                .options(inferSchema='True') \
                .load(&amp;lt;path-to-storageaccount-file&amp;gt;))
except Exception as e:
    import traceback
    traceback.print_exc()
    print(f"Error occurred while reading excel {e.print}")&lt;/LI-CODE&gt;&lt;P&gt;How do I check whether the maven package is installed or not ? And is thereany configuration I need to set once the package is installed ?&lt;/P&gt;&lt;P&gt;Waiting to hear back from you soon.&lt;/P&gt;</description>
    <pubDate>Thu, 07 Sep 2023 06:31:57 GMT</pubDate>
    <dc:creator>nikhilkumawat</dc:creator>
    <dc:date>2023-09-07T06:31:57Z</dc:date>
    <item>
      <title>Install maven package on job cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/43770#M27540</link>
      <description>&lt;P&gt;I have a&amp;nbsp;&lt;SPAN&gt;single user cluster and I have created a workflow which will read excel file from Azure storage account.&amp;nbsp;For reading excel file I am using&amp;nbsp;&lt;STRONG&gt;com.crealytics:spark-excel_2.13:3.4.1_0.19.0&lt;/STRONG&gt;&amp;nbsp; library on single user all-purpose cluster.I have already installed this library on the cluster. Attaching a screenshot for reference. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Now what I want to do is to run the workflow on a job cluster. But I don't know how to install this maven library on job cluster. Is there something I can do with init script ? Or there might be multiple ways to install maven package on job cluster ?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;So can you please help me on this ? Any help would really be appreciated.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Sep 2023 09:56:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/43770#M27540</guid>
      <dc:creator>nikhilkumawat</dc:creator>
      <dc:date>2023-09-06T09:56:53Z</dc:date>
    </item>
    <item>
      <title>Re: Install maven package on job cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/43912#M27578</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for replying. I tried the second approach of using init script. Apart from maven package I also installed a python package using script. I imported python package in notebook&amp;nbsp; and it was installed. But when I was trying to read excel file it gave below error:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;Traceback (most recent call last):
  File "&amp;lt;command-709773507629376&amp;gt;", line 4, in &amp;lt;module&amp;gt;
    df = (spark.read.format("com.crealytics.spark.excel") \
  File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 48, in wrapper
    res = func(*args, **kwargs)
  File "/databricks/spark/python/pyspark/sql/readwriter.py", line 302, in load
    return self._df(self._jreader.load(path))
  File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/databricks/spark/python/pyspark/errors/exceptions.py", line 228, in deco
    return f(*a, **kw)
  File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o602.load.
: java.lang.NoSuchMethodError: scala.collection.immutable.Seq.map(Lscala/Function1;)Ljava/lang/Object;
	at com.crealytics.spark.excel.Utils$MapIncluding.unapply(Utils.scala:28)
	at com.crealytics.spark.excel.WorkbookReader$.apply(WorkbookReader.scala:68)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:39)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:29)
	at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:24)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:382)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:378)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:334)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:334)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:750)&lt;/LI-CODE&gt;&lt;P&gt;Below is the code in notebook that I am using:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;#read the Excelfile from storage account
try:
    sheet_name="LPT_Control_"+str(as_of_date)+"!A1"
    df = (spark.read.format("com.crealytics.spark.excel") \
                .option("header", "true") \
                .option("treatEmptyValuesAsNulls", "false") \
                .option("dataAddress", sheet_name) \
                .options(inferSchema='True') \
                .load(&amp;lt;path-to-storageaccount-file&amp;gt;))
except Exception as e:
    import traceback
    traceback.print_exc()
    print(f"Error occurred while reading excel {e.print}")&lt;/LI-CODE&gt;&lt;P&gt;How do I check whether the maven package is installed or not ? And is thereany configuration I need to set once the package is installed ?&lt;/P&gt;&lt;P&gt;Waiting to hear back from you soon.&lt;/P&gt;</description>
      <pubDate>Thu, 07 Sep 2023 06:31:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/43912#M27578</guid>
      <dc:creator>nikhilkumawat</dc:creator>
      <dc:date>2023-09-07T06:31:57Z</dc:date>
    </item>
    <item>
      <title>Re: Install maven package on job cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/45397#M27865</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any update on the above mentioned issue regarding maven package ?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Sep 2023 04:34:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/45397#M27865</guid>
      <dc:creator>nikhilkumawat</dc:creator>
      <dc:date>2023-09-20T04:34:24Z</dc:date>
    </item>
    <item>
      <title>Re: Install maven package on job cluster</title>
      <link>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/45413#M27875</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you ellaborate few more things:&lt;/P&gt;&lt;P&gt;1. When spark-shell installs any maven package, what is the default location where it downloads the jar file ?&lt;/P&gt;&lt;P&gt;2. As far as I know default location for jars is "/databricks/jars/" from where spark picks all the packages. So does this spark-shell installs the jar at different place ? If yes, Please suggest how can I use spark to use jars from this place ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Waiting to hear from you soon.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Sep 2023 08:53:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/install-maven-package-on-job-cluster/m-p/45413#M27875</guid>
      <dc:creator>nikhilkumawat</dc:creator>
      <dc:date>2023-09-20T08:53:02Z</dc:date>
    </item>
  </channel>
</rss>

