Hi @Retired_mod
Thanks for replying. I tried the second approach of using init script. Apart from maven package I also installed a python package using script. I imported python package in notebook and it was installed. But when I was trying to read excel file it gave below error:
Traceback (most recent call last):
File "<command-709773507629376>", line 4, in <module>
df = (spark.read.format("com.crealytics.spark.excel") \
File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 48, in wrapper
res = func(*args, **kwargs)
File "/databricks/spark/python/pyspark/sql/readwriter.py", line 302, in load
return self._df(self._jreader.load(path))
File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/databricks/spark/python/pyspark/errors/exceptions.py", line 228, in deco
return f(*a, **kw)
File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o602.load.
: java.lang.NoSuchMethodError: scala.collection.immutable.Seq.map(Lscala/Function1;)Ljava/lang/Object;
at com.crealytics.spark.excel.Utils$MapIncluding.unapply(Utils.scala:28)
at com.crealytics.spark.excel.WorkbookReader$.apply(WorkbookReader.scala:68)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:39)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:29)
at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:24)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:382)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:378)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:334)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:334)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Below is the code in notebook that I am using:
#read the Excelfile from storage account
try:
sheet_name="LPT_Control_"+str(as_of_date)+"!A1"
df = (spark.read.format("com.crealytics.spark.excel") \
.option("header", "true") \
.option("treatEmptyValuesAsNulls", "false") \
.option("dataAddress", sheet_name) \
.options(inferSchema='True') \
.load(<path-to-storageaccount-file>))
except Exception as e:
import traceback
traceback.print_exc()
print(f"Error occurred while reading excel {e.print}")
How do I check whether the maven package is installed or not ? And is thereany configuration I need to set once the package is installed ?
Waiting to hear back from you soon.