cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Intermittently unavailable: Maven library com.crealytics:spark-excel_2.12:3.5.0_0.20.3

sudhakargen
New Contributor II

The issue is that the package com.crealytics:spark-excel_2.12:3.5.0_0.20.3 is intermittently unavailable i.e. most of the times excel import works and few times it fails with exception (org.apache.spark.SparkClassNotFoundException).

I have installed maven package com.crealytics:spark-excel_2.12:3.5.0_0.20.3 on a bricks cluster(14.2) with spark_version: "14.2.x-scala2.12" and "effective_spark_version": "14.2.x-photon-scala2.12". I'm using databricks-connect python library to import files from Azure blob storage from another application. Any help is appreciated.

 

 ERROR _handle_rpc_error GRPC Error received Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", line 1235, in _analyze resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata()) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1030, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiable grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INTERNAL details = "[DATA_SOURCE_NOT_FOUND] Failed to find data source: com.crealytics.spark.excel. Please find packages at `https://spark.apache.org/third-party-projects.html`. SQLSTATE: 42K02" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"[DATA_SOURCE_NOT_FOUND] Failed to find data source: com.crealytics.spark.excel. Please find packages at `https://spark.apache.org/third-party-projects.html`. SQLSTATE: 42K02", grpc_status:13, created_time:"2024-01-19T06:45:01.99118045+00:00"}"

 

 

2 REPLIES 2

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Looks like the issue is source is not able to reach `https://spark.apache.org/third-party-projects.html` . Could you please try to download the package and install it locally? Also, is there any dependency error? Have you tried installing the package through normal libraries? 

sudhakargen
New Contributor II

"Looks like the issue is source is not able to reach" - Can you please let me know what you mean by this.

Libraries installed on the databricks cluster are as below, I have a cluster with14.2 version on which I have installed maven library(com.crealytics:spark-excel_2.12:3.5.0_0.20.3). We auto-terminate the cluster after say X - minutes of inactivity (JFYI). 

Now the problem is when I try to import data from excel file I get this error intermittently i.e. not always. This was working always without any issues when I was using 13.2 version cluster. Since I have upgraded to 14.2 databricks cluster I'm seeing this error (org.apache.spark.SparkClassNotFoundException). Im using databricks-connect to import data from external sources.

sudhakargen_0-1705903897810.png

 

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.