cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Intermittently unavailable: Maven library com.crealytics:spark-excel_2.12:3.5.0_0.20.3

sudhakargen
New Contributor II

The issue is that the package com.crealytics:spark-excel_2.12:3.5.0_0.20.3 is intermittently unavailable i.e. most of the times excel import works and few times it fails with exception (org.apache.spark.SparkClassNotFoundException).

I have installed maven package com.crealytics:spark-excel_2.12:3.5.0_0.20.3 on a bricks cluster(14.2) with spark_version: "14.2.x-scala2.12" and "effective_spark_version": "14.2.x-photon-scala2.12". I'm using databricks-connect python library to import files from Azure blob storage from another application. Any help is appreciated.

 

 ERROR _handle_rpc_error GRPC Error received Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", line 1235, in _analyze resp = self._stub.AnalyzePlan(req, metadata=self._builder.metadata()) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1030, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiable grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INTERNAL details = "[DATA_SOURCE_NOT_FOUND] Failed to find data source: com.crealytics.spark.excel. Please find packages at `https://spark.apache.org/third-party-projects.html`. SQLSTATE: 42K02" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"[DATA_SOURCE_NOT_FOUND] Failed to find data source: com.crealytics.spark.excel. Please find packages at `https://spark.apache.org/third-party-projects.html`. SQLSTATE: 42K02", grpc_status:13, created_time:"2024-01-19T06:45:01.99118045+00:00"}"

 

 

2 REPLIES 2

Debayan
Databricks Employee
Databricks Employee

Hi, Looks like the issue is source is not able to reach `https://spark.apache.org/third-party-projects.html` . Could you please try to download the package and install it locally? Also, is there any dependency error? Have you tried installing the package through normal libraries? 

sudhakargen
New Contributor II

"Looks like the issue is source is not able to reach" - Can you please let me know what you mean by this.

Libraries installed on the databricks cluster are as below, I have a cluster with14.2 version on which I have installed maven library(com.crealytics:spark-excel_2.12:3.5.0_0.20.3). We auto-terminate the cluster after say X - minutes of inactivity (JFYI). 

Now the problem is when I try to import data from excel file I get this error intermittently i.e. not always. This was working always without any issues when I was using 13.2 version cluster. Since I have upgraded to 14.2 databricks cluster I'm seeing this error (org.apache.spark.SparkClassNotFoundException). Im using databricks-connect to import data from external sources.

sudhakargen_0-1705903897810.png

 

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group