cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

databricks-connector: Error: Cluster MASKED is in unexpected state Pending.

AFox
Contributor

Is there a way to make databricks-connector wait for cluster to be running?

Details:

databricks-connector==13.1.0 and the python minor version of cluster and environment are both 3.10

If the cluster is not running this will start it, but any commands after fail because it does not wait for the cluster to be ready:

 

from databricks.connect import DatabricksSession
from databricks.sdk.core import Config
# get spark session using Databricks SDK's Config class:
config = Config(
host=os.environ.get("DATABRICKS_HOST"),
token=os.environ.get("DATABRICKS_TOKEN"),
cluster_id=os.environ.get("DATABRICKS_CLUSTER_ID"),
)
spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

 

Any commands using `spark` after, fail like:

 

pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.FAILED_PRECONDITION
	details = "INVALID_STATE: Cluster [MASKED] is in unexpected state Pending."
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2023-07-06T18:57:01.084365359+00:00", grpc_status:9, grpc_message:"INVALID_STATE: Cluster [MASKED] is in unexpected state Pending."}"

 

If the Cluster is already running everything works as expected.

I am trying to set up a test CI Job so this is kind of a pain because I have to either manually make sure the cluster is running or restart the job once it is.

7 REPLIES 7

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Prabakar
Esteemed Contributor III
Esteemed Contributor III

If you want to use PySpark UDFs, it’s important that your development machine’s installed minor version of Python match the minor version of Python that is included with Databricks Runtime installed on the cluster.

Please refer to the document and check if your setup meets the required configuration. Databricks Connect | Databricks on AWS

For Databricks Runtime 13.0 and higher, Databricks Connect is now built on open-source Spark Connect. 

Yes I am using 13.1.0 and the python minor version of cluster and environment are both 3.10.  Sorry I should have put that in the question.

Anonymous
Not applicable

Hi @AFox 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

The question has not been answered. databricks-connect does not wait for the selected cluster to start. This needs to be an option or the tool is not nearly as useful.

AFox
Contributor

FYI for anyone that finds this:  This seems to be resolved in databricks-connector 14+

Kaniz
Community Manager
Community Manager

Hi @AFox , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.




 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.