cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

databricks-connector: Error: Cluster MASKED is in unexpected state Pending.

AFox
Contributor

Is there a way to make databricks-connector wait for cluster to be running?

Details:

databricks-connector==13.1.0 and the python minor version of cluster and environment are both 3.10

If the cluster is not running this will start it, but any commands after fail because it does not wait for the cluster to be ready:

 

from databricks.connect import DatabricksSession
from databricks.sdk.core import Config
# get spark session using Databricks SDK's Config class:
config = Config(
host=os.environ.get("DATABRICKS_HOST"),
token=os.environ.get("DATABRICKS_TOKEN"),
cluster_id=os.environ.get("DATABRICKS_CLUSTER_ID"),
)
spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()

 

Any commands using `spark` after, fail like:

 

pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.FAILED_PRECONDITION
	details = "INVALID_STATE: Cluster [MASKED] is in unexpected state Pending."
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2023-07-06T18:57:01.084365359+00:00", grpc_status:9, grpc_message:"INVALID_STATE: Cluster [MASKED] is in unexpected state Pending."}"

 

If the Cluster is already running everything works as expected.

I am trying to set up a test CI Job so this is kind of a pain because I have to either manually make sure the cluster is running or restart the job once it is.

6 REPLIES 6

Prabakar
Databricks Employee
Databricks Employee

Prabakar
Databricks Employee
Databricks Employee

If you want to use PySpark UDFs, itโ€™s important that your development machineโ€™s installed minor version of Python match the minor version of Python that is included with Databricks Runtime installed on the cluster.

Please refer to the document and check if your setup meets the required configuration. Databricks Connect | Databricks on AWS

For Databricks Runtime 13.0 and higher, Databricks Connect is now built on open-source Spark Connect. 

Yes I am using 13.1.0 and the python minor version of cluster and environment are both 3.10.  Sorry I should have put that in the question.

Anonymous
Not applicable

Hi @AFox 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

The question has not been answered. databricks-connect does not wait for the selected cluster to start. This needs to be an option or the tool is not nearly as useful.

AFox
Contributor

FYI for anyone that finds this:  This seems to be resolved in databricks-connector 14+

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group