databricks-connector: Error: Cluster MASKED is in unexpected state Pending.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-06-2023 01:30 PM - edited 07-06-2023 02:44 PM
Is there a way to make databricks-connector wait for cluster to be running?
Details:
databricks-connector==13.1.0 and the python minor version of cluster and environment are both 3.10
If the cluster is not running this will start it, but any commands after fail because it does not wait for the cluster to be ready:
from databricks.connect import DatabricksSession
from databricks.sdk.core import Config
# get spark session using Databricks SDK's Config class:
config = Config(
host=os.environ.get("DATABRICKS_HOST"),
token=os.environ.get("DATABRICKS_TOKEN"),
cluster_id=os.environ.get("DATABRICKS_CLUSTER_ID"),
)
spark = DatabricksSession.builder.sdkConfig(config).getOrCreate()
Any commands using `spark` after, fail like:
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.FAILED_PRECONDITION
details = "INVALID_STATE: Cluster [MASKED] is in unexpected state Pending."
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2023-07-06T18:57:01.084365359+00:00", grpc_status:9, grpc_message:"INVALID_STATE: Cluster [MASKED] is in unexpected state Pending."}"
If the Cluster is already running everything works as expected.
I am trying to set up a test CI Job so this is kind of a pain because I have to either manually make sure the cluster is running or restart the job once it is.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-06-2023 02:10 PM
Are you using db-connect 13?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-06-2023 02:15 PM
If you want to use PySpark UDFs, it’s important that your development machine’s installed minor version of Python match the minor version of Python that is included with Databricks Runtime installed on the cluster.
Please refer to the document and check if your setup meets the required configuration. Databricks Connect | Databricks on AWS
For Databricks Runtime 13.0 and higher, Databricks Connect is now built on open-source Spark Connect.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-06-2023 02:32 PM
Yes I am using 13.1.0 and the python minor version of cluster and environment are both 3.10. Sorry I should have put that in the question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2023 02:41 AM
Hi @AFox
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-12-2023 09:22 AM
The question has not been answered. databricks-connect does not wait for the selected cluster to start. This needs to be an option or the tool is not nearly as useful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2023 11:28 AM
FYI for anyone that finds this: This seems to be resolved in databricks-connector 14+

