cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

Notebook cells stuck on "waiting to run" when using Cluster Libraries

jannikj
New Contributor

Hey,

we're observing the following problem when trying to run a notebook on a cluster with libraries installed:
All notebook cells are stuck in "Waiting to run" (also ones containing only a '1+1' statement).

When running the cluster without installing the packages, notebook execution works fine.

We're using a private setup with public network access disabled in Azure Databricks.

We tried this on both a Shared Cluster and a Single User Cluster.
The runtime version is 13.3 LTS with Worker/Driver Type of Standard_DS3_v2 and 2 workers.
Termination is configured after 30 minutes of inactivity.

 The packages we are trying to install are spark-xml and a custom written plain python package without dependencies. Both packages are uploaded as files to the workspace (for the Single User Cluster) and in a Unity Catalog Volume (for the Shared Cluster).

In the Event log, the clusters show the expected STARTING -> RUNNING -> DRIVER HEALTHY messages. The checkmarks in the Compute UI and on the library page are all green.

One thing that catched my eye when checking the Driver Logs:
For a cluster without packages installed (i.e. running normally), the Standard error looks like this:

ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
Wed Nov 15 08:28:57 2023 Connection to spark from PID  1157
Wed Nov 15 08:28:57 2023 Initialized gateway on port 39503
Wed Nov 15 08:28:58 2023 Connected to spark.

On a cluster with packages installed, the last three lines are missing.

Also, after a few minutes, the Event log shows the message METASTORE DOWN for the cluster with libraries installed.

I would greatly appreciate any thoughts on this! 

1 REPLY 1

nkraj
New Contributor III
New Contributor III

This can happen if Metastore client fails to initialize. With python libraries/Jars added, the REPL creation step involves adding installed libraries with Spark addJar operation. This would initialize metastore client and can get stuck if there is any problem with the initialization

Kindly also verify the metastore connectivity.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.