Databricks Community

jannikj · ‎11-15-2023

Hey,

we're observing the following problem when trying to run a notebook on a cluster with libraries installed:
All notebook cells are stuck in "Waiting to run" (also ones containing only a '1+1' statement).

When running the cluster without installing the packages, notebook execution works fine.

We're using a private setup with public network access disabled in Azure Databricks.

We tried this on both a Shared Cluster and a Single User Cluster.
The runtime version is 13.3 LTS with Worker/Driver Type of Standard_DS3_v2 and 2 workers.
Termination is configured after 30 minutes of inactivity.

The packages we are trying to install are spark-xml and a custom written plain python package without dependencies. Both packages are uploaded as files to the workspace (for the Single User Cluster) and in a Unity Catalog Volume (for the Shared Cluster).

In the Event log, the clusters show the expected STARTING -> RUNNING -> DRIVER HEALTHY messages. The checkmarks in the Compute UI and on the library page are all green.

One thing that catched my eye when checking the Driver Logs:
For a cluster without packages installed (i.e. running normally), the Standard error looks like this:

ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
Wed Nov 15 08:28:57 2023 Connection to spark from PID  1157
Wed Nov 15 08:28:57 2023 Initialized gateway on port 39503
Wed Nov 15 08:28:58 2023 Connected to spark.

On a cluster with packages installed, the last three lines are missing.

Also, after a few minutes, the Event log shows the message METASTORE DOWN for the cluster with libraries installed.

I would greatly appreciate any thoughts on this!

jannikj · ‎09-12-2024

Hi,

actually we were able to fix the issue, but I completely forgot about my post.
The problem was, that some Databricks URLs were not accessible from the cluster due to firewall restrictions.
We had to make sure that all URLs from this list for the region our cluster is in was reachable: IP addresses and domains for Azure Databricks services and assets - Azure Databricks | Microsoft Lea...

This fixed the issue and the cluster now runs normally.

View solution in original post

nkraj · ‎02-07-2024

This can happen if Metastore client fails to initialize. With python libraries/Jars added, the REPL creation step involves adding installed libraries with Spark addJar operation. This would initialize metastore client and can get stuck if there is any problem with the initialization

Kindly also verify the metastore connectivity.

BhawaniD · ‎09-12-2024

Did you manage to fix this issue? I am facing a similar situation while running a notebook to read the XML files from the storage account.

jannikj · ‎09-12-2024

Hi,

actually we were able to fix the issue, but I completely forgot about my post.
The problem was, that some Databricks URLs were not accessible from the cluster due to firewall restrictions.
We had to make sure that all URLs from this list for the region our cluster is in was reachable: IP addresses and domains for Azure Databricks services and assets - Azure Databricks | Microsoft Lea...

This fixed the issue and the cluster now runs normally.

Databricks Community

Notebook cells stuck on "waiting to run" when using Cluster Libraries

Connect with Databricks Users in Your Area

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Announcing the new Meta Llama 3.3 model on Databricks

Milestone: DatabricksTV Reaches 100 Videos!

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences

Databricks Community Champion - December 2024 - Sujesh Menon